JPH08314974A

JPH08314974A - Automatic key work extracting device and document retrieving device

Info

Publication number: JPH08314974A
Application number: JP7145211A
Authority: JP
Inventors: Naohiko Noguchi; 直彦野口; Hirofumi Shinoki; 裕文篠木; Chuichi Kikuchi; 忠一菊池; Terukazu Kiryu; 輝一桐生; Tetsuya Otsuka; 哲也大塚
Original assignee: MAINICHI SHINBUNSHA KK; Matsushita Electric Industrial Co Ltd
Current assignee: MAINICHI SHINBUNSHA KK; Panasonic Holdings Corp
Priority date: 1995-05-22
Filing date: 1995-05-22
Publication date: 1996-11-29
Anticipated expiration: 2017-09-24
Also published as: JP3328104B2

Abstract

PURPOSE: To provide an automatic key word extracting device capable of extracting only a key word securely expressing the contents of a document by the human intervention of an absolute minimum at the time of automatically extracting a key word from the retrieving object document. CONSTITUTION: The automatic key word extracting system automatically extracts the key word from the retrieving object document through the use of a dictionary 12 or a thesaurus 13 describing the high-low order relation mutually between the candidate words for the key word. The device is provided with a key word candidate word segmenting means 14 segmenting the candidate words for the key word from the retrieving object document, a key word candidate word selection means 15 confirming a user's intention and then selecting this candidate word for the key word as the key word when the segmented candidate word for the key word is described on plural places in the thesaurus, and an input/output means 16 presenting information to the user and receiving a selection input from the user. When the segmented candidate word for the key word is described in the many places in the thesaurus and the candidate word involves plural meanings, which meaning the candidate word processes is asked to the user by way of the input/output means to store the candidate word as the key word with the meaning selected by the user.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、データベースに蓄積さ
れた文書や一般のワードプロセッサ、オフィスコンピュ
ータなどの記憶装置に蓄積された文書の中から所望の文
書を検索する文書検索装置と、この文書検索装置に利用
される、各文書を特徴づけるキーワードを自動的に抽出
するキーワード自動抽出装置とに関し、特に、精度の高
い文書検索を可能にしたものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a document retrieval device for retrieving a desired document from documents stored in a database or documents stored in a storage device such as a general word processor or office computer, and this document retrieval. The present invention relates to a keyword automatic extraction device that is used in a device and that automatically extracts keywords that characterize each document, and in particular, enables a highly accurate document search.

【０００２】[0002]

【従来の技術】近年、電子メールや電子カタログ、電子
出版物など、電子化された文書情報が大量に流通し始め
たことに伴って、それらの文書情報から所望の文書だけ
を検索する文書検索装置に対する関心が高まっている。2. Description of the Related Art In recent years, with the start of mass distribution of digitized document information such as electronic mails, electronic catalogs, and electronic publications, a document search for searching only desired documents from those document information. There is growing interest in devices.

【０００３】このような文書検索装置では、従来から、
文書毎に付与されたキーワードを利用して文書を検索す
るキーワード検索という手法が一般的に用いられてい
る。このキーワード検索では、蓄積文書に対して、その
文書の内容を表すキーワードを予め人手によって付与し
て、キーワードに対する転置ファイルを構成しておき、
検索時には利用者が所望のキーワードを入力すると、こ
のキーワードを含む文書を、転置ファイルを利用して高
速で検索する。In such a document retrieval device, conventionally,
A method called a keyword search for searching a document using a keyword assigned to each document is generally used. In this keyword search, a keyword representing the content of the document is manually added to the stored document in advance, and a transposed file for the keyword is configured.
When a user inputs a desired keyword at the time of search, a document containing this keyword is searched at high speed by using the transposed file.

【０００４】このキーワード検索においては、人が各文
書の内容を検討してキーワードを付与しているため、利
用者が望む内容の文書を高い精度で検索することができ
る。しかし、その反面、人手によるキーワード付けの作
業が蓄積文書の増加に間に合わないといった問題点も指
摘されている。In this keyword search, a person assigns a keyword by examining the content of each document, so that the document having the content desired by the user can be searched with high accuracy. However, on the other hand, it has been pointed out that manual keyword addition work cannot keep up with the increase in the number of stored documents.

【０００５】そのような問題点の解決を図るために、文
書からキーワードを自動的に抽出する装置が提案されて
いる（例えば、木本晴夫“キーワード自動抽出装置"、
特開昭６３−１３６２２４など）。この従来のキーワー
ド自動抽出装置は、図１９に示すように、検索対象文書
を格納する文書格納部191と、キーワード抽出時に参照
する辞書192と、単語同士の上位−下位関係などが記述
されたシソーラス193と、文書格納部191から読出した文
書のキーワードを抽出するキーワード自動抽出部194
と、抽出されたキーワードを後の検索で利用しやすいよ
うに転置ファイル形式などで格納するキーワード抽出結
果格納部195とを備えている。In order to solve such a problem, a device for automatically extracting a keyword from a document has been proposed (for example, Haruo Kimoto "keyword automatic extraction device",
JP-A-63-136224). As shown in FIG. 19, this conventional keyword automatic extraction device includes a document storage unit 191, which stores documents to be searched, a dictionary 192 which is referred to when extracting keywords, and a thesaurus in which upper-lower relationships between words are described. 193, and a keyword automatic extraction unit 194 for extracting the keyword of the document read from the document storage unit 191
And a keyword extraction result storage unit 195 that stores the extracted keywords in a transposed file format or the like so that they can be easily used in subsequent searches.

【０００６】この装置のキーワード自動抽出部194は、
まず、文書格納部191から検索対象文書を読出し、辞書1
92を用いて最長一致法（辞書の単語と一致する最も長い
文字列を単語として区分する）や形態素解析手法（品詞
情報、接続情報などを用いて文字列を区分する）などの
処理を行なって、この文書を単語に分割する。例えば、
図２０に示すような文書（文書番号２０とする）を単語
分割すると、図２１に示すような単語列となる。The keyword automatic extraction unit 194 of this apparatus is
First, the search target document is read from the document storage unit 191, and the dictionary 1
Using 92, the longest match method (the longest character string that matches a word in the dictionary is classified as a word) and the morphological analysis method (the character string is classified using part-of-speech information, connection information, etc.) , Divide this document into words. For example,
When a document (document number 20) as shown in FIG. 20 is divided into words, a word string as shown in FIG. 21 is obtained.

【０００７】次に、キーワード自動抽出部194は、こう
して求めた単語列から、この文書のキーワードとして付
与するものを選択する。キーワード検索においては、で
きるだけこの文書の内容を的確に表す単語のみを抽出す
ることが好ましいので、キーワードを選択する際には、
不要語辞書などを用意して、その不要語辞書中に登録さ
れた単語は選択しないといった処理を行なったり、ある
いはキーワード辞書を用意して、そのキーワード辞書中
に登録された単語のみを選択したり、あるいは頻度計算
などを行なって重要と判断される単語のみを選択するな
どの手段がとられる。例えば、図２１の単語列から、不
要な単語として、助詞などの附属語を除去することで、
図２２に示す単語のみが選択される。Next, the keyword automatic extraction unit 194 selects a keyword to be given as a keyword of this document from the word strings thus obtained. In the keyword search, it is preferable to extract only the words that accurately represent the contents of this document, so when selecting a keyword,
Prepare an unnecessary word dictionary and do not select words registered in the unnecessary word dictionary, or prepare a keyword dictionary and select only words registered in the keyword dictionary. Alternatively, it is possible to take a measure such as selecting a word that is judged to be important by performing frequency calculation or the like. For example, by removing an auxiliary word such as a particle as an unnecessary word from the word string in FIG.
Only the words shown in FIG. 22 are selected.

【０００８】最後に、キーワード自動抽出部194で選択
されたキーワードをこの文書のキーワードとして、キー
ワード抽出結果格納部195に格納する。図２３は、キー
ワード抽出結果格納部195の内容を示す図である。図２
２で選択された各単語について、その単語を含む文書番
号２０が記録される。Finally, the keyword selected by the keyword automatic extraction unit 194 is stored in the keyword extraction result storage unit 195 as the keyword of this document. FIG. 23 is a diagram showing the contents of the keyword extraction result storage unit 195. Figure 2
For each word selected in 2, the document number 20 containing that word is recorded.

【０００９】検索時には、利用者からキーワード入力を
受付けて、このキーワードを含む文書をキーワード抽出
結果格納部195から検索する。例えば、利用者の入力が
「ひまわり」であった場合、キーワード抽出結果格納部
195を検索し、図２３のテーブルから、「ひまわり」を
キーワードとして含む文書番号２０の文書が、検索結果
として得られる。At the time of search, a keyword input from the user is accepted, and a document including this keyword is searched from the keyword extraction result storage unit 195. For example, if the user's input is "sunflower", the keyword extraction result storage unit
195 is searched, and the document with the document number 20 including “sunflower” as a keyword is obtained from the table of FIG. 23 as the search result.

【００１０】また、人手に頼るキーワード付与を別の観
点から解決するものとして、全文検索という手法も提案
されている。この全文検索では、利用者から与えられた
文字列を検索条件として、この検索条件と、検索対象文
書を構成する全ての文字列との照合を行ない、検索条件
を満たす文書を求める。そのため、この方法では文書に
予めキーワードを付与する必要がない。A method called full-text search has also been proposed as a means of solving the keyword assignment relying on manual labor from another viewpoint. In this full-text search, a character string provided by the user is used as a search condition, and this search condition is compared with all the character strings forming the search target document to obtain a document satisfying the search condition. Therefore, in this method, it is not necessary to add a keyword to the document in advance.

【００１１】[0011]

【発明が解決しようとする課題】しかし、従来のキーワ
ード自動抽出装置では、検索対象文書から全て自動的に
キーワードを抽出しているため、必ずしもこの文書の内
容を適切に表しているとは言えないものまでキーワード
として取込んでしまうことがある。また、付与されるキ
ーワードの数は、通常の人手によるキーワード付与に比
べて可成り多くなる。その結果、キーワード自動抽出装
置でキーワードが付与された文書をキーワード検索する
場合には、検索の精度が悪化する（検索漏れ、検索のゴ
ミの両者とも増加する）という課題を有していた。However, in the conventional keyword automatic extraction device, since all the keywords are automatically extracted from the document to be searched, it cannot be said that the contents of this document are properly represented. Sometimes things are captured as keywords. Moreover, the number of keywords to be added is considerably larger than that of the normal manual keyword addition. As a result, in the case of performing a keyword search for a document to which a keyword has been added by the automatic keyword extraction device, there is a problem that the accuracy of the search deteriorates (both the search omission and the search dust increase).

【００１２】具体例で説明すると、図２０に掲げた文書
のキーワードを自動抽出した場合には、図２２に示すよ
うに、「月」「日」「両国」などの、文書の内容を表現
するとは思えない余計な単語までもキーワードとして登
録されるため、文書をキーワード検索するときの検索精
度が悪化する。例えば、利用者が、地名の「両国」に関
する文書を検索したい場合に、「両国」というキーワー
ドで検索を行なうと、誤って文書番号２０の文書を検索
してしまう。また、衛星である「月」についての文書を
検索したい場合に、「月」というキーワードで検索を行
なうと、やはり誤って文書番号２０の文書を検索してし
まう。また、利用者が有名な映画に関する文書を検索し
たい場合に、「名画」というキーワードで検索を行なう
と、文書番号２０の文書には「名画」というキーワード
が付与されているために、やはり誤って文書番号２０の
文書が検索されてしまう。このように、キーワード付与
を自動的に行なうと、結果的に誤って検索される文書
（検索のゴミ）が増加し、検索精度が悪化してしまうと
いう問題がある。Explaining in a concrete example, when the keywords of the document shown in FIG. 20 are automatically extracted, the contents of the document such as “month”, “day”, and “Ryogoku” are expressed as shown in FIG. Even an extra word that cannot be thought of is registered as a keyword, so the search accuracy when searching a document for a keyword deteriorates. For example, if the user wants to search for a document related to the place name "Ryogoku" and searches with the keyword "Ryogoku", the document with the document number 20 is erroneously searched. Further, when a user wants to search for a document about the moon "Month", if the search is performed using the keyword "month", the document with the document number 20 will be erroneously searched. Further, when the user wants to search for a document related to a famous movie, if the search is performed using the keyword "masterpiece", the document "document number 20" is added with the keyword "masterpiece". The document with the document number 20 is retrieved. As described above, when the keywords are automatically added, as a result, the number of documents (search dust) that are erroneously searched increases, and the search accuracy deteriorates.

【００１３】また、従来の全文検索では、利用者が検索
条件として入力した文字列を含む全ての文書が検索結果
として出力されるために、網羅的な検索には適している
が、やはり検索のゴミが多く、検索精度が悪いという課
題を有していた。Further, in the conventional full-text search, all documents including the character string input by the user as a search condition are output as the search result, which is suitable for an exhaustive search, but the search There was a problem that there was a lot of garbage and the search accuracy was poor.

【００１４】本発明は、こうした従来の問題点を解決す
るものであり、検索対象文書からキーワードを自動抽出
する際に、人手を最小限度介在させることで、文書の内
容を的確に表現するキーワードだけを抽出することがで
きるキーワード自動抽出装置を提供し、また、キーワー
ドが自動抽出された検索対象文書を検索する際に、人手
を最小限度介在させることで、精度の高い効率的なキー
ワード検索を行なうことができる文書検索装置を提供す
ることを目的としている。The present invention solves these conventional problems. Only a keyword that accurately expresses the content of a document is obtained by automatically interposing a minimum amount of human hands when the keyword is automatically extracted from the document to be searched. An automatic keyword extraction device capable of extracting a keyword is provided, and when searching for a search target document in which a keyword is automatically extracted, a minimum of human intervention is performed to perform a highly accurate and efficient keyword search. It is an object of the present invention to provide a document search device that can perform such a search.

【００１５】[0015]

【課題を解決するための手段】そこで、本発明では、辞
書、またはキーワード候補語同士の上位−下位関係が記
載されたシソーラスを用いて、検索対象文書からキーワ
ードを自動抽出するキーワード自動抽出装置において、
検索対象文書からキーワード候補語を切出すキーワード
候補語切出し手段と、切出されたキーワード候補語がシ
ソーラスの複数箇所に記載されているとき、利用者の意
図を確認した上でこのキーワード候補語をキーワードと
して選択するキーワード候補語選択手段と、利用者に情
報を提示し、また利用者からの選択入力を受付ける入出
力手段とを設けている。Therefore, in the present invention, in a keyword automatic extraction device for automatically extracting a keyword from a document to be searched using a dictionary or a thesaurus in which upper-lower relationships between keyword candidate words are described. ,
When the keyword candidate word extraction means for extracting a keyword candidate word from the document to be searched and the extracted keyword candidate word are described in multiple places in the thesaurus, this keyword candidate word is checked after confirming the user's intention. A keyword candidate word selecting means for selecting as a keyword and an input / output means for presenting information to the user and accepting selection input from the user are provided.

【００１６】また、シソーラスに記載されたキーワード
候補語の中でキーワードの選択に注意を要するキーワー
ド候補語に予め注意語の印を付け、キーワード候補語切
出し手段の切出したキーワード候補語がこの注意語に該
当するとき、キーワード選択手段が、利用者の意図を確
認した上でこのキーワード候補語をキーワードとして選
択するように構成している。Further, among the keyword candidate words described in the thesaurus, the keyword candidate words which require attention in the selection of the keyword are marked in advance and the keyword candidate word cut out by the keyword candidate word cutting means is the caution word. When the above applies, the keyword selecting means is configured to select the keyword candidate word as a keyword after confirming the user's intention.

【００１７】また、辞書、またはキーワード候補語同士
の上位−下位関係が記載されたシソーラスを用いて、検
索対象文書からキーワードを自動抽出し、抽出結果をキ
ーワード抽出結果格納手段に格納するとともに、利用者
の入力した検索文字列をキーワードとして、このキーワ
ードを持つ文書をキーワード抽出結果格納手段から検索
する文書検索装置において、検索文字列がシソーラスの
複数箇所に記載されているとき、利用者の意図を確認
し、利用者の意図するキーワードを持つ文書をキーワー
ド抽出結果格納手段から検索するキーワード検索手段
と、利用者に情報を提示し、また利用者からの選択入力
を受付ける入出力手段とを設けている。Further, a keyword is automatically extracted from a document to be searched using a dictionary or a thesaurus in which upper-lower relationships between keyword candidate words are described, and the extraction result is stored in the keyword extraction result storage means and used. In a document retrieval device that retrieves a document having this keyword from the keyword extraction result storage means by using the retrieval character string input by the user as a keyword, when the retrieval character string is described in a plurality of locations in the thesaurus, the user's intention is A keyword search means for confirming and searching a document having a keyword intended by the user from the keyword extraction result storage means, and an input / output means for presenting information to the user and accepting a selection input from the user are provided. There is.

【００１８】また、この検索対象文書からキーワードを
自動抽出する手段として、検索対象文書からキーワード
候補語を切出すキーワード候補語切出し手段と、切出さ
れたキーワード候補語がシソーラスの複数箇所に記載さ
れているとき、利用者の意図を確認した上でこのキーワ
ード候補語をキーワードとして選択するキーワード候補
語選択手段とを設けている。As means for automatically extracting a keyword from the search target document, a keyword candidate word cutting means for cutting out a keyword candidate word from the search target document and the cut out keyword candidate words are described at a plurality of locations in the thesaurus. At this time, a keyword candidate word selecting means for selecting this keyword candidate word as a keyword after confirming the user's intention is provided.

【００１９】また、検索文字列を含む検索対象文書を文
字列照合により検索する全文検索手段を設け、キーワー
ド検索手段または全文検索手段による検索を入出力手段
から選択できるようにしている。Further, a full-text search means for searching the search target document including the search character string by character string collation is provided so that the search by the keyword search means or the full-text search means can be selected from the input / output means.

【００２０】また、同義語の関係にある単語グループと
シソーラスのキーワード候補語との対応関係を記述した
同義語辞書と、利用者の入力した検索文字列を同義語辞
書を用いて変換する検索文字列変換手段とを設け、キー
ワード検索手段に対して、検索文字列変換手段によって
変換されたシソーラスのキーワード候補語を検索文字列
として与えるように構成している。Further, a synonym dictionary describing the correspondence between word groups having a synonym relationship and a keyword candidate word of the thesaurus, and a search character for converting a search character string input by the user using the synonym dictionary Column conversion means is provided, and the keyword search means is configured to give the keyword candidate words of the thesaurus converted by the search character string conversion means as a search character string.

【００２１】さらに、この検索文字列変換手段が、利用
者の入力した検索文字列を、同義語辞書を用いて同義語
の単語グループに変換し、全文検索手段に対して、この
単語グループの中から利用者が選択した文字列を検索文
字列として与えるように構成している。Further, the search character string conversion means converts the search character string input by the user into a word group of synonyms using a synonym dictionary, and the full text search means selects from among the word groups. The character string selected by the user is given as the search character string.

【００２２】[0022]

【作用】本発明のキーワード自動抽出装置では、切出さ
れたキーワード候補語がシソーラスの複数箇所に載って
いるとき、つまり、その候補語が複数の意味を持つとき
は、入出力手段を介して、どの意味であるかを利用者に
訊ね、利用者が選択した意味を有するキーワードとし
て、その候補語を格納する。このように、曖昧な候補語
について、利用者に正しい意味を選択する機会が与えら
れるため、精度の高いキーワード抽出が可能になる。In the keyword automatic extraction device of the present invention, when the cut-out keyword candidate word appears in a plurality of positions in the thesaurus, that is, when the candidate word has a plurality of meanings, it is input through the input / output means. The user is asked what meaning it has, and the candidate word is stored as a keyword having the meaning selected by the user. In this way, since the user is given the opportunity to select the correct meaning for the ambiguous candidate word, highly accurate keyword extraction becomes possible.

【００２３】また、キーワード抽出に誤りが生じやすい
キーワード候補語については、シソーラスに注意語の表
示が記入されており、この注意語が候補語として切出さ
れた場合にも、利用者の選択を待って、キーワードとし
て格納される。Also, regarding keyword candidate words that are likely to cause an error in keyword extraction, a caution word is displayed on the thesaurus, and even when this caution word is cut out as a candidate word, the user must select it. Wait, it will be stored as a keyword.

【００２４】また、本発明の文書検索装置では、利用者
の入力した検索文字列がシソーラスの複数箇所に記載さ
れているとき、つまり、その検索文字列が複数の意味を
持つときは、入出力手段を介して、その検索文字列がど
の意味であるかを利用者に訊ね、利用者の意図するキー
ワードを備えた文書を検索する。そのため、利用者の意
図に沿った検索を高い精度で進めることができ、全体と
して効率的な検索が可能になる。Further, in the document search device of the present invention, when the search character string input by the user is described in a plurality of locations in the thesaurus, that is, when the search character string has a plurality of meanings, input / output is performed. The user is asked what meaning the search character string means through the means, and a document having a keyword intended by the user is searched. Therefore, the search according to the user's intention can be advanced with high accuracy, and the search as a whole can be efficiently performed.

【００２５】また、この文書検索装置におけるキーワー
ド抽出手段を、本発明の前記キーワード自動抽出装置で
構成することにより、検索漏れや検索ゴミの少ない高精
度の文書検索が可能になる。Further, by constructing the keyword extracting means in this document retrieval device by the keyword automatic extraction device of the present invention, it is possible to perform a highly accurate document retrieval with less omission of retrieval and retrieval dust.

【００２６】また、全文検索手段を併せて備えた文書検
索装置では、精度の比較的高いキーワード検索と、網羅
性の比較的高い全文検索とを適宜使い分けたり、全文検
索で集めた対象文書をキーワード検索で絞るなど、相補
的な検索を実施することが可能であり、利用者のニーズ
に即した検索を行なうことができる。Further, in the document search apparatus equipped with the full-text search means as well, the keyword search having a relatively high accuracy and the full-text search having a relatively high degree of completeness are properly used, or the target documents collected by the full-text search are used as keywords. It is possible to perform a complementary search, such as narrowing down the search, and a search that meets the needs of the user can be performed.

【００２７】また、同義語辞書を備えた文書検索装置で
は、各文書のキーワードとして、抽出されたキーワード
候補語の外に、その候補語のシソーラス上での上位語を
併せて格納しておく。一方、キーワード検索時には、入
力された検索文字列のシソーラス上での上位語を同義語
辞書から調べて、この上位語をキーワードとして対象文
書を検索する。この場合、この上位語の下位概念となる
各同義語については個々に検索を行なわなくとも、上位
語の検索だけで求める文書を得ることができるため、極
めて効率的である。In addition, in the document search device provided with the synonym dictionary, as keywords of each document, in addition to the extracted keyword candidate words, the superordinate words on the thesaurus of the candidate words are also stored. On the other hand, at the time of keyword search, the synonym dictionary is searched for an upper word on the thesaurus of the input search character string, and the target document is searched using this upper word as a keyword. In this case, since it is possible to obtain a desired document only by searching for the synonyms, which are subordinate concepts of the synonyms, without searching individually, it is extremely efficient.

【００２８】また、全文検索時には、入力された検索文
字列の同義語を同義語辞書から調べて、入出力手段を介
して利用者に開示し、利用者がその中から選択した一ま
たは二以上の同義語を検索文字列として全文検索を実行
する。こうすることにより、利用者のニーズに即した検
索を効率的に行なうことができる。Further, at the time of full-text search, the synonyms of the input search character string are searched from the synonym dictionary and disclosed to the user via the input / output means, and one or more selected by the user. Perform a full-text search with the synonyms of as the search string. By doing so, it is possible to efficiently perform a search that meets the needs of the user.

【００２９】[0029]

【Example】

（第１実施例）第１実施例はキーワード自動抽出装置で
あり、この装置は、図１に示すように、検索対象文書を
格納する文書格納部11と、検索対象文書からキーワード
候補語を切出すために参照される辞書12と、キーワード
候補語同士の上位−下位関係などが記述されたシソーラ
ス13と、辞書12やシソーラス13を用いて検索対象文書か
らキーワード候補語を切出すキーワード候補語切出し部
14と、切出されたキーワード候補語から利用者とのやり
取りを通じてその対象文書に適するキーワードを選択す
るキーワード候補語選択部15と、利用者からの入力を受
付けるとともに利用者に情報を提示する入出力部16と、
検索対象文書と選択されたキーワードとの対応関係を記
録するキーワード抽出結果格納部17とを備えている。(First Embodiment) A first embodiment is an automatic keyword extraction device. As shown in FIG. 1, this device cuts off a keyword candidate word from a document storage unit 11 for storing a search target document. A dictionary 12 that is referred to for output, a thesaurus 13 that describes the upper-lower relationship between keyword candidate words, and a keyword candidate word cutout that cuts out keyword candidate words from the search target document using the dictionary 12 and thesaurus 13. Department
14, a keyword candidate word selection unit 15 that selects a keyword suitable for the target document from the extracted keyword candidate words through interaction with the user, and an input that accepts input from the user and presents information to the user. The output unit 16,
A keyword extraction result storage unit 17 for recording the correspondence between the search target document and the selected keyword is provided.

【００３０】シソーラス13は、図３に例示するように、
上位概念を表す上位語と、それに意味的に含まれる下位
語との関係を規定しており、また、「月」のように、複
数の意味（衛星としての月と、月日の月）を持ち、間違
い易いキーワード候補語に対しては注意語の印（＊）が
付されている。なお、単語の後の数字は単語番号を示し
ている。The thesaurus 13 is, as illustrated in FIG.
It defines the relationship between a broader word that represents a broader concept and a lower word that is semantically included in it. It also has multiple meanings, such as "month" (the moon as a satellite and the month and day of the month). A caution word mark (*) is attached to a keyword candidate word that is easily held and is likely to be mistaken. The number after the word indicates the word number.

【００３１】このキーワード自動抽出装置では、まず、
キーワード候補語切出し部14が、文書格納部11に格納さ
れている文書を順番に読み出し、各文書を、辞書12を参
照しながら最長一致手法または形態素解析手法などを用
いて単語分割し、その単語分割された結果から不要語を
削除して、キーワード候補語の列を得る。In this keyword automatic extraction device, first,
The keyword candidate word cutout unit 14 sequentially reads the documents stored in the document storage unit 11, divides each document into words using the longest matching method or the morphological analysis method with reference to the dictionary 12, and then divides the words into words. Unnecessary words are deleted from the divided result to obtain a string of keyword candidate words.

【００３２】この場合、予めキーワードになり得る単語
を辞書12またはシソーラス13に収録しておき、キーワー
ド候補語切出し部14が、読出した文書の全文を辞書12ま
たはシソーラス13と照合し、一致する単語だけを抽出し
てキーワード候補語列を求めてもよい。In this case, words that can be keywords can be recorded in the dictionary 12 or thesaurus 13 in advance, and the keyword candidate word cutout unit 14 collates the entire text of the read document with the dictionary 12 or thesaurus 13 to find matching words. You may extract | require only and you may obtain | require a keyword candidate word string.

【００３３】こうして、例えば図２０に示す文書番号２
０の文書から、図２２のようなキーワード候補語列を得
る。Thus, for example, the document number 2 shown in FIG.
A keyword candidate word string as shown in FIG. 22 is obtained from the document of 0.

【００３４】次に、キーワード候補語選択部15が、この
キーワード候補語列の各単語について、順番にシソーラ
ス13を参照して、各単語の上位語を求める。その際に、
シソーラス13中に、複数箇所登録されている単語があっ
た場合には、一旦その単語とこの単語の上位語との組を
全て入出力部16に表示して利用者にいずれかを選択させ
る。例えば、図２２のキーワード候補語列のうち、「名
画」については、図３のシソーラス13中に２箇所登録さ
れているため、この単語を入出力部16にその上位語とと
もに表示する。図４に、このときの表示画面の例を示
す。Next, the keyword candidate word selecting unit 15 sequentially refers to the thesaurus 13 for each word of this keyword candidate word string to obtain a superordinate word of each word. At that time,
If the thesaurus 13 has a word registered at a plurality of locations, all the pairs of the word and the superordinate word of this word are displayed on the input / output unit 16 to allow the user to select one of them. For example, among the keyword candidate word strings in FIG. 22, the “name” is registered in two places in the thesaurus 13 in FIG. 3, so this word is displayed in the input / output unit 16 together with its superordinate word. FIG. 4 shows an example of the display screen at this time.

【００３５】利用者は、表示された文書を確認し、この
文書中の「名画」という単語が絵画の「名画」を意味し
ていると判断したときは、表示画面の「２絵画」を選
択する。キーワード候補語選択部15は、利用者からの入
力を受けて、この文書（文書番号２０）のキーワードと
して、「絵画」の意味での「名画」（単語番号２０１）
と、その上位語である「絵画」（２００）とをキーワー
ド抽出結果格納部17に格納する。When the user confirms the displayed document and determines that the word "masterpiece" in this document means the "masterpiece" of the painting, the user selects "2 paintings" on the display screen. To do. The keyword candidate word selection unit 15 receives an input from the user and, as a keyword of this document (document number 20), “masterpiece” (word number 201) in the meaning of “painting”.
And its higher-ranking word “painting” (200) are stored in the keyword extraction result storage unit 17.

【００３６】また、キーワード候補語選択部15が、キー
ワード候補語列の各単語について、順番にシソーラス13
を参照して、各単語の上位語を求める際に、シソーラス
13中に、注意語として記載されている単語があった場合
には、一旦その単語が注意語であることを入出力装置16
に表示して利用者に注意を促し、この単語をキーワード
として登録するかどうかを選択させる。図２２のキーワ
ード候補語列のうち、「月」については、図３に示すよ
うに、シソーラス13中に「注意語」としての記述がある
ので、入出力部16にその旨表示する。図５に、このとき
の表示画面の例を示す。Further, the keyword candidate word selection unit 15 sequentially sets the thesaurus 13 for each word in the keyword candidate word string.
To find the broader term of each word, refer to
If there is a word written as a caution word in 13, the input / output device 16 once confirms that the word is a caution word.
It is displayed on the screen, and the user is warned, and is asked to select whether or not to register this word as a keyword. Of the keyword candidate word strings in FIG. 22, “month” is described as “attention word” in the thesaurus 13 as shown in FIG. FIG. 5 shows an example of the display screen at this time.

【００３７】利用者は、表示された文書を確認して、こ
の文書中での「月」という単語が、衛星の意味での
「月」ではないことを確認し、表示画面中の「２選択
しない」を選択する。キーワード候補語選択部15は、利
用者からの入力を受けて、「月」についてはキーワード
として登録しないことに決定する。The user confirms the displayed document, confirms that the word "month" in this document is not "month" in the sense of the satellite, and selects "2 selection" on the display screen. Select "No". The keyword candidate word selection unit 15 receives an input from the user and determines not to register “month” as a keyword.

【００３８】こうしたキーワード自動抽出装置の動作手
順を、図２を用いて説明する。The operation procedure of such an automatic keyword extracting device will be described with reference to FIG.

【００３９】ステップ１：文書格納部11に格納された検
索対象文書数Ｎを記憶するとともに、文書番号ｉを初期
値（１）にセットする。Step 1: The number N of search target documents stored in the document storage unit 11 is stored, and the document number i is set to the initial value (1).

【００４０】ステップ２：キーワード候補語切出し部14
は、文書番号ｉの文書を文書格納部11から読出し、ステップ３：例えば、辞書12を用いてこの文書を単語分
割し、不要語を削除して、長さＫのキーワード候補語列
を得る。Step 2: keyword candidate word cutout unit 14
Reads the document with the document number i from the document storage unit 11, and step 3: divides this document into words using, for example, the dictionary 12, deletes unnecessary words, and obtains a keyword candidate word string of length K.

【００４１】ステップ４：キーワード候補語の順番ｊを
初期値（１）にセットする。Step 4: The order j of keyword candidate words is set to an initial value (1).

【００４２】ステップ５：ｊがＫに達していなければ、ステップ６：キーワード候補語選択部15は、キーワード
候補語列のｊ番目の単語がシソーラス13中で複数箇所に
登録されているかどうかを調べ、ステップ７：複数箇所に登録されているときは、その単
語をシソーラス中の上位語とともに入出力部16の画面に
表示する。Step 5: If j has not reached K, Step 6: The keyword candidate word selection unit 15 checks whether or not the jth word of the keyword candidate word string is registered in a plurality of locations in the thesaurus 13. Step 7: When the word is registered in a plurality of places, the word is displayed on the screen of the input / output unit 16 together with the upper word in the thesaurus.

【００４３】ステップ８：画面を見た利用者が適切な単
語とその上位語とを選択すると、ステップ12：その単語と上位語とをキーワードとしてキ
ーワード抽出結果格納部17に格納する。Step 8: When the user looking at the screen selects an appropriate word and its superordinate word, Step 12: The word and superordinate word are stored as keywords in the keyword extraction result storage unit 17.

【００４４】ステップ６において、キーワード候補語列
のｊ番目の単語がシソーラス13の複数箇所に登録されて
いないときは、ステップ９：その単語がシソーラス中で注意語とされて
いるかどうかを調べ、ステップ10：注意語であるときは、その単語をシソーラ
ス中の上位語とともに入出力部16の画面に表示し、ステップ11：画面を見た利用者はその単語をキーワード
として選択するかどうかを決定する。In step 6, if the j-th word of the keyword candidate word string is not registered in a plurality of locations in the thesaurus 13, step 9: checks whether the word is a caution word in the thesaurus, and 10: When the word is a caution word, the word is displayed on the screen of the input / output unit 16 together with the upper word in the thesaurus. Step 11: The user who looks at the screen determines whether to select the word as a keyword. .

【００４５】ステップ12：キーワード候補語選択部15
は、ステップ11で選択されなかった場合を除いて、その
単語と上位語とをキーワードとしてキーワード抽出結果
格納部17に格納する。Step 12: Keyword candidate word selection unit 15
Stores the word and the broader term as keywords in the keyword extraction result storage unit 17 except when not selected in step 11.

【００４６】ステップ13：キーワード候補語の順番ｊを
１つアップして、ステップ５〜ステップ12の手順を繰り
返し、ステップ５：ｊがＫを超えたとき、つまり、文書番号ｉ
の文書から求めたキーワード候補語列の各単語について
キーワード選択の処理が終了したときは、ステップ14：文書番号を１つアップして、ステップ２〜
ステップ13の手順を繰り返し、ステップ15：文書番号ｉがＮを超えたとき、つまり、文
書格納部11に格納された全ての文書についてのキーワー
ド抽出が完了したときは処理を終了する。Step 13: The order j of the keyword candidate words is incremented by 1, and the steps 5 to 12 are repeated. Step 5: When j exceeds K, that is, the document number i
When the keyword selection processing for each word of the keyword candidate word string obtained from the document is finished, step 14: increment the document number by one, and step 2
The procedure of step 13 is repeated, and step 15: When the document number i exceeds N, that is, when the keyword extraction for all the documents stored in the document storage unit 11 is completed, the process ends.

【００４７】このように、第１実施例のキーワード自動
抽出装置では、利用者のチェックを受けてキーワードが
正しい意味の下で登録される。そのため、文書の検索に
おいて、誤った文書の検出が抑制され、検索精度が向上
する。As described above, in the keyword automatic extraction device of the first embodiment, the keyword is registered with the correct meaning upon the user's check. Therefore, in the document search, detection of an erroneous document is suppressed, and the search accuracy is improved.

【００４８】（第２実施例）第２実施例は文書検索装置
であり、この装置は、図６に示すように、検索対象文書
を格納する文書格納部61と、検索対象文書からキーワー
ド候補語を抽出する際に参照される辞書62と、キーワー
ド候補語同士の上位−下位関係などが記述されたシソー
ラス63と、辞書62やシソーラス63を用いて検索対象文書
からキーワードを抽出するキーワード自動抽出部64と、
検索対象文書とキーワードとの対応関係を記録するキー
ワード抽出結果格納部65と、利用者からの入力を受付け
るとともに利用者に情報を提示する入出力部67と、利用
者から入力された文字列をキーワードとして対象文書を
検索するキーワード検索部66とを備えている。(Second Embodiment) A second embodiment is a document retrieval apparatus. This apparatus, as shown in FIG. 6, has a document storage unit 61 for storing a retrieval target document and a keyword candidate word from the retrieval target document. A dictionary 62 referred to when extracting a keyword, a thesaurus 63 in which upper-lower relations among keyword candidate words are described, and a keyword automatic extraction unit that extracts a keyword from a document to be searched using the dictionary 62 or thesaurus 63. 64,
The keyword extraction result storage unit 65 that records the correspondence between the search target document and the keyword, the input / output unit 67 that receives the input from the user and presents the information to the user, and the character string input by the user A keyword search unit 66 for searching a target document as a keyword is provided.

【００４９】シソーラス63を図８に示しているが、これ
は第１実施例のシソーラス（図３）と同じである。The thesaurus 63 is shown in FIG. 8, which is the same as the thesaurus (FIG. 3) of the first embodiment.

【００５０】この装置のキーワード自動抽出部64は、第
１実施例（図１）のキーワード候補語切出し部14及びキ
ーワード候補語選択部15に相当しており、文書格納部61
に格納された文書を順番に読出し、利用者とのやりとり
を交えながら、辞書62やシソーラス63を用いて、各文書
のキーワードを自動抽出し、抽出結果をキーワード抽出
結果格納部65に格納する。例えば、図２０に示す文書
（文書番号２０）に対して、キーワード自動抽出部64
は、図２２のキーワードを抽出し、また、図２４に示す
文書（文書番号２４）に対して、図２５のキーワードを
抽出する。このとき、第１実施例に示したように、シソ
ーラス63の複数箇所に登録されているキーワードについ
ては、利用者とのやりとりを通じて、正しい意味のキー
ワードが抽出される。The automatic keyword extraction unit 64 of this apparatus corresponds to the keyword candidate word cutout unit 14 and the keyword candidate word selection unit 15 of the first embodiment (FIG. 1), and the document storage unit 61.
The documents stored in the above are sequentially read, the keywords of each document are automatically extracted using the dictionary 62 and thesaurus 63 while exchanging with the user, and the extraction result is stored in the keyword extraction result storage unit 65. For example, for the document (document number 20) shown in FIG.
22 extracts the keywords shown in FIG. 22 and also extracts the keywords shown in FIG. 25 for the document (document number 24) shown in FIG. At this time, as shown in the first embodiment, with respect to the keywords registered in a plurality of locations of the thesaurus 63, the keyword having the correct meaning is extracted through the interaction with the user.

【００５１】抽出されたキーワードは、図９に示すよう
に、キーワードとその単語番号、そのキーワードを含む
文書の総数、及びその文書の各文書番号、の形でキーワ
ード抽出結果格納部65に登録される。ここでは「名画」
というキーワードが二つ登録されているが、「名画」
（単語番号１０１）は、図２４の文書のキーワードであ
る映画の意味を持つ「名画」であり、「名画」（単語番
号２０１）は、図２０の文書のキーワードである絵画の
意味を持つ「名画」である。As shown in FIG. 9, the extracted keywords are registered in the keyword extraction result storage unit 65 in the form of the keyword and its word number, the total number of documents containing the keyword, and each document number of the document. It Here "masterpiece"
The two keywords are registered, but "masterpiece"
The (word number 101) is a “masterpiece” having the meaning of a movie which is the keyword of the document in FIG. 24, and the “masterpiece” (word number 201) has the meaning of a painting which is the keyword of the document in FIG. It is a masterpiece.

【００５２】こうして検索対象文書のキーワードの登録
を終えると、文書検索装置は、図７に示す手順で文書検
索を実行する。When the registration of the keyword of the document to be searched is completed in this way, the document search device executes the document search according to the procedure shown in FIG.

【００５３】ステップ21：利用者が検索キーワードとし
て、例えば「名画」を入出力部67より入力すると、ステップ22：キーワード検索部66は、シソーラス63か
ら、この検索キーワード「名画」の上位語（「映画」
「絵画」）を求める。Step 21: When the user inputs, for example, "masterpiece" as the search keyword from the input / output unit 67, the keyword search unit 66 causes the thesaurus 63 to search for the higher-ranking word ("movies"
"Painting").

【００５４】ステップ23：検索キーワードの上位語がシ
ソーラス中に複数ある場合、つまり、その検索キーワー
ドがシソーラスの複数箇所に登録されている場合には、ステップ24：この検索キーワードを含む文書数をキーワ
ード抽出結果格納部65から求め、検索キーワードと、そ
の上位語と、その検索キーワードを含む文書数とを入出
力部67の画面に表示する（この表示画面例を図１０に示
している）。Step 23: When there are a plurality of broader terms of the search keyword in the thesaurus, that is, when the search keyword is registered in plural places of the thesaurus, Step 24: The number of documents including this search keyword is set as the keyword. Obtained from the extraction result storage unit 65, the search keyword, its superordinate word, and the number of documents containing the search keyword are displayed on the screen of the input / output unit 67 (an example of this display screen is shown in FIG. 10).

【００５５】ステップ25：利用者は画面を見て、検索し
ようとするキーワードがどの上位語を持つキーワードで
あるかを選択する。Step 25: The user looks at the screen and selects which superordinate word the keyword to be searched is.

【００５６】ステップ26：キーワード検索部66は、検索
キーワードがシソーラスの複数箇所に登録されていない
場合にはその検索キーワードを、また、ステップ25にお
いて選択が行なわれたときは選択された検索キーワード
を含む文書の文書番号をキーワード抽出結果格納部65か
ら検索する。Step 26: The keyword search unit 66 selects the search keyword when the search keyword is not registered in a plurality of locations in the thesaurus, and the selected search keyword when the selection is made in step 25. The document number of the included document is searched from the keyword extraction result storage unit 65.

【００５７】従って、利用者の入力した検索キーワード
が「名画」の場合は、図１０の画面が表示され、利用者
は、自身の検索意図に応じて、この画面から「１映
画」または「２絵画」を選択する。例えば「２絵
画」が選択されると、キーワード検索部66は、この選択
を受けて、キーワード抽出結果格納部65から、キーワー
ド「名画（２０１）」のみを持つ文書を検索し、検索結
果である文書番号２０を表示する。Therefore, if the search keyword input by the user is "masterpiece", the screen of FIG. 10 is displayed, and the user can select "1 movie" or "2" from this screen according to his / her search intention. Select "Painting". For example, when "2 paintings" is selected, the keyword search unit 66 receives this selection and searches the keyword extraction result storage unit 65 for documents having only the keyword "masterpiece (201)", which is the search result. The document number 20 is displayed.

【００５８】このように、第２実施例の文書検索装置で
は、利用者の入力した検索条件文字列がシソーラスの複
数箇所に登録されているとき、つまり、その文字列が複
数の意味を持つときには、利用者の検索意図を確認し、
その後、検索を実行する。そのため、効率的、かつ高精
度の検索を行なうことができる。As described above, in the document search apparatus of the second embodiment, when the search condition character string input by the user is registered in a plurality of locations in the thesaurus, that is, when the character string has a plurality of meanings. , Confirm the user's search intention,
Then, the search is executed. Therefore, an efficient and highly accurate search can be performed.

【００５９】（第３実施例）第３実施例の文書検索装置
は、キーワード検索と全文検索との選択が可能である。
この装置は、図１１に示すように、検索対象文書の全文
と入出力部118から入力された文字列とを照合して、こ
の文字列を含む対象文書を検索する全文検索部117を備
えている。その他の構成は第２実施例の文書検索装置
（図６）と変わりがない。(Third Embodiment) The document search apparatus according to the third embodiment can select between keyword search and full text search.
As shown in FIG. 11, this apparatus is provided with a full-text search unit 117 for matching the full text of the search target document with the character string input from the input / output unit 118 and searching for the target document containing this character string. There is. The other structure is the same as that of the document retrieval apparatus (FIG. 6) of the second embodiment.

【００６０】この装置では、キーワード自動抽出部114
が、文書格納部111に格納されている文書を順番に読出
し、キーワードを自動抽出してキーワード抽出結果格納
部115に格納する。この動作は第２実施例と同じであ
る。いま、検索対象文書として文書格納部111に図２０
の文書（文書番号２０）、図２４の文書（文書番号２
４）及び図２６の文書（文書番号２６）が格納され、こ
れらの文書から、それぞれ、図２２、図２５及び図２７
のキーワードが抽出され、このキーワードが図１３に示
す状態でキーワード抽出結果格納部115に登録されたと
する。In this apparatus, the keyword automatic extraction unit 114
, Sequentially reads the documents stored in the document storage unit 111, automatically extracts the keywords, and stores them in the keyword extraction result storage unit 115. This operation is the same as in the second embodiment. Now, as a document to be searched, the document storage unit 111 is shown in FIG.
Document (document number 20), the document of FIG. 24 (document number 2)
4) and the document of FIG. 26 (document number 26) are stored, and from these documents, FIG. 22, FIG. 25, and FIG. 27, respectively.
It is assumed that the keyword is extracted and registered in the keyword extraction result storage unit 115 in the state shown in FIG.

【００６１】文書検索を行なう利用者は、入出力部118
から、検索文字列の入力と、キーワード検索モードか全
文検索モードかの選択を行なう。例えば、利用者がキー
ワード検索モードを選択して、「名画」という検索文字
列を入力した場合には、第２実施例で説明したように、
キーワード検索部116は、入出力部118を介して適宜利用
者とやりとりを行ないながら、利用者の検索意図に沿っ
た、精度の高い検索を実行する。A user who searches for a document is input / output unit 118.
From, input the search character string and select the keyword search mode or the full text search mode. For example, when the user selects the keyword search mode and inputs the search character string "masterpiece", as described in the second embodiment,
The keyword search unit 116 executes a highly accurate search according to the user's search intention while interacting with the user via the input / output unit 118 as appropriate.

【００６２】しかし、利用者がキーワード検索モードを
選択して、「旅行ブーム」のように、それ自身キーワー
ドとして登録されていない検索文字列を入力した場合に
は、検索結果の文書数は０件になってしまう。その場
合、利用者が新たに全文検索モードを選択すると、全文
検索部117は、「旅行ブーム」という文字列と文書格納
部111に格納された各文書との文字列照合を実行し、
「旅行ブーム」の文字列を含む文書、即ち、文書番号２
６の文書を検出する。つまり、「旅行ブーム」という文
字列を含む文書を網羅的に求めたい場合には、利用者は
全文検索モードを指定することによって、該当文書を検
索することができる。However, when the user selects the keyword search mode and inputs a search character string that is not registered as a keyword itself such as "travel boom", the number of documents in the search result is 0. Become. In that case, when the user newly selects the full-text search mode, the full-text search unit 117 executes character string matching between the character string “travel boom” and each document stored in the document storage unit 111,
Document that contains the character string "Travel boom", that is, document number 2
6 documents are detected. In other words, when the user wants to comprehensively obtain documents including the character string "travel boom", the user can search for the documents by designating the full-text search mode.

【００６３】この装置の検索動作の手順を図１２のフロ
ーチャートに示している。The procedure of the search operation of this apparatus is shown in the flowchart of FIG.

【００６４】ステップ31：利用者が入出力部118より検
索条件の文字列と検索モードとを入力すると、ステップ32：検索モードがキーワード検索モードである
ときは、ステップ33〜ステップ38の手順を実行する。こ
の手順は第２実施例の手順（図７）（ステップ22〜ステ
ップ26）と同じである。Step 31: When the user inputs the character string of the search condition and the search mode from the input / output unit 118, Step 32: When the search mode is the keyword search mode, the steps 33 to 38 are executed. To do. This procedure is the same as the procedure of the second embodiment (FIG. 7) (step 22 to step 26).

【００６５】ステップ32において、検索モードがキーワ
ード検索モードでないときは、ステップ34：全文検索部117は、検索格納部111から検索
対象文書を順番に読出し、その文書の全文と検索条件の
文字列とを照合して、その文字列を含む対象文書を求め
る。In step 32, when the search mode is not the keyword search mode, step 34: the full-text search section 117 sequentially reads the search target document from the search storage section 111, and extracts the full-text of the document and the character string of the search condition. And the target document containing the character string is obtained.

【００６６】このように第３実施例の文書検索装置で
は、検索モードの選択が可能であり、利用者は、網羅的
な検索を希望する場合に全文検索モードを選択し、ま
た、精度の高い検索を希望する場合にキーワード検索モ
ードを選択することができる。また、一旦全文検索モー
ドで検索を実行し、検索結果件数が多い場合にキーワー
ド検索モードに変更して検索結果を絞り込んだり、ある
いは、一旦キーワード検索モードを実行し、検索結果件
数が０件またはそれに近い場合に、全文検索モードを選
択して更に関連文書を求める、といった、柔軟で、効率
的な検索を行なうことができる。As described above, in the document search apparatus of the third embodiment, the search mode can be selected, and the user selects the full-text search mode when he or she desires an exhaustive search, and the accuracy is high. You can select the keyword search mode if you want to search. In addition, if you perform a search in the full text search mode and change the number of search results to a keyword search mode to narrow down the search results, or execute the keyword search mode once and the number of search results is 0 or When it is close, a full-text search mode is selected and further related documents are searched for, and a flexible and efficient search can be performed.

【００６７】（第４実施例）第４実施例の文書検索装置
は、同義語による検索を効率化することができる。この
装置は、図１４に示すように、入出力部150から入力さ
れる検索条件の文字列を同義語に変換する検索文字列変
換部148と、この変換に使用される同義語辞書149とを備
えている。その他の構成は第３実施例の装置（図１１）
と変わりがない。(Fourth Embodiment) The document retrieval apparatus according to the fourth embodiment can improve the efficiency of retrieval by synonyms. This device, as shown in FIG. 14, includes a search character string conversion unit 148 that converts a character string of a search condition input from the input / output unit 150 into a synonym, and a synonym dictionary 149 used for this conversion. I have it. The other structure is the device of the third embodiment (FIG. 11).
There is no change.

【００６８】同義語辞書149には、図１６に示すよう
に、同じ意味を有する同義語の単語グループと、この単
語グループに対応するシソーラス143中の上位語との関
係が記述されている。As shown in FIG. 16, the synonym dictionary 149 describes a relationship between a synonym word group having the same meaning and an upper word in the thesaurus 143 corresponding to this word group.

【００６９】この文書検索装置では、検索対象文書のキ
ーワード抽出に当たって、キーワード自動抽出部144
は、各文書から切出した単語と、この単語のシソーラス
143中での上位語とを全てキーワードとして抽出し、キ
ーワード抽出結果格納部145に格納する。In this document retrieval apparatus, the keyword automatic extraction unit 144 is used to extract the keywords of the retrieval target document.
Is the word cut out from each document and the thesaurus of this word
All upper terms in 143 are extracted as keywords and stored in the keyword extraction result storage unit 145.

【００７０】例えば、シソーラス143において「電子計
算機」「電算機」「コンピュータ」の上位語として「コ
ンピューター」が規定されている場合には、キーワード
自動抽出部144は、図２８に示す文書（文書番号２
８）、図３０に示す文書（文書番号３０）及び図３２に
示す文書（文書番号３２）から、それぞれ図２９、３
１、３３のキーワードを抽出する。これらのキーワード
は、キーワード抽出結果格納部145に図１７に示す状態
で登録される。For example, when "computer" is defined in the thesaurus 143 as a superordinate word for "electronic computer", "computer", and "computer", the keyword automatic extraction unit 144 uses the document (document number) shown in FIG. Two
8), the document (document number 30) shown in FIG. 30 and the document (document number 32) shown in FIG.
The keywords 1, 33 are extracted. These keywords are registered in the keyword extraction result storage unit 145 in the state shown in FIG.

【００７１】キーワードの登録後、文書検索を行なおう
とする利用者は、入出力部150から検索文字列を入力す
るとともに、キーワード検索モードか全文検索モードか
を選択する。例えば、キーワード検索モードが選択さ
れ、「電子計算機」という検索文字列が入力されると、
検索文字列変換部148は、選択されたモードがキーワー
ド検索モードであるときは、同義語辞書149から、「電
子計算機」を含む同義語グループの上位語となるキーワ
ード「コンピューター」を探して、キーワード検索部14
6に伝える。After registering the keyword, the user who wants to perform the document search inputs the search character string from the input / output unit 150 and selects the keyword search mode or the full-text search mode. For example, if the keyword search mode is selected and the search string "computer" is entered,
When the selected mode is the keyword search mode, the search string conversion unit 148 searches the synonym dictionary 149 for a keyword “computer” that is a superordinate word of a synonym group including “electronic computer”, Search unit 14
Tell 6.

【００７２】キーワード検索部146は、これを受けて
「コンピューター」というキーワードを持つ文書をキー
ワード抽出結果格納部145から探索する。キーワード抽
出結果格納部145には、図１７に示すように、文書番号
２８、３０、３２の文書がすべて「コンピューター」と
いうキーワードを持つものとして登録されているので、
これら３つの文書は、検索の結果、検出されることにな
る。In response to this, the keyword search unit 146 searches the keyword extraction result storage unit 145 for a document having the keyword "computer". As shown in FIG. 17, since the documents of document numbers 28, 30, and 32 are all registered in the keyword extraction result storage unit 145 as having the keyword “computer”,
These three documents will be detected as a result of the search.

【００７３】このように、一旦検索文字列を、上位語と
なるキーワードに変換することで、「電子計算機」と同
義である「電算機」「コンピュータ」などについての検
索を個別に行なう必要が無くなり、キーワード検索を効
率化することができる。As described above, once the search character string is converted into a keyword that is a high-order word, it is not necessary to individually search for "computer", "computer", etc., which are synonymous with "electronic computer". , The keyword search can be made efficient.

【００７４】一方、全文検索モードが選択され、「電子
計算機」という検索文字列が入力されると、検索文字列
変換部148は、選択されたモードが全文検索モードであ
るときは、「電子計算機」を含む同義語グループを同義
語辞書149より探し、この同義語グループに属する単語
を入出力部150を介して利用者に提示する。このときの
表示画面を図１８に例示している。On the other hand, when the full-text search mode is selected and the search character string "electronic computer" is input, the search-character-string conversion unit 148 determines that the selected computer is "electronic computer" when the selected mode is the full-text search mode. The synonym group including “” is searched from the synonym dictionary 149, and words belonging to this synonym group are presented to the user via the input / output unit 150. The display screen at this time is illustrated in FIG.

【００７５】利用者は、表示された単語の中から、一
つ、あるいは複数の単語を選択する。例えば、「コンピ
ュータ」という単語を選択すると、検索文字列変換部14
8は選択された文字列を全文検索部147に伝え、全文検索
部147は、文書格納部141から読出した文書の全文と「コ
ンピュータ」という単語との照合を行ない、この単語を
含む文書を検出する。その結果、「コンピュータ」とい
う単語を含む文書番号３２の文書は検出されるが、文書
番号２８、３０の文書は検出されない。The user selects one or a plurality of words from the displayed words. For example, if the word "computer" is selected, the search string conversion unit 14
8 transmits the selected character string to the full-text search unit 147, and the full-text search unit 147 matches the full-text of the document read from the document storage unit 141 with the word "computer", and detects the document containing this word. To do. As a result, the document with the document number 32 including the word "computer" is detected, but the documents with the document numbers 28 and 30 are not detected.

【００７６】こうした動作を行なう文書検索装置の動作
手順を図１５に示している。ステップ41：利用者が入出力部150から検索条件の文字
列と検索モードとを入力すると、ステップ42：検索モードがキーワード検索モードである
ときは、ステップ43：検索文字列変換部148は、同義語辞書149か
ら検索文字列の属する同義語の単語グループの上位語を
求め、ステップ44〜ステップ48：キーワード検索部146が、こ
の上位語をキーワードに持つ文書を検索する。このステ
ップ44〜ステップ48の手順は、第２実施例の手順（図
７）（ステップ22〜ステップ26）と同じである。FIG. 15 shows an operation procedure of the document search device which performs such an operation. Step 41: When the user inputs the character string of the search condition and the search mode from the input / output unit 150, Step 42: When the search mode is the keyword search mode, Step 43: The search character string conversion unit 148 has the same meaning. From the word dictionary 149, the upper word of the synonym word group to which the search character string belongs is found, and step 44 to step 48: the keyword search unit 146 searches for documents having this upper word as a keyword. The procedure of steps 44 to 48 is the same as the procedure of the second embodiment (FIG. 7) (steps 22 to 26).

【００７７】ステップ42において、検索モードがキーワ
ード検索モードでないときは、ステップ49：検索文字列変換部148は、同義語辞書149か
ら検索文字列の属する同義語の単語グループを求め、ステップ50：この単語グループに含まれる単語を画面に
表示する。In step 42, when the search mode is not the keyword search mode, step 49: the search character string conversion unit 148 obtains a synonym word group to which the search character string belongs from the synonym dictionary 149, and step 50: this Display the words contained in a word group on the screen.

【００７８】ステップ51：画面を見た利用者が、全文検
索に用いる単語を選択すると、ステップ52：全文検索部147は、文書格納部141から読出
した検索対象文書と選択された単語とを照合し、その単
語を含む全ての文書を検出する。Step 51: When the user viewing the screen selects a word to be used for full-text search, Step 52: The full-text search unit 147 collates the search target document read from the document storage unit 141 with the selected word. Then, all documents including the word are detected.

【００７９】このように、第４実施例の文書検索装置で
は、キーワード検索の場合に、検索条件文字列の上位語
をキーワードとしているため、効率的な検索が可能であ
る。また、全文検索の場合には、利用者に対して、同義
語グループの中から検索条件文字列を選択する機会を与
えており、利用者のニーズに即した検索を実行すること
ができる。As described above, in the document search device of the fourth embodiment, in the case of keyword search, since the high-order word of the search condition character string is used as a keyword, efficient search is possible. Further, in the case of full-text search, the user is given the opportunity to select a search condition character string from the synonym group, and it is possible to execute a search that meets the needs of the user.

【００８０】なお、第２実施例、第３実施例及び第４実
施例の文書検索装置において、キーワード自動抽出部
は、第１実施例のキーワード自動抽出装置のキーワード
候補語切出し部及びキーワード候補語選択部に相当する
構成とすることが望ましいが、辞書及びシソーラスを用
いて検索対象文書から適切なキーワードを抽出すること
ができる他の構成を採ることも可能である。In the document retrieving apparatus of the second, third and fourth embodiments, the keyword automatic extracting unit includes a keyword candidate word cutting unit and a keyword candidate word extracting unit of the keyword automatic extracting apparatus of the first embodiment. Although it is desirable to have a configuration corresponding to the selection unit, it is also possible to employ another configuration capable of extracting an appropriate keyword from the search target document using a dictionary and a thesaurus.

【００８１】[0081]

【発明の効果】以上の実施例の説明から明らかなよう
に、本発明のキーワード自動抽出装置は、複数の意味を
持つ単語、あるいは間違いやすい単語に対して、利用者
の意図を確認してからキーワードとして抽出しているた
め、文書の内容に即した適切なキーワードの抽出が可能
であり、その結果、キーワード検索における精度を向上
させることができる。As is apparent from the above description of the embodiments, the keyword automatic extraction device of the present invention confirms the user's intention with respect to a word having a plurality of meanings or a word that is easily mistaken. Since the keyword is extracted, it is possible to extract an appropriate keyword according to the content of the document, and as a result, it is possible to improve the accuracy in keyword search.

【００８２】また、本発明の文書検索装置は、入力され
た検索キーワードが複数の意味を持つ場合に、利用者の
検索意図を明確にした上で検索を実行しているため、精
度の高い検索結果を効率的に得ることができる。Further, since the document search apparatus of the present invention performs the search after clarifying the user's search intention when the input search keyword has a plurality of meanings, the search is highly accurate. The result can be obtained efficiently.

【００８３】また、キーワード検索モードと全文検索モ
ードとを選択できる文書検索装置では、利用者の意図に
沿った、柔軟で精度の高い検索が可能である。Further, in the document search device capable of selecting the keyword search mode and the full-text search mode, it is possible to perform a flexible and highly accurate search in line with the user's intention.

【００８４】さらに、同義語辞書を備えた文書検索装置
では、同義語によるキーワード検索を効率化することが
でき、また、全文検索に際しては、検索文字列に用いる
同義語を選択することができ、利用者のニーズに即した
全文検索が可能である。Further, in the document search device provided with the synonym dictionary, the keyword search by the synonyms can be made efficient, and in the full text search, the synonyms used for the search character string can be selected. It is possible to perform full-text search that meets the needs of users.

[Brief description of drawings]

【図１】本発明の第１実施例におけるキーワード自動抽
出装置の構成を示すブロック図、FIG. 1 is a block diagram showing the configuration of a keyword automatic extraction device according to a first embodiment of the present invention,

【図２】第１実施例のキーワード自動抽出装置における
動作を示すフローチャート、FIG. 2 is a flowchart showing the operation of the automatic keyword extraction device of the first embodiment,

【図３】第１実施例のキーワード自動抽出装置における
シソーラスの概念図、FIG. 3 is a conceptual diagram of a thesaurus in the keyword automatic extraction device according to the first embodiment,

【図４】第１実施例のキーワード自動抽出装置における
表示例（単語が複数の意味を持つ場合）、FIG. 4 is a display example in the keyword automatic extraction device according to the first embodiment (when a word has a plurality of meanings),

【図５】第１実施例のキーワード自動抽出装置における
表示例（単語が注意語の場合）、FIG. 5 is a display example (when a word is a caution word) in the keyword automatic extraction device of the first embodiment,

【図６】本発明の第２実施例における文書検索装置の構
成を示すブロック図、FIG. 6 is a block diagram showing a configuration of a document search device according to a second embodiment of the present invention,

【図７】第２実施例の文書検索装置における動作を示す
フローチャート、FIG. 7 is a flowchart showing the operation of the document search device of the second embodiment,

【図８】第２実施例の文書検索装置におけるシソーラス
の概念図、FIG. 8 is a conceptual diagram of a thesaurus in the document search device according to the second embodiment,

【図９】第２実施例の文書検索装置におけるキーワード
抽出結果格納部の概念図、FIG. 9 is a conceptual diagram of a keyword extraction result storage unit in the document search device of the second embodiment,

【図１０】第２実施例の文書検索装置における表示例
（検索条件が複数の意味を持つ場合）、FIG. 10 is a display example of the document search device according to the second embodiment (when the search condition has a plurality of meanings),

【図１１】本発明の第３実施例における文書検索装置の
構成を示すブロック図、FIG. 11 is a block diagram showing a configuration of a document search device according to a third embodiment of the present invention,

【図１２】第３実施例の文書検索装置における動作を示
すフローチャート、FIG. 12 is a flowchart showing the operation of the document search device of the third embodiment,

【図１３】第３実施例の文書検索装置におけるキーワー
ド抽出結果格納部の概念図、FIG. 13 is a conceptual diagram of a keyword extraction result storage unit in the document retrieval device of the third embodiment,

【図１４】本発明の第４実施例における文書検索装置の
構成を示すブロック図、FIG. 14 is a block diagram showing the configuration of a document search device according to a fourth embodiment of the present invention,

【図１５】第４実施例の文書検索装置における動作を示
すフローチャート、FIG. 15 is a flowchart showing the operation of the document search device of the fourth embodiment,

【図１６】第４実施例の文書検索装置における同義語辞
書の概念図、FIG. 16 is a conceptual diagram of a synonym dictionary in the document search device according to the fourth embodiment,

【図１７】第４実施例の文書検索装置におけるキーワー
ド抽出結果格納部の概念図、FIG. 17 is a conceptual diagram of a keyword extraction result storage unit in the document search device of the fourth embodiment,

【図１８】第４実施例の文書検索装置における表示例
（同義語を表示する場合）FIG. 18 is a display example of the document search device according to the fourth embodiment (when displaying synonyms).

【図１９】従来のキーワード自動抽出装置の構成を示す
ブロック図、FIG. 19 is a block diagram showing the configuration of a conventional keyword automatic extraction device,

【図２０】検索対象文書（文書番号２０）の例、FIG. 20 is an example of a search target document (document number 20);

【図２１】検索対象文書（文書番号２０）を単語分割し
た例、FIG. 21 is an example in which a search target document (document number 20) is divided into words,

【図２２】検索対象文書（文書番号２０）から抽出され
るキーワードの例、FIG. 22 is an example of keywords extracted from a search target document (document number 20);

【図２３】従来のキーワード自動抽出装置でのキーワー
ド抽出結果格納部の概念図、FIG. 23 is a conceptual diagram of a keyword extraction result storage unit in a conventional keyword automatic extraction device,

【図２４】検索対象文書（文書番号２４）の例、FIG. 24 is an example of a search target document (document number 24),

【図２５】検索対象文書（文書番号２４）から抽出され
るキーワードの例、FIG. 25 is an example of keywords extracted from a search target document (document number 24);

【図２６】検索対象文書（文書番号２６）の例、FIG. 26 is an example of a search target document (document number 26),

【図２７】検索対象文書（文書番号２６）から抽出され
るキーワードの例、FIG. 27 is an example of keywords extracted from a search target document (document number 26);

【図２８】検索対象文書（文書番号２８）の例、FIG. 28 is an example of a search target document (document number 28),

【図２９】検索対象文書（文書番号２８）から抽出され
るキーワードの例、FIG. 29 is an example of keywords extracted from a search target document (document number 28);

【図３０】検索対象文書（文書番号３０）の例、FIG. 30 is an example of a document to be searched (document number 30),

【図３１】検索対象文書（文書番号３０）から抽出され
るキーワードの例、FIG. 31 is an example of keywords extracted from a search target document (document number 30);

【図３２】検索対象文書（文書番号３２）の例、FIG. 32 is an example of a search target document (document number 32),

【図３３】検索対象文書（文書番号３２）から抽出され
るキーワードの例である。FIG. 33 is an example of keywords extracted from a search target document (document number 32).

[Explanation of symbols]

11、61、111、141、191 文書格納部 12、62、112、142、192 辞書 13、63、113、143、193 シソーラス 14 キーワード候補語切出し部 15 キーワード候補語選択部 16、67、118、150 入出力部 17、65、115、145、195 キーワード抽出結果格納部 64、114、144、194 キーワード自動抽出部 66、116、146 キーワード検索部 117、147 全文検索部 148 検索文字列変換部 149 同義語辞書 11, 61, 111, 141, 191 Document storage unit 12, 62, 112, 142, 192 Dictionary 13, 63, 113, 143, 193 Thesaurus 14 Keyword candidate word cutout unit 15 Keyword candidate word selection unit 16, 67, 118, 150 Input / output unit 17, 65, 115, 145, 195 Keyword extraction result storage unit 64, 114, 144, 194 Automatic keyword extraction unit 66, 116, 146 Keyword search unit 117, 147 Full text search unit 148 Search character string conversion unit 149 Synonym dictionary

フロントページの続き (72)発明者菊池忠一大阪府門真市大字門真1006番地松下電器産業株式会社内 (72)発明者桐生輝一東京都千代田区一ツ橋一丁目１番１号株式会社毎日新聞社内 (72)発明者大塚哲也東京都千代田区一ツ橋一丁目１番１号株式会社毎日新聞社内Front page continued (72) Inventor Chuichi Kikuchi 1006 Kadoma, Kadoma City, Osaka Prefecture Matsushita Electric Industrial Co., Ltd. (72) Teruichi Kiryu 1-1-1, Hitotsubashi, Chiyoda-ku, Tokyo Stock Company Mainichi Newspaper (72) Inventor Tetsuya Otsuka 1-1-1, Hitotsubashi, Chiyoda-ku, Tokyo Stock company Mainichi Newspaper

Claims

[Claims]

1. An automatic keyword extracting device for automatically extracting a keyword from a document to be searched using a dictionary or a thesaurus in which upper-lower relations among keyword candidate words are written, in which the keyword candidate word is cut from the document to be searched. A keyword candidate word cutting-out means for outputting and a keyword candidate word for selecting the keyword candidate word as a keyword after confirming the user's intention when the cut-out keyword candidate word is described in a plurality of locations in the thesaurus. An automatic keyword extraction device comprising: a selection unit and an input / output unit that presents information to the user and receives selection input from the user.

2. Among the keyword candidate words described in the thesaurus, a keyword candidate word that needs attention in selecting a keyword is marked in advance with a caution word, and the keyword candidate word cut out by the keyword candidate word cutting means is the above-mentioned keyword candidate word. The automatic keyword extraction device according to claim 1, wherein, when the keyword is selected, the keyword selection unit selects the keyword candidate word as a keyword after confirming the intention of the user.

3. A keyword is automatically extracted from a document to be searched using a dictionary or a thesaurus in which upper-lower relationships between keyword candidate words are described, and the extraction result is stored in the keyword extraction result storage means and used. In a document retrieval device for retrieving a document having this keyword from the keyword extraction result storage means using a retrieval character string input by a user as a keyword, when the retrieval character string is described in a plurality of locations in the thesaurus, the user The keyword search means for confirming the intention of the user and searching the document having the keyword intended by the user from the keyword extraction result storage means, and the input / output means for presenting information to the user and accepting the selection input from the user A document retrieval device characterized in that and are provided.

4. A keyword candidate word cutout unit for cutting out a keyword candidate word from the search target document as a unit for automatically extracting a keyword from the search target document, and the cut out keyword candidate words at a plurality of locations in the thesaurus. 4. The document search device according to claim 3, further comprising a keyword candidate word selection unit that selects the keyword candidate word as a keyword after confirming a user's intention.

5. A full-text search unit for searching a search target document including the search character string by character string matching is provided, and a search by the keyword search unit or the full-text search unit can be selected from the input / output unit. Claim 3
Document retrieval device described in.

6. A synonym dictionary describing a correspondence between a word group having a synonym relationship and a keyword candidate word of the thesaurus, and a search character string input by a user are converted using the synonym dictionary. A search character string conversion means is provided,
The document search device according to claim 3 or 5, wherein the keyword search means supplies the keyword candidate word of the thesaurus converted by the search character string conversion means as a search character string.

7. The search character string conversion means converts a search character string input by a user into a word group of synonyms using the synonym dictionary, and the full text search means performs the word group conversion. 7. The document search device according to claim 6, wherein a character string selected by the user from among the above is given as a search character string.