JPH10149368A

JPH10149368A - Document retrieval device

Info

Publication number: JPH10149368A
Application number: JP8309412A
Authority: JP
Inventors: Hiroshi Masuichi; 博増市
Original assignee: Fuji Xerox Co Ltd
Current assignee: Fujifilm Business Innovation Corp
Priority date: 1996-11-20
Filing date: 1996-11-20
Publication date: 1998-06-02

Abstract

PROBLEM TO BE SOLVED: To provide a document retrieval device capable of improving a reproducing rate without reducing the adaptation rate of retrieval. SOLUTION: An explanatory word dictionary preparing part 3 prepares an explanatory word dictionary having a certain word and a set of words explaining the word concerned as a pair based on the contents of a Japanese- language dictionary 1 and a synonym dictionary 2 stored in a dictionary storage part 10 and stores the prepared dictionary in an explanatory word dictionary storing part 4. A development part 6 develops a retrieving word inputted by a retrieving word input part 5 to a synonym and a logical expression based on the contents of the dictionary 2 and the contents of the explanatory word dictionary stored in the storing part 4. A retrieval part 8 obtains all documents including any of the retrieving word inputted to the input part 5 and the synonym as the contents of the documents as retrieved results. When the retrieving word is developed by the development part 6, all documents having paragraphs satisfying the developed logical expression are obtained as retrieved results. These retrieved results are outputted to a display part 9.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、電子文書の検索装
置に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an electronic document search apparatus.

【０００２】[0002]

【従来の技術】膨大な量の文書を対象とした検索方式と
して、キーワードによる検索方式が一般に用いられてい
る。検索条件として任意のキーワード（検索語）を検索
システムに入力すると、文書内容に検索語を含む全ての
文書が検索結果として得られる。この方式による検索
は、全文検索と呼ばれている。また、各文書に対して検
索用のキーワードを予め付加しておき、入力された検索
語と一致するキーワードが付加された文書を検索結果と
する方式も広く用いられている。各文書に付加されるキ
ーワードは、文書中から自動抽出する場合と、人手によ
って決定する場合とがある。2. Description of the Related Art As a search method for an enormous amount of documents, a search method using a keyword is generally used. When an arbitrary keyword (search term) is input as a search condition into the search system, all documents including the search term in the document content are obtained as search results. The search by this method is called a full-text search. A method is also widely used in which a keyword for search is added to each document in advance, and a document to which a keyword matching the input search word is added is used as a search result. The keyword added to each document may be automatically extracted from the document, or may be manually determined.

【０００３】これらの検索システムでは、ユーザによっ
て入力された検索語と完全に一致する語を含んでいる文
書か、あるいは、ユーザが入力したキーワードと完全に
一致する語が検索用のキーワードとして付加されている
文書しか検索結果として得ることができない。したがっ
て、このような検索システムでは、検索語とキーワード
間の完全一致が要求されるため、ユーザが求める全ての
文書を網羅的に得ることができるものであるとはいえな
い。すなわち、適合率は比較的高いが再現率は低い検索
システムとなっている。ここで適合率とは、検索結果と
して得られた文書総数のうちのユーザが所望する文書の
割合であり、再現率とは、ユーザが所望する文書総数の
うちの検索結果として得られた文書数の割合である。In these search systems, a document containing a word completely matching the search word entered by the user or a word completely matching the keyword entered by the user is added as a keyword for search. Only those documents that are found can be obtained as search results. Therefore, in such a search system, since a complete match between the search word and the keyword is required, it cannot be said that all documents required by the user can be obtained comprehensively. That is, the retrieval system has a relatively high relevance but a low recall. Here, the relevance is the ratio of documents desired by the user to the total number of documents obtained as search results, and the recall is the number of documents obtained as search results to the total number of documents desired by the user Is the ratio of

【０００４】そこで、適合率を高い値に保ったまま再現
率を向上させるために、同義語の概念を用いることによ
り、ユーザが入力した検索語あるいは文書に付加するキ
ーワードの展開を行なうという方式が用いられている。
例えば、ユーザが入力した検索語が「辞書」の場合、同
義語辞書を用いることにより「辞典」、「字書」、「字
典」、「事典」等へと展開し、「辞書」および展開され
た語のいずれかを含む文書を検索する。あるいは、文書
に付加すべきキーワードが「辞書」の場合に、「辞書」
および展開された語の全てをキーワードとして付加して
おき、検索語としていずれかの語が入力された場合にこ
れらの文書が検索されるようにする。このようにするこ
とにより、再現率の向上を目指している。In order to improve the recall while keeping the relevance at a high value, there is a method in which the concept of a synonym is used to develop a search word input by a user or a keyword to be added to a document. Used.
For example, if the search term entered by the user is a "dictionary", the synonym dictionary is used to expand the dictionary into "dictionaries", "letters", "letters", "encyclopedias", etc. Search for documents that contain any of the words. Or, if the keyword to be added to the document is "dictionary", "dictionary"
In addition, all of the expanded words are added as keywords, and when any of the words is input as a search word, these documents are searched. By doing so, the aim is to improve the recall.

【０００５】伊藤哲郎，「情報検索」，昭晃堂（１９８
５）において述べられている通り、この方式で重要視さ
れるのは、交換可能性（ｉｎｔｅｒｃｈａｎｇｅａｂｉ
ｌｉｔｙ）を持つ語（同義語）へと展開することであ
る。２つの語が交換可能性を持つとは、それぞれの語が
あらわれる文脈中で互いに他の語に入れ替えても文脈の
意味が変わらない場合をいう。Tetsuro Ito, "Information Search", Shokodo (198
As noted in 5), what is emphasized in this scheme is the interchangeability (interchangeability).
(synonyms). Two words are interchangeable when the meaning of the context does not change even if the words are replaced with other words in the context in which each word appears.

【０００６】上記の方式によって、適合率を下げること
なく再現率を向上させることが可能となった。しかしな
がら、ある語と交換可能性を持つ語の語彙には限りがあ
る。そのため、さらなる再現率の向上を目指す方式が提
案されている。例えば、特開平７−２１０５６８号公報
に記載されている方式は、ユーザによって入力された検
索語を類似語辞書（シソーラス）を用いて展開した上で
検索することにより、再現率の向上を図るものである。
例えば、検索語として「パルプ」を入力し、検索に失敗
した場合、シソーラスを用いて「原料」等の上位概念語
に展開し、展開した語により検索を行なう。この文献に
も記述されているように、展開する語は類似語、上位概
念語に限らず、下位語、反義語、対義語等の場合も考え
られる。[0006] The above method makes it possible to improve the recall without lowering the precision. However, the vocabulary of words that are interchangeable with certain words is limited. For this reason, a method for further improving the recall has been proposed. For example, the method described in Japanese Patent Application Laid-Open No. 7-210568 is intended to improve the recall by expanding a search term input by a user using a similar word dictionary (thesaurus) and searching. It is.
For example, if "pulp" is input as a search term and the search fails, the search term is expanded into broader terms such as "raw material" using a thesaurus, and a search is performed using the expanded term. As described in this document, the words to be developed are not limited to similar words and broader terms, but may be lower words, antonyms, and opposite words.

【０００７】また、特開平６−２７４５４１号公報に
は、ユーザによって入力された検索語を、語の共起関係
を基に関連語へと展開した上で検索することにより、再
現率の向上を図ることが記載されている。例えば、検索
条件として「数学」を入力した場合、共起関係を基に
「方程式」、「幾何学」等の関連語（連想語）へと展開
し、展開した語によっても検索を行なう。Japanese Patent Laid-Open Publication No. Hei 6-274541 discloses an improvement in the recall rate by retrieving a search term input by a user into related terms based on the co-occurrence relation of the terms. It is described that it is intended. For example, when “mathematics” is input as a search condition, the word is expanded into related words (associative words) such as “equation” and “geometry” based on the co-occurrence relation, and a search is also performed using the expanded words.

【０００８】しかしながら、上述のシソーラスや共起関
係を用いた展開では、ユーザによって入力された語とは
意味的に大きく異なる語（交換可能性が全く成り立たな
い語）へと展開が行なわれることが多い。例えば、上述
の例において、「パルプ」と「原料」は意味的に全く異
なる語である。したがって、検索結果中にユーザの意図
しない文書が多く含まれてしまい、再現率は向上するも
のの、適合率が著しく低下するという問題がある。However, in the above-described development using the thesaurus and co-occurrence relation, the development may be performed to a word semantically significantly different from the word input by the user (a word having no exchangeability). Many. For example, in the above example, "pulp" and "raw material" are semantically different words. Therefore, there are many documents not intended by the user in the search result, and although the recall is improved, the relevance is significantly reduced.

【０００９】[0009]

【発明が解決しようとする課題】本発明は、上述した事
情に鑑みてなされたもので、検索の適合率を低下させる
ことなく、再現率を向上させた文書検索装置を提供する
ことを目的とするものである。SUMMARY OF THE INVENTION The present invention has been made in view of the above circumstances, and it is an object of the present invention to provide a document retrieval apparatus in which the recall is improved without lowering the relevance of retrieval. Is what you do.

【００１０】[0010]

【課題を解決するための手段】請求項１に記載の発明
は、文書検索装置において、複数の文書を格納する文書
格納手段と、前記文書格納手段から所望の文書を検索す
るための少なくとも検索語を入力する検索語入力手段
と、単語の同義関係が記述された同義語辞書と単語の語
義が記述された単語辞書の内容を格納する辞書格納手段
と、該辞書格納手段中の前記同義語辞書および単語辞書
の記述内容を基に各単語と該単語を説明する複数の単語
の対を記述する説明語辞書を作成する辞書作成手段と、
前記検索語入力手段で入力された前記検索語を前記辞書
作成手段で作成された説明語辞書を用いて該検索語と類
似の意味を表現する語の集合へと展開する展開手段と、
前記展開手段によって展開された語の集合を基に前記文
書格納手段から文書の集合を得る検索手段を有すること
を特徴とするものである。According to a first aspect of the present invention, there is provided a document search apparatus, comprising: a document storage means for storing a plurality of documents; and at least a search word for searching a desired document from the document storage means. Search word input means for inputting a word, a synonym dictionary describing word synonymous relations, and a dictionary storage means storing the contents of a word dictionary describing word meanings, and the synonym dictionary in the dictionary storage means Dictionary creation means for creating an explanatory word dictionary that describes each word and a plurality of word pairs that explain the word based on the description content of the word dictionary;
Expansion means for expanding the search word input by the search word input means into a set of words expressing similar meanings to the search word using an explanatory word dictionary created by the dictionary creation means;
And a search unit for obtaining a set of documents from the document storage unit based on the set of words expanded by the expansion unit.

【００１１】請求項２に記載の発明は、文書検索装置に
おいて、複数の文書を対象として前記各文書に対応する
キーワードを得るキーワード取得手段と、単語の同義関
係が記述された同義語辞書と単語の語義が記述された単
語辞書の内容を格納する辞書格納手段と、該辞書格納手
段中の前記同義語辞書および前記単語辞書の記述内容を
基に各単語と該単語を説明する複数の単語の対を記述す
る説明語辞書を作成する辞書作成手段と、該辞書作成手
段で作成された前記説明語辞書を用いて前記キーワード
取得手段によって得られたキーワードと類似の意味を表
現するキーワードの集合に展開する展開手段と、各文書
ごとに前記キーワード取得手段から得られたキーワード
と前記展開手段から得られたキーワードの集合を該文書
と対にして格納する文書格納手段と、該文書格納手段か
ら所望の文書を検索するための少なくとも検索語を入力
する検索語入力手段と、該検索語入力手段で入力された
前記検索語と前記文書格納手段に格納されているキーワ
ードあるいはキーワードの集合とを比較して前記文書格
納手段から文書の集合を得る検索手段を有することを特
徴とするものである。According to a second aspect of the present invention, in the document retrieval apparatus, a keyword acquiring means for acquiring a keyword corresponding to each of a plurality of documents, a synonym dictionary describing the synonymous relation of the words, and a word Dictionary storage means for storing the contents of a word dictionary in which the meaning of the word is described, and a plurality of words for explaining each word based on the description contents of the synonym dictionary and the word dictionary in the dictionary storage means. A dictionary creating means for creating an explanatory word dictionary describing the pair, and a set of keywords expressing similar meanings to the keywords obtained by the keyword obtaining means using the explanatory word dictionary created by the dictionary creating means. Expanding means for expanding, and for each document, a set of keywords obtained from the keyword obtaining means and a set of keywords obtained from the expanding means are stored as a pair with the document. Document storage means, search word input means for inputting at least a search word for searching for a desired document from the document storage means, and the search word input by the search word input means and stored in the document storage means And a retrieval unit for comparing a set of keywords or a set of keywords to obtain a set of documents from the document storage unit.

【００１２】請求項３に記載の発明は、請求項１または
２に記載の文書検索装置において、前記展開手段で展開
する前記集合は、語またはキーワードからなる論理式で
あることを特徴とするものである。According to a third aspect of the present invention, in the document search device according to the first or second aspect, the set expanded by the expansion means is a logical expression including a word or a keyword. It is.

【００１３】[0013]

【発明の実施の形態】図１は、本発明の文書検索装置の
第１の実施の形態を示す構成図である。図中、１は国語
辞書、２は同義語辞書、３は説明語辞書作成部、４は説
明語辞書格納部、５は検索語入力部、６は展開部、７は
文書格納部、８は検索部、９は表示部、１０は辞書格納
部である。辞書格納部１０は、国語辞書１および同義語
辞書２の内容を格納する。国語辞書１は、電子化された
国語辞典のテキスト内容そのものであり、各見出し語に
対して、見出し語以外の語を用いて見出し語と同義の説
明を行なうものである。説明は、ここでは自然言語で表
現されているものとする。もちろん、国語辞書１の内容
は国語辞典のほか、科学技術用語辞典や経済用語辞典等
の専門分野に特化した辞書などを用いてもよい。あるい
は、いくつかの辞典を組み合わせて用いてもよい。同義
語辞書２は、交換可能性を持つ語の集合のリストであ
り、電子化されたテキストの形式で格納されている。FIG. 1 is a block diagram showing a first embodiment of a document search apparatus according to the present invention. In the figure, 1 is a Japanese language dictionary, 2 is a synonym dictionary, 3 is an explanatory word dictionary creation unit, 4 is an explanatory word dictionary storage unit, 5 is a search word input unit, 6 is a development unit, 7 is a document storage unit, 8 is A search unit, 9 is a display unit, and 10 is a dictionary storage unit. The dictionary storage unit 10 stores the contents of the Japanese language dictionary 1 and the synonym dictionary 2. The national language dictionary 1 is the text content of the digitized Japanese language dictionary itself, and provides a description equivalent to a headword using a word other than the headword for each headword. The description is assumed to be expressed in a natural language here. Of course, the content of the Japanese language dictionary 1 may be a Japanese language dictionary, or a dictionary specialized in a specialized field such as a science and technology term dictionary or an economic term dictionary. Alternatively, some dictionaries may be used in combination. The synonym dictionary 2 is a list of a set of words that can be exchanged, and is stored in the form of digitized text.

【００１４】説明語辞書作成部３は、辞書格納部１０に
格納されている国語辞書１の内容と同義語辞書２の内容
を基に、説明語辞書を作成する。説明語辞書格納部４
は、説明語辞書作成部３によって作成された説明語辞書
の内容を格納する。ここで、説明語辞書とは、ある語と
その語を説明付ける語の集合を対として持つ辞書であ
る。図２は、本発明における説明語辞書の説明図、図３
は、従来の類似語辞書の説明図である。従来の類似語辞
書（シソーラス）は、図３（Ａ）や（Ｂ）に示す通り、
語の２項間の対応関係を規定することにより、全体とし
て語の木構造あるいはネットワーク構造が構築されるも
のである。これに対し、本発明における説明語辞書は、
図２に示す通り、語と語の集合の対応関係を規定したも
のである。説明語辞書においては、語と語の２項間の対
応関係は存在しない。例えば図２（Ｃ）に示すように、
「類似語」は、「言葉、意味、同じ」の語集合とのみ同
義関係を持ち、「類似語」と「言葉」、「類似語」と
「意味」、「言葉」と「意味」といった２項間の対応関
係は何ら規定されない。The explanatory word dictionary creating section 3 creates an explanatory word dictionary based on the contents of the Japanese language dictionary 1 and the contents of the synonym dictionary 2 stored in the dictionary storage section 10. Explanatory word dictionary storage unit 4
Stores the contents of the explanatory word dictionary created by the explanatory word dictionary creating unit 3. Here, the explanatory word dictionary is a dictionary having a pair of a certain word and a set of words explaining the word. FIG. 2 is an explanatory diagram of an explanatory word dictionary according to the present invention, and FIG.
Is an explanatory diagram of a conventional similar word dictionary. A conventional similar word dictionary (thesaurus) is, as shown in FIGS. 3A and 3B,
By defining the correspondence between two terms of a word, a word tree structure or a network structure is constructed as a whole. On the other hand, the explanatory word dictionary in the present invention is
As shown in FIG. 2, the correspondence between a word and a set of words is defined. In the explanatory word dictionary, there is no correspondence between words and two terms. For example, as shown in FIG.
"Similar word" has a synonymous relationship only with the word set of "word, meaning, same", and includes two terms such as "similar word" and "word", "similar word" and "meaning", and "word" and "meaning". No correspondence between terms is specified.

【００１５】上述のように、２項間の対応関係に基づい
た検索語または付加キーワードの展開では、交換可能性
を持つ語への展開を除いて、同義関係のない語への展開
が行なわれ、適合率が低下する。検索語または付加キー
ワードを語の集合へと展開することにより、単独の語に
よっては表現できない同義関係を保つことが可能とな
り、適合率を低下させることなく再現率を向上させるこ
とができる。この第１の実施の形態では、検索語を語の
集合へと展開する場合を示し、後述する第２の実施の形
態では、付加キーワードを語の集合へと展開する場合を
示している。As described above, in the expansion of a search word or an additional keyword based on the correspondence between two terms, expansion to a word having no synonymous relationship is performed except for expansion to a word having interchangeability. , The precision is reduced. By expanding a search word or an additional keyword into a set of words, it is possible to maintain a synonymous relationship that cannot be expressed by a single word, and it is possible to improve the recall without lowering the precision. The first embodiment shows a case where a search word is expanded into a set of words, and a second embodiment described later shows a case where an additional keyword is expanded into a set of words.

【００１６】なお、説明語辞書による展開は、語を単に
分解して複数の語の集合へと展開するものであってはな
らない。例えば、「登板」を単に「板」（プレート）と
「登る」に分解して対応付けるのではなく、「野球、投
手、出場」といった語の集合へと対応付ける。意味的に
遠い語を構成要素とする集合へと展開することによって
はじめて、再現率の大幅な向上を実現することが可能と
なる。The expansion using the explanatory word dictionary must not simply decompose a word and expand it into a set of a plurality of words. For example, instead of simply decomposing “climbing” into “plane” (plate) and “climbing” and associating them, they are associated with a set of words such as “baseball, pitcher, participation”. Only by expanding to a set having semantically distant words as constituent elements, it is possible to realize a significant improvement in recall.

【００１７】図１に戻り、検索語入力部５は、ユーザが
検索を行なうための検索条件として語（検索語）あるい
は語（検索語）を含む検索式を入力することが可能なユ
ーザインタフェースである。展開部６は、辞書格納部１
０に格納されている同義語辞書２の内容と、説明語辞書
格納手段４に格納されている説明語辞書の内容を基に、
検索語入力部５に入力された検索語を同義語および論理
式へと展開する。検索語および展開結果として得られた
検索語、論理式は検索部８へ通知される。Returning to FIG. 1, the search word input unit 5 is a user interface that allows a user to input a word (search word) or a search formula including the word (search word) as search conditions for performing a search. is there. The development unit 6 stores the dictionary storage unit 1
0 and the contents of the explanatory word dictionary stored in the explanatory word dictionary storage means 4,
The search term input to the search term input unit 5 is expanded into a synonym and a logical expression. The search word, the search word and the logical expression obtained as the expansion result are notified to the search unit 8.

【００１８】文書格納手段７は、検索対象となる複数の
文書を電子化されたテキストの形式で格納している。検
索部８は、展開部６から得られる検索語、検索語と同義
の語および論理式を基に、文書格納部７中から検索結果
となる文書を特定する。表示部９は、検索手段８によっ
て特定された文書を検索結果としてユーザに表示するユ
ーザインタフェースである。The document storage means 7 stores a plurality of documents to be searched in the form of digitized text. The search unit 8 specifies a document serving as a search result from the document storage unit 7 based on the search word obtained from the expansion unit 6, a word having the same meaning as the search word, and a logical expression. The display unit 9 is a user interface for displaying a document specified by the search unit 8 to a user as a search result.

【００１９】図４は、説明語辞書作成部３の処理手順の
一例を示すフローチャートである。説明語辞書作成部３
の処理は基本的に、国語辞書１の各見出し語に、見出し
語の説明文に出現する自立語の集合を説明語彙として対
応付けるものである。特に国語辞典は、各見出し語に対
して、見出し語以外の語を用いて見出し語と同義の説明
を行なうものである。したがって、同義関係を持つ語の
集合（説明語彙）を得るためには最も適したデータであ
ると言える。なお、図中のＭは０＜Ｍ＜１を満たす実数
定数であるとする。FIG. 4 is a flowchart showing an example of the processing procedure of the explanatory word dictionary creating section 3. Explanation word dictionary creation unit 3
Is basically a process of associating each headword in the Japanese language dictionary 1 with a set of independent words appearing in the description of the headword as a description vocabulary. In particular, the Japanese language dictionary provides a description equivalent to a headword for each headword using words other than the headword. Therefore, this data can be said to be the most suitable data for obtaining a set of words having synonymous relationships (explanatory vocabulary). It is assumed that M in the drawing is a real constant that satisfies 0 <M <1.

【００２０】まずＳ２１において、国語辞書１から見出
し語とその説明文を抽出し、Ｓ２２において、国語辞書
１の先頭の見出し語をＡとする。Ｓ２３において、見出
し語Ａに対応する説明文がすでに形態素解析処理されて
いるか否かを判定し、形態素解析処理されている場合に
は、既に他の見出し語の処理において説明語彙として取
得されているので、Ｓ３０へ進んで次の見出し語の処理
に移る。まだ形態素解析処理が行なわれていない場合に
は、Ｓ２４において見出し語Ａに対応する説明文に形態
素解析処理を施し、自立語を抽出する。First, in S21, a headword and its description are extracted from the Japanese dictionary 1, and in S22, the head headword of the Japanese dictionary 1 is set to A. In S23, it is determined whether or not the explanatory sentence corresponding to the headword A has already been subjected to morphological analysis processing. If the explanatory sentence has been subjected to morphological analysis processing, it has already been acquired as an explanatory vocabulary in the processing of another headword. Therefore, the process proceeds to S30 and proceeds to the processing of the next headword. If the morphological analysis processing has not been performed yet, the morphological analysis processing is performed on the explanatory sentence corresponding to the headword A in S24 to extract an independent word.

【００２１】Ｓ２５において、見出し語Ａと同義の語と
して同義語辞書２に記述されており、かつ、国語辞書１
の見出し語である語が存在するか否かを判定する。も
し、そのような語が存在する場合には、Ｓ２６におい
て、見出し語Ａと同義のすべての見出し語（これらを見
出し語Ａ１，Ａ２，．．．，Ａｎとする）に対応する説
明文に形態素解析処理を施し、自立語を抽出する。そし
て、Ｓ２７において、Ｓ２４で抽出した見出し語Ａの自
立語、およびＳ２６で抽出した見出し語Ａ１，Ａ
２，．．．，Ａｎに対応する説明文中の自立語のうち、
（ｎ＋１）×Ｍ回以上出現する自立語のすべてを、見出
し語ＡおよびＡ１，Ａ２，．．．，Ａｎにそれぞれ対す
る説明語彙とする。ただし、抽出される自立語に同義語
が存在する場合は、同じ語として数える。この場合、説
明語彙とする語は、同義語のうちの任意の１語とすれば
よい。In S25, the synonym dictionary 2 describes the word synonymous with the headword A.
It is determined whether or not there is a word that is a headword of. If such a word exists, in S26, the morpheme is added to the description corresponding to all the headwords having the same meaning as the headword A (these are referred to as headwords A1, A2, ..., An). Analyze and extract independent words. Then, in S27, the independent words of the headword A extracted in S24 and the headwords A1, A extracted in S26.
2,. . . , An, among the independent words in the description corresponding to
All of the independent words appearing at least (n + 1) × M times are headwords A and A1, A2,. . . , An, respectively. However, if a synonym exists in the extracted independent word, it is counted as the same word. In this case, the word used as the explanatory vocabulary may be any one of the synonyms.

【００２２】Ｓ２５において見出し語Ａと同義の語が同
義語辞書２に存在しなかったり、あるいは同義語辞書２
に存在しても国語辞書１の見出し語として存在しない場
合には、Ｓ２８において、Ｓ２４で得られた自立語の集
合を、見出し語Ａに対する説明語彙とする。ただし、抽
出された自立語に同義語が存在する場合は、同義語のう
ちの任意の１語を説明語彙の一つとすればよい。At S25, a word synonymous with the headword A does not exist in the synonym dictionary 2, or the synonym dictionary 2
If it does not exist as a headword in the national language dictionary 1, the set of independent words obtained in S24 is used as an explanatory vocabulary for the headword A in S28. However, when a synonym exists in the extracted independent word, any one of the synonyms may be set as one of the explanatory vocabulary.

【００２３】Ｓ２９において、国語辞書１に見出し語Ａ
の次の見出し語が存在するか否かを判定し、存在する場
合には、Ｓ３０において次の見出し語を新たな見出し語
ＡとしてＳ２３へ戻って処理を繰り返し、最後の見出し
語まで処理を行なう。In S29, the headword A is entered in the Japanese language dictionary 1.
It is determined whether or not the next headword exists, and if so, the process returns to S23 with the next headword as a new headword A in S30, and the process is repeated up to the last headword. .

【００２４】以下、具体例を用いて本発明の第１の実施
の形態の動作の一例を説明する。図５は、同義語辞書の
内容の一例の説明図、図６は、国語辞書の内容の形態素
解析結果の一例の説明図である。ここでは、同義語辞書
２に図５に示すように、「同義語」、「類似語」、「類
義語」、「類語」の４語が同義語であるとして記述され
ており、同様に、「意味」と「意義」、「言葉」と
「語」、「同等」と「等しい」と「同じ」がそれぞれ同
義語であるという記述があるものとする。また、国語辞
書１には見出し語として「同義語」、「類似語」、「類
義語」、「類語」と、それぞれの見出し語に対応する説
明文が記述してあるものとする。これらの見出し語に対
応する説明文から抽出した自立語を図６に示している。
また、Ｍ＝０．８であるとする。Hereinafter, an example of the operation of the first embodiment of the present invention will be described using a specific example. FIG. 5 is an explanatory diagram of an example of the contents of the synonym dictionary, and FIG. 6 is an explanatory diagram of an example of a morphological analysis result of the contents of the Japanese language dictionary. Here, as shown in FIG. 5, the synonym dictionary 2 describes four words of “synonyms”, “similar words”, “synonyms”, and “synonyms” as synonyms. It is assumed that there is a description that "meaning" and "significance", "word" and "word", and "equivalent" and "equal" and "same" are synonyms. Further, it is assumed that the national language dictionary 1 describes, as headwords, “synonyms”, “similar words”, “synonyms”, and “synonyms”, and explanatory sentences corresponding to the respective headwords. FIG. 6 shows the independent words extracted from the explanatory sentences corresponding to these headwords.
It is assumed that M = 0.8.

【００２５】説明語辞書作成部３において、図４に示す
処理が行なわれ、見出し語Ａとして「同義語」が選択さ
れているとき、Ｓ２４で見出し語「同義語」に対応する
説明文に形態素解析処理を施し、図６の１行目に示すよ
うに「意味」、「同等」、「言葉」、「シノニム」等の
自立語が抽出される。また、Ｓ２５において、図５に示
す同義語辞書２の内容を参照すると、見出し語「同義
語」と同義の語として「類似語」、「類義語」、「類
語」が得られる。これらの語は図６に示すように国語辞
書１の見出し語であるので、Ｓ２６においてこれらの語
に対応する説明文を形態素解析処理し、図６の２〜４行
目に示すような自立語を得る。そして、これらのうちか
ら、（３＋１）×０．８＝３．２回以上出現する自立語
を集めると、「言葉」が得られる。このとき、図５に示
す同義語辞書２の内容から、「意味」と「意義」が同義
であるから、出現回数を加算すると４回となり、「意
味」または「意義」のいずれかが得られる。同様に、
「同等」、「等しい」、「同じ」も同義であるので、出
現回数を加えると４回となり、「同等」、「等しい」、
「同じ」のいずれかが得られる。このような処理によ
り、「同義語」、「類似語」、「類義語」、「類語」の
各語に対する説明語彙は「言葉」、「意味」、「同じ」
あるいは「言葉」、「意味」、「等しい」や「言葉」、
「意義」、「同じ」等となる。In the explanatory word dictionary creating unit 3, the processing shown in FIG. 4 is performed, and when "synonym" is selected as the headword A, the morpheme is added to the explanatory sentence corresponding to the headword "synonym" in S24. An analysis process is performed to extract independent words such as “meaning”, “equivalent”, “word”, and “synonym” as shown in the first line of FIG. In S25, referring to the contents of the synonym dictionary 2 shown in FIG. 5, "similar words", "synonyms", and "synonyms" are obtained as words synonymous with the headword "synonyms". Since these words are the headwords of the Japanese language dictionary 1 as shown in FIG. 6, the explanation sentence corresponding to these words is subjected to morphological analysis processing in S26, and the independent words as shown in the second to fourth lines in FIG. Get. Then, by collecting independent words that appear (3 + 1) × 0.8 = 3.2 times or more from these, “words” are obtained. At this time, from the contents of the synonym dictionary 2 shown in FIG. 5, since “meaning” and “significance” are synonymous, the number of appearances is added to four, and either “meaning” or “significance” is obtained. . Similarly,
Since "equal", "equal", and "same" are synonymous, adding the number of appearances results in four times, "equal", "equal",
Either "same" is obtained. By such processing, the explanatory vocabulary for each of the terms “synonym”, “similar word”, “synonym”, and “synonym” is “word”, “meaning”,
Or "words", "meaning", "equals" or "words"
"Significance", "same", etc.

【００２６】展開部６では、同義語辞書２の内容と、上
述のようにして作成された説明語辞書の内容を基に、検
索語入力部５に入力された検索語を同義語および論理式
へと展開する。ここで、検索語入力部５に入力された検
索語をＷとする。また、検索語Ｗの説明語彙をＷ１，Ｗ
２，・・・，Ｗｎとし、説明語彙Ｗ１の同義語をＷ１
１，Ｗ１２，・・・、説明語彙Ｗ２の同義語をＷ２１，
Ｗ２２，・・・、・・・、説明語彙Ｗｎの同義語をＷｎ
１，Ｗｎ２，・・・とすれば、検索語Ｗを（Ｗ１ｏｒＷ１１ｏｒＷ１２ｏｒ・・・）
ａｎｄ（Ｗ２ｏｒＷ２１ｏｒＷ２２ｏｒ
・・・）ａｎｄ・・・ａｎｄ（ＷｎｏｒＷ
ｎ１ｏｒＷｎ２ｏｒ・・・）へと展開する。なお、Ｗに対する説明語彙が存在しない
場合、展開は行なわない。In the expansion unit 6, based on the contents of the synonym dictionary 2 and the contents of the explanatory word dictionary created as described above, the search words input to the search word input unit 5 are converted into synonyms and logical expressions. Expand to Here, it is assumed that the search word input to the search word input unit 5 is W. The description vocabulary of the search word W is W1, W
, Wn, and the synonym of the explanatory vocabulary W1 is W1
, W12, ..., the synonyms of the explanatory vocabulary W2 are W21,
W22,...,...
1, Wn2,..., The search word W is (W1 or W11 or W12 or...)
and (W2or W21 or W22 or
・・・) And ・・・ and (Wn or W
n1 or Wn2 or ...). If there is no explanatory vocabulary for W, no expansion is performed.

【００２７】上述の具体例において、検索語として「類
似語」が入力されると、例えば、「類似語」の説明語彙
として「言葉」、「意味」、「同じ」などが得られてい
るので、それぞれの語の同義語、例えば図５に示すよう
な同義語辞書２の内容から、（言葉ｏｒ語）ａｎｄ（意味ｏｒ意義）
ａｎｄ（同等ｏｒ同じｏｒ等しい）へと展開される。In the above example, when "similar word" is input as a search word, for example, "word", "meaning", "same", etc. are obtained as the description vocabulary of "similar word". From the contents of the synonym dictionary 2 as shown in FIG. 5, for example, (word or word) and (meaning or meaning)
and (equal or same or equal).

【００２８】検索部８は、検索語入力部５に入力された
検索語Ｗを文書内容として含む文書、および、検索語入
力部５に入力された検索語Ｗの同義語のいずれかを文書
内容として含む文書を全て検索結果とする。さらに、展
開部６によって検索語の展開が行なわれていれば、展開
された論理式を満たす所定単位の文章を有する文書を全
て検索結果とする。所定単位としては、例えば文単位や
段落単位、章節単位などとすることができる。例えば、
段落を単位とする場合、入力された検索語Ｗが展開部６
において（Ｗ１１ｏｒＷ１２ｏｒ・・・）ａ
ｎｄ（Ｗ２１ｏｒＷ２２ｏｒ・・・）ａｎｄ
・・・ａｎｄ（Ｗｎ１ｏｒＷｎ２ｏｒ・・
・）へと展開されているとき、Ｗ１１，Ｗ１２，・・・
のいずれかの語と、Ｗ２１，Ｗ２２，・・・のいずれか
の語と、・・・、Ｗｎ１，Ｗｎ２，・・・のいずれかの
語を同時に含む段落が存在する文書を全て検索結果とす
る。なお、検索語入力部５から検索語を含む検索式が入
力されている場合には、各検索語についてこのような検
索を行ない、与えられている論理演算などを施して検索
結果とすればよい。[0028] The search unit 8 converts any one of a document including the search word W input to the search word input unit 5 as a document content and a synonym of the search word W input to the search word input unit 5 into the document content. Are all search results. Furthermore, if the expansion of the search word has been performed by the expansion unit 6, all documents having a predetermined unit of text that satisfies the expanded logical expression are set as search results. The predetermined unit may be, for example, a sentence unit, a paragraph unit, a chapter unit, or the like. For example,
When a paragraph is a unit, the input search word W is
At (W11 or W12 or ...) a
nd (W21 or W22 or ...) and
... and (Wn1 or Wn2 or ...
···, when expanded to W11, W12, ...
, W21, W22,..., And Wn1, Wn2,. I do. When a search expression including a search word is input from the search word input unit 5, such a search is performed for each search word, and a given logical operation may be performed to obtain a search result. .

【００２９】上述の具体例では、検索語「類似語」から
展開部６によって（言葉ｏｒ語）ａｎｄ（意味
ｏｒ意義）ａｎｄ（同等ｏｒ同じｏｒ
等しい）へと展開された。検索部８は、検索語である
「類似語」およびその同義語である「同義語」、「類義
語」、「類語」を含む文書を検索するとともに、展開さ
れた論理式を満足する段落を含む文書を検索する。例え
ば、「本節では、言葉の分類について考える。１．１で
は、等しい意味を持っているこを条件として分類を行な
った。・・・・・・」という段落を含む文書が検索によ
って得られる。この文書は、「類似語」、「同義語」等
の語を含んではいないが、実質的には「類似語」につい
て書かれたものであるといえる。このようにして、再現
率を向上させることができる。In the above specific example, the expansion unit 6 (word or word) and (meaning or meaning) and (equivalent or same or) from the search word "similar word"
Equal). The search unit 8 searches for a document including the search term “similar word” and its synonyms “synonym”, “synonym”, and “synonym”, and includes a paragraph satisfying the expanded logical expression. Search for documents. For example, a document including a paragraph "This section considers the classification of words. In 1.1, classification was performed on condition that they have the same meaning ...." is obtained by retrieval. Although this document does not include words such as "similar words" and "synonyms", it can be said that this document is substantially written for "similar words". In this way, the recall can be improved.

【００３０】なお、説明語辞書作成部３による説明語辞
書の作成は、上述のように予め国語辞書１の全ての見出
し語について行なっておくこともできるが、例えば、検
索語が入力された時点で、その検索語に基づいて必要な
語について説明語辞書を作成するように構成することも
できる。It should be noted that the explanation word dictionary can be created by the explanation word dictionary creating section 3 for all headwords of the national language dictionary 1 in advance as described above. Then, it is also possible to configure so that an explanatory word dictionary is created for a necessary word based on the search word.

【００３１】図７は、本発明の文書検索装置の第２の実
施の形態を示す構成図である。図中、図１と同様の部分
には同じ符号を付して説明を省略する。１１はキーワー
ド取得部である。上述の第１の実施の形態では、検索語
を語の集合へと展開する場合を示したが、この第２の実
施の形態では、文書に付加されているキーワードを語の
集合へと展開する場合を示している。FIG. 7 is a block diagram showing a second embodiment of the document search apparatus of the present invention. In the figure, the same parts as those in FIG. Reference numeral 11 denotes a keyword acquisition unit. In the above-described first embodiment, a case has been described in which a search word is expanded into a set of words. In the second embodiment, a keyword added to a document is expanded into a set of words. Shows the case.

【００３２】キーワード取得部１１は、文書格納部７に
格納されている文書をユーザに提示し、付加すべきキー
ワードをユーザから受け取る。あるいは、文書を解析
し、自動的にキーワードを抽出する。この場合でも、ユ
ーザがキーワードを追加あるいは削除可能に構成しても
よい。展開部６は、辞書格納部１０に格納されている同
義語辞書２の内容と、説明語辞書格納部４に格納されて
いる説明語辞書の内容を基に、キーワード取得部１１か
ら得られたキーワードを同義語および論理式へと展開す
る。キーワード取得部１１から得られたキーワードおよ
び展開結果として得られた同義語、論理式を、対応する
文書と対にして文書格納部７に格納する。文書格納部７
は、検索対象となる複数の文書を電子化されたテキスト
の形式で格納する。このとき各文書には、展開部６から
得られるキーワードとその同義語および論理式が付加さ
れる。検索部８は、検索語入力部５から得られる検索式
中の検索語あるいは論理式と、文書格納部７に格納され
た各文書に付加されているキーワードとその同義語およ
び論理式を比較して、検索結果となる文書を特定する。The keyword acquiring section 11 presents the document stored in the document storing section 7 to the user, and receives a keyword to be added from the user. Alternatively, the document is analyzed and keywords are automatically extracted. Also in this case, the user may be able to add or delete a keyword. The expansion unit 6 is obtained from the keyword acquisition unit 11 based on the contents of the synonym dictionary 2 stored in the dictionary storage unit 10 and the contents of the explanatory word dictionary stored in the explanatory word dictionary storage unit 4. Expand keywords into synonyms and logical expressions. The keyword obtained from the keyword obtaining unit 11 and the synonyms and logical expressions obtained as a result of the development are stored in the document storage unit 7 in pairs with the corresponding documents. Document storage unit 7
Stores a plurality of documents to be searched in the form of digitized text. At this time, a keyword obtained from the expansion unit 6, its synonym, and a logical expression are added to each document. The search unit 8 compares a search word or a logical expression in the search expression obtained from the search word input unit 5 with a keyword added to each document stored in the document storage unit 7 and its synonym and logical expression. Then, a document serving as a search result is specified.

【００３３】図８は、本発明の文書検索装置の第２の実
施の形態における文書格納部のデータ構造の一例の説明
図である。この第２の実施の形態では、展開部６がキー
ワード取得部１１で取得したキーワードを上述の第１の
実施の形態で説明したように論理式に展開し、対応する
文書と対にして文書格納部７に格納する。したがって、
各文書には、通常付加されるキーワードおよびその同義
語の他に、（Ｗ１１ｏｒＷ１２ｏｒ・・・）ａｎｄ
（Ｗ２１ｏｒＷ２２ｏｒ・・・）ａｎｄ・
・・ａｎｄ（Ｗｎ１ｏｒＷｎ２ｏｒ・・・）の形式で表現される論理式も付加される。FIG. 8 is an explanatory diagram of an example of the data structure of the document storage unit in the second embodiment of the document search device of the present invention. In the second embodiment, the keyword acquired by the keyword acquiring unit 11 is expanded by the expanding unit 6 into a logical expression as described in the first embodiment, and the keyword is stored in a pair with the corresponding document. Stored in the unit 7. Therefore,
Each document has (W11 or W12 or ...) and
(W21 or W22 or ...) and
.. And (Wn1 or Wn2 or...) Are also added.

【００３４】図９に示した例では、文書１において、
「類似語」あるいは、「同義語」、「類義語」、「類
語」が「（言葉ｏｒ語）ａｎｄ（意味ｏｒ
意義）ａｎｄ（同等ｏｒ同じｏｒ等しい）」
へと展開され、キーワードの１つとして格納されてい
る。同様に、文書３において、「逸材」が「（優秀ｏ
ｒ優等ｏｒ秀逸ｏｒ優れた）ａｎｄ（才能
ｏｒ才気ｏｒ才知）」へと展開され、格納され
ている。In the example shown in FIG.
“Similar words” or “synonyms”, “synonyms”, and “synonyms” are “(word or word) and (meaning or
Meaning) and (equivalent or same or equal) "
And stored as one of the keywords. Similarly, in Document 3, “special material” is changed to “(excellent o
r excellence or excellence or excellence) and (talent or ingenuity or intellect).

【００３５】検索部８は、検索語入力部５から入力され
た検索式中の語あるいは論理式と、文書格納部７に格納
されている各文書に付加されているキーワード（キーワ
ードの同義語）を通常の検索方式で比較し、該当する文
書を全て検索結果とする。また、検索式中に論理式が含
まれている場合、その論理式をド・モルガンの法則に従
って展開した上で、語のａｎｄ接続の集合をｏｒ接続し
た形式、すなわち、（ｗ１１ａｎｄｗ１２ａｎｄ・・・）ｏｒ
（ｗ２１ａｎｄｗ２２ａｎｄ・・・）ｏｒ・
・・ｏｒ（ｗｍ１ａｎｄｗｍ２ａｎｄ・・
・）へと変換する。The search unit 8 includes a word or a logical expression in the search expression input from the search word input unit 5 and a keyword (synonym of keyword) added to each document stored in the document storage unit 7. Are compared with each other by a normal search method, and all corresponding documents are set as search results. When a logical expression is included in the search expression, the logical expression is expanded in accordance with De Morgan's law, and a set of and connections of words is connected or connected, that is, (w11 and w12 and.・・) Or
(W21 andw22 and ...) or
・・ Or (wm1 and wm2and ・・
・)

【００３６】変換された論理式のｍ個のａｎｄ接続され
た語の集合（ｗ１１ａｎｄｗ１２ａｎｄ・・・）（ｗ２１ａｎｄｗ２２ａｎｄ・・・）・・・（ｗｍ１ａｎｄｗｍ２ａｎｄ・・・）に関して、これらのいずれかに注目して、以下の条件を
満たす文書Ａが存在する時、文書Ａを検索結果とする。
いま、文書格納部７中の文書Ａに付加されている論理式
の１つが（Ｗ１１ｏｒＷ１２ｏｒ・・・）ａｎｄ
（Ｗ２１ｏｒＷ２２ｏｒ・・・）ａｎｄ・
・・ａｎｄ（Ｗｎ１ｏｒＷｎ２ｏｒ・・・）であるとし、変換されたｍ個の論理式中のｉ番目の語の
集合（ｗｉ１ａｎｄｗｉ２ａｎｄ・・・）に注目しているとする。文書Ａに付加されている論理式
中のｎ個の集合（Ｗ１１，Ｗ１２，・・・）（Ｗ２１，Ｗ２２，・・・）・・・（Ｗｎ１，Ｗｎ２，・・・）の全てに、ｗｉ１，ｗｉ２，・・・のいずれかの語が要
素として存在するとともに、もし、ｗｉ１，ｗｉ２，・
・・のうち文書Ａに付加されている論理式中の上記ｎ個
の集合のいずれにも含まれていない語が存在する場合
は、それら上記ｎ個の集合に含まれていない語が、文書
Ａに付加されているキーワードかその同義語に等しいと
き、文書Ａを検索結果の１つとする。A set of m and connected words of the converted logical expression (w11 and w12 and ...) (w21 and w22 and ...) (wm1 and wm2 and ...) Paying attention to any of these, when there is a document A satisfying the following conditions, the document A is set as a search result.
Now, one of the logical expressions added to the document A in the document storage unit 7 is (W11 or W12 or ...) and
(W21 or W22 or ...) and
.. And (Wn1 or Wn2 or...), And focus on the set of the i-th word (wi1 and wi2 and...) In the converted m logical expressions. N sets of (W11, W12,...) (W21, W22,...) (Wn1, Wn2,...) In the logical expression added to the document A , Wi2,... Exist as elements, and if wi1, wi2,.
If there are words that are not included in any of the n sets in the logical expression added to the document A, words that are not included in the n sets are included in the document When the keyword added to A is equal to the keyword or its synonym, document A is regarded as one of the search results.

【００３７】具体例として、文書格納部７に格納されて
いるデータが図８に示す内容の場合に、検索語入力部５
に検索語「言葉」が入力されると、論理式ではないので
通常の検索方式によって検索を行ない、「文書２」が検
索結果となる。図８に示す文書Ａの論理式「（言葉ｏ
ｒ語）ａｎｄ（意味ｏｒ意義）ａｎｄ（同
等ｏｒ同じｏｒ等しい）」の中にも検索語「言
葉」が含まれているが、ａｎｄ接続されている「意味」
または「意義」、「同等」または「同じ」または「等し
い」が検索条件に存在しないため、文書１は検索されな
い。As a specific example, when the data stored in the document storage unit 7 has the contents shown in FIG.
When the search word "word" is input, the search is performed according to a normal search method because it is not a logical expression, and "document 2" is a search result. The logical expression of the document A shown in FIG.
r word) and (meaning or meaning) and (equivalent or same or equal) "also include the search word" word ", but" meaning "connected with and
Alternatively, since “significance”, “equal”, “same” or “equal” does not exist in the search condition, document 1 is not searched.

【００３８】例えば、検索語入力部５から論理式「（サ
ーチャーｏｒコンピューター）ａｎｄ優秀ａｎ
ｄ才能」が入力された場合、まず同値な論理式（サーチャーａｎｄ優秀ａｎｄ才能）ｏｒ
（コンピューターａｎｄ優秀ａｎｄ才能）へと変換される。変換された論理式のｍ個のａｎｄ接続
された語の集合（サーチャーａｎｄ優秀ａｎｄ才能）（コンピューターａｎｄ優秀ａｎｄ才能）について、まず、語集合（サーチャー，優秀，才能）に
注目すると、文書に付加されている論理式中のｎ個の集
合の全てに、「サーチャー」，「優秀」，「才能」のい
ずれかの語が要素として存在するものとして、文書３の
論理式「（優秀ｏｒ優等ｏｒ秀逸ｏｒ優れ
た）ａｎｄ（才能ｏｒ才気ｏｒ才知）」が該
当する。すなわち、文書３の論理式のうち、（優秀ｏ
ｒ優等ｏｒ秀逸ｏｒ優れた）には、注目してい
る（サーチャー，優秀，才能）中の「優秀」が存在し、
かつ、（才能ｏｒ才気ｏｒ才知）には「才能」
が存在する。しかしこの論理式は「サーチャー」を含ま
ないので、文書３に付加されているキーワードかその同
義語に等しいか否かを判定する。しかし、「サーチャ
ー」は文書３のキーワードにも、その同義語にも等しく
ないので、この時点では文書３を検索文書とはしない。For example, a logical expression "(searcher or computer) and excellent an"
When "d talent" is input, first, an equivalent logical expression (searcher and excellent and talent) or
(Computer and excellence and talent). About the set of m and connected words of the converted logical expression (searcher and excellent and talent) (computer and excellent and talent), first, paying attention to the word set (searcher, excellent and talent), it is added to the document. Assuming that any of the words “searcher”, “excellent”, and “talent” exists as an element in all of the n sets in the logical expression, the logical expression “(excellent or honor or Excellence or excellence) and (talent or ingenuity or intellect). That is, among the logical expressions in Document 3, (excellent o
r excellence or excellence or excellence), there is “excellence” in the attention (searcher, excellence, talent),
And (Talent or Talent or Talent) means "Talent"
Exists. However, since this logical expression does not include “searcher”, it is determined whether or not the keyword added to the document 3 is equal to the keyword or a synonym thereof. However, since "searcher" is not equal to the keyword of document 3 or its synonym, document 3 is not considered as a search document at this time.

【００３９】次の語集合（コンピューター，優秀，才
能）に注目すると、この場合も文書３の論理式が該当す
る。この文書３の論理式は「コンピューター」を含まな
いが、この語は文書３のキーワードとして登録されてい
たので、文書３が条件を満たすことになる。したがっ
て、検索結果として文書３が得られる。Focusing on the next word set (computer, excellence, talent), the logical expression of the document 3 also applies in this case. Although the logical expression of this document 3 does not include “computer”, since this word has been registered as a keyword of document 3, document 3 satisfies the condition. Therefore, document 3 is obtained as a search result.

【００４０】[0040]

【発明の効果】以上の説明から明らかなように、本発明
によれば、ユーザが入力した検索語あるいは文書に付加
すべきキーワードを、説明語辞書を用いることによって
単語の集合（論理式）へと展開し、検索の適合率を低下
させることなく、再現率を向上させることができるとい
う効果がある。As is apparent from the above description, according to the present invention, a search word input by a user or a keyword to be added to a document is converted into a set of words (logical expression) by using an explanatory word dictionary. The effect is that the recall can be improved without lowering the relevance of the search.

【図面の簡単な説明】[Brief description of the drawings]

【図１】本発明の文書検索装置の第１の実施の形態を
示す構成図である。FIG. 1 is a configuration diagram showing a first embodiment of a document search device of the present invention.

【図２】本発明における説明語辞書の説明図である。FIG. 2 is an explanatory diagram of an explanatory word dictionary according to the present invention.

【図３】従来の類似語辞書の説明図である。FIG. 3 is an explanatory diagram of a conventional similar word dictionary.

【図４】説明語辞書作成部３の処理手順の一例を示す
フローチャートである。FIG. 4 is a flowchart illustrating an example of a processing procedure of an explanatory word dictionary creating unit 3.

【図５】同義語辞書の内容の一例の説明図である。FIG. 5 is an explanatory diagram of an example of the contents of a synonym dictionary.

【図６】国語辞書の内容の形態素解析結果の一例の説
明図である。FIG. 6 is an explanatory diagram of an example of a morphological analysis result of the contents of a Japanese language dictionary.

【図７】本発明の文書検索装置の第２の実施の形態を
示す構成図である。FIG. 7 is a configuration diagram showing a second embodiment of the document search device of the present invention.

【図８】本発明の文書検索装置の第２の実施の形態に
おける文書格納部のデータ構造の一例の説明図である。FIG. 8 is an explanatory diagram illustrating an example of a data structure of a document storage unit according to the second embodiment of the document search device of the present invention.

[Explanation of symbols]

１…国語辞書、２…同義語辞書、３…説明語辞書作成
部、４…説明語辞書格納部、５…検索語入力部、６…展
開部、７…文書格納部、８…検索部、９…表示部、１０
…辞書格納部、１１…キーワード取得部。DESCRIPTION OF SYMBOLS 1 ... Japanese language dictionary, 2 ... Synonym dictionary, 3 ... Explanatory word dictionary preparation part, 4 ... Explanatory word dictionary storage part, 5 ... Search term input part, 6 ... Development part, 7 ... Document storage part, 8 ... Search part, 9 display unit, 10
... Dictionary storage unit, 11 ... Keyword acquisition unit.

Claims

[Claims]

A document storage unit for storing a plurality of documents;
A search word input unit for inputting at least a search word for searching for a desired document from the document storage unit; a synonym dictionary describing synonymous relations of words; and a word dictionary describing word meanings of words are stored. Dictionary storing means, and dictionary creating means for creating an explanatory word dictionary that describes pairs of each word and a plurality of words that explain the word based on the description contents of the synonym dictionary and the word dictionary in the dictionary storing means Expanding means for expanding the search word input by the search word input means into a set of words expressing similar meanings to the search word using an explanatory word dictionary created by the dictionary creating means; A document search apparatus, comprising: a search unit that obtains a set of documents from the document storage unit based on a set of words expanded by the expansion unit.

2. A keyword acquiring means for obtaining a keyword corresponding to each of a plurality of documents, a synonym dictionary describing a synonym of the word and a word dictionary describing the meaning of the word. Dictionary creation means for creating a dictionary storage means for creating and a description word dictionary describing pairs of each word and a plurality of words explaining the word based on the description contents of the synonym dictionary and the word dictionary in the dictionary storage means Means, and expanding means for expanding into a set of keywords expressing similar meanings to the keyword obtained by the keyword obtaining means using the explanatory word dictionary created by the dictionary creating means,
A document storage unit for storing, for each document, a set of keywords obtained from the keyword obtaining unit and the keywords obtained from the expanding unit in a pair with the document; and searching for a desired document from the document storage unit. Search word input means for inputting at least a search word, and comparing the search word input by the search word input means with a keyword or a set of keywords stored in the document storage means, A document search device comprising a search means for obtaining a set of documents.

3. The document search apparatus according to claim 1, wherein the set expanded by the expansion unit is a logical expression including a word or a keyword.