JP2003242170A

JP2003242170A - Document search device, document search method, and recording medium

Info

Publication number: JP2003242170A
Application number: JP2002038931A
Authority: JP
Inventors: Hideo Ito; 秀夫伊東
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2002-02-15
Filing date: 2002-02-15
Publication date: 2003-08-29
Anticipated expiration: 2022-02-15
Also published as: JP4118571B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a document search device, a document search method, and a recording medium capable of suppressing quality deterioration of document ranking regardless of the quality of search request. <P>SOLUTION: From a document group stored in a document storage means such as a hard disk 4, a document where an initial search language group stored in a search language storage means such as a hard disk 4 appears, is ranking-searched to acquire a first ranking. An enlarged language group is generated from a document group included therein. A document where the initial search group and the enlarged language group appear is ranking-searched to acquire second ranking. A control part such as CPU 22 and a memory 3 which adjusts the first ranking and the second ranking to acquire a new third ranking is provided. <P>COPYRIGHT: (C)2003,JPO

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、文書群から指定さ
れた検索語群が出現する文書を検索して順序付けする文
書検索装置、文書検索方法および記録媒体に関するもの
である。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a document search device, a document search method and a recording medium for searching and ordering documents in which a specified search word group appears from a document group.

【０００２】[0002]

【従来の技術】従来、検索条件に指定された複数の単語
（検索語：キーワード）により当該文書を検索する場合
に、検索条件に対して文書群中の各文書がその検索条件
を満たす度合い（以下、適合度とも呼ぶ）を求め、適合
度が大きい順に文書を順序付け（以下、ランキングとも
いう）して出力する文書検索装置が種々提案されている
（特開２０００-３２２４１６号公報、特開平１０‐４
９５４９号公報）。2. Description of the Related Art Conventionally, when a document is searched by a plurality of words (search word: keyword) designated as a search condition, the degree to which each document in a document group meets the search condition ( Hereinafter, various document search apparatuses have been proposed (Japanese Patent Laid-Open No. 2000-322416, Japanese Laid-Open Patent Publication No. 10-199810) that obtains the goodness of fit and orders the documents in the descending order of the goodness of fit (hereinafter also referred to as ranking). -4
9549).

【０００３】この種の文書検索装置において、検索要求
は検索語群{Ｔｉ}で表現され、文書Ｄと検索要求との適
合度は文書Ｄに対し、各検索語Ｔｉごとに求めたスコア
の和などで定義される場合が多い。ここで、スコアとは
文書Ｄの重要度を示す数値であり、文書Ｄと検索条件と
の間に所定の基準を設けて数値化されたものである。な
お、この数値化には、検索対象の文章や単語によって重
み付けすることも含まれる。In this type of document retrieval apparatus, a retrieval request is represented by a retrieval word group {Ti}, and the matching degree between the document D and the retrieval request is the sum of scores obtained for each retrieval word Ti for the document D. It is often defined by. Here, the score is a numerical value indicating the degree of importance of the document D, and is a numerical value by setting a predetermined criterion between the document D and the search condition. It should be noted that this digitization also includes weighting by the sentence or word to be searched.

【０００４】このような文書検索の結果を出力する場合
には、前述の検索要求に対し、検索された文書群中の各
文書の適合度に応じてランキングがなされ、文書ランキ
ングの品質は、平均適合率（ＡＶＰ）などで評価され
る。ここで、平均適合率とは、ランキングの上位ｒ個の
文書群に含まれる適合文書（検索要求を満たす文書）の
割合をｒ＝１、２、・・・Ｎに対して求め、それらＮ
個の値を平均したものである。When outputting the result of such a document search, the above-mentioned search request is ranked according to the matching degree of each document in the searched document group, and the quality of the document ranking is average. It is evaluated by the precision rate (AVP). Here, the average relevance ratio is the ratio of relevant documents (documents that satisfy the search request) included in the top r document groups in the ranking for r = 1, 2, ...
It is the average of the individual values.

【０００５】さらに、高い品質の文書ランキングを得る
ための一つの方法として、擬似適合性フィードバック法
(pseudo-relevance feedback method)がある。この擬似
適合性フィードバック法では、検索要求を表現する検索
語群（初期検索語群）を用いて検索（初期検索）を行
い、その検索結果の上位にランクされた文書群（シード
文書群）に含まれる新たな検索語（拡張語群）を生成す
る。そして、初期検索語群と拡張語群との両方を用いて
検索（拡張検索）した結果を最終的な出力とする。Further, as one method for obtaining a high quality document ranking, a pseudo conformance feedback method is used.
(pseudo-relevance feedback method) is available. In this pseudo relevance feedback method, a search (initial search) is performed using a search word group (initial search word group) expressing a search request, and a document group (seed document group) ranked higher in the search result is searched. A new search word (extended word group) included is generated. The result of the search (extended search) using both the initial search word group and the expanded word group is the final output.

【０００６】一方、関連語検索と呼ばれる機能が従来か
ら提案されている。ここで、関連語検索とは、与えられ
た検索語群に関連する別の検索語（関連語）群を求める
ことである。例えば、「ヒステリシス」という検索語に
対して、「誘電率」や「分極」などの関連語群を求め
る。このような関連語検索は、検索者が与えた検索語群
に対して、新たな別の検索語群を検索者に提示するな
ど、文書検索支援のために用いられることが多い。ま
た、関連語検索は擬似適合性フィードバック法に類似す
る方法で実現可能である。すなわち、拡張語群を関連語
群として出力すればよい。On the other hand, a function called related word search has been conventionally proposed. Here, the related word search is to obtain another search word (related word) group related to the given search word group. For example, a related word group such as "dielectric constant" or "polarization" is obtained for the search word "hysteresis". Such related word search is often used for document search support, such as presenting a new search word group to the searcher for the search word group given by the searcher. The related word search can be realized by a method similar to the pseudo compatibility feedback method. That is, the expanded word group may be output as the related word group.

【０００７】[0007]

【発明が解決しようとする課題】しかし、このような従
来の文書検索装置では、擬似適合性フィードバック法を
用いる場合に、初期検索語群により拡張語群を生成し、
この両方を用いて拡張検索しているために、与えられた
検索要求（初期検索語群）が適切でないと、初期検索結
果の品質は低く、シード文書から得られた拡張語は、当
該検索要求とは無関係な場合が多くなり、拡張検索結果
の品質低下につながるという問題があった。また、擬似
適合性フィードバック法を用いる場合に、検索要求の品
質は検索者によって決定され、かつ、文書検索装置がそ
の品質の良否を評価するのは非常に困難であるために、
拡張検索によって、初期検索結果より劣った品質の検索
結果が得られるおそれがあるという問題があった。さら
に、関連語検索を擬似適合性フィードバック法に類する
方法で実現する場合に、得られた拡張語群は言わば最終
目的とする関連語群の候補として扱われるが、関連語の
品質評価によって前述の候補から適切なもののみを選択
することは容易でないという問題があった。However, in such a conventional document search apparatus, when the pseudo compatibility feedback method is used, an expanded word group is generated by the initial search word group,
Since the expanded search is performed using both of them, if the given search request (initial search word group) is not appropriate, the quality of the initial search results is low, and the expanded word obtained from the seed document is the relevant search request. There are many cases that are unrelated to, and there is a problem that the quality of the extended search results deteriorates. Further, when using the pseudo compatibility feedback method, since the quality of the search request is determined by the searcher, and it is very difficult for the document search device to evaluate the quality of the quality,
There is a problem in that the expanded search may result in a search result that is inferior in quality to the initial search result. Further, when the related word search is realized by a method similar to the pseudo relevance feedback method, the obtained expanded word group is, so to speak, treated as a candidate of the final related word group. There is a problem that it is not easy to select only appropriate ones from the candidates.

【０００８】本発明は、このような問題を解決するため
になされたもので、検索要求の品質に拘らず、文書ラン
キングの品質劣化を抑えることが可能な文書検索装置、
文書検索方法および記録媒体を提供するものである。The present invention has been made to solve such a problem, and is capable of suppressing the quality deterioration of the document ranking regardless of the quality of the search request.
A document search method and a recording medium are provided.

【０００９】[0009]

【課題を解決するための手段】本発明の請求項１に係る
文書検索装置は、ランキング検索対象の文書群を記憶し
ている文書記憶手段と、ランキング検索に用いる第１の
検索語群を記憶している検索語記憶手段と、前記文書記
憶手段に記憶された文書群から、前記検索語記憶手段に
記憶された第１の検索語群が出現する文書をランキング
検索して第１のランキングを取得し、前記第１のランキ
ングに含まれる文書群から第２の検索語群を生成し、前
記予め記憶された文書群から前記第１の検索語群および
第２の検索語群が出現する文書をランキング検索して第
２のランキングを取得し、前記第１のランキングにより
前記第２のランキングを調整して第３のランキングを取
得するランキング取得手段とを設けた構成を有してい
る。A document retrieval apparatus according to claim 1 of the present invention stores a document storage unit that stores a document group to be ranked for searching, and a first search word group used for ranking search. The search word storage means and the document group stored in the document storage means are searched for a document in which the first search word group stored in the search word storage means appears, and the first ranking is performed. A document in which the second search word group is acquired from the document group included in the first ranking, and the first search word group and the second search word group appear from the document group stored in advance. A ranking acquisition unit that acquires a second ranking by performing a ranking search, adjusts the second ranking according to the first ranking, and acquires a third ranking.

【００１０】この構成により、初期検索語群が出現する
文書のランキング（第１のランキング）を求め、この第
１のランキングに含まれる文書群から拡張語群を生成
し、初期検索語群と拡張語群が出現する文書のランキン
グ（第２のランキング）を求め、第１および第２のラン
キングを基に第２のランキングを調整するので、初期検
索語群による検索要求の品質が低い場合においても、従
来に比べ品質劣化が少ない擬似適合性フィードバック法
を実現できることとなる。なお、一般に擬似適合フィー
ドバック法においては、初期検索ではランキングの特に
上位部分の品質がよいように、拡張検索はランキングの
全体での品質がよいように、ランキング検索で用いるパ
ラメータなどを調整している。よって、前述の第１のラ
ンキングにより第２のランキングを調整することで両者
がマージされ、上位部分での品質がよく、かつ全体での
品質もよい結果が得られることとなる。With this configuration, the ranking (first ranking) of the documents in which the initial search word group appears is obtained, the extended word group is generated from the document group included in the first ranking, and the initial search word group and the extended word group are expanded. Even if the quality of the search request by the initial search word group is low, the ranking (second ranking) of the document in which the word group appears is obtained and the second ranking is adjusted based on the first and second rankings. Therefore, it is possible to realize the pseudo compatibility feedback method with less quality deterioration than the conventional one. Generally, in the pseudo-adaptation feedback method, parameters used in the ranking search are adjusted so that the quality of the upper part of the ranking is particularly good in the initial search and the quality of the entire ranking is good in the extended search. . Therefore, by adjusting the second ranking by the above-mentioned first ranking, the two are merged, and the result in which the quality of the upper portion is good and the quality of the whole is also good is obtained.

【００１１】また、本発明の請求項２に係る文書検索方
法は、予め記憶された文書群から、予め設定された第１
の検索語群が出現する文書をランキング検索し、第１の
ランキングを取得する第１のステップと、前記第１のラ
ンキングに含まれる文書群から第２の検索語群を生成す
る第２のステップと、前記予め記憶された文書群から前
記第１の検索語群および第２の検索語群が出現する文書
をランキング検索し、第２のランキングを取得する第３
のステップと、前記第１のランキングにより前記第２の
ランキングを調整して第３のランキングを取得する第４
のステップとを有している。According to a second aspect of the present invention, there is provided a document search method in which a first preset document group is selected from a previously stored document group.
A first step of obtaining a first ranking by searching the documents in which the search word group appears, and a second step of generating a second search word group from the document group included in the first ranking. And performing a ranking search for documents in which the first search word group and the second search word group appear from the previously stored document group, and obtaining a second ranking.
And a step of adjusting the second ranking according to the first ranking to obtain a third ranking.
And the steps of.

【００１２】この方法により、初期検索語群を用いて取
得された第１のランキングと、初期検索語群および拡張
語群を用いて取得された第２のランキングとから新たな
ランキング（第３のランキング）を取得するステップを
有するので、前述のように初期検索語群による検索要求
の品質が低い場合においても、従来に比べ品質劣化が少
ない擬似適合性フィードバック法を実現できることとな
る。According to this method, a new ranking (third rank) is obtained from the first ranking acquired using the initial search word group and the second ranking acquired using the initial search word group and the expanded word group. As described above, even if the quality of the search request by the initial search word group is low as described above, it is possible to realize the pseudo compatibility feedback method in which the quality deterioration is less than in the conventional case.

【００１３】さらに、本発明の請求項３に係る記録媒体
は、コンピュータに、予め記憶された文書群から、予め
設定された第１の検索語群が出現する文書をランキング
検索し、第１のランキングを取得する第１のステップ
と、前記第１のランキングに含まれる文書群から第２の
検索語群を生成する第２のステップと、前記予め記憶さ
れた文書群から前記第１の検索語群および第２の検索語
群が出現する文書をランキング検索し、第２のランキン
グを取得する第３のステップと、前記第１のランキング
により前記第２のランキングを調整して第３のランキン
グを取得する第４のステップとを実行させるためのプロ
グラムを記録した構成を有している。Further, the recording medium according to claim 3 of the present invention performs a ranking search for a document in which a preset first search word group appears in a computer from a document group stored in advance, and the first search word group is searched. A first step of obtaining a ranking; a second step of generating a second search word group from a document group included in the first ranking; and a first search word from the previously stored document group Group and a document in which the second search word group appears, are searched for a ranking, and a third step of obtaining the second ranking and adjusting the second ranking by the first ranking to adjust the third ranking The fourth step of acquiring and a program for executing the fourth step are recorded.

【００１４】この構成により、初期検索語群を用いて取
得された第１のランキングと、初期検索語群および拡張
語群を用いて取得された第２のランキングとから新たな
ランキング（第３のランキング）を取得するプログラム
を容易に取得し、前述のように初期検索語群による検索
要求の品質が低い場合においても、従来に比べ品質劣化
が少ない擬似適合性フィードバック法を実現できること
となる。With this configuration, a new ranking (third rank) is obtained from the first ranking acquired using the initial search word group and the second ranking acquired using the initial search word group and the expanded word group. It is possible to easily obtain the program for obtaining the ranking) and realize the pseudo compatibility feedback method in which the quality deterioration is less than in the conventional case even when the quality of the search request by the initial search word group is low as described above.

【００１５】本発明の請求項４に係る文書検索装置は、
ランキング検索対象の文書群を記憶している文書記憶手
段と、ランキング検索に用いる第１の検索語群を記憶し
ている検索語記憶手段と、前記文書記憶手段に記憶され
た文書群から、前記検索語記憶手段に記憶された第１の
検索語群が出現する文書をランキング検索して第１のラ
ンキングを取得し、前記第１のランキングに含まれる文
書群から前記第１の検索語群に関連する関連語群を生成
し、前記文書記憶手段に記憶された文書群から前記関連
語群に含まれる関連語が出現する文書をランキング検索
して第２のランキングを取得し、前記第１のランキング
および前記第２のランキングにより前記関連語群に含ま
れる各関連語を評価する関連語評価手段とを設けた構成
を有している。A document retrieval apparatus according to claim 4 of the present invention is
From a document storage unit that stores a document group to be searched for a ranking, a search word storage unit that stores a first search word group used for a ranking search, and a document group stored in the document storage unit, A document in which the first search word group stored in the search word storage unit appears is searched to obtain the first ranking, and the document group included in the first ranking is changed to the first search word group. A related related word group is generated, a document in which a related word included in the related word group appears is searched from the document group stored in the document storage unit to obtain a second ranking, and the first ranking is acquired. And a related word evaluation means for evaluating each related word included in the related word group according to the ranking and the second ranking.

【００１６】この構成により、擬似適合性フィードバッ
ク法に類する方法による関連語検索において、初期検索
語群を用いて取得された第１のランキングと、関連語群
を用いて取得された第２のランキングとから、例えば初
期検索結果を用いた平均適合率ＡＶＰの計算などにより
関連語ごとの評価を行うので、関連語の品質を適切に評
価できることとなる。With this configuration, in the related word search by a method similar to the pseudo compatibility feedback method, the first ranking obtained by using the initial search word group and the second ranking obtained by using the related word group. From this, since the evaluation of each related word is performed, for example, by calculating the average precision rate AVP using the initial search result, the quality of the related word can be appropriately evaluated.

【００１７】また、本発明の請求項５に係る文書検索方
法は、予め記憶された文書群から、予め設定された第１
の検索語群が出現する文書をランキング検索し、第１の
ランキングを取得する第１のステップと、前記第１のラ
ンキングに含まれる文書群から前記第１の検索語群に関
連する関連語群を生成する第２のステップと、前記予め
記憶された文書群から、前記関連語群に含まれる関連語
が出現する文書をランキング検索し、第２のランキング
を取得する第３のステップと、前記第１のランキングお
よび前記第２のランキングにより前記関連語群に含まれ
る各関連語を評価する第４のステップとを有している。According to a fifth aspect of the present invention, there is provided a document search method in which a first preset document group is selected from a previously stored document group.
Search for a document in which the search word group appears, and obtain a first ranking; and a related word group related to the first search word group from the document group included in the first ranking. And a third step of performing a ranking search for a document in which a related word included in the related word group appears, from the prestored document group, and obtaining a second ranking. And a fourth step of evaluating each related word included in the related word group according to the first ranking and the second ranking.

【００１８】この方法により、前述のように初期検索語
群を用いて取得された第１のランキングと、関連語群を
用いて取得された第２のランキングとから各関連語を評
価するステップを有するので、擬似適合性フィードバッ
ク法に類する方法による関連語検索において、関連語の
品質を適切に評価できることとなる。According to this method, the step of evaluating each related word from the first ranking acquired using the initial search word group and the second ranking acquired using the related word group as described above is performed. Therefore, it is possible to appropriately evaluate the quality of the related word in the related word search by the method similar to the pseudo suitability feedback method.

【００１９】さらに、本発明の請求項６に係る記録媒体
は、コンピュータに、予め記憶された文書群から、予め
設定された第１の検索語群が出現する文書をランキング
検索し、第１のランキングを取得する第１のステップ
と、前記第１のランキングに含まれる文書群から前記第
１の検索語群に関連する関連語群を生成する第２のステ
ップと、前記予め記憶された文書群から、前記関連語群
に含まれる関連語が出現する文書をランキング検索し、
第２のランキングを取得する第３のステップと、前記第
１のランキングおよび前記第２のランキングにより前記
関連語群に含まれる各関連語を評価する第４のステップ
とを実行させるためのプログラムを記録した構成を有し
ている。Further, the recording medium according to claim 6 of the present invention performs a ranking search for a document in which a preset first search word group appears in a computer from a document group stored in advance, and a first search is performed. A first step of obtaining a ranking; a second step of generating a related word group related to the first search word group from a document group included in the first ranking; and the previously stored document group From, a ranking search is performed for documents in which related words included in the related word group appear,
A program for executing a third step of obtaining a second ranking, and a fourth step of evaluating each related word included in the related word group by the first ranking and the second ranking. It has the recorded configuration.

【００２０】この構成により、前述のように初期検索語
群を用いて取得された第１のランキングと、関連語群を
用いて取得された第２のランキングとから各関連語を評
価するプログラムを容易に取得し、擬似適合性フィード
バック法に類する方法による関連語検索において、関連
語の品質を適切に評価できることとなる。With this configuration, a program for evaluating each related word from the first ranking acquired using the initial search word group and the second ranking acquired using the related word group as described above is provided. The quality of the related words can be appropriately evaluated in the related word search by a method similar to the pseudo compatibility feedback method, which is easily acquired.

【００２１】[0021]

【発明の実施の形態】以下、本発明の実施の形態につい
て、図面を用いて説明する。［第１の実施形態］図１は、本発明の第１の実施形態に
係る文書検索装置のハードウェア構成を示す。図１にお
いて、文書検索装置１００には、装置全体を制御するＣ
ＰＵ２２と、このＣＰＵ２２の制御によって各種機能を
実現するためのプログラムや必要データを記憶するため
のＲＯＭ、ＲＡＭなどで構成されているメモリ３と、検
索対象文書や検索条件、検索結果などを記憶するための
ハードディスク４と、キーボードやマウスなどのポイン
ティングデバイスで必要な指示やデータを入力するため
の入力部５と、ＣＲＴや液晶ディスプレイなどで構成さ
れた出力部６と、フレキシブルディスクに対するデータ
の書き込み（更新）および読み出しを行うフレキシブル
ディスクドライブ７と、コンパクトディスク・リードオ
ンリー・メモリ（ＣＤ‐ＲＯＭ）からデータの読み出し
を行うＣＤ‐ＲＯＭドライブ８とを備え、各部３乃至
８、２２はバス９によって接続されている。BEST MODE FOR CARRYING OUT THE INVENTION Embodiments of the present invention will be described below with reference to the drawings. [First Embodiment] FIG. 1 shows a hardware configuration of a document search apparatus according to a first embodiment of the present invention. In FIG. 1, the document search apparatus 100 includes a C that controls the entire apparatus.
A PU 22, a memory 3 including a ROM and a RAM for storing programs and necessary data for implementing various functions under the control of the CPU 22, a search target document, search conditions, search results, and the like are stored. Hard disk 4 for inputting, an input unit 5 for inputting necessary instructions and data with a pointing device such as a keyboard and a mouse, an output unit 6 including a CRT or a liquid crystal display, and writing of data to a flexible disk ( A flexible disk drive 7 for reading and updating and a CD-ROM drive 8 for reading data from a compact disk read-only memory (CD-ROM) are provided, and each unit 3 to 8 and 22 is connected by a bus 9. Has been done.

【００２２】図２は、本発明の第１の実施形態に係る文
書検索装置の機能構成を示す。図２において、文書記憶
部１４は、検索対象の文書とこの文書に付与された文書
番号とを含む属性群を記憶するものであり、この機能は
ハードディスク４などによって実現される。検索語バッ
ファ１１は、入力部５により入力された検索語群を記憶
するものであり、この機能はハードディスク４などによ
って実現される。索引部１５は、文書記憶部１４に記憶
された文書のうち、入力部５により入力された索引語が
出現する文書に関し、その文書番号を含む属性群を前記
索引語ごとに記憶するものであり、この機能はハードデ
ィスク４などによって実現される。検索部１３は、入力
部５により入力された検索語群に対し、文書記憶部１４
に記憶された文書のうち、前記検索語群が出現する文書
に関し、この文書の文書番号とスコアの組を求め、この
文書番号とスコアの組を要素とするリストを生成するも
のであり、この機能はＣＰＵ２２、メモリ３などによっ
て実現される。ランキングバッファ１およびランキング
バッファ２は、検索部１３によって生成された文書番号
とスコアの組のリストを記憶するものであり、この機能
はハードディスク４などによって実現される。FIG. 2 shows a functional configuration of the document search device according to the first embodiment of the present invention. In FIG. 2, the document storage unit 14 stores an attribute group including a document to be searched and a document number given to this document, and this function is realized by the hard disk 4 or the like. The search word buffer 11 stores the search word group input by the input unit 5, and this function is realized by the hard disk 4 or the like. The index unit 15 stores, for each of the index words, a group of attributes including a document number of a document stored in the document storage unit 14 and in which the index word input by the input unit 5 appears. This function is realized by the hard disk 4 or the like. The search unit 13 uses the document storage unit 14 for the search word group input by the input unit 5.
Among the documents stored in, the document number and the score of this document are obtained for the document in which the search word group appears, and a list including the document number and the score as an element is generated. The function is realized by the CPU 22, the memory 3, and the like. The ranking buffer 1 and the ranking buffer 2 store a list of pairs of document numbers and scores generated by the search unit 13, and this function is realized by the hard disk 4 or the like.

【００２３】制御部１２は、検索部１３を用いて、検索
語バッファ１１に記憶された検索語群により、文書記憶
部１４に記憶されている文書群を対象としてランキング
検索を行い、前記検索語群が出現する文書の文書番号と
スコアの組のリストを求めさせ、このリストをランキン
グバッファ１に格納し、ランキングバッファ１に記憶さ
れたリストの要素を構成する文書番号により、この文書
番号に対応する文書を文書記憶部１４から取得させ、取
得された文書群から新たな検索語群を生成させ、この新
たな検索語群と検索語バッファ１１に記憶された検索語
群と併せてさらに新たな検索語群を構成し、このさらに
新たな検索語群により、検索部１３を用いて文書番号と
スコアの組のリストを求めさせ、ランキングバッファ２
に格納し、ランキングバッファ１に記憶されたリストに
より、ランキングバッファ２に記憶されたリストを調整
し、出力部６により調整後のリストを出力させるもので
あり、この機能はＣＰＵ２２、メモリ３などによって実
現される。The control unit 12 uses the search unit 13 to perform a ranking search for the document group stored in the document storage unit 14 using the search word group stored in the search word buffer 11 and to perform the ranking search. A list of document number and score pairs of documents in which a group appears is stored, this list is stored in the ranking buffer 1, and the document numbers constituting the elements of the list stored in the ranking buffer 1 correspond to this document number. Document to be acquired from the document storage unit 14, a new search word group is generated from the acquired document group, and the new search word group and the search word group stored in the search word buffer 11 are combined to create a new search word group. A search word group is formed, and a search list 13 is used to obtain a list of pairs of document numbers and scores by using this new search word group.
The list stored in the ranking buffer 1 is adjusted by the list stored in the ranking buffer 1, and the adjusted list is output by the output unit 6. This function is performed by the CPU 22, the memory 3, and the like. Will be realized.

【００２４】次に、図３を参照しながら本実施形態に係
る文書検索方法を説明する。制御部１２は、検索語バッ
ファ１１に記憶された検索語群（初期検索語群）を取得
し（ステップＳ１０１）、検索部１３により、前記初期
検索語群を用いて文書ランキング検索を実行させ、初期
検索結果として文書番号とスコアの組を要素とするリス
トを取得し、得られたランキング（第１のランキング：
初期検索結果）Ｒ１をランキングバッファ１に格納する
（ステップＳ１０２）。Next, the document search method according to this embodiment will be described with reference to FIG. The control unit 12 acquires the search word group (initial search word group) stored in the search word buffer 11 (step S101), and causes the search unit 13 to perform a document ranking search using the initial search word group. As a result of the initial search, a list having a set of document number and score as an element is acquired, and the obtained ranking (first ranking:
The initial search result) R1 is stored in the ranking buffer 1 (step S102).

【００２５】次いで、制御部１２はランキングバッファ
１に格納されたランキングをスコア順にソートして、上
位ｎ個の文書番号群を取得し、この文書番号群の各文書
番号に対応する文書（シード文書）を文書記憶部１４か
ら一つずつ取得し、各シード文書から検索語群（拡張語
群）を生成する（ステップＳ１０３）。すなわち、シー
ド文書を単語に分割し、単語の文書頻度を求める。ここ
で、単語ｗの文書頻度とは、単語ｗが出現するシード文
書数である。そして、文書頻度で単語群を順位づけし、
所望の数だけ上位から単語を選択し、検索語（拡張語）
とする。Next, the control unit 12 sorts the rankings stored in the ranking buffer 1 in the order of scores to obtain the upper n document number groups, and the document (seed document) corresponding to each document number of this document number group. ) Are acquired one by one from the document storage unit 14, and a search word group (extended word group) is generated from each seed document (step S103). That is, the seed document is divided into words, and the document frequency of the words is obtained. Here, the document frequency of the word w is the number of seed documents in which the word w appears. Then, the word groups are ranked according to the document frequency,
Search words (extended words) by selecting words from the top as many as you like
And

【００２６】次いで、検索部１３により、前記初期検索
語と前記拡張語を用いて文書ランキング検索を実行さ
せ、拡張検索結果として文書番号とスコアの組を要素と
するリストを取得し、得られたランキング（第２のラン
キング：拡張検索結果）Ｒ２をランキングバッファ２に
格納する（ステップＳ１０４）。Next, the search unit 13 executes a document ranking search using the initial search word and the expanded word, and acquires a list having a set of a document number and a score as an expanded search result, which is obtained. The ranking (second ranking: expanded search result) R2 is stored in the ranking buffer 2 (step S104).

【００２７】次いで、制御部１２はランキングバッファ
１に格納されているランキングＲ１とランキングバッフ
ァ２に格納されているランキングＲ２とから、新たなラ
ンキング（第３のランキング）Ｒ３を取得する（ステッ
プＳ１０５）。この作業はランキングバッファ２を用い
て実行される。具体的には、ランキングＲ１およびラン
キングＲ２をそれぞれスコア順にソートする。次いで、
ランキングＲ２から順に要素ｅを取り出し、以下の処理
を施す。まず、取り出された要素ｅがランキングＲ1の
上位ｒ１番目の要素であり、かつランキングＲ２の上位
ｒ２番目の要素である場合に、スコアＳ（ｅ）を次式
（１）により求める。Ｓ（ｅ）＝α×ｒ１＋（１−α）
×ｒ２・・・（１）但し、αは０以上１以下の定数であ
り、ランキング間の混成パラメータである。Next, the control unit 12 acquires a new ranking (third ranking) R3 from the ranking R1 stored in the ranking buffer 1 and the ranking R2 stored in the ranking buffer 2 (step S105). . This work is executed using the ranking buffer 2. Specifically, the ranking R1 and the ranking R2 are sorted in the order of scores. Then
The element e is taken out in order from the ranking R2, and the following processing is performed. First, when the extracted element e is the upper r1 element of the ranking R1 and the upper r2 element of the ranking R2, the score S (e) is obtained by the following equation (1). S (e) = α × r1 + (1-α)
Xr2 (1) However, α is a constant of 0 or more and 1 or less, which is a hybrid parameter between rankings.

【００２８】次いで、ランキングＲ２における要素ｅの
スコアをＳ（ｅ）に置き換え、ランキングＲ２を前述の
新たなスコアが小さい順にソートする。こうして新たな
ランキングＲ３を取得する。こうして取得された新たな
ランキングＲ３について、出力部６はランキングバッフ
ァ２に記憶された文書番号または文書番号とスコアを出
力する（ステップＳ１０６）。Next, the score of the element e in the ranking R2 is replaced with S (e), and the ranking R2 is sorted in the ascending order of the new scores. Thus, the new ranking R3 is acquired. With respect to the new ranking R3 thus acquired, the output unit 6 outputs the document number or the document number and the score stored in the ranking buffer 2 (step S106).

【００２９】なお、前述のように検索部１３により文書
ランキング検索を行う場合に、検索部１３は、索引部１
５に記憶された内容を用い、制御部１２から与えられた
検索語Ｔ（前述の初期検索語、あるいは初期検索語およ
び拡張語）が出現する出現文書Ｄの文書番号とスコアＳ
（Ｔ，Ｄ）を求める。このスコアＳ（Ｔ，Ｄ）は、文書
Ｄにおける検索語Ｔの出現頻度Ｔｆ（文書Ｄについて、
検索語Ｔが出現する回数）と、検索語Ｔのコレクション
に対する文書頻度Ｄｆ（検索語Ｔについて、この検索語
Ｔを含む文書Ｄが出現する回数）とにより、Ｔｆ／Ｄｆ
として与えられる数値である。When the document ranking search is performed by the search unit 13 as described above, the search unit 13 uses the index unit 1
5, the document number and the score S of the appearing document D in which the search word T (the above-mentioned initial search word, or the initial search word and the expanded word) given from the control unit 12 appear are used.
Find (T, D). The score S (T, D) is the appearance frequency Tf of the search word T in the document D (for the document D,
Tf / Df based on the number of times the search word T appears and the document frequency Df for the collection of the search word T (the number of times the document D including the search word T appears for the search word T).
Is a numerical value given as.

【００３０】以上のように、本発明の第１の実施形態に
係る文書検索装置１００は、文書記憶部１４に複数記憶
された文書から、検索語バッファ１１に記憶された初期
検索語群が出現する文書をランキング検索して第１のラ
ンキングを取得し、この第１のランキングに含まれる文
書群から拡張語群を生成し、初期検索語群および拡張語
群が出現する文書をランキング検索して第２のランキン
グを取得し、第１のランキングにより第２のランキング
を調整して新たな第３のランキングを取得する制御部１
２を設けているので、初期検索語群による検索要求の品
質が低い場合においても、従来に比べ品質劣化が少ない
擬似適合性フィードバック法を実現できる。As described above, in the document search device 100 according to the first embodiment of the present invention, the initial search word group stored in the search word buffer 11 appears from a plurality of documents stored in the document storage unit 14. To obtain the first ranking, generate an extended word group from the document group included in the first ranking, and perform a ranking search for documents in which the initial search word group and the extended word group appear. Control unit 1 that obtains the second ranking, adjusts the second ranking according to the first ranking, and obtains a new third ranking
Since 2 is provided, even if the quality of the search request by the initial search word group is low, it is possible to realize the pseudo compatibility feedback method with less quality deterioration than in the past.

【００３１】また、本発明の第１の実施形態に係る文書
検索方法は、予め複数記憶された文書から、予め設定さ
れた初期検索語群が出現する文書をランキング検索し、
第１のランキングを取得するステップＳ１０２と、第１
のランキングに含まれる文書群から拡張語群を生成する
ステップＳ１０３と、前記複数記憶された文書から初期
検索語群および拡張語群が出現する文書をランキング検
索し、第２のランキングを取得するステップＳ１０４
と、第１のランキングにより第２のランキングを調整し
て第３のランキングを取得するステップＳ１０５とを有
しているので、初期検索語群による検索要求の品質が低
い場合においても、従来に比べ品質劣化が少ない擬似適
合性フィードバック法を実現できる。Further, the document search method according to the first embodiment of the present invention performs a ranking search for a document in which a preset initial search word group appears from a plurality of documents stored in advance,
Step S102 of acquiring the first ranking, and
Step S103 of generating an expanded word group from the document group included in the ranking, and a step of performing a ranking search for a document in which the initial search word group and the expanded word group appear from the plurality of stored documents to obtain the second ranking. S104
And the step S105 of obtaining the third ranking by adjusting the second ranking according to the first ranking, even when the quality of the search request by the initial search word group is low, compared to the conventional case. A pseudo-fitness feedback method with less quality deterioration can be realized.

【００３２】なお、第１の実施形態では前述の文書検索
方法を実現するためのプログラムを制御部１２のメモリ
３に記憶した場合について説明したが、本発明はメモリ
３のほかに、前述の文書検索方法を実現するためのプロ
グラムが記録され、コンピュータで読み取り可能な記録
媒体（ＣＤ‐ＲＯＭ、ＦＤ、光磁気ディスク（ＭＯ）、
ミニディスク（ＭＤ）、書き換え可能なシーディーロム
（ＣＤ−ＲＷ）を含む）を文書検索装置１００に取り付
け、この記録媒体をＣＤ‐ＲＯＭドライブ８などで読み
取って前記プログラムを実行しても同様の効果が得られ
るものである。この構成により、前記記録媒体を移動お
よび交換することで、前記プログラムを容易に更新でき
る。In the first embodiment, the case in which the program for implementing the above-mentioned document retrieval method is stored in the memory 3 of the control unit 12 has been described. A computer-readable recording medium (CD-ROM, FD, magneto-optical disk (MO), in which a program for realizing the search method is recorded,
Even if a mini disk (MD) or a rewritable CD (including CD-RW) is attached to the document retrieval apparatus 100 and this recording medium is read by the CD-ROM drive 8 or the like to execute the program, the same effect is obtained. Is obtained. With this configuration, the program can be easily updated by moving and exchanging the recording medium.

【００３３】さらに、第１の実施形態では前述の文書検
索方法を実現するためのプログラムを制御部１２のメモ
リ３に記憶した場合について説明したが、本発明はこの
ほかに、ネットワークインタフェースおよび通信手段を
文書検索装置１００に設け、ＬＡＮなどのネットワーク
上の外部装置から前記プログラムを制御部１２にダウン
ロードして実行しても同様の効果が得られるものであ
る。この方法により、前記プログラムの更新がネットワ
ークを介して容易に行える。Further, in the first embodiment, the case in which the program for realizing the above-mentioned document retrieval method is stored in the memory 3 of the control unit 12 has been described, but the present invention is not limited to this. The same effect can be obtained by providing the document retrieving apparatus 100 with the above and downloading the program from the external apparatus on the network such as LAN to the control unit 12 and executing the program. With this method, the program can be easily updated via the network.

【００３４】［第２の実施形態］本発明の第２の実施形
態に係る文書検索装置のハードウェア構成は、第１の実
施形態と概ね同様であるために、図１を用いると共に同
一構成には同一符号を付与して説明を省略する。また、
本実施形態に係る機能構成は、関連語バッファ１６を除
き、第１の実施形態と概ね同様であるために、同一構成
には同一符号を付与して説明を省略する。[Second Embodiment] The hardware configuration of the document retrieval apparatus according to the second embodiment of the present invention is almost the same as that of the first embodiment. Are assigned the same reference numerals and description thereof will be omitted. Also,
The functional configuration according to the present embodiment is substantially the same as that of the first embodiment except for the related word buffer 16, and thus the same reference numerals are given to the same components and the description thereof will be omitted.

【００３５】図４は本発明の第２の実施形態に係る文書
検索装置の機能構成を示す。これは第１の実施形態と
は、さらに関連語バッファ１６を設け、初期検索語群が
出現する文書をランキング検索して第１のランキングを
取得し、この第１のランキングに含まれる文書群から関
連語群を生成し、この関連語群に含まれる関連語が出現
する文書をランキング検索して関連語ごとに第２のラン
キング２を取得し、第１および第２のランキングにより
各関連語を評価し、評価値に応じて関連語を関連語バッ
ファ１６に記憶する点が相違している。FIG. 4 shows the functional arrangement of a document search device according to the second embodiment of the present invention. This is different from the first embodiment in that a related word buffer 16 is further provided, a document in which an initial search word group appears is searched for a ranking to obtain a first ranking, and a document group included in this first ranking is acquired. A related word group is generated, the documents in which the related words included in this related word group appear are searched for to obtain the second ranking 2 for each related word, and the related words are identified by the first and second rankings. The difference is that the related words are evaluated and stored in the related word buffer 16 according to the evaluation value.

【００３６】図４において、制御部１２は、検索語バッ
ファ１１に記憶された検索語群について、検索部１３に
よりランキング検索を実行させ、前記検索語が出現する
文書の文書番号とスコアの組のリストを求めさせてラン
キングバッファ１に格納し、ランキングバッファ１に記
憶された前記リストの要素を構成する文書番号により、
この文書番号に対応する文書を文書記憶部１４から取得
し、取得された文書群から前記検索語群に関連する関連
語群を生成し、検索部１３により、前記関連語群を構成
する関連語ごとにランキング検索を実行させ、この関連
語が出現する文書の文書番号とスコアの組のリストを求
めさせてランキングバッファ２に格納し、ランキングバ
ッファ１とランキングバッファ２に格納された前記リス
トにより、前記関連語の評価を行い、その評価結果に応
じて前記関連語を関連語バッファ１６に格納し、出力部
６により関連語群を出力させるものであって、この機能
はＣＰＵ２２、メモリ３などによって実現される。関連
語バッファ１６は、評価後の関連語群を記憶するもので
あり、この機能はハードディスク４などによって実現さ
れる。In FIG. 4, the control unit 12 causes the search unit 13 to perform a ranking search for the search word group stored in the search word buffer 11, and sets the combination of the document number and the score of the document in which the search word appears. A list is obtained and stored in the ranking buffer 1, and according to the document numbers constituting the elements of the list stored in the ranking buffer 1,
A document corresponding to the document number is acquired from the document storage unit 14, a related word group related to the search word group is generated from the acquired document group, and a related word configuring the related word group is generated by the search unit 13. A ranking search is executed for each of these, and a list of document number and score pairs of documents in which this related word appears is stored in the ranking buffer 2 and the list stored in the ranking buffer 1 and the ranking buffer 2 The related word is evaluated, the related word is stored in the related word buffer 16 according to the evaluation result, and the output unit 6 outputs the related word group. This function is performed by the CPU 22, the memory 3, and the like. Will be realized. The related word buffer 16 stores a group of related words after evaluation, and this function is realized by the hard disk 4 or the like.

【００３７】次に、図５を参照しながら本実施形態に係
る文書検索方法を説明する。制御部１２は、検索語バッ
ファ１１に記憶された検索語群（初期検索語）を取得し
（ステップＳ２０１）、検索部１３により前記初期検索
語を用いて文書ランキング検索を実行させ、前記初期検
索語が出現する文書の文書番号とスコアの組を要素とす
るリストを取得し、取得されたランキング（第１のラン
キング）をランキングバッファ１に格納する（ステップ
Ｓ２０２）。Next, the document search method according to this embodiment will be described with reference to FIG. The control unit 12 acquires the search word group (initial search word) stored in the search word buffer 11 (step S201), causes the search unit 13 to perform a document ranking search using the initial search word, and performs the initial search. A list having, as elements, a document number and a score of a document in which a word appears is acquired, and the acquired ranking (first ranking) is stored in the ranking buffer 1 (step S202).

【００３８】次いで、ランキングバッファ１に格納され
たランキングをスコア順にソートし、上位ｎ個の文書番
号群を取得し、各文書番号に対応する文書（シード文
書）を文書記憶部１４から一つずつ取得し、各シード文
書から検索語群（ここでは、関連語の候補群を示す）を
生成する（ステップＳ２０３）。すなわち、シード文書
を単語に分割し、単語の文書頻度を求める。ここで、単
語ｗの文書頻度とは、単語ｗが出現するシード文書数で
ある。さらに、前述の文書頻度で単語群を順位づけし、
所望の数だけ上位から単語ｗを選択し、関連語候補とす
る。Next, the rankings stored in the ranking buffer 1 are sorted in the order of scores to obtain a group of the top n document numbers, and the documents (seed documents) corresponding to the respective document numbers are obtained one by one from the document storage unit 14. A search word group (here, a related word candidate group is shown) is generated from each seed document (step S203). That is, the seed document is divided into words, and the document frequency of the words is obtained. Here, the document frequency of the word w is the number of seed documents in which the word w appears. Furthermore, the word groups are ranked according to the document frequency described above,
A desired number of words w are selected from the upper order to be related word candidates.

【００３９】次いで、前述の関連語候補群から一つずつ
関連語を取り出し、次の条件を満たす場合に関連語バッ
ファ１６に格納する（ステップＳ２０４〜Ｓ２０７）。
すなわち、まずランキングバッファ１のランキングをス
コア順にソートし、上位ｘ個を初期検索語に対する適合
文書の集合Ｒとみなす。次いで、前述の関連語候補群か
ら取り出された関連語候補ｃを検索語として文書ランキ
ング（第２のランキング）Ｃを得て、さらに前記適合文
書の集合Ｒを用いて文書ランキングＣの平均適合率（Ａ
ＶＰ）を求める。この平均適合率が予め定められたしき
い値を超える場合に関連語候補ｃを関連語バッファ１６
に格納する。こうして全ての関連語候補ｃについての評
価が終了すると（ステップＳ２０８）、出力部６により
関連語バッファ１６に格納された関連語群を出力させる
（ステップＳ２０９）。Next, the related words are extracted one by one from the related word candidate group and stored in the related word buffer 16 when the following conditions are satisfied (steps S204 to S207).
That is, first, the rankings in the ranking buffer 1 are sorted in the order of scores, and the top x items are regarded as a set R of relevant documents for the initial search word. Next, a document ranking (second ranking) C is obtained by using the related word candidate c extracted from the above-mentioned related word candidate group as a search word, and the average matching rate of the document ranking C is further obtained using the set R of the matching documents. (A
VP). When the average precision ratio exceeds a predetermined threshold value, the related word candidate c is set to the related word buffer 16
To store. When the evaluation of all the related word candidates c is completed (step S208), the output unit 6 outputs the related word group stored in the related word buffer 16 (step S209).

【００４０】以上のように、本発明の第２の実施形態に
係る文書検索装置１００は、文書記憶部１４に複数記憶
された文書から、検索語バッファ１１に記憶された初期
検索語群が出現する文書をランキング検索して第１のラ
ンキングを取得し、この第１のランキングに含まれる文
書群から前記初期検索語群に関連する関連語の候補群を
生成し、前記複数記憶された文書から前記関連語の候補
群に含まれる候補が出現する文書をランキング検索して
第２のランキングを取得し、第１および第２のランキン
グにより前記関連語の候補群に含まれる各候補を評価す
る制御部１２を設けているので、例えば初期検索結果を
用いた平均適合率ＡＶＰの計算により関連語（ここで
は、関連語の候補）ごとの品質評価を適切に行うことが
できる。As described above, in the document search apparatus 100 according to the second embodiment of the present invention, the initial search word group stored in the search word buffer 11 appears from the documents stored in the document storage unit 14. To obtain a first ranking, and generate a related word candidate group related to the initial search word group from the document group included in the first ranking. From the plurality of stored documents A control in which a document in which a candidate included in the related word candidate group appears is ranked to obtain a second ranking, and each candidate included in the related word candidate group is evaluated by the first and second rankings. Since the unit 12 is provided, it is possible to appropriately perform quality evaluation for each related word (here, a related word candidate) by calculating the average precision AVP using the initial search result.

【００４１】また、本発明の第２の実施形態に係る文書
検索方法は、予め複数記憶された文書から、予め設定さ
れた初期検索語群が出現する文書をランキング検索し、
第１のランキングを取得するステップＳ２０２と、第１
のランキングに含まれる文書群から前記初期検索語群に
関連する関連語の候補群を生成するステップＳ２０３
と、前記複数記憶された文書から、前記関連語の候補群
に含まれる候補が出現する文書をランキング検索し、第
２のランキングを取得するステップＳ２０４と、第１お
よび第２のランキングにより前記関連語の候補群に含ま
れる各候補を評価するステップＳ２０５、Ｓ２０６とを
有しているので、前述のように関連語（ここでは、関連
語の候補）ごとの品質評価を適切に行うことができる。Further, in the document search method according to the second embodiment of the present invention, a document in which a preset initial search word group appears is ranked searched from a plurality of documents stored in advance,
Step S202 of acquiring the first ranking, and
Step S203 of generating a candidate group of related words related to the initial search word group from the document group included in the ranking of
And a step S204 of performing a ranking search for a document in which a candidate included in the candidate group of the related word appears from the plurality of stored documents, and obtaining a second ranking, and the relation based on the first and second rankings. Since it has steps S205 and S206 for evaluating each candidate included in the word candidate group, it is possible to appropriately perform quality evaluation for each related word (here, related word candidate) as described above. .

【００４２】なお、第２の実施形態では前述の文書検索
方法を実現するためのプログラムを制御部１２のメモリ
３に記憶した場合について説明したが、本発明はメモリ
３のほかに、前述の文書検索方法を実現するためのプロ
グラムが記録され、コンピュータで読み取り可能な記録
媒体（ＣＤ‐ＲＯＭ、ＦＤ、光磁気ディスク（ＭＯ）、
ミニディスク（ＭＤ）、書き換え可能なシーディーロム
（ＣＤ‐ＲＷ）を含む）を文書検索装置１００に取り付
け、この記録媒体をＣＤ‐ＲＯＭドライブ８などで読み
取って前記プログラムを実行しても同様の効果が得られ
るものである。この構成により、前記記録媒体を移動お
よび交換することで、前記プログラムを容易に更新でき
る。In the second embodiment, the case in which the program for implementing the above-mentioned document retrieval method is stored in the memory 3 of the control unit 12 has been described. A computer-readable recording medium (CD-ROM, FD, magneto-optical disk (MO), in which a program for realizing the search method is recorded,
Even if a mini disk (MD) or a rewritable CD (including a CD-RW) is attached to the document retrieval apparatus 100 and the recording medium is read by the CD-ROM drive 8 or the like to execute the program, the same effect is obtained. Is obtained. With this configuration, the program can be easily updated by moving and exchanging the recording medium.

【００４３】さらに、第２の実施形態では前述の文書検
索方法を実現するためのプログラムを制御部１２のメモ
リ３に記憶した場合について説明したが、本発明はこの
ほかに、ネットワークインタフェースおよび通信手段を
文書検索装置１００に設け、ＬＡＮなどのネットワーク
上の外部装置から前記プログラムを制御部１２にダウン
ロードして実行しても同様の効果が得られるものであ
る。この方法により、前記プログラムの更新がネットワ
ークを介して容易に行える。Further, in the second embodiment, the case where the program for realizing the above-mentioned document retrieval method is stored in the memory 3 of the control unit 12 has been described. However, the present invention is not limited to this. The same effect can be obtained by providing the document retrieving apparatus 100 with the above and downloading the program from the external apparatus on the network such as LAN to the control unit 12 and executing the program. With this method, the program can be easily updated via the network.

【００４４】[0044]

【発明の効果】以上説明したように、本発明は初期検索
語群を用いて取得された第１のランキングと、初期検索
語群および拡張語群を用いて取得された第２のランキン
グとから新たな第３のランキングを取得することによ
り、検索要求の品質が低い場合にも、従来に比べ品質劣
化が少ない擬似適合性フィードバック法を実現できると
いう優れた効果を有する文書検索装置、文書検索方法お
よび記録媒体を提供することができるものである。As described above, according to the present invention, the first ranking obtained by using the initial search word group and the second ranking obtained by using the initial search word group and the extended word group are used. By acquiring the new third ranking, a document search device and a document search method having an excellent effect that a pseudo suitability feedback method with less quality deterioration than in the past can be realized even when the quality of a search request is low. And a recording medium can be provided.

【００４５】また、本発明は擬似適合性フィードバック
法に類する方法による関連語検索において、初期検索語
群を用いて取得された第１のランキングと、関連語群
（関連語の候補群）を用いて取得された第２のランキン
グとから関連語（関連語の候補）ごとの評価を行うこと
により、関連語の品質を適切に評価できるという優れた
効果を有する文書検索装置、文書検索方法および記録媒
体を提供することができるものである。Further, the present invention uses the first ranking obtained by using the initial search word group and the related word group (related word candidate group) in the related word search by a method similar to the pseudo compatibility feedback method. A document search device, a document search method, and a record having an excellent effect that the quality of a related word can be appropriately evaluated by evaluating each related word (candidate of the related word) from the second ranking acquired by A medium can be provided.

[Brief description of drawings]

【図１】本発明の第１の実施形態に係る文書検索装置の
ハードウェア構成を示すブロック図である。FIG. 1 is a block diagram showing a hardware configuration of a document search device according to a first embodiment of the present invention.

【図２】本発明の第１の実施形態に係る文書検索装置の
機能構成を示すブロック図である。FIG. 2 is a block diagram showing a functional configuration of a document search device according to the first embodiment of the present invention.

【図３】本発明の第１の実施形態に係る文書検索方法を
示すフローチャートである。FIG. 3 is a flowchart showing a document search method according to the first embodiment of the present invention.

【図４】本発明の第２の実施形態に係る文書検索装置の
機能構成を示すブロック図である。FIG. 4 is a block diagram showing a functional configuration of a document search device according to a second embodiment of the present invention.

【図５】本発明の第２の実施形態に係る文書検索方法を
示すフローチャートである。FIG. 5 is a flowchart showing a document search method according to the second embodiment of the present invention.

[Explanation of symbols]

１、２ランキングバッファ３メモリ４ハードディスク５入力部６出力部７ＦＤＤ８ＣＤ‐ＲＯＭドライブ９バス１１検索語バッファ（検索語記憶手段）１２制御部（ランキング取得手段、関連語評価手段）１３検索部１４文書記憶部（文書記憶手段）１５索引部１６関連語バッファ２２ＣＰＵ１００文書検索装置（コンピュータ） 1, 2 ranking buffer 3 memory 4 hard disk 5 Input section 6 Output section 7 FDD 8 CD-ROM drive 9 buses 11 Search word buffer (search word storage means) 12 Control unit (ranking acquisition means, related word evaluation means) 13 Search Department 14 Document Storage Unit (Document Storage Unit) 15 Index section 16 Related word buffer 22 CPU 100 Document Retrieval Device (Computer)

Claims

[Claims]

1. A document storage unit that stores a document group to be searched for a ranking, a search word storage unit that stores a first search word group used for a ranking search, and a document storage unit stored in the document storage unit. From the document group, a document in which the first search word group stored in the search word storage means appears is searched to obtain a first ranking, and a second search is made from the document group included in the first ranking. A search word group is generated, and the first search word group and the second search word group are generated from the previously stored document group.
And a ranking obtaining unit that obtains a second ranking by performing a ranking search for a document in which the search word group appears, adjusts the second ranking according to the first ranking, and obtains a third ranking. A document retrieval device characterized by the above.

2. A ranking search is performed for documents in which a preset first search word group appears from a document group stored in advance,
A first step of obtaining a first ranking; a second step of generating a second search term group from a document group included in the first ranking; and a first step from the previously stored document group. The third step of performing a ranking search for documents in which the second search word group and the second search word group appear, and obtaining the second ranking; and the second ranking being adjusted by the first ranking to the third ranking. And a fourth step of obtaining the ranking of the document.

3. A first step of performing a ranking search for a document in which a preset first search word group appears from a document group stored in advance in a computer, and obtaining the first ranking; A second step of generating a second search word group from a document group included in the ranking of 1, and a document in which the first search word group and the second search word group appear from the previously stored document group To perform a third step of searching for the second ranking and obtaining a second ranking, and a fourth step of adjusting the second ranking by the first ranking to obtain the third ranking. A computer-readable recording medium in which a program is recorded.

4. A document storage means for storing a document group to be searched for by ranking, a search word storage means for storing a first search word group used for ranking search, and a document storage means stored in the document storage means. From the document group, a document in which the first search word group stored in the search word storage means appears is searched to obtain a first ranking, and the document group included in the first ranking is used to obtain the first ranking. Generating a related word group related to the search word group, and performing a ranking search for a document in which the related word included in the related word group appears from the document group stored in the document storage unit to obtain a second ranking. , The first
And a related word evaluation unit that evaluates each related word included in the related word group according to the second ranking and the second ranking.

5. A ranking search is performed for documents in which a preset first search word group appears from a document group stored in advance,
A first step of obtaining a first ranking; a second step of generating a related word group related to the first search word group from a document group included in the first ranking; The third step of performing a ranking search for a document in which a related word included in the related word group appears in the related document group to obtain a second ranking, and the related step by the first ranking and the second ranking. A fourth step of evaluating each related word included in the word group.

6. A first step of performing a ranking search for a document in which a preset first search word group appears from a document group stored in advance in a computer, and obtaining the first ranking; A second generating a related word group related to the first search word group from a document group included in the ranking of 1;
And a third step of performing a ranking search for a document in which a related word included in the related word group appears from the previously stored document group, and acquiring a second ranking.
A computer-readable recording medium having recorded thereon a program for executing a fourth step of evaluating each related word included in the related word group according to the first ranking and the second ranking. .