JP4223756B2

JP4223756B2 - Document search method, document search program, and document search system

Info

Publication number: JP4223756B2
Application number: JP2002213929A
Authority: JP
Inventors: 直幸富田; 健二朗川戸
Original assignee: Fujitsu Semiconductor Ltd
Current assignee: Fujitsu Semiconductor Ltd
Priority date: 2002-07-23
Filing date: 2002-07-23
Publication date: 2009-02-12
Anticipated expiration: 2022-07-23
Also published as: JP2004054757A

Description

【０００１】
【発明の属する技術分野】
本発明は、電子化された多数の文書から、検索条件として所定のキーワードを指定して必要とする文書を検索する文書検索方法、文書検索プログラム及び文書検索方法に関するものである。
【０００２】
近年、コンピュータシステムにおける保管文書は、データベースの大型化やネットワーク技術の導入等により増加しており、同システムにおける多数の文書の中から所望の文書を効率よく検索する技術が望まれている。
【０００３】
【従来の技術】
文書検索システムは、検索者が指定した検索条件に適合する文書を、任意の範囲から検索し、その所在をリストとして提示するものである。具体的には、例えば、所定のパソコンに保管してある文書の中から所望の文書を検索する、といった検索範囲が比較的狭く、文書の数、及び、文書データの総量が大きくない場合、文書検索システムは、条件が指定される毎にすべての文書データを実際に調べてリストを作成している。しかし、検索範囲がネットワークを介すなどして広範囲に及ぶ場合は、この方式では、一回の検索の応答時間が非常に長くなってしまう上に、システム全体への処理負荷も大きく、実用的では無い。このため、ネットワーク上の文書などの大量広範囲の文書を対象とする検索システムにおいて、実用的な応答速度を実現する場合、定期的に検索範囲の文書を調査し、検索条件として指定されるキーワードなどの情報と、文書の所在場所（アドレス）とを対応づけるインデックステーブルを作成している。そして、検索要求があった時点で、このインデックステーブルを使用して検索情報との対比を行い、対応する文書のアドレスのリストを返すという方法が採用されている。
【０００４】
一般的な文書検索システムにおいて、検索範囲における文書収集、インデックスの作成、及び検索要求に応じたインデックスの集計と応答は、一台のコンピュータで行われている場合もあれば、クラスタ化された複数のコンピュータにより行われている場合もあるが、基本には一個所で集中的に行われている。
【０００５】
検索対象であるネットワーク上のリソースは継続して増大しているため、集中方式の検索システムでは、インデックステーブルの肥大化が将来に及ぶ問題となっている。このため、検索範囲を複数のサーバで分割し、これらが協調して検索を行うシステムがいくつか提案されている。また、特開平８−２５５１７８号公報には、文書の関連性を評価して、それを記録し、関連の深い文書をたどれるようにすることによって、ユーザが一つの文書から、関連の深い文書を効果的に閲覧できるようにするといった手法も開示されている。
【０００６】
【発明が解決しようとする課題】
ところで、例えば、特定の分野でその分野固有の意味に使用されているキーワードについて、広く知られていて、話題に上りやすい別の意味が存在した場合、上記従来の文書検索システムでは、その別の意味で使用されている文書（必要のない文書）が大量にヒットする。そのため、本来探したい意味のキーワードを有する文書を探すためには、分野を絞り込むためのキーワードを別に指定するなどの工夫が必要となっている。また、文書検索システムのアルゴリズムによっては、キーワードに対する文書が大量にヒットした場合、広く知られているメジャーなキーワードについての凡庸な文書（ページ）と判定されたものは、その検索結果から除外されてしまう可能性もある。
【０００７】
本発明は上記問題点を解決するためになされたものであって、その目的は、キーワード検索を的確に行うことができる文書検索方法、文書検索プログラム及び文書検索システムを提供することにある。
【０００８】
【課題を解決するための手段】
上記目的を達成するため、請求項１，４，５に記載の発明によれば、検索キーワードと、検索を開始する開始セグメントと、該開始セグメントを起点としたセグメントの探索段数とが入力され、セグメントの探索段数にて指定される検索範囲内のセグメントに対してインデックステーブルの近隣セグメント保有情報に基づくキーワード検索が行われる。そして、各セグメントのインデックステーブルから検索キーワードの該当文書の所在情報が抽出され検索結果リストが生成される。つまり、開始セグメントと検索範囲を指定することにより、検索キーワードに関するインデックスの検索範囲が適切に調整される。従って、関連する文書の管理単位でセグメントを設定することにより、同一のキーワードを含む無関係な文書を検索結果リストから排除することが可能となり、キーワードの該当文書が的確に検索される。
【０００９】
請求項２に記載の発明によれば、セグメント単位で文書の収集が行われ、文書中に存在するキーワードと、該キーワードを含む該当文書の所在情報とを記録したローカルインデックステーブルが作成される。そして、ローカルインデックステーブルに存在するキーワードのリストについて、セグメント自身のキーワードのリストと、近隣のセグメントのキーワードのリストとの総和から、交換先となるセグメントのキーワードのリストのみが排除されて腕インデックステーブルが作成される。さらに、セグメント間における交換にて得られた腕インデックステーブルとローカルインデックステーブルとに基づいて、キーワードとセグメント内に保管された該当文書の所在情報と交換相手のセグメントの保有情報とを記録したグローバルインデックステーブルが作成される。このグローバルインデックステーブルが参照されることでキーワードを含む該当文書の所在情報が抽出され検索結果リストが生成される。
【００１０】
請求項３に記載の発明によれば、セグメント単位でサーバが設けられるので、キーワードの検索処理がセグメント毎に分散して行われる。また、この場合、使用されるネットワーク資源は、セグメントの対象範囲に限定され、さらにセグメントの対象範囲が一台のコンピュータ内のみである場合には、ネットワークを使用することなく行えるので、その文書の収集によるネットワーク負荷を低減することが可能となる。
【００１１】
【発明の実施の形態】
以下、本発明を具体化した一実施形態を図面に従って説明する。
図１は、文書検索システムの概略構成を示すブロック図である。同システムにおいて、複数のセグメントＳａ〜Ｓｄが接続され、各セグメントＳａ〜Ｓｄは相互にデータのやり取りを行うことができるようになっている。具体的には、セグメントＳａは、セグメントＳｂ，Ｓｃ，Ｓｄと接続されている。また、セグメントＳｂ，Ｓｃ，Ｓｄは、セグメントＳａ以外に図示しない他のセグメントと接続されている。セグメントＳｂ〜Ｓｄは、セグメントＳａに対する近隣セグメントとして設定されている。
【００１２】
各セグメントＳａ〜Ｓｄは、電子化された多数の文書を有するコンピュータシステム上での任意の区分として設定される。セグメントＳａ〜Ｓｄは、例えば、単一のコンピュータにおけるディレクトリ毎、ローカルネットワークに接続されたコンピュータ毎、広域ネットワーク上のドメイン毎、或いは社内組織における管理部門毎のように、設定することができる。但し、実用的な文書検索システムを構築する場合、セグメント内の各文書が関連のある文書となるよう区分するとよい。
【００１３】
セグメントＳａは、ローカルインデックステーブルＬＩａ、グローバルインデックステーブルＧＩａ、腕インデックステーブルＡＩａ−ｂ，ＡＩａ−ｃ，ＡＩａ−ｄを含む。同セグメントＳａの近隣セグメントＳｂ〜Ｓｄも、同様に、ローカルインデックステーブルＬＩｂ〜ＬＩｄ、グローバルインデックステーブルＧＩｂ〜ＧＩｄ、腕インデックステーブルＡＩｂ−ａ，ＡＩｂ−ｘ，ＡＩｃ−ａ，ＡＩｃ−ｘ，ＡＩｄ−ａ、ＡＩｄ−ｘを含む。
【００１４】
ローカルインデックステーブルＬＩａ〜ＬＩｄは、キーワードと、文書の所在情報（アドレス）とを対応づけて示す一覧表（リスト）である。ローカルインデックステーブルＬＩａ〜ＬＩｄの生成時には、セグメントＳａ〜Ｓｄ内に保管されている複数の文書が収集され、各文書からキーワードが抽出される。そして、各キーワードについて、キーワードと、文書の所在を示すインデックス情報とがローカルインデックステーブルＬＩａ〜ＬＩｄに順次登録される。なお、文書の収集とローカルインデックステーブルＬＩａ〜ＬＩｄの生成は、セグメント単位で定期的に実施される。
【００１５】
ここで、日本語などの、単語を区切らずに書く言語の文書からキーワードを抽出するには、文書を単語（文節）に分割する必要がある。そのため、先ず、公知の形態素解析の手法により、文書が複数の単語（文節）に分割される。そして、分割された単語のうちの所望の単語がキーワードとして抽出され、インデックステーブルＬＩａ〜ＬＩｄに登録される。なお、形態素解析を行うものとしては、例えば、奈良先端科学技術大学院大学自然言語処理学講座からリリースされている日本語形態素解析システムの「茶筌」等がある。
【００１６】
また、分割された各単語において、どの単語をキーワードとしてインデックステーブルに登録するかについては、次のような方法が挙げられる。
（ａ）文書のデータ形式を利用して文書タイトルなどの有為と思われる構成要素に基づいてそれに関連する単語をキーワードとして登録する方法。
（ｂ）あらかじめ有為とみなされるキーワードの辞書を用意し、適合する単語のみをキーワードとして登録する方法。
（ｃ）（ｂ）とは逆に、接続詞などの頻出してキーワードとして意味の無い単語の辞書を用意し、それに合致する単語をキーワードから排除する方法。
（ｄ）収集範囲中の出現頻度などの統計的な評価によって、基準値以上の評価が得られた単語をキーワードとして登録する方法。
（ｅ）上記（ａ）〜（ｄ）を組み合わせた方法。
【００１７】
文書の所在を示すインデックス情報（所在情報）の一例として、UNIX(R)，Windows(R)などにおけるファイルシステムのパス、URL(Uniform Resource Locator)、URC(Uniform Resource Characteristic)などが挙げられる。なお、ローカルインデックステーブルＬＩａ〜ＬＩｄは、各セグメントＳａ〜Ｓｄ内の文書のみを対象として生成されるインデックステーブルである。
【００１８】
腕インデックステーブルＡＩａ−ｂ，ＡＩａ−ｃ，ＡＩａ−ｄ，ＡＩｂ−ａ，ＡＩｂ−ｘ，ＡＩｃ−ａ，ＡＩｃ−ｘ，ＡＩｄ−ａ、ＡＩｄ−ｘは、グローバルインデックステーブルＧＩａ〜ＧＩｄを生成するために近隣セグメント同士が交換するインデックステーブルである。これら腕インデックステーブルＡＩａ−ｂ，ＡＩａ−ｃ，ＡＩａ−ｄ，ＡＩｂ−ａ，ＡＩｂ−ｘ，ＡＩｃ−ａ，ＡＩｃ−ｘ，ＡＩｄ−ａ、ＡＩｄ−ｘは、ローカルインデックステーブルＬＩａ〜ＬＩｄに登録されたキーワードに基づいて生成される。
【００１９】
腕インデックステーブルＡＩａ−ｂ，ＡＩａ−ｃ，ＡＩａ−ｄ，ＡＩｂ−ａ，ＡＩｂ−ｘ，ＡＩｃ−ａ，ＡＩｃ−ｘ，ＡＩｄ−ａ、ＡＩｄ−ｘは、各セグメントＳａ〜Ｓｄにおいて、それらセグメントにおける近隣セグメントの数だけ生成される。具体的には、セグメントＳａでは、近隣セグメントＳｂ〜Ｓｃが設定されているため、それら各近隣セグメントＳｂ〜Ｓｃと交換するための３つの腕インデックステーブルＡＩａ−ｂ，ＡＩａ−ｃ，ＡＩａ−ｄが生成される。また、交換先となる近隣セグメントＳｂ〜Ｓｄ側でも腕インデックステーブルＡＩｂ−ａ，ＡＩｃ−ａ，ＡＩｄ−ａが生成される。つまり、腕インデックステーブルＡＩａ−ｂ，ＡＩａ−ｃ，ＡＩａ−ｄ及び腕インデックステーブルＡＩｂ−ａ，ＡＩｃ−ａ，ＡＩｄ−ａは、セグメントＳａと近隣セグメントＳｂ〜Ｓｄとを結ぶ腕（接続部）の両端で作られている。
【００２０】
腕インデックステーブルＡＩａ−ｂ，ＡＩａ−ｃ，ＡＩａ−ｄ，ＡＩｂ−ａ，ＡＩｂ−ｘ，ＡＩｃ−ａ，ＡＩｃ−ｘ，ＡＩｄ−ａ、ＡＩｄ−ｘは、キーワードの一覧表（リスト）であり、彼岸腕インデックステーブルと呼ぶものと、此岸腕インデックステーブルと呼ぶものとに分けられる。具体的には、セグメントＳａにおけるインデックス更新処理で生成した近隣セグメントＳｂに対する腕インデックステーブルＡＩａ−ｂを彼岸腕インデックステーブルと呼ぶ。一方、近隣セグメントＳｂにおけるインデックス更新処理にて生成された、こちらのセグメントＳａに対する腕インデックステーブルＡＩｂ−ａを此岸腕インデックステーブルと呼ぶ。同様に、セグメントＳａの近隣セグメントＳｃ，Ｓｄに対する腕インデックステーブルＡＩａ−ｃ，ＡＩａ−ｄを彼岸腕インデックステーブルと呼び、セグメントＳａに対する腕インデックステーブルＡＩｃ−ａ，ＡＩｄ−ａを此岸腕インデックステーブルと呼ぶ。
【００２１】
近隣セグメントＳｂに対する彼岸腕インデックステーブルＡＩａ−ｂは、セグメントＳｂにおける此岸腕インデックステーブルＡＩｂ−ａ以外の此岸腕インデックステーブルＡＩｃ−ａ，ＡＩｄ−ａと、ローカルインデックステーブルＬＩａに存在するキーワードのリストの総和として生成される。彼岸腕インデックステーブルＡＩａ−ｃは、此岸腕インデックステーブルＡＩｂ−ａ，ＡＩｄ−ａと、ローカルインデックステーブルＬＩａに存在するキーワードのリストの総和として生成される。彼岸腕インデックステーブルＡＩａ−ｄは、此岸腕インデックステーブルＡＩｂ−ａ，ＡＩｃ−ａと、ローカルインデックステーブルＬＩａとに存在するキーワードのリストの総和として生成される。つまり、セグメント自身のキーワードのリスト（ローカルインデックステーブルのキーワードのリスト）と、近隣セグメントにおけるキーワードのリスト（此岸腕インデックステーブル）との総和から、交換先となる近隣セグメントのキーワードのリストのみを排除して彼岸腕インデックステーブルが生成される。
【００２２】
彼岸腕インデックステーブルＡＩａ−ｂ，ＡＩａ−ｃ，ＡＩａ−ｄは、各近隣セグメントＳｂ〜ＳｄのためにセグメントＳａ側で生成する腕インデックステーブルであり、交換対象となる近隣セグメントＳｂ〜Ｓｄ側から見た場合には、此岸腕インデックステーブルとなる。そして、近隣セグメントＳｂ〜Ｓｄ側では、上記と同様に、此岸腕インデックステーブルＡＩａ−ｂ，ＡＩａ−ｃ，ＡＩａ−ｄ等とローカルインデックステーブルＬＩｌｂ〜ＬＩｄとに存在するキーワードのリストの総和として彼岸腕インデックステーブルＡＩｂ−ａ，ＡＩｃ−ａ，ＡＩｄ−ａ等が生成される。
【００２３】
このように彼岸腕インデックステーブルを生成した場合、一方のセグメントで生成した他方のセグメントのための彼岸腕インデックステーブルは、その一方のセグメントと、該一方のセグメントを経由して他方のセグメントに接続する他の複数のセグメントとにおけるキーワードのリストになる。例えば、セグメントＳａにおける近隣セグメントＳｂのための彼岸腕インデックステーブルＡＩａ−ｂは、セグメントＳａと、そのセグメントＳａを経由してセグメントＳｂに接続する各セグメントＳｃ，Ｓｄ等におけるローカルインデックステーブルＬＩａ，ＬＩｃ，ＬＩｄ等のキーワードのリストになる。また逆に、セグメントＳｂにおける近隣セグメントＳａのための彼岸腕インデックステーブルＡＩｂ−ａは、セグメントＳｂと、そのセグメントＳｂを経由してセグメントＳａに接続する各セグメント（図示しない）とにおけるローカルインデックステーブルＬＩｂ等のキーワードのリストになる。
【００２４】
セグメントＳａにおけるグローバルインデックステーブルＧＩａは、ローカルインデックステーブルＬＩａに、此岸腕インデックステーブルＡＩｂ−ａ，ＡＩｃ−ａ，ＡＩｄ−ａの内容を追加したものである。同様に、各セグメントＳｂ〜ＳｄのグローバルインデックステーブルＧＩｂ〜ＧＩｄは、ローカルインデックステーブルＬＩｂ〜ＬＩｄに、此岸腕インデックステーブルＡＩａ−ｂ，ＡＩａ−ｃ，ＡＩａ−ｄ等の内容を追加したものである。
【００２５】
グローバルインデックステーブルＧＩａ〜ＧＩｄには、ローカルインデックステーブルＬＩａ〜ＬＩｄの情報（セグメントＳａ〜Ｓｄ内における文書の所在を示すインデックス情報）に加え、近隣セグメント識別コードが登録される。グローバルインデックステーブルＧＩａ〜ＧＩｄにおいて、近隣セグメント識別コードは、近隣セグメントがキーワードの該当文書を保有する旨を示す情報として登録されている。例えば、セグメントＳａにおけるグローバルインデックステーブルＧＩａについて、近隣セグメントＳｂの此岸腕インデックステーブルＡＩａ−ｂのキーワードを登録する場合、そのキーワードとともに近隣セグメントＳｂの識別コードが登録される。そして、キーワードの検索時には、グローバルインデックステーブルＧＩａ〜ＧＩｄに基づいて検索結果リストが生成される。
【００２６】
具体的には、例えば、グローバルインデックステーブルＧＩａから、指定されたキーワードが検索され、該当文書の所在がセグメントＳａ内なら、そのまま文書の所在情報が検索結果リストに追加される。また、グローバルインデックステーブルＧＩａにおいて、近隣セグメントＳｂの識別コードが登録されていれば、近隣セグメントＳｂのグローバルインデックステーブルＧＩｂについても同様に、指定されたキーワードが検索され、その該当文書の所在情報が検索結果リストへ追加される。このように、グローバルインデックステーブルの情報に基づいて近隣セグメント間を辿ってキーワードの検索処理が繰り返し実施される。そして、経由したセグメント数が指定された上限に達した場合、近隣セグメント識別コードで示されるセグメントが検索を開始したセグメントである場合等になるまでキーワードの検索処理が実施される。
【００２７】
図２は、セグメントＳａの概略構成図である。セグメントＳａは、本実施形態ではパーソナルコンピュータ１１により構成されている。尚、セグメントＳａをワークステーション等の汎用的な目的で使用される計算機により構成しても良い。
【００２８】
従って、文章検索システムは複数のパソコン１１を連携させ構築されている。図１における他のセグメントＳｂ〜Ｓｄは、セグメントＳａと同様に構成されているため、図面及び説明を省略する。
【００２９】
パソコン１１は、操作入力部１２、表示部１３、制御部１４、記憶部１５等を備える。操作入力部１２は、キーボード、マウス装置等を含み、文書検索プログラムの起動、キーワードの入力等のユーザからの要求や指示に用いられる。表示部１３は、例えば、ＣＲＴ，ＬＣＤ，ＰＤＰ等により構成され、キーワード入力画面の表示、検索結果の表示等に用いられる。
【００３０】
制御部１４は、パソコン１１を統括的に制御する周知の中央処理装置（ＣＰＵ）、文書検索プログラムを格納した記録装置等により構成される。制御部１４におけるＣＰＵは、文書検索プログラムを実行することにより、ローカルインデックス作成手段２１、彼岸腕インデックス作成手段２２、グローバルインデックス作成手段２３、キーワード検索手段２４、問い合わせ発行手段２５として機能する。
【００３１】
記憶部１５は、パソコン１１に内蔵された磁気ディスク装置（ハードディスク）により構成されている。記憶部１５は、ローカルファイルシステム３１、セグメント定義テーブル３２、文書テーブル３３、ローカルインデックステーブル３４、近隣セグメントテーブル３５、グローバルインデックステーブル３６、此岸腕インデックステーブル３７、彼岸腕インデックステーブル３８、問い合わせ履歴リスト３９、検索結果リスト４０等を含む。なお、記憶部１５としては、光ディスク装置、光磁気ディスク装置を使用してもよい。勿論、パソコン１１に外付けされたディスク装置を用いてもよく、複数のディスク装置により記憶部１５を構成してもよい。
【００３２】
ローカルファイルシステム３１には、文書管理領域として複数のディレクトリが設けられており、電子化された各種文書が文書内容に応じて各ディレクトリに格納されている。セグメント定義テーブル３２は、ローカルファイルシステム３１において検索対象となるディレクトリの所在を示すパス文字列のリストである。
【００３３】
先ず、ローカルインデックステーブル３４の作成方法について説明する。
ローカルインデックス作成手段２１は、セグメント定義テーブル３２からパス文字列を順次取り出し、ローカルファイルシステム３１上において検索対象となるディレクトリのパスを認識する。そして、ローカルインデックス作成手段２１は、それらパスを起点として、各ディレクトリに格納されている文書を収集して文書テーブル３３を作成する。なお、セグメント定義テーブル３２のパス文字列で示されるディレクトリに下位ディレクトリが存在する場合には、そのパス文字列が起点パスとなり、ローカルファイルシステム３１におけるファイル管理情報を利用することで、下位ディレクトリの文書についても収集できるようになっている。
【００３４】
文書テーブル３３は、同一の文書へのパスを文字列として扱う場合の冗長性を回避するために作成されるテーブルであり、文書の所在を示すパス（文書パス文字列）と、その文書に対応させる文書コード（整数値のならび）とからなる一覧表（リスト）である。単に、文書パス文字列のリストとし、リスト上の順番を示す数値を文書コードとしてもよい。なお、文書パス文字列は、パソコン１１のローカルファイルシステム３１におけるパス表現を使う。
【００３５】
ローカルインデックス作成手段２１は、セグメント定義テーブル３２に基づき収集した各文書について、上述した日本語形態素解析システム等を使用して、単語の切り出し、品詞の判定を行った後、固有名詞をキーワードとして抽出する。そして、抽出したキーワードに基づいて、ローカルインデックステーブル３４を作成する。
【００３６】
図３に示すように、ローカルインデックステーブル３４は、キーワードＫＷ（ＫＷ１，ＫＷ２，…）の文字列と、そのキーワードを含む文書の文書コードＣＤ（ＣＤ１，ＣＤ２，…，ＣＤｘ）とを構成要素とするテーブルである。このローカルインデックステーブル３４の作成に際しては、各キーワードＫＷについて、ローカルインデックステーブル３４におけるエントリの有無が探索され、エントリが有ればそのテーブル３４上のキーワードＫＷに対して文書コードＣＤが追加される。エントリが無ければそのキーワードＫＷの項目を新たに追加し、キーワードＫＷに対する文書コードＣＤを書き込む。このキーワードＫＷ及び文書コードＣＤの追加は、各文書から抽出した全キーワードＫＷについて繰り返し実施される。これにより、セグメント定義テーブル３２で定義された範囲内にある各文書のローカルインデックステーブル３４が作成される。なお、このローカルインデックステーブル３４上から任意のキーワードＫＷを高速に検索する手法としては、二分木法などが知られている。
【００３７】
近隣セグメントテーブル３５は、近隣セグメントとするセグメントの設定であって、近隣セグメントとして設定するパソコン１１のネットワークアドレスのリストと、そのパソコン１１に対応付けられた近隣セグメント識別コードとのならびである。なおここで、関連のある文書を保管するパソコン１１同士を近隣セグメントとして設定している。近隣セグメント識別コードは、近隣セグメントを識別するためのコードであり、本実施形態では、文書コードＣＤで使用するコードの一定領域を予約して割り付けている。
【００３８】
次に、腕インデックステーブルの作成方法について説明する。
彼岸腕インデックス作成手段２２は、ローカルインデックステーブル３４、近隣セグメントテーブル３５、此岸腕インデックステーブル３７に基づいて、各近隣セグメントの彼岸腕インデックステーブル３８を生成する。
【００３９】
具体的に、彼岸腕インデックス作成手段２２は、近隣セグメントテーブル３５に基づいて近隣セグメントとして設定された他のパソコン１１を認識し、各パソコン１１から此岸腕インデックステーブル３７を受信して記憶部１５に格納する。また、彼岸腕インデックス作成手段２２は、ローカルインデックステーブル３４に存在するキーワードを抽出してそのキーワードの一覧表（リスト）を作成する。さらに、彼岸腕インデックス作成手段２２は、腕インデックステーブルの交換対象となる近隣セグメントから受信した此岸腕インデックステーブル３７を除く他の此岸腕インデックステーブル３７の内容をキーワードＫＷのリストに追加する。これにより、交換対象となる近隣セグメントに対する彼岸腕インデックステーブル３８が生成される。彼岸腕インデックステーブル３８は、近隣セグメントテーブル３５において近隣セグメントとして設定された各パソコン１１について生成される。
【００４０】
彼岸腕インデックステーブル３８は、各近隣セグメントについて作成する腕インデックステーブルであり、交換対象となる近隣セグメント側から見た場合は、此岸腕インデックステーブル３７となる。つまり、近隣セグメント側での腕インデックステーブルの作成時には、彼岸腕インデックステーブル３８が近隣セグメントに転送され、近隣セグメントでは、その腕インデックステーブル３８が此岸腕インデックステーブルとして使用される。
【００４１】
次に、グローバルインデックステーブル３６の作成方法について説明する。
グローバルインデックス作成手段２３は、文書テーブル３３、ローカルインデックステーブル３４、近隣セグメントテーブル３５、此岸腕インデックステーブル３７に基づいて、グローバルインデックステーブル３６を生成する。図４に示すように、グローバルインデックステーブル３６は、図３のローカルインデックステーブル３４の内容に加え、近隣セグメント識別コードＣＳ（ＣＳｂ，ＣＳｄ、ＣＳｃ,…）が入力されている。つまり、グローバルインデックステーブル３６において、ローカルファイルシステム３１上に保管されている文書は文書コードＣＤによりその所在が示され、一方、近隣セグメントに保管されている文書は、文書コードＣＤではなく近隣セグメント識別コードＣＳによりその所在が示される。
【００４２】
具体的に、グローバルインデックス作成手段２３は、先ず、ローカルインデックステーブル３４の内容をそのままグローバルインデックステーブル３６にコピーする。その後、グローバルインデックス作成手段２３は、各此岸腕インデックステーブル３７について、キーワードＫＷを順に取り出す。ここで、各キーワードＫＷについて、グローバルインデックステーブル３６におけるキーワードＫＷのエントリの有無が探索され、エントリが有ればそのテーブル３６上のキーワードＫＷに対して近隣セグメント識別コードＣＳが追加される。エントリが無ければそのキーワードＫＷの項目を新たに追加して、キーワードＫＷに対する近隣セグメント識別コードＣＳを書き込む。このキーワードＫＷ及び近隣セグメント識別コードＣＳの追加は、近隣セグメントから受信した各此岸腕インデックステーブル３７の全キーワードＫＷについて繰り返し実施される。これにより、グローバルインデックステーブル３６が作成される。
【００４３】
ここで、ローカルファイルシステム３１上に所定のキーワードＫＷを含む文書が複数存在する場合、グローバルインデックステーブル３６には、それら文書の所在を示す複数の文書コードＣＤがキーワードＫＷのインデックス情報として登録される。また、近隣セグメントにおいて所定のキーワードＫＷを含む文書が存在する場合、グローバルインデックステーブル３６には、その近隣セグメントの識別コードＣＳが登録される。なお、近隣セグメントにおいてキーワードＫＷの該当文書が複数存在する場合、グローバルインデックステーブル３６にはそのキーワードＫＷに対して同一の近隣セグメント識別コードＣＳが重複して登録されることはない。
【００４４】
次に、キーワードＫＷの検索方法について説明する。
本実施形態の文書検索システムでは、キーワード検索を開始するパソコン１１（開始セグメント）が指定され、同パソコン１１におけるキーワード検索が実施された後、近隣セグメントとして設定された他のパソコン１１を辿ってキーワード検索が実施される。この文書検索システムにおいて、経由する近隣セグメント（パソコン１１）の数の上限を示す整数（探索段数）により、キーワードの検索範囲が指定される。
【００４５】
詳述すると、問い合わせ発行手段２５は、問い合わせ先パソコン１１、キーワードＫＷ、探索段数の入力を促す入力画面を表示部１３に表示させる。そして、ユーザにより操作入力部１２が操作され、その操作により、問い合わせ先のパソコン１１、検索キーワードＫＷ、探索段数といった入力データが問い合わせ発行手段２５に入力される。問い合わせ発行手段２５は、これら入力データにより問い合わせ識別コードを自動的に生成して、問い合わせ先パソコン１１に対して問い合わせ識別コードとともに上記入力データを発行する。なお、問い合せ識別コードは、問い合せを発行したホストとなるパソコン１１のアドレスと、発行時間とからなる識別コードである。従って、別の時間に同一キーワードＫＷに対する問い合わせが発行された場合には、異なる問い合わせ識別コードが生成される。
【００４６】
キーワード検索手段２４は、問い合わせ発行手段２５からの問い合せ（キーワードＫＷ、探索段数、問い合せ識別コード）に応答してキーワード検索を開始する。キーワード検索手段は、他のパソコン１１からの通信による問い合せに備えて常時待機しており、他のパソコン１１のキーワード検索手段２４や問い合わせ発行手段２５からの問い合せを受信した場合にも検索を開始する。
【００４７】
キーワード検索手段２４は、先ず、問い合せ履歴リスト３９を参照する。問い合せ履歴リスト３９は、過去に処理した問い合わせの識別コードの一覧表であり、新しい問い合せであった場合は、その問い合せ識別コードが問い合わせ履歴リストに追加される。キーワード検索手段２４は、グローバルインデックステーブル３６から、指定されたキーワードＫＷを検索する。ここで、ローカルファイルシステム３１においてそのキーワードＫＷの該当文書が存在した場合（文書コードＣＤが見つかった場合）は、文書テーブル３３を参照して、その文書のローカルパスと、自分のネットワークアドレスとを検索結果リスト４０に追加する。
【００４８】
検索結果リスト４０は、文書が存在していたパソコン１１のネットワークアドレスと、文書の所在を示すパス文字列とからなる。キーワード検索手段２４は、ローカルファイルシステム３１に存在する全ての該当文書に関するパス文字列を検索結果リスト４０に登録した後、問い合せ識別コードに基づいて問い合せを発行したパソコン（ホストパソコン）１１を認識し、同パソコン１１に検索結果リスト４０を返信する。
【００４９】
また、グローバルインデックステーブル３６において、近隣セグメント識別コードＣＳがみつかった場合、キーワード検索手段２４は、対応する近隣セグメント（パソコン１１）に問い合わせを発行する。このとき、探索段数が１減算されて、その探索段数が問い合せ識別コード、キーワードＫＷとともに送られる。
【００５０】
近隣セグメント（パソコン１１）におけるキーワード検索手段２４は、上記と同様に、グローバルインデックステーブル３６から、指定されたキーワードＫＷを検索する。そして、ローカルファイルシステム３１上の該当文書に関する検索結果リスト４０を生成した後、該検索結果リスト４０を問い合せを発行したパソコン１１に返信する。
【００５１】
問い合わせに含まれる探索段数が０であった場合、キーワード検索手段２４は、近隣セグメント（パソコン１１）への問い合せは行わず、グローバルインデックステーブル３６における近隣セグメント識別コードＣＳを無視する。なおこの場合にも、キーワード検索手段２４は、グローバルインデックステーブル３６に基づいてローカルファイルシステム３１における該当文書の所在を探索して検索結果リスト４０を生成した後、該リスト４０を問い合せを発行したパソコン１１に返信する。
【００５２】
また、既に同一の問い合せを処理していた場合、問い合わせ履歴リスト３９にはその問い合わせの識別コードが登録されている。そのため、キーワード検索手段２４は、上記のグローバルインデックステーブル３６に基づく検索処理を実施することなく、空の検索結果リスト４０をホストパソコン１１に返信する。文書検索システムにおける近隣セグメント（パソコン１１）の設定等によっては、近隣セグメントを経由する際に、所定のセグメントに対して同じ問い合わせが繰り返し発行される場合がある。よって、問い合わせ識別コードにより同一の問い合せ認識し、空の検索結果リスト４０を返信することにより、キーワード検索手段２４において、検索処理が重複して実施されることが回避される。
【００５３】
問い合せを発行したホストパソコン１１において、検索結果リスト４０を受信すると、問い合わせ発行手段２５は、検索結果リスト４０に関する表示画面を表示部１３に表示させる。また、問い合せ発行手段２５は、ユーザの選択に応じて、指定された文書を受信して、該文書を表示部１３に表示させる。
【００５４】
以上記述したように、上記実施形態によれば、下記の効果を奏する。
（１）検索キーワードＫＷと、検索を開始する開始セグメント（パソコン１１）と、セグメントの探索段数とが入力され、セグメントの探索段数にて指定される検索範囲内において近隣セグメントとして設定された各他のパソコン１１を辿ってキーワード検索が行われる。つまり、開始セグメント（パソコン１１）と検索範囲を指定することにより、検索キーワードＫＷに関するインデックスの検索範囲を適切に調整できる。従って、同一のキーワードＫＷを含む無関係な文書を検索結果リスト４０から排除することが可能となり、キーワードＫＷの該当文書を的確に検索できる。
【００５５】
（２）グローバルインデックステーブル３６には、近隣のセグメントがキーワードＫＷの該当文書を保有する旨を示す近隣セグメント識別コードＣＳが登録され、近隣セグメント識別コードＣＳに基づいて、近隣セグメントとして設定された各他のパソコン１１を辿ってキーワードの検索が行われる。ここで、近隣セグメントにキーワードＫＷを含む文書が複数あったとしても、グローバルインデックステーブル３６には文書の所在の有無を示す１つの近隣セグメント識別コードＣＳが登録されるだけである。このようにすれば、コンピュータシステムにおける保管文書が増大したとしても、グローバルインデックステーブル３６の肥大化を抑制することができる。
【００５６】
（３）問い合わせ発行手段２５において、入力情報に基づく問い合わせ識別コードが発行され、該問い合わせ識別コードに関するキーワード検索時にその識別コードが問い合わせ履歴リスト３９に登録される。そして、その履歴リスト３９を参照することにより、キーワード検索を重複して行うことが回避できる。
【００５７】
（４）セグメントとして設定されたパソコン１１毎に、記憶部１５、ローカルインデックス作成手段２１、彼岸腕インデックス作成手段２２、グローバルインデックス作成手段２３、キーワード検索手段２４、問い合わせ発行手段２５等が設けられている。この場合、各インデックステーブル３４、３６，３８の更新処理や、キーワードの検索処理が各パソコン１１にて分散して行われるので、パソコン１１間を接続するネットワークの負荷を軽減でき、実用上好ましいものとなる。
【００５８】
（５）本実施形態では、文書の所在を示すパスと、その文書に対応させる文書コードＣＤとからなる文書テーブル３３を作成し、文書テーブル３３における文書コードＣＤを用いて、ローカルインデックステーブル３４やグローバルインデックステーブル３６が作成されている。このようにすれば、同一の文書へのパスを文字列として扱う場合の冗長性を回避することができる。よって、インデックステーブル３４，３６のために必要となる記憶領域の増大を抑制することができる。
【００５９】
上記実施の形態は、次に示すように変更することもできる。
・上記実施形態では個人のパソコン１１を１つのセグメントとして具体化したが、ネットワーク上の適当な構成単位（サブネットやドメイン等）を一つのセグメントとして具体化してもよい。この場合、ローカルインデックス作成手段２１、彼岸腕インデックス作成手段２２、グローバルインデックス作成手段２３、キーワード検索手段２４等を有するサブネット単位で配置する。そして、腕インデックステーブルの交換や近隣セグメントへの問い合わせ等の処理については、サーバ間の通信にて行うようにする。このようにすると、サブネットやドメイン単位で、インデックスの作成処理やキーワード検索処理を分散することができる。またこの場合、サブネット間を接続するネットワークを使用することなく文書の収集が行えるので、その文書の収集によるネットワーク負荷を低減することができる。さらに、ネットワークに接続するサブネットを新たに増設する場合には、そのサブネットのためのサーバを追加すればよく、実用上好ましいものとなる。
【００６０】
また、一台のパソコンが複数のセグメントを扱うような構成にしてもよい。さらに、一台のパソコン内の文書が異なるセグメントに属してもよい。セグメント及びセグメント処理手段の構成は、処理効率と管理の都合によって任意に設定できる。
【００６１】
・上記実施形態において、問い合わせ発行手段２５は、各パソコン１１が持つ構成であるが、これに限定されるものではない。つまり、文書検索システムを構成する全てのパソコン１１に設ける必要はなく、それらパソコン１１のうちの少なくとも一つに設けるようにすればよい。また、ローカルインデックス作成手段２１、彼岸腕インデックス作成手段２２、グローバルインデックス作成手段２３、キーワード検索手段２４を有するパソコン１１とは別に、問い合わせ発行手段２５のみを有するパソコンを文書検索システムに設けてもよい。
【００６２】
・上記第実施形態では、文書の所在を示すパスと、その文書に対応させる文書コードＣＤとからなる文書テーブル３３を作成し、文書テーブル３３における文書コードＣＤを用いて、ローカルインデックステーブル３４やグローバルインデックステーブル３６を作成したが、これに限定するものではない。ローカルインデックステーブル３４やグローバルインデックステーブル３６は、文書テーブル３３の文書コードＣＤを用いずに、文書の所在を示すパス（文書パス文字列）を用いて作成してもよい。
【００６３】
以上の様々な実施の形態をまとめると、以下のようになる。
（付記１）コンピュータシステム上に保管されている文書を検索するための文書検索方法であって、
前記コンピュータシステムにおける文書の管理単位としてセグメントが設定され、
前記セグメントは、キーワードと、セグメント内に保管された前記キーワードを含む該当文書の所在情報と、近隣のセグメントが該当文書を保有する旨を示す近隣セグメント保有情報とを記録したインデックステーブルを含み、
前記コンピュータシステムにおいて、検索キーワードと、検索を開始する開始セグメントと、該開始セグメントを起点としたセグメントの探索段数とに基づいて、前記探索段数で指定される検索範囲内のセグメントに対して前記インデックステーブルの近隣セグメント保有情報に基づくキーワード検索を行い、該各セグメントの前記インデックステーブルからキーワードの該当文書の所在情報を抽出して検索結果リストを生成することを特徴とする文書検索方法。
（付記２）前記コンピュータシステムを構成するコンピュータは、
セグメント単位で文書の収集を行い、文書中に存在するキーワードと、該キーワードを含む該当文書の所在情報とを記録したローカルインデックステーブルを作成し、
前記ローカルインデックステーブルに存在するキーワードのリストについて、セグメント自身のキーワードのリストと、近隣のセグメントのキーワードのリストとの総和から、交換先となるセグメントのキーワードのリストのみを排除して腕インデックステーブルを作成し、
前記交換先となるセグメントとの間で腕インデックステーブルを交換し、その交換した腕インデックステーブルと前記ローカルインデックステーブルとに基づいて、前記キーワードと、前記セグメント内に保管された該当文書の所在情報と、交換相手のセグメントの保有情報とを記録したグローバルインデックステーブルを作成し、
前記グローバルインデックステーブルを参照することで前記検索キーワードを含む該当文書の所在情報を抽出して検索結果リストを生成することを特徴とする付記１に記載の文書検索方法。
（付記３）前記コンピュータシステムにおいてセグメント単位でサーバが設けられ、サーバ間の通信によって、前記腕インデックステーブルの交換と、他のセグメントにおけるキーワードの検索を行うようにしたことを特徴とする付記２に記載の文書検索方法。
（付記４）前記文書の所在を示すパスと、その文書に対応させる文書コードとからなる文書テーブルを作成し、該文書テーブルにおける文書コードを用いて、前記インデックステーブルを作成するようにしたことを特徴とする付記１〜３のいずれかに記載の文書検索方法。
（付記５）入力情報に対応する問い合わせ識別コードを発行し、前記問い合わせ識別コードに関するキーワード検索時にその識別コードを問い合わせ履歴リストに登録するようにしたことを特徴とする付記１〜４のいずれかに記載の文書検索方法。
（付記６）コンピュータシステム上に保管されている文書を検索するための文書検索プログラムであって、
コンピュータに、
前記文書の管理単位としてのセグメントを設定し、セグメント単位で文書の収集を行い、キーワードと、前記セグメント内に保管された前記キーワードを含む該当文書の所在情報と、近隣のセグメントが該当文書を保有する旨を示す近隣セグメント保有情報とを記録したインデックステーブルを作成する手段と、
検索キーワードと、検索を開始する開始セグメントと、該開始セグメントを起点としたセグメントの探索段数とを入力情報として取り込む手段と、
前記探索段数で指定される検索範囲内のセグメントに対して前記インデックステーブルの近隣セグメント保有情報に基づくキーワード検索を行い、該各セグメントの前記インデックステーブルから検索キーワードの該当文書の所在情報を抽出して検索結果リストを生成する手段として機能させること
を特徴とする文書検索プログラム。
（付記７）コンピュータシステム上に保管されている文書を検索する文書検索システムであって、
前記コンピュータシステムにおける文書の管理単位としてのセグメントを設定するためのテーブルが記憶手段に記憶され、
前記セグメントは、キーワードと、セグメント内に保管された前記キーワードを含む該当文書の所在情報と、近隣のセグメントが該当文書を保有する旨を示す近隣セグメント保有情報とを記録したインデックステーブルを含み、
検索キーワードと、検索を開始する開始セグメントと、該開始セグメントを起点としたセグメントの探索段数とを入力情報として問い合わせを行う問い合わせ発行手段と、
前記インデックステーブルを参照し、前記検索キーワードの該当文書の所在情報を抽出して検索結果リストを生成するキーワード検索手段と、
を備え、前記探索段数で指定される検索範囲内のセグメントに対し前記インデックステーブルの近隣セグメント保有情報に基づいて各他のセグメントを辿ってキーワード検索を行うようにしたことを特徴とする文書検索システム。
（付記８）セグメント単位で文書の収集を行い、文書中に存在するキーワードと、該キーワードを含む該当文書の所在情報とを記録したローカルインデックステーブルを作成するローカルインデックス作成手段と、
前記ローカルインデックステーブルに存在するキーワードのリストについて、セグメント自身のキーワードのリストと、近隣のセグメントのキーワードのリストとの総和から、交換先となるセグメントのキーワードのリストのみを排除して腕インデックステーブルを作成する腕インデックス作成手段と、
セグメント間における交換により得られた腕インデックステーブルと、前記ローカルインデックステーブルとに基づいて、前記キーワードと、前記セグメント内に保管された該当文書の所在情報と、交換相手のセグメントの保有情報とを記録したグローバルインデックステーブルを作成するグローバルインデックス作成手段と
を備えることを特徴とする付記７に記載の文書検索システム。
【００６４】
【発明の効果】
以上詳述したように、本発明によれば、キーワード検索を的確に行うことができる文書検索方法、文書検索プログラム及び文書検索システムを提供することができる。
【図面の簡単な説明】
【図１】一実施形態の文書検索システムを示す概略構成図である。
【図２】文書検索システムを構成するパソコンの概略構成図である。
【図３】ローカルインデックステーブルの説明図である。
【図４】グローバルインデックステーブルの説明図である。
【符号の説明】
１１セグメントとしてのパソコン
１５記憶手段としての記憶部
２１ローカルインデックス作成手段
２２腕インデックス作成手段としての彼岸腕インデックス作成手段
２３グローバルインデックス作成手段
２４キーワード検索手段
２５問い合わせ発行手段
３２セグメント定義テーブル
３４ローカルインデックステーブル
３５近隣セグメントテーブル
３６グローバルインデックステーブル
３７此岸腕インデックステーブル
３８彼岸腕インデックステーブル
３９問い合わせ履歴リスト
４０検索結果リスト
ＣＳ識別コード
ＫＷキーワード
Ｓａ，Ｓｂ，Ｓｃ，Ｓｄセグメント
ＧＩａ，ＧＩｂ，ＧＩｃ，ＧＩｄグローバルインデックステーブル
ＬＩａ，ＬＩｂ，ＬＩｃ，ＬＩｄローカルインデックステーブル
ＡＩａ−ｂ，ＡＩａ−ｃ，ＡＩａ−ｄ，ＡＩｂ−ａ，ＡＩｂ−ｘ，ＡＩｃ−ａ，ＡＩｃ−ｘ，ＡＩｄ−ａ，ＡＩｄ−ｘ腕インデックステーブル[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a document search method, a document search program, and a document search method for searching a required document by specifying a predetermined keyword as a search condition from a number of digitized documents.
[0002]
In recent years, the number of stored documents in a computer system has increased due to an increase in the size of a database, the introduction of network technology, and the like, and a technique for efficiently retrieving a desired document from a large number of documents in the system is desired.
[0003]
[Prior art]
The document search system searches for a document that meets a search condition specified by a searcher from an arbitrary range, and presents the location as a list. Specifically, for example, when the search range such as searching for a desired document from documents stored in a predetermined personal computer is relatively narrow and the number of documents and the total amount of document data are not large, the document The search system creates a list by actually examining all document data every time a condition is specified. However, when the search range extends over a wide range, such as via a network, this method results in a very long response time for a single search and a large processing load on the entire system. Not. For this reason, in a search system that targets a large amount of documents such as documents on a network, when a practical response speed is to be realized, the documents in the search range are periodically checked, and keywords specified as search conditions, etc. An index table is created for associating this information with the location (address) of the document. Then, when a search request is made, a method is employed in which the index table is used for comparison with search information and a list of addresses of corresponding documents is returned.
[0004]
In a general document search system, collection of documents in a search range, creation of an index, and aggregation and response of an index according to a search request may be performed by a single computer, or a plurality of clusters may be clustered. In some cases, it is done by a computer, but basically it is done centrally in one place.
[0005]
Since resources on the network to be searched continue to increase, in the centralized search system, the enlargement of the index table is a future problem. For this reason, several systems have been proposed in which the search range is divided by a plurality of servers and these search in cooperation. Japanese Patent Application Laid-Open No. 8-255178 discloses a method for evaluating the relevance of a document, recording it, and tracing a closely related document, so that a user can create a deeply related document from one document. A technique for enabling effective browsing is also disclosed.
[0006]
[Problems to be solved by the invention]
By the way, for example, in the case of a keyword that is used for a specific meaning in a specific field, there is another meaning that is widely known and easy to get to the topic. Documents that are used in meaning (unnecessary documents) are hit in large numbers. Therefore, in order to search for a document having a keyword having a meaning that is originally desired to be searched, it is necessary to devise such as separately specifying a keyword for narrowing down a field. Also, depending on the algorithm of the document search system, when a large number of documents for a keyword are hit, what is determined to be a mediocre document (page) for a major keyword that is widely known is excluded from the search results. There is also a possibility of end.
[0007]
The present invention has been made to solve the above-described problems, and an object of the present invention is to provide a document search method, a document search program, and a document search system that can accurately perform keyword search.
[0008]
[Means for Solving the Problems]
In order to achieve the above object, according to the first, fourth, and fifth aspects of the present invention, a search keyword, a start segment for starting a search, and the number of search stages of the segment starting from the start segment are input. A keyword search based on neighboring segment possession information in the index table is performed on a segment within the search range specified by the number of segment search stages. Then, the location information of the corresponding document of the search keyword is extracted from the index table of each segment, and a search result list is generated. That is, by specifying the start segment and the search range, the search range of the index related to the search keyword is appropriately adjusted. Therefore, by setting a segment in the management unit of related documents, it is possible to exclude irrelevant documents including the same keyword from the search result list, and the corresponding document of the keyword is accurately searched.
[0009]
According to the second aspect of the present invention, documents are collected on a segment basis, and a local index table is created in which keywords existing in the documents and location information of the corresponding documents including the keywords are recorded. Then, for the keyword list existing in the local index table, only the keyword list of the segment to be exchanged is excluded from the sum of the keyword list of the segment itself and the keyword list of the neighboring segment, and the arm index table Is created. Furthermore, based on the arm index table and local index table obtained by exchange between segments, a global index that records keywords, location information of corresponding documents stored in the segment, and possession information of the exchange partner's segment A table is created. By referring to the global index table, the location information of the corresponding document including the keyword is extracted and a search result list is generated.
[0010]
According to the third aspect of the present invention, since the server is provided for each segment, the keyword search process is performed in a distributed manner for each segment. In this case, the network resources used are limited to the target range of the segment. Further, when the target range of the segment is only within one computer, the network resource can be used without using the network. Network load due to collection can be reduced.
[0011]
DETAILED DESCRIPTION OF THE INVENTION
DESCRIPTION OF EXEMPLARY EMBODIMENTS Hereinafter, an embodiment of the invention will be described with reference to the drawings.
FIG. 1 is a block diagram showing a schematic configuration of a document search system. In the system, a plurality of segments Sa to Sd are connected, and each segment Sa to Sd can exchange data with each other. Specifically, the segment Sa is connected to the segments Sb, Sc, and Sd. The segments Sb, Sc, Sd are connected to other segments (not shown) in addition to the segment Sa. The segments Sb to Sd are set as neighboring segments with respect to the segment Sa.
[0012]
Each segment Sa to Sd is set as an arbitrary section on a computer system having a large number of digitized documents. The segments Sa to Sd can be set, for example, for each directory in a single computer, for each computer connected to a local network, for each domain on a wide area network, or for each administrative department in an in-house organization. However, when constructing a practical document search system, it is preferable to classify each document in the segment so that it is a related document.
[0013]
The segment Sa includes a local index table LIa, a global index table GIa, and arm index tables AIa-b, AIa-c, and AIa-d. Similarly, the neighboring segments Sb to Sd of the segment Sa are also local index tables LIb to LId, global index tables GIb to GId, arm index tables AIb-a, AIb-x, AIc-a, AIc-x, AId-a. , AId-x.
[0014]
The local index tables LIa to LId are lists (lists) that indicate keywords and document location information (addresses) in association with each other. When the local index tables LIa to LId are generated, a plurality of documents stored in the segments Sa to Sd are collected, and a keyword is extracted from each document. For each keyword, the keyword and index information indicating the location of the document are sequentially registered in the local index tables LIa to LId. Note that collection of documents and generation of the local index tables LIa to LId are periodically performed on a segment basis.
[0015]
Here, in order to extract a keyword from a document in a language such as Japanese that is written without dividing a word, it is necessary to divide the document into words (sentences). Therefore, first, the document is divided into a plurality of words (phrases) by a known morphological analysis technique. Then, a desired word among the divided words is extracted as a keyword and registered in the index tables LIa to LId. Examples of the morphological analysis include “tea bowl” of the Japanese morphological analysis system released from Nara Institute of Science and Technology Graduate School of Natural Language Processing.
[0016]
In addition, for each divided word, as for which word is registered in the index table as a keyword, the following method can be cited.
(A) A method of registering a word related to a keyword as a keyword based on a component such as a document title using a document data format.
(B) A method of preparing a dictionary of keywords that are considered to be significant in advance and registering only matching words as keywords.
(C) Contrary to (b), a method of preparing a dictionary of frequently used words such as conjunctions and meaningless words as keywords, and excluding words that match them from the keywords.
(D) A method of registering, as a keyword, a word for which an evaluation equal to or higher than a reference value is obtained by statistical evaluation such as appearance frequency in the collection range.
(E) A method combining the above (a) to (d).
[0017]
Examples of index information (location information) indicating the location of a document include a file system path, URL (Uniform Resource Locator), URC (Uniform Resource Characteristic), etc. in UNIX (R), Windows (R), and the like. Note that the local index tables LIa to LId are index tables generated only for documents in the segments Sa to Sd.
[0018]
The arm index tables AIa-b, AIa-c, AIa-d, AIb-a, AIb-x, AIc-a, AIc-x, AId-a, and AId-x generate global index tables GIa to GId. This is an index table exchanged between neighboring segments. These arm index tables AIa-b, AIa-c, AIa-d, AIb-a, AIb-x, AIc-a, AIc-x, AId-a, and AId-x are registered in the local index tables LIa to LId. It is generated based on the keyword.
[0019]
The arm index tables AIa-b, AIa-c, AIa-d, AIb-a, AIb-x, AIc-a, AIc-x, AId-a, AId-x are included in the segments Sa to Sd. The number of neighboring segments is generated. Specifically, since the segment Sa has neighboring segments Sb to Sc, three arm index tables AIa-b, AIa-c, and AIa-d for exchanging with the neighboring segments Sb to Sc are provided. Generated. In addition, arm index tables AIb-a, AIc-a, and AId-a are also generated on the neighboring segments Sb to Sd side to be exchanged. That is, the arm index tables AIa-b, AIa-c, AIa-d and the arm index tables AIb-a, AIc-a, AId-a are the arms (connection portions) that connect the segment Sa and the neighboring segments Sb to Sd. Made at both ends.
[0020]
The arm index tables AIa-b, AIa-c, AIa-d, AIb-a, AIb-x, AIc-a, AIc-x, AId-a and AId-x are keyword lists (lists). It is divided into what is called the “branch arm index table” and what is called this “branch arm index table”. Specifically, the arm index table AIa-b for the neighboring segment Sb generated by the index update process in the segment Sa is referred to as a cluster arm index table. On the other hand, the arm index table AIb-a for this segment Sa generated by the index update process in the neighboring segment Sb is referred to as this bank arm index table. Similarly, the arm index tables AIa-c and AIa-d for the neighboring segments Sc and Sd of the segment Sa are referred to as the cross-arm index table, and the arm index tables AIc-a and AId-a for the segment Sa are referred to as the cross-arm index table. .
[0021]
The cluster index index AIa-b for the neighboring segment Sb is the sum of the list of keywords existing in the local index table LIa and the bank index tables AIc-a and AId-a other than the bank index table AIb-a in the segment Sb. Is generated as The bank index table AIa-c is generated as the sum of the list of keywords existing in the bank index table AIb-a, AId-a and the local index table LIa. The bank index table AIa-d is generated as the sum of the keyword lists existing in the bank index tables AIb-a and AIc-a and the local index table LIa. In other words, from the sum of the segment's own keyword list (local index table keyword list) and the neighboring segment keyword list (this bank arm index table), only the neighboring segment keyword list to be exchanged is excluded. The cross-arm index table is generated.
[0022]
The cluster arm index tables AIa-b, AIa-c, and AIa-d are arm index tables that are generated on the segment Sa side for each of the neighboring segments Sb to Sd, and are viewed from the neighboring segment Sb to Sd to be exchanged. In this case, this bank arm index table is used. Then, on the neighboring segments Sb to Sd side, as above, this bank arm is the sum of the keyword lists existing in this bank arm index table AIa-b, AIa-c, AIa-d, etc. and the local index tables LIlb to LId. Index tables AIb-a, AIc-a, AId-a, etc. are generated.
[0023]
In this way, when generating a pier arm index table, the pier arm index table for the other segment generated in one segment is connected to the other segment via the one segment and the one segment. A list of keywords with other segments. For example, the cluster index table AIa-b for the neighboring segment Sb in the segment Sa is the local index table LIa, LIc, in each segment Sc, Sd, etc. connected to the segment Sb via the segment Sa. It becomes a list of keywords such as LId. Conversely, the cluster index index AIb-a for the neighboring segment Sa in the segment Sb is the local index table LIb in the segment Sb and each segment (not shown) connected to the segment Sa via the segment Sb. It becomes a list of keywords such as.
[0024]
The global index table GIa in the segment Sa is obtained by adding the contents of this bank arm index table AIb-a, AIc-a, AId-a to the local index table LIa. Similarly, the global index tables GIb to GId of the segments Sb to Sd are obtained by adding contents such as this bank index table AIa-b, AIa-c, AIa-d to the local index tables LIb to LId.
[0025]
In the global index tables GIa to GId, in addition to the information of the local index tables LIa to LId (index information indicating the location of documents in the segments Sa to Sd), neighboring segment identification codes are registered. In the global index tables GIa to GId, the neighboring segment identification code is registered as information indicating that the neighboring segment holds the corresponding document of the keyword. For example, for the global index table GIa in the segment Sa, when registering a keyword of the adjacent arm index table AIa-b of the neighboring segment Sb, the identification code of the neighboring segment Sb is registered together with the keyword. When searching for keywords, a search result list is generated based on the global index tables GIa to GId.
[0026]
Specifically, for example, the specified keyword is searched from the global index table GIa, and if the location of the corresponding document is in the segment Sa, the location information of the document is added to the search result list as it is. Further, if the identification code of the neighboring segment Sb is registered in the global index table GIa, the designated keyword is similarly searched for the global index table GIb of the neighboring segment Sb, and the location information of the corresponding document is searched. Added to the result list. In this way, keyword search processing is repeatedly performed by tracing between neighboring segments based on information in the global index table. When the number of segments passed through reaches the designated upper limit, the keyword search process is performed until the segment indicated by the neighboring segment identification code is the segment where the search is started.
[0027]
FIG. 2 is a schematic configuration diagram of the segment Sa. The segment Sa is configured by the personal computer 11 in this embodiment. The segment Sa may be composed of a computer used for general purposes such as a workstation.
[0028]
Therefore, the text search system is constructed by linking a plurality of personal computers 11. Since the other segments Sb to Sd in FIG. 1 are configured in the same manner as the segment Sa, the drawings and description are omitted.
[0029]
The personal computer 11 includes an operation input unit 12, a display unit 13, a control unit 14, a storage unit 15, and the like. The operation input unit 12 includes a keyboard, a mouse device, and the like, and is used for requests and instructions from the user such as starting a document search program and inputting keywords. The display unit 13 is composed of, for example, a CRT, LCD, PDP or the like, and is used for displaying a keyword input screen, displaying search results, and the like.
[0030]
The control unit 14 includes a known central processing unit (CPU) that controls the personal computer 11 in an integrated manner, a recording device that stores a document search program, and the like. The CPU in the control unit 14 functions as the local index creation means 21, the cluster index creation means 22, the global index creation means 23, the keyword search means 24, and the inquiry issue means 25 by executing the document search program.
[0031]
The storage unit 15 is configured by a magnetic disk device (hard disk) built in the personal computer 11. The storage unit 15 includes a local file system 31, a segment definition table 32, a document table 33, a local index table 34, a neighboring segment table 35, a global index table 36, a bank arm index table 37, a bank arm index table 38, and an inquiry history list 39. The search result list 40 and the like are included. As the storage unit 15, an optical disk device or a magneto-optical disk device may be used. Of course, a disk device externally attached to the personal computer 11 may be used, and the storage unit 15 may be constituted by a plurality of disk devices.
[0032]
The local file system 31 is provided with a plurality of directories as document management areas, and various electronic documents are stored in the respective directories according to the document contents. The segment definition table 32 is a list of path character strings indicating the locations of directories to be searched in the local file system 31.
[0033]
First, a method for creating the local index table 34 will be described.
The local index creation means 21 sequentially extracts path character strings from the segment definition table 32 and recognizes the path of the directory to be searched on the local file system 31. Then, the local index creating means 21 creates a document table 33 by collecting the documents stored in each directory starting from these paths. When a lower directory exists in the directory indicated by the path character string in the segment definition table 32, the path character string becomes a starting path, and the file management information in the local file system 31 is used to store the lower directory. Documents can also be collected.
[0034]
The document table 33 is a table created in order to avoid redundancy when a path to the same document is handled as a character string, and corresponds to a path (document path character string) indicating the location of the document and the document. It is a list (list) composed of document codes (a sequence of integer values) to be executed. Simply, a list of document path character strings may be used, and a numerical value indicating the order on the list may be used as the document code. The document path character string uses a path expression in the local file system 31 of the personal computer 11.
[0035]
For each document collected based on the segment definition table 32, the local index creation means 21 uses the above-described Japanese morphological analysis system to extract words and determine parts of speech, and then extracts proper nouns as keywords. To do. Then, the local index table 34 is created based on the extracted keywords.
[0036]
As shown in FIG. 3, the local index table 34 includes a character string of a keyword KW (KW1, KW2,...) And a document code CD (CD1, CD2,..., CDx) of a document including the keyword as constituent elements. It is a table to do. When the local index table 34 is created, each keyword KW is searched for the presence / absence of an entry in the local index table 34. If there is an entry, the document code CD is added to the keyword KW on the table 34. If there is no entry, an entry for the keyword KW is newly added, and the document code CD for the keyword KW is written. The addition of the keyword KW and the document code CD is repeatedly performed for all keywords KW extracted from each document. As a result, a local index table 34 for each document within the range defined in the segment definition table 32 is created. A binary tree method or the like is known as a method for searching for an arbitrary keyword KW at high speed from the local index table 34.
[0037]
The neighboring segment table 35 is a setting of a segment to be a neighboring segment, and includes a list of network addresses of the personal computer 11 set as the neighboring segment and a neighboring segment identification code associated with the personal computer 11. Here, the personal computers 11 that store related documents are set as neighboring segments. The neighborhood segment identification code is a code for identifying the neighborhood segment. In this embodiment, a certain area of the code used in the document code CD is reserved and assigned.
[0038]
Next, a method for creating an arm index table will be described.
The pier arm index creation means 22 generates a pier arm index table 38 of each neighboring segment based on the local index table 34, the neighboring segment table 35, and the pier arm index table 37.
[0039]
Specifically, the hindrest index creation means 22 recognizes the other personal computer 11 set as the neighboring segment based on the neighboring segment table 35, receives this fellow arm index table 37 from each personal computer 11, and stores it in the storage unit 15. Store. Further, the cluster index creation means 22 extracts keywords existing in the local index table 34 and creates a list of the keywords. Furthermore, the bank index creation means 22 adds the contents of the other arm index table 37 other than the arm index table 37 received from the neighboring segment to be exchanged for the arm index table to the keyword KW list. As a result, the cross-arm index table 38 for the neighboring segment to be exchanged is generated. The forearm index table 38 is generated for each personal computer 11 set as a neighboring segment in the neighboring segment table 35.
[0040]
The cluster index table 38 is an arm index table created for each neighboring segment, and when viewed from the neighboring segment side to be exchanged, this bank index table 37. That is, when the arm index table is created on the neighboring segment side, the bank arm index table 38 is transferred to the neighboring segment, and the arm index table 38 is used as this bank arm index table in the neighboring segment.
[0041]
Next, a method for creating the global index table 36 will be described.
The global index creation means 23 generates a global index table 36 based on the document table 33, the local index table 34, the neighboring segment table 35, and the bank arm index table 37. As shown in FIG. 4, the global index table 36 is input with the neighborhood segment identification code CS (CSb, CSd, CSc,...) In addition to the contents of the local index table 34 of FIG. That is, in the global index table 36, the document stored on the local file system 31 is indicated by the document code CD, while the document stored in the neighboring segment is not the document code CD but the neighboring segment identification. Its location is indicated by the code CS.
[0042]
Specifically, the global index creating means 23 first copies the contents of the local index table 34 to the global index table 36 as they are. Thereafter, the global index creation means 23 sequentially extracts the keyword KW for each bank arm index table 37. Here, for each keyword KW, the presence / absence of an entry for the keyword KW in the global index table 36 is searched. If there is an entry, the neighboring segment identification code CS is added to the keyword KW on the table 36. If there is no entry, an entry for the keyword KW is newly added, and the neighboring segment identification code CS for the keyword KW is written. The addition of the keyword KW and the neighboring segment identification code CS is repeatedly performed for all the keywords KW in each of the bank arm index tables 37 received from the neighboring segment. Thereby, the global index table 36 is created.
[0043]
Here, when there are a plurality of documents including the predetermined keyword KW on the local file system 31, a plurality of document codes CD indicating the locations of the documents are registered in the global index table 36 as index information of the keyword KW. . When there is a document including the predetermined keyword KW in the neighboring segment, the identification code CS of the neighboring segment is registered in the global index table 36. If there are a plurality of corresponding documents of the keyword KW in the neighboring segment, the same neighboring segment identification code CS is not registered in the global index table 36 for the keyword KW.
[0044]
Next, a search method for the keyword KW will be described.
In the document search system of this embodiment, a personal computer 11 (starting segment) for starting a keyword search is designated, and after the keyword search in the personal computer 11 is performed, the keyword is traced to other personal computers 11 set as neighboring segments. A search is performed. In this document search system, a keyword search range is designated by an integer (number of search stages) indicating the upper limit of the number of neighboring segments (personal computers 11) that pass through.
[0045]
More specifically, the inquiry issuing means 25 causes the display unit 13 to display an input screen that prompts input of the inquiry destination personal computer 11, the keyword KW, and the number of search stages. Then, the operation input unit 12 is operated by the user, and input data such as the personal computer 11 to be inquired, the search keyword KW, and the number of search stages is input to the inquiry issuing unit 25 by the operation. The inquiry issuing means 25 automatically generates an inquiry identification code from these input data and issues the input data together with the inquiry identification code to the inquiry destination personal computer 11. The inquiry identification code is an identification code composed of the address of the personal computer 11 serving as the host that issued the inquiry and the issue time. Accordingly, when an inquiry for the same keyword KW is issued at another time, a different inquiry identification code is generated.
[0046]
The keyword search means 24 starts keyword search in response to an inquiry from the inquiry issuing means 25 (keyword KW, number of search stages, inquiry identification code). The keyword search means always stands by in preparation for inquiries by communication from other personal computers 11, and starts searching even when inquiries from the keyword search means 24 or inquiry issuing means 25 of other personal computers 11 are received. .
[0047]
The keyword search means 24 first refers to the inquiry history list 39. The inquiry history list 39 is a list of identification codes of inquiries processed in the past. When the inquiry is a new inquiry, the inquiry identification code is added to the inquiry history list. The keyword search means 24 searches the global index table 36 for the designated keyword KW. Here, when the corresponding document of the keyword KW exists in the local file system 31 (when the document code CD is found), the local path of the document and its own network address are referred to by referring to the document table 33. It adds to the search result list 40.
[0048]
The search result list 40 includes a network address of the personal computer 11 where the document exists and a path character string indicating the location of the document. The keyword searching means 24 recognizes the personal computer (host personal computer) 11 that issued the inquiry based on the inquiry identification code after registering the path character strings related to all corresponding documents existing in the local file system 31 in the search result list 40. The search result list 40 is returned to the personal computer 11.
[0049]
When the neighboring segment identification code CS is found in the global index table 36, the keyword search means 24 issues an inquiry to the corresponding neighboring segment (personal computer 11). At this time, the search stage number is decremented by 1, and the search stage number is sent together with the inquiry identification code and the keyword KW.
[0050]
The keyword search means 24 in the neighboring segment (personal computer 11) searches the specified keyword KW from the global index table 36 in the same manner as described above. Then, after generating the search result list 40 for the corresponding document on the local file system 31, the search result list 40 is returned to the personal computer 11 that issued the inquiry.
[0051]
When the number of search stages included in the inquiry is 0, the keyword search means 24 does not make an inquiry to the neighboring segment (personal computer 11) and ignores the neighboring segment identification code CS in the global index table 36. Also in this case, the keyword search means 24 searches the location of the corresponding document in the local file system 31 based on the global index table 36 and generates the search result list 40, and then the personal computer that issued the inquiry to the list 40 Reply to 11.
[0052]
If the same inquiry has already been processed, an inquiry identification code is registered in the inquiry history list 39. Therefore, the keyword search unit 24 returns an empty search result list 40 to the host personal computer 11 without performing the search process based on the global index table 36 described above. Depending on the setting of the neighboring segment (personal computer 11) in the document search system, the same inquiry may be repeatedly issued to a predetermined segment when passing through the neighboring segment. Therefore, by recognizing the same inquiry using the inquiry identification code and returning an empty search result list 40, it is possible to avoid the keyword search unit 24 from performing the search process redundantly.
[0053]
When the host personal computer 11 that has issued the inquiry receives the search result list 40, the inquiry issuing means 25 causes the display unit 13 to display a display screen relating to the search result list 40. Further, the inquiry issuing unit 25 receives the designated document according to the user's selection, and causes the display unit 13 to display the document.
[0054]
As described above, according to the above embodiment, the following effects can be obtained.
(1) The search keyword KW, the start segment (PC 11) for starting the search, and the segment search stage number are input, and each other set as a neighboring segment within the search range specified by the segment search stage number The keyword search is performed by tracing the personal computer 11. That is, by designating the start segment (the personal computer 11) and the search range, the search range of the index related to the search keyword KW can be adjusted appropriately. Therefore, irrelevant documents including the same keyword KW can be excluded from the search result list 40, and the corresponding document of the keyword KW can be searched accurately.
[0055]
(2) In the global index table 36, the neighboring segment identification code CS indicating that the neighboring segment holds the corresponding document of the keyword KW is registered, and each of the neighboring segments set as the neighboring segment based on the neighboring segment identification code CS is registered. The keyword search is performed by tracing the other personal computer 11. Here, even if there are a plurality of documents including the keyword KW in the neighboring segment, only one neighboring segment identification code CS indicating the presence / absence of the document is registered in the global index table 36. In this way, even if the number of stored documents in the computer system increases, the global index table 36 can be prevented from being enlarged.
[0056]
(3) The inquiry issuing means 25 issues an inquiry identification code based on the input information, and the identification code is registered in the inquiry history list 39 when searching for a keyword related to the inquiry identification code. Then, by referring to the history list 39, it is possible to avoid duplicate keyword searches.
[0057]
(4) For each personal computer 11 set as a segment, a storage unit 15, a local index creation means 21, a cluster index creation means 22, a global index creation means 23, a keyword search means 24, an inquiry issue means 25, and the like are provided. Yes. In this case, update processing of each index table 34, 36, 38 and keyword search processing are performed in a distributed manner in each personal computer 11, so that the load on the network connecting the personal computers 11 can be reduced, which is practically preferable. It becomes.
[0058]
(5) In the present embodiment, a document table 33 including a path indicating the location of a document and a document code CD corresponding to the document is created, and the local index table 34 or the like is created using the document code CD in the document table 33. A global index table 36 is created. In this way, it is possible to avoid redundancy when handling paths to the same document as character strings. Therefore, an increase in the storage area required for the index tables 34 and 36 can be suppressed.
[0059]
The above embodiment can be modified as follows.
In the above embodiment, the personal computer 11 is embodied as one segment, but an appropriate structural unit (such as a subnet or a domain) on the network may be embodied as one segment. In this case, the local index creation means 21, the cluster index creation means 22, the global index creation means 23, the keyword search means 24, and the like are arranged in units of subnets. Processing such as exchanging arm index tables and inquiries to neighboring segments is performed by communication between servers. In this way, index creation processing and keyword search processing can be distributed on a subnet or domain basis. In this case, since documents can be collected without using a network connecting subnets, it is possible to reduce the network load due to the collection of the documents. Furthermore, when a new subnet connected to the network is newly added, a server for the subnet may be added, which is practically preferable.
[0060]
Further, a configuration may be adopted in which one personal computer handles a plurality of segments. Furthermore, documents in one personal computer may belong to different segments. The configuration of the segment and the segment processing means can be arbitrarily set depending on the processing efficiency and management convenience.
[0061]
-In above-mentioned embodiment, although the inquiry issuing means 25 is the structure which each personal computer 11 has, it is not limited to this. That is, it is not necessary to provide in all the personal computers 11 which comprise a document search system, and it should just provide in at least one of those personal computers 11. Further, in addition to the personal computer 11 having the local index creation means 21, the cluster index creation means 22, the global index creation means 23, and the keyword search means 24, a personal computer having only the inquiry issuing means 25 may be provided in the document search system. .
[0062]
In the first embodiment, the document table 33 including the path indicating the location of the document and the document code CD corresponding to the document is created, and the local index table 34 and the global index are created using the document code CD in the document table 33. Although the index table 36 is created, the present invention is not limited to this. The local index table 34 and the global index table 36 may be created using a path (document path character string) indicating the location of the document without using the document code CD of the document table 33.
[0063]
The various embodiments described above can be summarized as follows.
(Supplementary note 1) A document retrieval method for retrieving documents stored on a computer system,
A segment is set as a document management unit in the computer system,
The segment includes an index table that records a keyword, location information of a corresponding document including the keyword stored in the segment, and neighboring segment holding information indicating that a neighboring segment holds the corresponding document,
In the computer system, based on a search keyword, a start segment for starting a search, and a search stage number of a segment starting from the start segment, the index for a segment within a search range specified by the search stage number A document search method comprising: performing a keyword search based on neighboring segment possession information of a table, extracting location information of a corresponding document of a keyword from the index table of each segment, and generating a search result list.
(Additional remark 2) The computer which comprises the said computer system is
Documents are collected in segment units, and a local index table is created in which keywords existing in the documents and location information of the corresponding documents including the keywords are recorded.
For the keyword list existing in the local index table, the arm index table is created by excluding only the keyword list of the segment to be exchanged from the sum of the keyword list of the segment itself and the keyword list of the neighboring segments. make,
The arm index table is exchanged with the segment to be exchanged, and based on the exchanged arm index table and the local index table, the keyword and the location information of the corresponding document stored in the segment, , Create a global index table that records the holding information of the exchange partner's segment,
The document search method according to claim 1, wherein the search result list is generated by extracting location information of the corresponding document including the search keyword by referring to the global index table.
(Supplementary note 3) The supplementary note 2 is characterized in that a server is provided for each segment in the computer system, and the exchange of the arm index table and the search for keywords in other segments are performed by communication between servers. The document search method described.
(Supplementary note 4) A document table composed of a path indicating the location of the document and a document code corresponding to the document is created, and the index table is created using the document code in the document table. The document search method according to any one of appendices 1 to 3, which is characterized.
(Supplementary note 5) Any one of Supplementary notes 1 to 4, wherein an inquiry identification code corresponding to the input information is issued and the identification code is registered in the inquiry history list when searching for a keyword related to the inquiry identification code. The document search method described.
(Appendix 6) A document search program for searching a document stored on a computer system,
On the computer,
Set a segment as a management unit for the document, collect documents in the segment unit, keywords, location information of the document including the keyword stored in the segment, and neighboring segments own the document Means for creating an index table recording neighboring segment possession information indicating that
Means for taking as input information a search keyword, a start segment for starting the search, and the number of search stages of the segment starting from the start segment;
A keyword search based on neighboring segment holding information of the index table is performed on a segment within the search range specified by the search stage number, and the location information of the corresponding document of the search keyword is extracted from the index table of each segment. Act as a means to generate a search result list
Document search program characterized by
(Supplementary note 7) A document retrieval system for retrieving documents stored on a computer system,
A table for setting a segment as a document management unit in the computer system is stored in the storage means,
The segment includes an index table that records a keyword, location information of a corresponding document including the keyword stored in the segment, and neighboring segment holding information indicating that a neighboring segment holds the corresponding document,
Inquiry issuing means for making an inquiry using the search keyword, the start segment for starting the search, and the number of search stages of the segment starting from the start segment;
Keyword search means for referring to the index table and extracting the location information of the corresponding document of the search keyword to generate a search result list;
And a keyword search is performed by tracing each other segment based on neighboring segment holding information of the index table with respect to a segment within a search range specified by the number of search stages. .
(Supplementary note 8) A local index creating means for collecting documents in segment units and creating a local index table in which keywords existing in the documents and location information of the corresponding documents including the keywords are recorded;
For the keyword list existing in the local index table, the arm index table is created by excluding only the keyword list of the segment to be exchanged from the sum of the keyword list of the segment itself and the keyword list of the neighboring segments. Arm index creation means to create,
Based on the arm index table obtained by exchange between segments and the local index table, the keyword, the location information of the corresponding document stored in the segment, and the possession information of the exchange partner segment are recorded. Global index creation means to create a global index table
The document search system according to appendix 7, further comprising:
[0064]
【The invention's effect】
As described above in detail, according to the present invention, it is possible to provide a document search method, a document search program, and a document search system that can accurately perform keyword search.
[Brief description of the drawings]
FIG. 1 is a schematic configuration diagram illustrating a document search system according to an embodiment.
FIG. 2 is a schematic configuration diagram of a personal computer constituting the document search system.
FIG. 3 is an explanatory diagram of a local index table.
FIG. 4 is an explanatory diagram of a global index table.
[Explanation of symbols]
11 PC as a segment
15 Storage unit as storage means
21 Local index creation means
22 Cluster index creation means as arm index creation means
23 Global index creation means
24 Keyword search means
25 Inquiry issuing means
32 segment definition table
34 Local index table
35 Neighborhood segment table
36 Global Index Table
37 Kokonishi Arm Index Table
38 Higan Index Table
39 Inquiry History List
40 Search result list
CS identification code
KW keyword
Sa, Sb, Sc, Sd segments
GIa, GIb, GIc, GId Global index table
LIa, LIb, LIc, LId Local index table
AIa-b, AIa-c, AIa-d, AIb-a, AIb-x, AIc-a, AIc-x, AId-a, AId-x Arm index table

Claims

A document search method for searching a document stored on a computer system, comprising:
A segment is set as a document management unit in the computer system,
The segment includes an index table that records a keyword, location information of a corresponding document including the keyword stored in the segment, and neighboring segment holding information indicating that a neighboring segment holds the corresponding document,
In the computer system, based on a search keyword, a start segment for starting a search, and a search stage number of a segment starting from the start segment, the index for a segment within a search range specified by the search stage number A document search method comprising: performing a keyword search based on neighboring segment possession information of a table, extracting location information of a document corresponding to a search keyword from the index table of each segment, and generating a search result list.

The computer constituting the computer system is:
Documents are collected in segment units, and a local index table is created in which keywords existing in the documents and location information of the corresponding documents including the keywords are recorded.
For the keyword list existing in the local index table, the arm index table is created by excluding only the keyword list of the segment to be exchanged from the sum of the keyword list of the segment itself and the keyword list of the neighboring segments. make,
The arm index table is exchanged with the segment to be exchanged, and based on the exchanged arm index table and the local index table, the keyword and the location information of the corresponding document stored in the segment, , Create a global index table that records the holding information of the exchange partner's segment,
The document search method according to claim 1, wherein the search result list is generated by extracting location information of a corresponding document including the search keyword by referring to the global index table.

3. The document according to claim 2, wherein a server is provided for each segment in the computer system, and exchange of the arm index table and keyword search in another segment are performed by communication between the servers. retrieval method.

A document search program for searching a document stored on a computer system,
On the computer,
Set a segment as a management unit for the document, collect documents in the segment unit, keywords, location information of the document including the keyword stored in the segment, and neighboring segments own the document Means for creating an index table recording neighboring segment possession information indicating that
Means for taking as input information a search keyword, a start segment for starting the search, and the number of search stages of the segment starting from the start segment;
A keyword search based on neighboring segment holding information of the index table is performed on a segment within the search range specified by the search stage number, and the location information of the corresponding document of the search keyword is extracted from the index table of each segment. A document search program which functions as means for generating a search result list.

A document retrieval system for retrieving documents stored on a computer system,
A table for setting a segment as a document management unit in the computer system is stored in the storage means,
The segment includes an index table that records a keyword, location information of a corresponding document including the keyword stored in the segment, and neighboring segment holding information indicating that a neighboring segment holds the corresponding document,
Inquiry issuing means for making an inquiry using the search keyword, the start segment for starting the search, and the number of search stages of the segment starting from the start segment;
Keyword search means for referring to the index table and extracting the location information of the corresponding document of the search keyword to generate a search result list;
And a keyword search is performed by tracing each other segment based on neighboring segment holding information of the index table with respect to a segment within a search range specified by the number of search stages. .