JP2005018217A

JP2005018217A - Retrieval server device and retrival method

Info

Publication number: JP2005018217A
Application number: JP2003179447A
Authority: JP
Inventors: Hiromitsu Sumino; 宏光角野; Eiji Komata; 栄治小俣; Tsuyoshi Kato; 剛志加藤; Norihiro Ishikawa; 憲洋石川
Original assignee: NTT Docomo Inc
Current assignee: NTT Docomo Inc
Priority date: 2003-06-24
Filing date: 2003-06-24
Publication date: 2005-01-20

Abstract

<P>PROBLEM TO BE SOLVED: To provide a distributed retrieval system for efficiently selecting the optimal node which stores retrieval object data to be retrieved in order to reduce traffic necessary for retrieval, and to reduce a communication load. <P>SOLUTION: This retrieval server device is provided with: a retrieval object database 361 for storing retrieval object data; a retrieval result history table 362 for storing selection data obtained by making a server identifier correspond to the evaluation data of another retrieval server device; a judging part 402 for retrieving the retrieval object database based on a retrieval request including a retrieval condition, and for transmitting the retrieval result and its own server identifier to a retrieval request origin when the retrieval object is detected, and for transmitting one part or whole part of the selection data associated with the retrieval request and the retrieval condition to another retrieval server device based on the selection data when the retrieval object is not detected; and a history updating part 408 for updating the selection data stored by its own device based on the selection data acquired from another retrieval server device. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、通信ネットワーク上に分散配置され、相互に接続された検索サーバ装置によって検索を行うための検索サーバ装置及び検索方法に関する。
【０００２】
【従来の技術】
従来より、分散配置された複数の検索サーバ装置を、通信回線を介して相互に接続し、各検索サーバ装置が連携してキーワード検索を行う分散型検索システムが知られている（例えば、非特許文献１、非特許文献２参照）。
【０００３】
このような従来の分散型検索システムにおいて、検索を要求された検索サーバ装置に該当するデータが存在しない場合には、当該検索サーバ装置が他の検索サーバ装置を選択し、この選択した検索サーバ装置に対して、検索要求を順次転送する処理を、該当するデータが見つかるまで、或いは他の検索サーバ装置が尽きるまで繰り返す方式がある。このような分散型検索システムについて、図１０を参照して説明する。
【０００４】
図１０に示すように、ノードＡ，Ｂ，Ｃ及びＤは、それぞれ検索サーバ装置であり、それぞれ検索対象となる文書を表す検索対象データを記憶し、検索条件を含む検索要求データを受信した際には、この検索要求データにより指定されたキーワードと検索対象データで表される文書を構成する文字列とを比較することにより検索を行う。
【０００５】
また、上記従来の方式においてノードＡには、過去の検索結果の履歴を表す検索結果履歴データを格納した検索結果履歴テーブルが記憶されている。この検索結果履歴データは、各ノードで行われた検索毎に生成されたデータであり、キーワード、ノード及び評価値という項目を有し、当該検索に用いられた少なくとも一つのキーワードを表すデータと、当該ノードを示すサーバデータと、当該検索の結果に基づいて算出された評価値を示す評価データとから構成されている。この評価値は、ヒットした検索対象データの件数が多ければ大きくなり、少なければ小さくなり、各ノード毎に関連づけられて格納される。
【０００６】
例えば、ノードＡは、２つの検索キーワード「Ｘ」及び「Ｙ」が指定された検索要求データを受信すると、自身が記憶している検索対象データについてキーワード「Ｘ」及び「Ｙ」を用いたキーワード検索を行う（図中処理Ｐ１）。この検索において検索対象データがヒットしなかった場合には、ノードＡは、これらの検索キーワードを表すデータをキーワード項目として有する検索結果履歴データを検索結果履歴テーブルから選択する。この際、該当する検索結果履歴データが複数の場合には、評価値項目のデータで示される評価値が最大の検索結果履歴データが選択される。ここでは、図１０に示すように、ノードＤの評価値が最も大きいため、ノードＤを選択する。
【０００７】
次いで、ノードＡは、こうして選択した検索結果履歴データのノード項目のデータで示されるノードＤに対して、２つの検索キーワード「Ｘ」及び「Ｙ」を含む検索要求データを送信する（図中処理Ｐ２）。この検索要求データを受信したノードＤは、記憶している検索対象データについてキーワード「Ｘ」及び「Ｙ」を用いたキーワード検索を行う。この検索にて検索対象データがヒットすると、ノードＤは、ヒットした検索対象データの件数を示すデータや当該検索対象データを含んだ検索結果データをノードＡに送信し（処理Ｐ３）、ノードＡは、受信した検索結果データに基づいて、検索結果履歴テーブルを更新する（処理Ｐ４）。
【０００８】
また、他の分散型検索システムとしては、図１１に示すように、文字列と、この文字列が意味する内容を示す意味情報とを表すメタデータを用いた検索があり、この検索においては、メタデータに記述された情報に基づき検索結果履歴データを生成し、検索装置の選択に用いる方式がある。
【０００９】
【非特許文献１】
ＮＥＵＲＯＧＲＩＤＰ２ＰＳＥＡＲＣＨ［Ｐｈｉｌｏｓｏｐｈｙ２１−０８・０１ｖ０３］インターネット＜ＵＲＬ：ｈｔｔｐ：／／ｗｗｗ．ｎｅｕｒｏｇｒｉｄ．ｎｅｔ／ｐｈｐ／ｐｈｉ１ｏｐｙ．ｐｈｐ＞
【００１０】
【非特許文献２】
ＮｅｕｒｏＧｒｉｄＰ２Ｐサーチ分散検索インターネット＜ＵＲＬｈｔｔｐ：／／ｗｗｗ．ｊｎｕｔｅｌｌａ．ｏｒｇ／ｊｎｕｄｅｖ／ｊｗｓ＃ｓａｍ．ｌｚｈ＞
【００１１】
【発明が解決しようとする課題】
上述した従来のノード選択方法では、検索結果履歴テーブルの情報を基にして検索要求データの送信先ノードを選択することにより検索に必要なトラフィックを抑え、効率のよい検索を実現することを目的としている。
【００１２】
しかしながら、上述のようなノード選択方法においては、検索要求に該当するキーワードが検索結果履歴テーブルに含まれていない場合に、ノードの選択が行えないという問題がある。また、過去に多数の検索を行い、検索結果履歴テーブルの情報が増えるほど適切なノードを選択する可能性が高くなるという特性があるため、有効なノード選択を可能とする検索結果履歴テーブルが作成されるまでには、多数の検索を行う必要がある。
【００１３】
そこで、本発明は、以上の点に鑑みてなされたもので、分散型検索システムにおいて、検索されるべき検索対象データを保持している最適なノードを効率よく選択することにより、検索に必要なトラフィックを削減し、通信負荷を低減することのできる検索サーバ装置及び検索方法を提供することを目的とする。
【００１４】
【課題を解決するための手段】
上記課題を解決するために、本発明は、通信ネットワーク上に分散配置され、相互に接続される検索サーバ装置であって、検索の対象となる検索対象データを記憶する検索対象データベースと、他の検索サーバ装置を特定するサーバ識別子と、当該他の検索サーバ装置の評価データとを対応付けた選択データを記憶する選択データ記憶部と、検索条件を含む検索要求に基づいて、検索対象データベースを検索し、検索対象が検出された場合には、検索要求元に対して検索結果、自身のサーバ識別子を送信し、検索対象が検出されない場合には、選択データに基づいて、他の検索サーバ装置に対して検索要求、及び検索条件に関連する選択データの一部又は全部を送信する判断部と、他の検索サーバ装置から取得した選択データに基づいて、自機が保持する選択データを更新する更新部とを備えることを特徴とする。
【００１５】
なお、上記発明において、前記選択データの形式は、検索対象が文字列によるもの及び意味情報と文字列から構成されるメタデータの形式によるものとすることができる。
【００１６】
このような本発明によれば、各検索サーバ装置は、検索要求が受信され、自身が検索対象データを保持していない場合に検索要求を他の検索サーバ装置に転送する際、評価データに基づいて検索要求の転送先を選択するため、最適なノードを効率よく選択し、検索に必要なトラフィックを削減することができる。また、本発明では、検索要求を転送する際、自機が保持する選択データの一部又は全部を転送先に送信するため、転送先の検索サーバにおける検索処理の効率を図ることができる。
【００１７】
上記発明においては、更新部は、任意のタイミングにおいて、他の検索サーバ装置が保持する選択データを取得し、自身が保持する選択データと同期させる機能を備えることが好ましい。この場合には、検索サーバ装置間で選択データを交換して同期を図り、情報を共有することができ、より最適な転送先を選択することができる。なお、この発明では、前記選択データの同期は、検索ノードが最初に検索開始した時、検索ノードが必要に応じて任意の時に行うことができる。
【００１８】
上記発明においては、選択データ記憶部は、他の検索サーバから取得した評価値及びサーバ識別子の履歴を履歴テーブルとして格納する機能を有することが好ましい。この場合には、過去に行った検索結果を履歴として蓄積することにより、選択データの充実を図ることができる。
【００１９】
本発明は、通信ネットワーク上に分散配置され、相互に接続された検索サーバ装置によって検索を行う検索方法であって、検索の対象となる検索対象データを検索対象データベースに記憶しておくとともに、他の検索サーバ装置を特定するサーバ識別子と、当該他の検索サーバ装置の評価データとを対応付けた選択データを記憶するステップと、検索条件を含む検索要求に基づいて、検索対象データベースを検索し、検索対象が検出された場合には、検索要求元に対して検索結果、自身のサーバ識別子、及び検索条件に対する一致の程度を示す評価値を送信し、検索対象が検出されない場合には、選択データに基づいて、他の検索サーバ装置に対して検索要求を転送するステップとを有することを特徴とする。
【００２０】
このような本発明によれば、各検索サーバ装置は、検索要求が受信され、自身が検索対象データを保持していない場合に検索要求を他の検索サーバ装置に転送する際、転送経路上の検索サーバ装置の評価値に基づいて、検索要求の転送先を選択するため、最適なノードを効率よく選択し、検索に必要なトラフィックを削減することができる。
【００２１】
【発明の実施の形態】
以下、図面を参照して本発明の実施形態について説明する。
【００２２】
（構成）
図１は、実施形態に係る分散型検索システム１の構成を示すブロック図である。同図に示すように、この分散型検索システム１は、通信ネットワーク２０上に分散配置され相互に接続されたノード３０（３０Ａ〜３０Ｇ）によって検索を行うシステムであり、インターネット等の通信ネットワーク２０には、ユーザ端末１０が接続されている。
【００２３】
ノード３０は、検索サーバ装置であり、後述のメタデータデータベース及び検索結果履歴テーブルを用いて検索処理を行う機能を、一般的なＷＷＷ（ＷｏｒｌｄＷｉｄｅＷｅｂ）サーバ装置で実現するものである。また、ユーザ端末１０は、一般的なコンピュータ装置であり、ｗｗｗブラウザを起動し、ＨＴＴＰに従ったリクエストメッセージを通信ワーク２０へ送出するとともに、当該メッセージに対するレスポンスメッセージを受け取る。
【００２４】
図２は、各ノード３０のハードウェア構成を示すブロック図である。同図に示すように、ノード３０は、バス３３を介して接続されたＣＰＵ３１と、ハードディスク装置３６と、通信ネットワーク２０に対して通信を行う通信インターフェース３２とを備えている。ハードディスク装置３６は、分散型検索システム１を構成する全てのノードの通信アドレスや、後述する検索対象データベース３６１と、検索結果履歴テーブル３６２と、検索プログラム３６３とを記憶し、ＣＰＵ３１がハードディスク装置３６に記憶された検索プログラム３６３を読み出して実行する。
【００２５】
検索対象データベース３６１は、ＸＭＬ（ｅＸｔｅｎｓｉｂｌｅＭａｒｋｕｐＬａｎｇｕａｇｅ）を用いて作成された多数の検索対象データを格納している。各検索対象データは少なくとも一つのメタデータを有し、メタデータは、検索のキーワードとなる文字列（例えば”渋谷”）を表すキーワードデータと、この文字列の意味内容を示す意味情報（例えば文字列”地”）を表す意味データとを有する。
【００２６】
図３は、ＣＰＵ３１が検索プログラム３６３を実行することでノード３０上に構築される機能を示すブロック図である。なお、ここでは、ノード３０Ａから検索要求が転送されたノードＢを例に説明する。
【００２７】
同図に示すように、ノード３０Ｂには、検索プログラム３６３が実行されることによって、検索要求受信部４０１と、判断部４０２と、検索結果送信部４０３と、履歴参照部４０４と、検索要求送信部４０５と、要求内容保持部４０６と、検索結果受信部４０７と、履歴更新部４０８と、履歴テーブル受信部４０９と、履歴テーブル送信部４１０とが構築される。
【００２８】
検索要求受信部４０１は、通信インターフェース３２に接続されたモジュールであり、検索を要求する他の装置（ここでは他のノードＡ）から、通信ネットワーク２０を介して、送信されてきた自ノード３０Ｂ宛の検索要求データを受信し、受信された検索要求データは、要求内容保持部４０６及び判断部４０２に入力される。ここで、検索要求データは、検索条件を表す検索条件データと、要求元のノードＡの通信アドレスを示すアドレスデータとを含んでおり、検索条件データは、少なくとも一つの検索条件から構成されている。
【００２９】
要求内容保持部４０６は、検索要求受信部４０１から入力された検索要求データに応答する検索結果データを返信するまで、当該検索要求データを記憶する記憶装置であり、保持されたデータは、履歴更新部４０８に受け渡される。履歴テーブル受信部４０９は、検索要求データの送信元のノード３０Ａより検索要求データと共に送信される評価テーブル６１Ａを受信し、履歴更新部４０８を経由して検索結果履歴テーブル３６２の情報を更新するモジュールである。
【００３０】
検索結果送信部４０３は、判断部４０２の指示に従い、検索結果データを送信するモジュールである。検索結果データは要求元の装置の通信アドレスと抽出した検索対象データの所在を示すＵＲＬ（ＵｎｉｆｏｒｍＲｅｓｏｕｒｃｅＬｏｃａｔｏｒ）と抽出した検索対象データの件数と自ノード３０の通信アドレスとから構成されている。
【００３１】
判断部４０２は、検索条件を含む検索要求データに基づいて、検索対象データベース３６１を検索し、検索対象が検出された場合には、ノードＡに対して検索結果、自身のサーバ識別子、及び検索条件に対する一致の程度を示す評価値を送信し、検索対象が検出されない場合には、選択データである履歴テーブルに基づいて、他の検索ノードに対して検索要求を転送するモジュールである。
【００３２】
そして、判断部４０２における検索処理では、キーワードデータと意味データとを区別しつつ文字列を比較し、全ての検索条件と全ての検索対象データとの間で行うことにより実現される。この試行により少なくとも一つの検索対象データが抽出されると、判断部４０２は、検索結果を示す検索結果データを検索要求データの送信元の装置へ返信するよう検索結果送信部４０３に指示する。
【００３３】
また、判断部４０２は、検索要求データで表される検索要求に最適なノードの選択を履歴参照部４０４に要求する。さらに判断部４０２は、履歴参照部４０４によりノードが選択されると、当該ノードへ検索要求データを送信することを検索要求送信部４０５に指示する。ただし、判断部４０２は、検索要求データにおける要求元の装置の通信アドレスがいずれのノード３０の通信アドレスとも異なる場合には、検索要求データ内に含まれる要求元装置の通信アドレスを自ノード３０の通信アドレスに書き換えた後に、上記指示を行う。
【００３４】
判断部４０２は、ノードが選択されなかった場合、要求元の装置の通信アドレスと自ノード３０の通信アドレスとから構成される空の検索結果データを、検索要求データの送信元の装置へ返信するよう検索結果送信部４０３に指示する。また、判断部４０２は履歴参照部４０４より検索ノードの条件に対応した評価テーブル６１Ａ作成に必要な情報を取得して評価テーブル６１Ａを作成し、履歴テーブル送信部４１０に対して評価テーブル６１Ａの送信を指示する。上述の判断部４０２におけるノードの選択の際には、履歴参照部４０４により検索結果履歴テーブル３６２を参照する。
【００３５】
履歴参照部４０４は、判断部４０２から要求されると、検索要求データ中の検索メタデータと合致するメタデータに対応付けられた評価値データにより表される評価値を、ノードデータで表されるノード毎に加算して評価指標を算出し、算出した評価指標が最も大きいノードを最適なノードとして選択する。ただし、検索要求データ中の検索メタデータと合致するメタデータが検索結果履歴テーブル３６２に存在しない場台には、予め定められた優先順位に従ってノードを選択する。また、優先順位が自ノードより高いノードや自ノードの選択は禁止されている。
【００３６】
検索結果受信部４０７は、履歴参照部４０４により選択されたノードから検索要求データに対する検索結果データを受信し、この検索結果データにおける要求元の装置の通信アドレスと自ノード３０の通信アドレスとを比較する。両者が不一致の場合には、検索結果受信部４０７は当該検索結果データを検索要求データの送信元の装置へ返信するよう検索結果送信部４０３に指示し、要求内容保持部４０６は記憶した検索要求データを破棄する。一方、両者が一致した場合には、履歴更新部４０８は、当該検索結果データと検索要求データとに基づいて検索結果履歴データを作成し検索結果履歴テーブル３６２に追加格納するとともに、当該検索結果データを検索要求データの送信元の装置へ返信するよう検索結果送信部４０３に指示し、要求内容保持部４０６は検索要求データを破棄する。
【００３７】
図４は、検索結果履歴テーブル３６２の具体的なデータ構成例を示す説明図である。同図に示すように、検索結果履歴テーブル３６２には、後述の検索結果受信部４０７が他のノードから検索結果データを受信した際に作成される検索結果履歴データが格納されている。検索結果履歴データは、本実施形態では、検索条件データを構成するメタデータと、この検索条件データを用いた検索を当該他のノードで行った結果を書誌的に表す書誌データとが対応付けられたデータである。この書誌データは、検索結果データを作成したノードを識別するためのノードデータと、検索条件の意味内容を示す意味情報と、検索条件の文字列、及び当該検索にてヒットした検索対象データの件数を表す評価値データとから構成されている。
【００３８】
図５は、評価テーブルの具体的なデータ構成例を示す説明図である。同図に示すように、評価データは、検索結果データを作成したノードを識別するためのノードデータと、検索条件の意味内容を示す意味情報と、検索条件の文字列、及び当該検索にてヒットした検索対象データの件数を表す評価値データとから構成されている。
【００３９】
（動作）
次に、本実施形態の動作について説明する。なお、ここでは、検索結果履歴テーブル３６２には、過去の検索結果に基づいて作成された検索結果履歴データが予め格納されているものとする。
【００４０】
先ず、ユーザがユーザ端末１０を操作し、２つの検索条件（ここでは、文字列”新宿”に意味情報”地名”が対応付けられたメタデータと文字列”フレンチ”に意味情報”レストラン”が対応付けられたメタデータ）を入力したものとする。これにより、これらのメタデータで構成される検索条件データを内包した検索要求データが通信ネットワーク２０を介してノード３０Ａに送信される。
【００４１】
図６は、ノードＡにおけるノード選択処理の手順を示すフロー図である。同図に示すように、ノード３０Ａの検索要求受信部４０１により、検索要求データを受信する（ステップＳ５０１）。そして、検索要求受信部４０１は、判断部４０２に検索要求データを判断部４０２に入力する。
【００４２】
検索要求データが入力された判断部４０２は、検索条件データを有する検索対象データを検索対象データベース３６１から抽出する検索処理を行う（ステップＳ５０２）。すなわち、判断部４０２は、文字列と意味情報とを区別しつつ文字列を比較する処理を、全ての検索メタデータと全ての検索対象データとの間で行う。ここで、もし検索結果履歴テーブル３６２の情報が十分でなく、他のノードの検索結果履歴テーブル３６２を取得したい場合（ステップＳ５０３：Ｙｅｓ）には、他のノードから検索結果履歴テーブル３６２を取得する（ステップＳ５０４）。この検索結果履歴テーブルの取得は、図７に示すように、他のノード（図７においてはノード３０Ｂ及び３０Ｄ）に対して履歴テーブル要求を送信し、これに応じて送信された検索結果履歴テーブルを取得する。
【００４３】
一方、図６におけるステップＳ５０３において、検索結果履歴テーブル３６２が既に取得されている場合には（ステップＳ５０３：Ｎｏ）、評価テーブル６１Ａを検索条件送信元から受信しているか否かについて判断し（Ｓ５０５）、評価テーブル６１Ａを検索条件送信元から受信している場合には（ステップＳ５０５：Ｙｅｓ）、図８に示すように、この評価テーブル６１Ａに基づき自身の検索結果履歴テーブル３６２を更新する（ステップＳ５０６）。なお、ここでは、ノード３０Ａを前提としているため、評価テーブルは受信されない。
【００４４】
このようにして更新された検索結果履歴テーブル３６２を用いて、検索条件データに対応した最適なノードの選択を行う（Ｓ５０７）。その後、判断部４０２では、図９に示すように、次に検索要求を転送するノード向けの評価テーブル６１Ａを作成し（Ｓ５０８）、図選択したノードに対してこの評価テーブル６１Ａと検索要求を共に送信する（Ｓ５０９）。
【００４５】
ここで、履歴参照部４０４が、検索結果履歴テーブル３６２から最適なノードを選択する方法について説明する。履歴参照部４０４は、メタデータ（「意味情報」＝”地名”、「文字列」＝”新宿”）の評価値と、メタデータ（「意味情報」＝”レストラン”、「文字列」＝”フレンチ”）の評価値とをノード毎に加算して評価指標を算出し、算出した評価指標が最も大きいノードを最適なノードとして選択する。
【００４６】
例えば、図４に示すノード３０Ｂにおいて、（「意味情報」＝”地名”「文字列」＝”新宿”）に対する検索結果履歴の評価値は”６０”である。また、（「意味情報」＝”レストラン「文字列」＝”フレンチ”）である履歴情報は格納されていないため、ノード３０Ｂの評価指標は６０＋０＝６０となる。同様に、ノード３０Ｃについては、（「意味情報」＝”地名”、「文字列」＝”新宿”）である履歴データは１件格納されており、その評価値は”２０”である。
【００４７】
また、（「意味情報」＝”レストラン”、「文字列」＝”フレンチ”）である履歴データも１件格納されており、その評価値は”２０”である。従って、ノード３０Ｃの評価指標は２０＋２０＝４０となる。ノード３０Ｄについては、（「意味情報」＝”地名”、「文字列」＝”新宿”）である履歴データの評価値は”２０”であり、（「意味情報」＝”レストラン”、「文字列」＝”フレンチ”）であるメタデータは格納されていないため、ノード３０Ｄの評価指標は３０＋０＝３０である。
【００４８】
以上より、ノード３０Ｂが最も大きな評価値となるため、履歴参照部４０４は、ノード３０Ｂを最適なノードとして選択し、このノード３０Ｂが最適なノードであるという情報を検索要求送信部４０５に送信する。検索要求送信部４０５は、受信した情報を基に、ノード３０Ｂに検索条件データと要求元のノード３０Ａの通信アドレスを示すデータを有する検索要求データを送信する（図９のステップＳ５０９）。
【００４９】
このように各ノードにおいて適切な転送先を選択し、検索要求及び評価テーブルの転送、検索結果履歴テーブルの更新を順次繰り返すことにより、要求する検索対象を検出し、検索要求元に対して回答することができ、最適なノードを効率よく選択し、検索に必要なトラフィックを削減することができる。特に、本実施形態では、検索要求を転送する際、自機が保持する検索結果履歴テーブルの一部を評価データとして転送先に送信するため、転送先の検索サーバ装置における検索処理の効率を図ることができる。
【００５０】
（変形例）
以上、本発明の実施形態について説明したが、本発明はその主要な特徴から逸脱することなく他の様々な形態で実施することが可能である。上述した実施形態は、本発明の一態様を例示したものに過ぎず、本発明の範囲は、特許請求の範囲に示す通りであって、また、特許請求の範囲の均等範囲に属する変形や変更は、全て本発明の範囲内に含まれる。
【００５１】
なお、変形例として、例えば、上記実施形態においては、検索結果履歴テーブル３６２の更新を検索開始時のみに行うとして説明したが、任意のタイミングで他のノードより検索結果履歴テーブル３６２を取得して更新することが考えられる。
【００５２】
【発明の効果】
以上説明したように、本発明によれば、分散型検索システムにおいて、検索されるべき検索対象データを保持している最適なノードを効率よく選択することにより、検索に必要なトラフィックを削減し、通信負荷を低減することができる。
【図面の簡単な説明】
【図１】実施形態に係る分散型検索システムの構成を示すブロック図である。
【図２】実施形態に係るノードのハードウェア構成を示すブロック図である。
【図３】実施形態においてＣＰＵが検索プログラムを実行することでノード上に構築される機能を示すブロック図である。
【図４】実施形態に係る検索結果履歴テーブルのデータ構造を示す説明図である。
【図５】実施形態に係る検索要求とともに送信される評価テーブルのデータ構造を示す説明図である。
【図６】実施形態に係るノードにおけるノード選択処理の手順を示すフロー図である。
【図７】実施形態において履歴情報取得動作を示す説明図である。
【図８】実施形態において履歴更新処理の動作を示す説明図である。
【図９】実施形態において検索情報及び評価テーブル転送の動作を示す説明図である。
【図１０】従来のノード選択処理と履歴更新処理の動作を示す説明図である。
【図１１】従来のメタデータを用いたノード選択処理と履歴更新処理の動作を示す説明図である。
【符号の説明】
１…分散型検索システム
１０…ユーザ端末
２０…通信ネットワーク
２０…通信ワーク
３０（３０Ａ〜３０Ｈ）…ノード
３１…ＣＰＵ
３２…通信インターフェース
３３…バス
３６…ハードディスク装置
６１Ａ…評価テーブル
３６１…検索対象データベース
３６２…検索結果履歴テーブル
３６３…検索プログラム
４０１…検索要求受信部
４０２…判断部
４０３…検索結果送信部
４０４…履歴参照部
４０５…検索要求送信部
４０６…要求内容保持部
４０７…検索結果受信部
４０８…履歴更新部
４０９…履歴テーブル受信部
４１０…履歴テーブル送信部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a search server device and a search method for performing a search by search server devices distributed and connected to each other on a communication network.
[0002]
[Prior art]
Conventionally, a distributed search system in which a plurality of distributed search server devices are connected to each other via a communication line, and each search server device cooperates to perform a keyword search is known (for example, non-patent) Reference 1 and Non-Patent Document 2).
[0003]
In such a conventional distributed search system, when there is no data corresponding to the search server device requested to be searched, the search server device selects another search server device, and the selected search server device On the other hand, there is a method in which the process of sequentially transferring search requests is repeated until the corresponding data is found or another search server device is exhausted. Such a distributed search system will be described with reference to FIG.
[0004]
As shown in FIG. 10, each of nodes A, B, C, and D is a search server device, and each stores search target data representing a search target document and receives search request data including a search condition. The search is performed by comparing the keyword specified by the search request data with the character string constituting the document represented by the search target data.
[0005]
Further, in the above-described conventional method, the node A stores a search result history table storing search result history data representing the history of past search results. The search result history data is data generated for each search performed at each node, has items of keyword, node, and evaluation value, and represents at least one keyword used in the search; It consists of server data indicating the node and evaluation data indicating an evaluation value calculated based on the search result. The evaluation value increases as the number of hit search target data increases, and decreases as the number of hit search target data decreases, and is stored in association with each node.
[0006]
For example, when the node A receives search request data in which two search keywords “X” and “Y” are designated, the keyword using the keywords “X” and “Y” for the search target data stored by itself. Search is performed (process P1 in the figure). When the search target data is not hit in this search, the node A selects search result history data having data representing these search keywords as keyword items from the search result history table. At this time, if there are a plurality of corresponding search result history data, the search result history data having the maximum evaluation value indicated by the data of the evaluation value item is selected. Here, as shown in FIG. 10, since the evaluation value of the node D is the largest, the node D is selected.
[0007]
Next, the node A transmits search request data including two search keywords “X” and “Y” to the node D indicated by the node item data of the search result history data thus selected (processing in the figure). P2). The node D that has received the search request data performs a keyword search using the keywords “X” and “Y” for the stored search target data. When the search target data is hit in this search, the node D transmits data indicating the number of hit search target data and search result data including the search target data to the node A (process P3). The search result history table is updated based on the received search result data (process P4).
[0008]
As another distributed search system, as shown in FIG. 11, there is a search using metadata representing a character string and semantic information indicating the meaning of the character string. In this search, There is a method of generating search result history data based on information described in metadata and using it for selecting a search device.
[0009]
[Non-Patent Document 1]
NEUROGRID P2PSEARCH [Philosophy 21-08.01v03] Internet <URL: http: // www. neurogrid. net / php / phi1opy. php>
[0010]
[Non-Patent Document 2]
Neuro Grid P2P Search Distributed Search Internet <URL http: // www. jnutella. org / jnudev / jws # sam. lzh>
[0011]
[Problems to be solved by the invention]
In the conventional node selection method described above, the traffic required for the search is suppressed by selecting the transmission destination node of the search request data based on the information of the search result history table, and the purpose is to realize an efficient search. Yes.
[0012]
However, the above-described node selection method has a problem that the node cannot be selected when the keyword corresponding to the search request is not included in the search result history table. In addition, since there is a characteristic that the possibility of selecting an appropriate node increases as information in the search result history table increases in the past, a search result history table that enables effective node selection is created. A number of searches need to be done before it is done.
[0013]
Therefore, the present invention has been made in view of the above points, and in a distributed search system, it is necessary for a search by efficiently selecting an optimal node holding search target data to be searched. It is an object of the present invention to provide a search server device and a search method capable of reducing traffic and reducing a communication load.
[0014]
[Means for Solving the Problems]
In order to solve the above-described problems, the present invention provides a search server device distributed on a communication network and connected to each other, a search target database storing search target data to be searched, and other search servers. Search a search target database based on a search request including a selection data storage unit that stores selection data that associates a server identifier that specifies a search server device with evaluation data of the other search server device, and a search condition If the search target is detected, the search result and its own server identifier are transmitted to the search request source. If the search target is not detected, the search request is sent to another search server device based on the selected data. On the basis of the determination unit that transmits a part or all of the selection data related to the search request and the search condition, and the selection data acquired from another search server device, There characterized in that it comprises an updating unit that updates the selection data holding.
[0015]
In the invention described above, the format of the selection data may be that of which a search target is a character string and a metadata format including semantic information and a character string.
[0016]
According to the present invention as described above, each search server device receives a search request and, based on the evaluation data, transfers the search request to another search server device when the search server device does not hold the search target data. Since the search request transfer destination is selected, the optimum node can be efficiently selected and the traffic required for the search can be reduced. Further, in the present invention, when a search request is transferred, a part or all of the selection data held by the own device is transmitted to the transfer destination, so that the search processing efficiency in the transfer destination search server can be improved.
[0017]
In the above-mentioned invention, it is preferable that the updating unit has a function of acquiring selection data held by another search server device at an arbitrary timing and synchronizing it with the selection data held by itself. In this case, the selection data can be exchanged between the search server devices for synchronization, information can be shared, and a more optimal transfer destination can be selected. In the present invention, the selection data can be synchronized when the search node first starts the search, and at any time as required by the search node.
[0018]
In the said invention, it is preferable that the selection data memory | storage part has a function which stores the evaluation value acquired from the other search server, and the history of a server identifier as a history table. In this case, the selection data can be enhanced by accumulating the search results performed in the past as a history.
[0019]
The present invention relates to a search method in which search is performed by search server devices distributed on a communication network and connected to each other. The search target data to be searched is stored in a search target database. A step of storing selection data in which a server identifier for specifying the search server device and the evaluation data of the other search server device are associated with each other, searching a search target database based on a search request including a search condition, When the search target is detected, the search result is transmitted to the search request source, an evaluation value indicating the degree of matching with the search result, its own server identifier, and the search condition. When the search target is not detected, the selection data And a step of transferring a search request to another search server device.
[0020]
According to the present invention, each search server device receives a search request and transfers the search request to another search server device when the search server device does not hold the search target data. Since the transfer destination of the search request is selected based on the evaluation value of the search server device, it is possible to efficiently select the optimum node and reduce the traffic required for the search.
[0021]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of the present invention will be described with reference to the drawings.
[0022]
(Constitution)
FIG. 1 is a block diagram illustrating a configuration of a distributed search system 1 according to the embodiment. As shown in the figure, the distributed search system 1 is a system that performs a search using nodes 30 (30A to 30G) that are distributed on the communication network 20 and connected to each other. The user terminal 10 is connected.
[0023]
The node 30 is a search server device, and realizes a function of performing a search process using a metadata database and a search result history table, which will be described later, in a general WWW (World Wide Web) server device. The user terminal 10 is a general computer device, starts a www browser, sends a request message in accordance with HTTP to the communication work 20, and receives a response message for the message.
[0024]
FIG. 2 is a block diagram illustrating a hardware configuration of each node 30. As shown in the figure, the node 30 includes a CPU 31 connected via a bus 33, a hard disk device 36, and a communication interface 32 that communicates with the communication network 20. The hard disk device 36 stores communication addresses of all nodes constituting the distributed search system 1, a search target database 361, a search result history table 362, and a search program 363, and the CPU 31 stores in the hard disk device 36. The stored search program 363 is read and executed.
[0025]
The search target database 361 stores a large number of search target data created using XML (extensible Markup Language). Each search target data has at least one metadata, and the metadata includes keyword data representing a character string (for example, “Shibuya”) as a search keyword and semantic information (for example, a character) indicating the meaning content of the character string. And semantic data representing the column “ground”).
[0026]
FIG. 3 is a block diagram showing functions constructed on the node 30 by the CPU 31 executing the search program 363. Here, a description will be given by taking as an example the node B to which the search request is transferred from the node 30A.
[0027]
As shown in the figure, the search request receiving unit 401, the determination unit 402, the search result transmission unit 403, the history reference unit 404, and the search request transmission are executed in the node 30B by executing the search program 363. A unit 405, a request content holding unit 406, a search result receiving unit 407, a history updating unit 408, a history table receiving unit 409, and a history table transmitting unit 410 are constructed.
[0028]
The search request receiving unit 401 is a module connected to the communication interface 32, and is addressed to the own node 30B transmitted from the other device (here, another node A) requesting the search via the communication network 20. Search request data is received, and the received search request data is input to the request content holding unit 406 and the determination unit 402. Here, the search request data includes search condition data representing a search condition and address data indicating a communication address of the requesting node A, and the search condition data is composed of at least one search condition. .
[0029]
The request content holding unit 406 is a storage device that stores the search request data until the search result data in response to the search request data input from the search request receiving unit 401 is returned. Is passed to the unit 408. The history table receiving unit 409 receives the evaluation table 61A transmitted together with the search request data from the search request data transmission source node 30A, and updates the information of the search result history table 362 via the history update unit 408. It is.
[0030]
The search result transmission unit 403 is a module that transmits search result data in accordance with an instruction from the determination unit 402. The search result data is composed of the communication address of the requesting device, the URL (Uniform Resource Locator) indicating the location of the extracted search target data, the number of extracted search target data, and the communication address of the own node 30.
[0031]
The determination unit 402 searches the search target database 361 based on the search request data including the search condition, and when the search target is detected, the search result for the node A, its own server identifier, and the search condition This is a module that transmits an evaluation value indicating the degree of coincidence with respect to, and forwards a search request to another search node based on a history table as selection data when a search target is not detected.
[0032]
The search processing in the determination unit 402 is realized by comparing the character strings while distinguishing the keyword data and the semantic data, and performing it between all the search conditions and all the search target data. When at least one search target data is extracted by this trial, the determination unit 402 instructs the search result transmission unit 403 to return the search result data indicating the search result to the device that has transmitted the search request data.
[0033]
In addition, the determination unit 402 requests the history reference unit 404 to select a node that is optimal for the search request represented by the search request data. Further, when a node is selected by the history reference unit 404, the determination unit 402 instructs the search request transmission unit 405 to transmit search request data to the node. However, when the communication address of the requesting device in the search request data is different from the communication address of any node 30, the determination unit 402 sets the communication address of the requesting device included in the search request data to the own node 30. After rewriting to the communication address, the above instruction is given.
[0034]
When a node is not selected, the determination unit 402 returns empty search result data including the communication address of the request source device and the communication address of the own node 30 to the source device of the search request data. The search result transmission unit 403 is instructed. In addition, the determination unit 402 acquires information necessary for creating the evaluation table 61A corresponding to the search node condition from the history reference unit 404, creates the evaluation table 61A, and transmits the evaluation table 61A to the history table transmission unit 410. Instruct. When the node is selected by the determination unit 402 described above, the history reference unit 404 refers to the search result history table 362.
[0035]
When requested by the determination unit 402, the history reference unit 404 represents the evaluation value represented by the evaluation value data associated with the metadata that matches the search metadata in the search request data by the node data. An evaluation index is calculated by adding each node, and a node having the largest calculated evaluation index is selected as an optimum node. However, if there is no metadata that matches the search metadata in the search request data in the search result history table 362, a node is selected according to a predetermined priority order. In addition, selection of a node having higher priority than the own node or the own node is prohibited.
[0036]
The search result receiving unit 407 receives the search result data for the search request data from the node selected by the history reference unit 404, and compares the communication address of the requesting device in the search result data with the communication address of the own node 30. To do. If the two do not match, the search result receiving unit 407 instructs the search result transmitting unit 403 to return the search result data to the device that has transmitted the search request data, and the request content holding unit 406 stores the stored search request. Discard the data. On the other hand, if the two match, the history update unit 408 creates search result history data based on the search result data and the search request data, and additionally stores the search result history data in the search result history table 362, and the search result data To the search result transmission unit 403, and the request content holding unit 406 discards the search request data.
[0037]
FIG. 4 is an explanatory diagram illustrating a specific data configuration example of the search result history table 362. As shown in the figure, the search result history table 362 stores search result history data created when a search result receiving unit 407 described later receives search result data from another node. In the present embodiment, the search result history data is associated with metadata constituting the search condition data and bibliographic data that bibliographically represents the result of performing the search using the search condition data at the other node. Data. This bibliographic data includes node data for identifying the node that created the search result data, semantic information indicating the semantic content of the search condition, a character string of the search condition, and the number of search target data hit in the search It is comprised from the evaluation value data showing.
[0038]
FIG. 5 is an explanatory diagram illustrating a specific data configuration example of the evaluation table. As shown in the figure, the evaluation data includes node data for identifying the node that created the search result data, semantic information indicating the semantic content of the search condition, a character string of the search condition, and a hit in the search And evaluation value data representing the number of search target data.
[0039]
(Operation)
Next, the operation of this embodiment will be described. Here, it is assumed that search result history data created based on past search results is stored in the search result history table 362 in advance.
[0040]
First, when the user operates the user terminal 10, two search conditions (here, metadata in which the character string “Shinjuku” is associated with the semantic information “place name” and the character string “French” have the semantic information “restaurant”. Assume that the associated metadata) is input. Thereby, the search request data including the search condition data composed of these metadata is transmitted to the node 30A via the communication network 20.
[0041]
FIG. 6 is a flowchart showing the procedure of the node selection process in the node A. As shown in the figure, the search request receiving unit 401 of the node 30A receives search request data (step S501). Then, the search request reception unit 401 inputs search request data to the determination unit 402.
[0042]
The determination unit 402 to which the search request data is input performs a search process for extracting search target data having search condition data from the search target database 361 (step S502). That is, the determination unit 402 performs processing for comparing character strings while distinguishing character strings and semantic information between all search metadata and all search target data. Here, if the information in the search result history table 362 is not sufficient and it is desired to acquire the search result history table 362 of another node (step S503: Yes), the search result history table 362 is acquired from the other node. (Step S504). As shown in FIG. 7, the retrieval result history table is acquired by transmitting a history table request to other nodes (nodes 30B and 30D in FIG. 7), and the retrieval result history table transmitted in response thereto. To get.
[0043]
On the other hand, if the search result history table 362 has already been acquired in step S503 in FIG. 6 (step S503: No), it is determined whether the evaluation table 61A has been received from the search condition transmission source (S505). When the evaluation table 61A is received from the search condition transmission source (step S505: Yes), as shown in FIG. 8, the search result history table 362 is updated based on the evaluation table 61A (step S505). S506). Here, since the node 30A is assumed, the evaluation table is not received.
[0044]
Using the search result history table 362 updated in this way, an optimum node corresponding to the search condition data is selected (S507). Thereafter, as shown in FIG. 9, the determination unit 402 creates an evaluation table 61A for the node to which the next search request is transferred (S508), and both the evaluation table 61A and the search request are sent to the selected node. Transmit (S509).
[0045]
Here, a method in which the history reference unit 404 selects an optimum node from the search result history table 362 will be described. The history reference unit 404 evaluates the metadata (“semantic information” = “place name”, “character string” = “Shinjuku”) and the metadata (“semantic information” = “restaurant”, “character string” = ”). The evaluation index is calculated by adding the evaluation value of “French”) for each node, and the node having the largest calculated evaluation index is selected as the optimum node.
[0046]
For example, in the node 30B shown in FIG. 4, the evaluation value of the search result history for (“semantic information” = “place name” “character string” = “Shinjuku”) is “60”. Further, since the history information (“semantic information” = “restaurant“ character string ”=“ French ”) is not stored, the evaluation index of the node 30B is 60 + 0 = 60. Similarly, for the node 30C, one piece of history data (“semantic information” = “place name”, “character string” = “Shinjuku”) is stored, and the evaluation value is “20”.
[0047]
One piece of history data (“semantic information” = “restaurant”, “character string” = “French”) is also stored, and the evaluation value is “20”. Therefore, the evaluation index of the node 30C is 20 + 20 = 40. For the node 30D, the evaluation value of the history data which is (“semantic information” = “place name”, “character string” = “Shinjuku”) is “20”, and (“semantic information” = “restaurant”, “character” Since the metadata “column” = “French”) is not stored, the evaluation index of the node 30D is 30 + 0 = 30.
[0048]
As described above, since the node 30B has the largest evaluation value, the history reference unit 404 selects the node 30B as the optimal node, and transmits information indicating that the node 30B is the optimal node to the search request transmission unit 405. . Based on the received information, the search request transmission unit 405 transmits search request data having search condition data and data indicating the communication address of the requesting node 30A to the node 30B (step S509 in FIG. 9).
[0049]
In this way, an appropriate transfer destination is selected at each node, the search request and evaluation table transfer, and the search result history table update are sequentially repeated to detect the requested search target and reply to the search request source. It is possible to efficiently select the optimal node and reduce the traffic required for the search. In particular, in this embodiment, when a search request is transferred, a part of the search result history table held by the own device is transmitted as evaluation data to the transfer destination, so that the search processing efficiency in the search server device at the transfer destination is improved. be able to.
[0050]
(Modification)
As mentioned above, although embodiment of this invention was described, this invention can be implemented with another various form, without deviating from the main characteristic. The above-described embodiments are merely examples of one aspect of the present invention, and the scope of the present invention is as shown in the claims, and modifications and changes belonging to the equivalent scope of the claims. Are all included within the scope of the present invention.
[0051]
As a modification, for example, in the above-described embodiment, the search result history table 362 is updated only when the search is started. However, the search result history table 362 is obtained from another node at an arbitrary timing. It is possible to update.
[0052]
【The invention's effect】
As described above, according to the present invention, in the distributed search system, by efficiently selecting the optimum node holding the search target data to be searched, the traffic required for the search is reduced, Communication load can be reduced.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a configuration of a distributed search system according to an embodiment.
FIG. 2 is a block diagram showing a hardware configuration of a node according to the embodiment.
FIG. 3 is a block diagram illustrating functions that are constructed on a node when a CPU executes a search program in the embodiment;
FIG. 4 is an explanatory diagram illustrating a data structure of a search result history table according to the embodiment.
FIG. 5 is an explanatory diagram showing a data structure of an evaluation table transmitted together with a search request according to the embodiment.
FIG. 6 is a flowchart showing a procedure of node selection processing in the node according to the embodiment.
FIG. 7 is an explanatory diagram illustrating a history information acquisition operation in the embodiment.
FIG. 8 is an explanatory diagram illustrating an operation of history update processing in the embodiment.
FIG. 9 is an explanatory diagram showing operations of search information and evaluation table transfer in the embodiment.
FIG. 10 is an explanatory diagram showing operations of a conventional node selection process and history update process.
FIG. 11 is an explanatory diagram showing operations of a node selection process and a history update process using conventional metadata.
[Explanation of symbols]
1 ... Distributed search system
10: User terminal
20. Communication network
20. Communication work
30 (30A-30H) ... node
31 ... CPU
32 ... Communication interface
33 ... Bus
36. Hard disk device
61A ... Evaluation table
361 ... Search target database
362 ... Search result history table
363 ... Search program
401 ... Search request receiving unit
402: Judgment unit
403 ... Search result transmission unit
404 ... History reference part
405 ... Search request transmission unit
406 ... request content holding unit
407 ... Search result receiving unit
408 ... History update unit
409 ... History table receiving unit
410 ... History table transmission unit

Claims

A search server device distributed on a communication network and connected to each other,
A search target database for storing search target data to be searched; and
A selection data storage unit that stores selection data in which a server identifier that identifies another search server device and an evaluation value of the other search server device are associated with each other;
The search target database is searched based on a search request including a search condition, and when the search target is detected, the search result and its own server identifier are transmitted to the search request source, and the search target is not detected. In this case, based on the selection data, a determination unit that transfers a part or all of the selection data related to a search request and a search condition to another search server device;
A search server device comprising: an update unit that updates selection data held by itself based on selection data acquired from another search server device.

2. The search according to claim 1, wherein the update unit has a function of acquiring the selection data held by the other search server device at an arbitrary timing and synchronizing the selection data with the selection data held by the update unit. Server device.

The search server device according to claim 1, wherein the selection data storage unit has a function of storing an evaluation value acquired from another search server and a history of server identifiers as a search result history table.

A search method in which search is performed by search server devices distributed on a communication network and connected to each other,
Search target data to be searched is stored in the search target database, and selection data in which a server identifier for specifying another search server device is associated with an evaluation value of the other search server device is stored. Steps,
The search target database is searched based on a search request including a search condition, and when the search target is detected, the search result and its own server identifier are transmitted to the search request source, and the search target is not detected. In some cases, the method includes a step of transmitting a part or all of selection data related to a search request and a search condition to another search server device based on the selection data.