JP5008748B2

JP5008748B2 - Search method, integrated search server, and computer program

Info

Publication number: JP5008748B2
Application number: JP2010111314A
Authority: JP
Inventors: 陽介石井
Original assignee: Hitachi Solutions Ltd
Current assignee: Hitachi Solutions Ltd
Priority date: 2010-05-13
Filing date: 2010-05-13
Publication date: 2012-08-22
Anticipated expiration: 2030-05-13
Also published as: US20110282868A1; JP2011238179A

Description

本発明は、検索方法、統合検索サーバ及びコンピュータプログラムに関する。 The present invention relates to a search method, an integrated search server, and a computer program.

全文検索サービスでは、コンピュータシステムに格納されているファイルデータを検索サーバが解析して、検索インデックスを事前に作成する。検索サーバは、検索インデックスを利用して、ユーザに検索サービスを提供する。ユーザは、取得したいファイルを検索するための検索クエリを検索サーバに送信し、その検索結果をもとに対象ファイルにアクセスする。コンピュータシステムに保管されるファイルの数は年々増加する一方であるため、ユーザにとって全文検索サービスは、重要なサービスとなっている。 In the full text search service, a search server analyzes file data stored in a computer system and creates a search index in advance. The search server provides a search service to the user using the search index. The user transmits a search query for searching for a file to be acquired to the search server, and accesses the target file based on the search result. Since the number of files stored in a computer system is increasing year by year, the full-text search service has become an important service for users.

ところで、検索サーバが複数台存在する場合、ユーザは、各検索サーバに個別に検索クエリを出し、検索結果を各検索サーバから個別に取得する必要がある。このため、ユーザの使い勝手が悪い。 By the way, when there are a plurality of search servers, the user needs to issue a search query to each search server individually and acquire the search results from each search server individually. For this reason, user convenience is poor.

そこで、近年、独立した複数の検索サーバに一回だけ検索要求を発行するだけで、各検索サーバからの検索結果を統合的に取得することのできる、統合検索サービスが提供されている。 In recent years, therefore, an integrated search service has been provided in which search results from each search server can be acquired in an integrated manner by issuing a search request only once to a plurality of independent search servers.

例えば、OpenSearchという、統合検索のための仕様が公開されており、その仕様を利用した統合検索サービスが提供されている。統合検索サービスでは、各検索サーバはそれぞれ独立して運用される。その一方、各検索サーバは、OpenSearchのような統一した標準インタフェースに基づく検索要求を受け付けることができる。これにより、複数の検索サーバを疎に結合した統合検索が可能となる。疎に結合した統合検索では、各検索サーバが利用する検索アルゴリズムまたは検索インデックスの更新契機等はそれぞれ異なる。 For example, OpenSearch, a specification for integrated search, has been released, and an integrated search service using the specification is provided. In the integrated search service, each search server is operated independently. On the other hand, each search server can accept a search request based on a unified standard interface such as OpenSearch. This enables an integrated search in which a plurality of search servers are loosely coupled. In an integrated search that is loosely coupled, the search algorithm used by each search server or the search trigger for the search index is different.

これに対し、複数の検索サーバを一体的に運用することで、密に結合した統合検索サービスを提供する形態もある。密に結合した統合検索サービスでは、各検索サーバがそれぞれ同一の検索アルゴリズムを使用しており、検索インデックスもシステム内で統一して更新される。複数の検索サーバを密に結合させた統合検索サービスは、一台の検索サーバと見ることもできる。 On the other hand, there is also a form that provides a tightly coupled integrated search service by operating a plurality of search servers integrally. In a tightly coupled integrated search service, each search server uses the same search algorithm, and the search index is also updated uniformly in the system. An integrated search service in which a plurality of search servers are closely coupled can be viewed as a single search server.

さらに、検索結果の中から重複した内容を除外する機能を有する検索サーバも知られている。具体的には、検索サーバが、検索結果の各エントリから生成したハッシュ値に基づいて重複エントリを検出し、重複エントリを検索結果から削除する（特許文献１）。 Further, a search server having a function of excluding duplicate contents from search results is also known. Specifically, the search server detects a duplicate entry based on a hash value generated from each entry of the search result, and deletes the duplicate entry from the search result (Patent Document 1).

米国特許第７，３６６，７１８号公報US Pat. No. 7,366,718

前記文献に記載の技術は、各検索サーバ内において重複エントリを削除できるだけであり、複数の検索サーバからの各検索結果を統合した統合検索結果について、重複エントリを検出することは事実上難しいという問題がある。 The technique described in the above-mentioned document can only delete duplicate entries in each search server, and it is practically difficult to detect duplicate entries for integrated search results obtained by integrating search results from a plurality of search servers. There is.

複数の検索サーバを疎に結合させた形態の統合検索サービスの場合、重複エントリを検出するために利用するハッシュアルゴリズムが、各検索サーバでそれぞれ異なる可能性があるためである。各検索サーバで使用されるハッシュアルゴリズムが異なる場合、ハッシュ値に基づいて重複エントリを検出することは極めて難しい。従って、前記文献に記載の技術では、複数の検索サーバが疎結合したシステムにおいて、統合検索結果に含まれる重複エントリを検出することはできない。 This is because in the case of an integrated search service in which a plurality of search servers are loosely coupled, hash algorithms used to detect duplicate entries may be different for each search server. If the hash algorithm used by each search server is different, it is extremely difficult to detect duplicate entries based on the hash value. Therefore, with the technique described in the document, it is not possible to detect duplicate entries included in the integrated search result in a system in which a plurality of search servers are loosely coupled.

上述の課題は、重複エントリ検出のために利用するハッシュアルゴリズムを、各検索サーバ間で統一することが困難であることに起因する。既に、様々な方式のハッシュアルゴリズムが存在する上に、今後も様々なハッシュアルゴリズムが新たに登場し、検索サーバに実装されていくであろう。さらに、各検索サーバにおいて、ハッシュアルゴリズムに求められる要件もそれぞれ異なる。 The above-described problem is caused by the difficulty in unifying the hash algorithm used for detecting duplicate entries among the search servers. Various hash algorithms already exist, and various new hash algorithms will appear and be implemented in search servers. Furthermore, the requirements for the hash algorithm are different for each search server.

上記の理由により、それぞれ独立して運用される複数の検索サーバ間で、ハッシュアルゴリズムを統一させるのは事実上困難である。従って、前記文献に記載の従来技術を、複数の検索サーバが疎に結合した統合検索サービスに適用することはできない。 For the above reasons, it is practically difficult to unify the hash algorithm among a plurality of search servers that are operated independently. Therefore, the prior art described in the above document cannot be applied to an integrated search service in which a plurality of search servers are loosely coupled.

そこで、本発明の目的は、複数の検索サーバが疎に結合したシステムにおいて、各検索結果から重複データを検出して排除できるようにした検索方法、統合検索サーバ及びコンピュータプログラムを提供することにある。本発明の更なる目的は、後述する実施形態の記載から明らかになるであろう。 Therefore, an object of the present invention is to provide a search method, an integrated search server, and a computer program that can detect and eliminate duplicate data from each search result in a system in which a plurality of search servers are loosely coupled. . Further objects of the present invention will become clear from the description of the embodiments described later.

上記課題を解決すべく、本発明の第１観点に従う検索方法は、複数の検索サーバを含むコンピュータシステムを用いて検索する方法であって、コンピュータシステムは、それぞれ独立して動作する複数の検索サーバを疎結合させて構成されており、複数の検索サーバに含まれる統合検索サーバは、複数の検索サーバに含まれる複数の所定の検索サーバにそれぞれ検索させるための統合検索要求を受信すると、各所定の検索サーバが共通して利用可能な、重複を検出するための重複検出用情報を決定し、各所定の検索サーバに統合検索要求に対応する検索要求を発行し、各所定の検索サーバは、検索要求に基づいて担当のデータ群を検索し、その検索結果に、決定された重複検出用情報を用いて作成される、重複を検出するための重複検出値を含めて、統合検索サーバに送信し、統合検索サーバは、各所定の検索サーバから受信する検索結果の中から、各重複検出値に基づいて重複データを検出し、各検索結果の中から検出された重複データを取り除いて統合検索結果を作成し、その統合検索結果を統合検索要求の発行元に提供する。 In order to solve the above-described problem, a search method according to the first aspect of the present invention is a search method using a computer system including a plurality of search servers, and the computer system operates independently from each other. When the integrated search server included in the plurality of search servers receives the integrated search request for searching the plurality of predetermined search servers included in the plurality of search servers, The search servers that can be used in common by each search server determine duplicate detection information for detecting duplicates, issue a search request corresponding to the integrated search request to each predetermined search server, and each predetermined search server Searches the data group in charge based on the search request, and creates a duplicate detection value for detecting duplicates created using the determined duplicate detection information in the search result. The integrated search server detects duplicate data based on each duplicate detection value from the search results received from each predetermined search server, and is detected from each search result. The integrated search result is created by removing the duplicate data, and the integrated search result is provided to the issuer of the integrated search request.

第２観点では、第１観点において、各所定の検索サーバは、担当のデータ群について、複数の重複検出用情報毎の重複検出値をそれぞれ予め記憶しており、記憶された各重複検出値のうち、統合検索サーバにより決定された重複検出用情報に対応する重複検出値を検索結果に含めて統合検索サーバに送信する。 In the second aspect, in the first aspect, each predetermined search server stores in advance a duplicate detection value for each of a plurality of pieces of duplicate detection information for the data group in charge, and each of the stored duplicate detection values is stored in advance. Among them, the duplicate detection value corresponding to the duplicate detection information determined by the federated search server is included in the search result and transmitted to the federated search server.

第３観点では、第２観点において、各所定の検索サーバは、担当のデータ群を検索するために使用する検索インデックスを更新する場合に、複数の重複検出用情報毎に重複検出値をそれぞれ作成して保存する。 In the third aspect, in the second aspect, each predetermined search server creates a duplicate detection value for each of a plurality of duplicate detection information when updating a search index used to search a data group in charge. And save.

第４観点では、第１観点において、統合検索サーバは、各所定の検索サーバで利用可能な重複検出用情報に関する情報を、各所定の検索サーバから取得して保持しており、統合検索要求を受信した場合、保持された各重複検出用情報に関する情報に基づいて、各所定の検索サーバが共通に利用可能な重複検出用情報を決定する。 In the fourth aspect, in the first aspect, the integrated search server acquires information about duplicate detection information that can be used in each predetermined search server from each predetermined search server and holds the information, and issues an integrated search request. When received, the duplication detection information that can be used in common by each predetermined search server is determined based on the information relating to each held duplication detection information.

第５観点では、第１観点において、統合検索サーバは、コンピュータシステムが構築される場合に、各所定の検索サーバで利用可能な重複検出用情報に関する情報を、各所定の検索サーバから取得して保持しており、統合検索要求を受信した場合、保持された各重複検出用情報に関する情報に基づいて、各所定の検索サーバが共通に利用可能な重複検出用情報を決定する。 In a fifth aspect, in the first aspect, the integrated search server acquires information on duplicate detection information that can be used in each predetermined search server from each predetermined search server when the computer system is constructed. When the integrated search request is received, duplicate detection information that can be commonly used by each predetermined search server is determined based on the held information regarding each duplicate detection information.

第６観点では、第１観点において、各所定の検索サーバは、統合検索サーバから検索要求を受信した場合に、決定された重複検出用情報による重複検出値を作成し、その重複検出値を検索結果に含めて統合検索サーバに送信する。 In a sixth aspect, in the first aspect, when each predetermined search server receives a search request from the integrated search server, it creates a duplicate detection value based on the determined duplicate detection information and searches for the duplicate detection value. Included in the result and sent to the integrated search server.

第７観点では、第１観点において、重複検出用情報はハッシュアルゴリズムであり、重複検出値はハッシュ値である。 In a seventh aspect, in the first aspect, the duplicate detection information is a hash algorithm, and the duplicate detection value is a hash value.

本発明は、それぞれ独立して動作する複数の検索サーバを疎結合して構成されるコンピュータシステムを用いて検索するための統合検索サーバ、または、コンピュータを統合検索サーバとして機能させるためのコンピュータプログラムとして、把握できる。なお、上記観点の組合せ以外の組合せも本発明の範囲に含まれる。コンピュータプログラムは、通信媒体または記録媒体を介して、流通させることができる。 The present invention is an integrated search server for searching using a computer system configured by loosely coupling a plurality of search servers that operate independently, or a computer program for causing a computer to function as an integrated search server. I can understand. Note that combinations other than the above-described viewpoints are also included in the scope of the present invention. The computer program can be distributed via a communication medium or a recording medium.

図１は、コンピュータシステムの全体構成を示す図である。FIG. 1 is a diagram showing an overall configuration of a computer system. 図２は、検索サーバのハードウェア構成を示す図である。FIG. 2 is a diagram illustrating a hardware configuration of the search server. 図３は、検索サーバに記憶されているコンピュータプログラムの構成を示す図である。FIG. 3 is a diagram showing a configuration of a computer program stored in the search server. 図４は、検索サーバに記憶されているテーブルの構成を示す図である。FIG. 4 is a diagram illustrating a configuration of a table stored in the search server. 図５は、ファイルサーバのハードウェア構成を示す図である。FIG. 5 is a diagram illustrating a hardware configuration of the file server. 図６は、クライアントマシンのハードウェア構成を示す図である。FIG. 6 is a diagram illustrating a hardware configuration of the client machine. 図７は、統合検索の一連の処理を模式的に示す図である。FIG. 7 is a diagram schematically showing a series of integrated search processes. 図８は、検索インデックスに登録されるファイルを管理するためのテーブルを示す。FIG. 8 shows a table for managing files registered in the search index. 図９は、検索インデックスを管理するためのテーブルを示す。FIG. 9 shows a table for managing the search index. 図１０は、検索サーバを管理するためのテーブルを示す。FIG. 10 shows a table for managing the search server. 図１１は、統合検索結果を一時的に保管するためのテーブルを示す。FIG. 11 shows a table for temporarily storing the integrated search results. 図１２は、統合検索要求パラメータの構成例を示す。FIG. 12 shows a configuration example of the integrated search request parameter. 図１３は、統合検索結果の応答パラメータの構成例を示す。FIG. 13 shows a configuration example of response parameters of the integrated search result. 図１４は、ハッシュアルゴリズム問い合わせ要求パラメータの構成例を示す。FIG. 14 shows a configuration example of the hash algorithm inquiry request parameter. 図１５は、ハッシュアルゴリズム問い合わせの応答パラメータの構成例を示す。FIG. 15 shows an example of the configuration of response parameters for a hash algorithm query. 図１６は、検索要求パラメータの構成例を示す。FIG. 16 shows a configuration example of the search request parameter. 図１７は、検索結果の応答パラメータの構成例を示す。FIG. 17 shows a configuration example of the response parameter of the search result. 図１８は、統合検索要求処理を示すフローチャート。FIG. 18 is a flowchart showing integrated search request processing. 図１９は、統合検索処理を示すフローチャート。FIG. 19 is a flowchart showing an integrated search process. 図２０は、図１９に続くフローチャート。FIG. 20 is a flowchart following FIG. 図２１は、ハッシュアルゴリズムの問い合わせに応答する処理を示すフローチャート。FIG. 21 is a flowchart showing processing for responding to a hash algorithm query. 図２２は、検索して検索結果を応答する処理を示すフローチャート。FIG. 22 is a flowchart showing a process of searching and responding with a search result. 図２３は、検索インデックスを更新する処理を示すフローチャート。FIG. 23 is a flowchart showing processing for updating a search index. 図２４は、図２３に続くフローチャート。FIG. 24 is a flowchart following FIG. 図２５は、第２実施例に係り、検索サーバを管理するためのテーブルの構成例を示す。FIG. 25 relates to the second embodiment and shows a configuration example of a table for managing the search server. 図２６は、統合検索処理の一部を示すフローチャート。FIG. 26 is a flowchart showing a part of the integrated search process. 図２７は、第３実施例に係り、検索サーバの有するコンピュータプログラムの構成例を示す。FIG. 27 shows a configuration example of a computer program included in the search server according to the third embodiment. 図２８は、ハッシュアルゴリズムを事前に折衝する処理を示すフローチャート。FIG. 28 is a flowchart showing processing for negotiating a hash algorithm in advance. 図２９は、第４実施例に係り、検索応答処理を示すフローチャート。FIG. 29 is a flowchart showing search response processing according to the fourth embodiment. 図３０は、第５実施例に係り、統合検索処理の一部を示すフローチャート。FIG. 30 is a flowchart showing a part of the federated search process according to the fifth embodiment. 図３１は、第６実施例に係り、統合検索結果を一時的に保管するためのテーブルを示す。FIG. 31 relates to the sixth embodiment and shows a table for temporarily storing the integrated search results. 図３２は、統合検索処理の一部を示すフローチャート。FIG. 32 is a flowchart showing a part of the integrated search process. 図３３は、第７実施例に係り、重複エントリを検索サーバに通知する処理を示すフローチャート。FIG. 33 is a flowchart illustrating processing for notifying a search server of duplicate entries according to the seventh embodiment.

1100：統合検索サーバ／検索サーバ、1200，1300：検索サーバ、2100，2200，2300：ファイルサーバ、3100，3200，3300：クライアントマシン、1124：検索制御プログラム、1125：統合検索制御プログラム、5100：統合検索制御部、5110，5210，5310：検索制御部、5120，5220，5320：検索インデックス 1100: Integrated search server / search server, 1200, 1300: Search server, 2100, 2200, 2300: File server, 3100, 3200, 3300: Client machine, 1124: Search control program, 1125: Integrated search control program, 5100: Integration Search control unit, 5110, 5210, 5310: Search control unit, 5120, 5220, 5320: Search index

以下、図面に基づいて、本発明の実施の形態を説明する。本実施形態では、検索サーバが統合検索結果から重複した内容を検出して排除するための処理方式を説明する。以下に詳細に述べるように、本実施形態では、検索を行う各検索サーバにて使用するハッシュアルゴリズムを事前に取り決め、その取り決めたハッシュアルゴリズムにより算出されるハッシュ値を検索結果に含めて統合検索サーバに送信する。統合検索サーバは、ハッシュ値を用いて重複エントリを検出し、取り除く。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. In the present embodiment, a processing method for the search server to detect and eliminate duplicate contents from the integrated search result will be described. As described in detail below, in the present embodiment, a hash algorithm used in each search server that performs a search is determined in advance, and a hash value calculated by the determined hash algorithm is included in the search result, and the integrated search server Send to. The federated search server detects and removes duplicate entries using the hash value.

図１は、本実施例によるシステム構成を例示する説明図である。通信ネットワーク100を介して、複数の検索サーバ1100、1200、1300と、複数のファイルサーバ2100、2200、2300と、複数のクライアントマシン3100、3200、3300とが接続されている。さらに、コンピュータプログラムを配信するためのサーバ7000を通信ネットワーク100に接続してもよい。 FIG. 1 is an explanatory diagram illustrating a system configuration according to this embodiment. A plurality of search servers 1100, 1200, 1300, a plurality of file servers 2100, 2200, 2300, and a plurality of client machines 3100, 3200, 3300 are connected via the communication network 100. Further, a server 7000 for distributing computer programs may be connected to the communication network 100.

本システムでは、各ファイルサーバに格納されているデータの検索インデックスを、対応する検索サーバが作成する。各検索サーバは、その検索インデックスを利用して、クライアントマシンにファイルサーバ上のファイルについての検索サービスを提供する。さらに、検索サーバは、クライアントマシンに、複数の検索サーバからの検索結果をまとめて提供する統合検索サービスも提供する。 In this system, a corresponding search server creates a search index for data stored in each file server. Each search server provides a search service for a file on the file server to the client machine using the search index. Further, the search server also provides an integrated search service that provides the client machine with search results from a plurality of search servers.

具体的なサービス内容は次の通りである。まず、クライアントマシンは、ファイルサーバにファイル（データファイル）を登録することができる。ファイルサーバは、登録されたファイルを、当該ファイルサーバに接続された外部記憶装置に格納し保管する。検索サーバは、ファイルサーバに格納されたファイルをクローリングによって取得し、検索インデックスを作成する。検索サーバは、検索インデックスを、当該検索サーバに接続された外部記憶装置に格納し保管する。 The specific service contents are as follows. First, the client machine can register a file (data file) in the file server. The file server stores and stores the registered file in an external storage device connected to the file server. The search server acquires a file stored in the file server by crawling, and creates a search index. The search server stores and stores the search index in an external storage device connected to the search server.

クライアントマシンは、検索サーバに、検索クエリを指定して検索要求を送信することができる。検索サーバは、その検索クエリの条件に合致するファイルを、当該検索サーバの有する検索インデックスを利用して抽出し、その検索結果をクライアントマシンに提供する。 The client machine can transmit a search request by specifying a search query to the search server. The search server extracts a file that matches the search query condition using the search index of the search server, and provides the search result to the client machine.

さらに、クライアントマシンは、検索サーバに、検索クエリを指定して統合検索要求を送信することができる。検索サーバは、その検索クエリの条件に合致するファイルを、当該検索サーバが有する検索インデックスを利用して抽出する。さらに、検索サーバは、統合検索可能な他の検索サーバにも検索要求を送信し、各検索サーバから応答される検索結果を統合検索結果として、クライアントマシンに提供する。 Furthermore, the client machine can send an integrated search request to the search server by specifying a search query. The search server extracts files that match the search query conditions using a search index of the search server. Further, the search server also transmits a search request to other search servers that can perform integrated search, and provides the search result returned from each search server to the client machine as an integrated search result.

クライアントマシンは、統合検索結果に基づいて、アクセス対象ファイルを選定することができる。クライアントマシンは、統合検索結果に格納されているファイルアクセス用のファイルパス名を利用して、ファイルサーバに保管されているファイルにアクセスすることができる。 The client machine can select an access target file based on the integrated search result. The client machine can access the file stored in the file server by using the file access file path name stored in the integrated search result.

なお、図１では、検索サーバ、ファイルサーバ及びクライアントマシンという３種類の装置を、それぞれ別々の装置として示している。図１に示す構成に限らず、例えば、それら３種類の装置のうちいずれか２つ、または、３つ全てを一つのコンピュータ装置として構成してもよい。 In FIG. 1, three types of devices, a search server, a file server, and a client machine, are shown as separate devices. For example, any two or all three of the three types of devices may be configured as one computer device.

プログラム配信サーバ7000は、例えば、ハッシュアルゴリズム等のプログラムを検索サーバに配信させる装置である。プログラム配信サーバを、例えば、ファイルサーバまたは検索サーバと統合させて、一つのコンピュータ装置内に実現してもよい。 The program distribution server 7000 is a device that distributes a program such as a hash algorithm to a search server, for example. For example, the program distribution server may be integrated with a file server or a search server and realized in one computer apparatus.

さらに、通信ネットワーク100による接続形態は、インターネット接続でもよいし、または、ローカルエリアネットワークによるイントラネット接続などでもよい。 Furthermore, the connection form by the communication network 100 may be an Internet connection or an intranet connection by a local area network.

図２は、検索サーバ1100のハードウェア構成を例示する説明図である。本実施例では、３台の検索サーバ1100、1200、1300のうち、検索サーバ1100が統合検索サービスの受付窓口となっている。つまり、検索サーバ1100は、統合検索サービスをクライアントマシンに提供するための「統合検索サーバ」であり、さらに、検索要求に従って検索する「所定の検索サーバ」でもある。 FIG. 2 is an explanatory diagram illustrating the hardware configuration of the search server 1100. In this embodiment, of the three search servers 1100, 1200, and 1300, the search server 1100 is a reception window for the integrated search service. That is, the search server 1100 is an “integrated search server” for providing an integrated search service to a client machine, and is also a “predetermined search server” for searching according to a search request.

検索サーバ1100は、例えば、プロセッサ1110と、メモリ1120と、外部記憶装置インタフェース（以下、I/F）1130と、ネットワークI/F1140と、それら1110、1120、1130、1140を接続するバス1150とを含んで構成される。 The search server 1100 includes, for example, a processor 1110, a memory 1120, an external storage device interface (hereinafter referred to as I / F) 1130, a network I / F 1140, and a bus 1150 that connects these 1110, 1120, 1130, and 1140. Consists of including.

プロセッサ1110は、コンピュータプログラム（以下、プログラム）を実行する。メモリ1120には、後述するプログラム1121−1125及びテーブル4100−4400が格納される。外部記憶I/F1130は、外部記憶装置1160にアクセスするための通信回路である。ネットワークI/F1140は、通信ネットワーク100を介して他装置（ファイルサーバ及びクライアントマシン等）にアクセスするための通信回路である。 The processor 1110 executes a computer program (hereinafter referred to as a program). The memory 1120 stores programs 1121-1125 and tables 4100-4400, which will be described later. The external storage I / F 1130 is a communication circuit for accessing the external storage device 1160. The network I / F 1140 is a communication circuit for accessing other devices (such as a file server and a client machine) via the communication network 100.

図３は、メモリ1120に記憶されるプログラム内容を示す。メモリ1120には、例えば、外部記憶装置I/Fプログラム1121と、ネットワークI/Fプログラム1122と、データ管理プログラム1123と、検索制御プログラム1124と、統合検索制御プログラム1125と、が記憶されている。 FIG. 3 shows program contents stored in the memory 1120. In the memory 1120, for example, an external storage device I / F program 1121, a network I / F program 1122, a data management program 1123, a search control program 1124, and an integrated search control program 1125 are stored.

外部記憶装置I/Fプログラム1121は、外部記憶装置I/F1130を制御する。ネットワークI/Fプログラム1122は、ネットワークI/F1140を制御する。データ管理プログラム1123は、検索サーバ1100において保管データを管理するために利用されるファイルシステムまたはデータベースを提供する。検索制御プログラム1124は、検索サーバ1100において検索サービスを提供する。統合検索制御プログラム1125は、検索サーバ1100において統合検索サービスを提供する。 The external storage device I / F program 1121 controls the external storage device I / F 1130. The network I / F program 1122 controls the network I / F 1140. The data management program 1123 provides a file system or database used for managing stored data in the search server 1100. The search control program 1124 provides a search service in the search server 1100. The integrated search control program 1125 provides an integrated search service in the search server 1100.

図４は、メモリ1120に記憶されるテーブル（管理用データ）の内容を示す。メモリ1120には、例えば、検索インデックス登録ファイル管理テーブル4100と、検索インデックス管理テーブル4200と、検索サーバ管理テーブル4300と、統合用検索結果一時保管テーブル4400とが格納される。 FIG. 4 shows the contents of a table (management data) stored in the memory 1120. In the memory 1120, for example, a search index registration file management table 4100, a search index management table 4200, a search server management table 4300, and an integrated search result temporary storage table 4400 are stored.

検索インデックス登録ファイル管理テーブル4100は、検索制御プログラム1124により利用されるテーブルであり、検索インデックスに登録されたファイルを管理する。検索インデックス管理テーブル4200は、検索インデックスを管理するテーブルである。検索サーバ管理テーブル4300は、統合検索制御プログラム1125により使用されるテーブルであり、統合検索システムに含まれている各検索サーバを管理する。統合用検索結果一時保管テーブル4400は、統合検索制御プログラム1125により使用されるテーブルであり、統合検索の結果を一時的に保管する。 The search index registration file management table 4100 is a table used by the search control program 1124, and manages files registered in the search index. The search index management table 4200 is a table for managing search indexes. The search server management table 4300 is a table used by the integrated search control program 1125, and manages each search server included in the integrated search system. The integrated search result temporary storage table 4400 is a table used by the integrated search control program 1125, and temporarily stores the results of the integrated search.

図３に戻る。検索制御プログラム1124は、その内部に、検索インデックス管理サブプログラム1171と、検索受付サブプログラム1172と、ハッシュアルゴリズム応答サブプログラム1173と、重複排除サブプログラム1174とを持つ。 Returning to FIG. The search control program 1124 includes therein a search index management subprogram 1171, a search reception subprogram 1172, a hash algorithm response subprogram 1173, and a deduplication subprogram 1174.

検索インデックス管理サブプログラム1171は、検索インデックスデータを管理するために必要な処理を行う。具体的には、検索インデックス管理サブプログラム1171は、検索サーバ1100の検索対象となっているファイルデータを保管しているファイルサーバ3100に対してクローリング処理を行い、必要に応じて検索インデックスデータを生成したり、更新したり、削除したりする。検索インデックス管理サブプログラム1171は、検索インデックスデータの実体をデータ管理プログラム1123を用いて管理する。 The search index management subprogram 1171 performs processing necessary to manage search index data. Specifically, the search index management subprogram 1171 performs crawling processing on the file server 3100 that stores the file data that is the search target of the search server 1100, and generates search index data as necessary. , Update, or delete. The search index management subprogram 1171 manages the substance of the search index data using the data management program 1123.

検索受付サブプログラム1172は、クライアントマシンから検索クエリを指定した検索要求を受け付ける。検索受付サブプログラム1172は、その検索条件に合致するファイルを検索し、検索結果をクライアントマシンに応答する処理を行う。本実施例では、検索受付サブプログラム1172は、検索インデックス管理サブプログラム1171が別途作成した検索インデックスデータを利用して検索処理を行う。 The search reception subprogram 1172 receives a search request specifying a search query from the client machine. The search reception subprogram 1172 searches for a file that matches the search condition, and performs a process of responding the search result to the client machine. In this embodiment, the search reception subprogram 1172 performs search processing using search index data separately created by the search index management subprogram 1171.

ハッシュアルゴリズム応答サブプログラム1173は、他の検索サーバからハッシュアルゴリズムの折衝を要求された場合、その要求を受けて、必要な処理を行った上で応答するプログラムである。ハッシュアルゴリズム応答サブプログラム1173は、当該ハッシュアルゴリズム応答プログラム1173が搭載されている検索サーバで利用可能なハッシュアルゴリズムの一覧を、問い合わせ元に応答する。詳細は後述するが、各検索サーバで共通して利用可能なハッシュアルゴリズムの使用を各検索サーバに指示することにより、統合検索結果における重複を検出することができる。 The hash algorithm response subprogram 1173 is a program that receives a request and performs a response after receiving a request when another search server requests a hash algorithm negotiation. The hash algorithm response subprogram 1173 responds to the inquiry source with a list of hash algorithms that can be used by the search server in which the hash algorithm response program 1173 is installed. Although details will be described later, duplication in the integrated search result can be detected by instructing each search server to use a hash algorithm that can be commonly used in each search server.

なお、以下の説明では、主語となるプログラムまたはテーブルが搭載されている検索サーバを、自検索サーバと呼ぶことがある。 In the following description, a search server on which a subject program or table is mounted may be referred to as a self-search server.

重複排除サブプログラム1174は、自検索サーバの検索インデックス管理サブプログラム1171により管理される検索インデックスデータの中で、重複データを検出し、必要に応じて重複データを削除する処理を行う。つまり、重複排除サブプログラム1174は、一つの検索サーバ内に保管されている、重複データを排除する。 The deduplication subprogram 1174 detects duplicate data in the search index data managed by the search index management subprogram 1171 of its own search server, and performs a process of deleting the duplicate data as necessary. That is, the deduplication subprogram 1174 eliminates duplicate data stored in one search server.

重複データを検出するために、後述するハッシュアルゴリズムが使用される。重複排除サブプログラム1174は、ハッシュアルゴリズムにより算出されるハッシュ値に基づいて、検索インデックスデータ内の任意のあるデータと他のデータとが同一であるか否かを判断する。 In order to detect duplicate data, a hash algorithm described later is used. The deduplication subprogram 1174 determines whether any given data in the search index data is identical to other data based on the hash value calculated by the hash algorithm.

統合検索制御プログラム1125は、その内部に、統合検索受付サブプログラム1175と、ハッシュアルゴリズム折衝サブプログラム1176と、統合検索結果重複排除サブプログラム1177とを有する。 The integrated search control program 1125 includes therein an integrated search reception subprogram 1175, a hash algorithm negotiation subprogram 1176, and an integrated search result deduplication subprogram 1177.

統合検索受付サブプログラム1175は、クライアントマシンから検索クエリを指定した統合検索要求を受け付けると、統合検索可能な他の複数の検索サーバも利用して、その検索条件に合致するファイルを検索する。統合検索受付サブプログラム1175は、各検索サーバからの検索結果をまとめて、統合検索結果としてクライアントマシンに送信する。統合検索受付サブプログラム1175は、統合検索可能な検索サーバを選定するために、検索サーバ管理テーブル4300を使用する。 When the integrated search reception subprogram 1175 receives an integrated search request specifying a search query from a client machine, the integrated search reception subprogram 1175 also uses a plurality of other search servers capable of integrated search to search for a file that matches the search condition. The integrated search reception subprogram 1175 collects the search results from each search server and transmits them to the client machine as the integrated search results. The integrated search reception subprogram 1175 uses the search server management table 4300 to select search servers that can perform integrated search.

ハッシュアルゴリズム折衝サブプログラム1176は、統合検索受付サブプログラム1175が統合検索要求を受けた場合、統合検索結果の中から重複した内容を排除するために使用するハッシュアルゴリズムについて、統合検索可能な検索サーバ群との間で折衝し合意するために必要な処理を行う。具体的な処理内容は後述する。 The hash algorithm negotiation subprogram 1176 is a search server group that can perform an integrated search on a hash algorithm used to eliminate duplicate contents from the integrated search result when the integrated search reception subprogram 1175 receives an integrated search request. Necessary processing to negotiate and agree with Specific processing contents will be described later.

統合検索結果重複排除サブプログラム1177は、統合検索可能な検索サーバ群から取得された検索結果データの中から重複データを検出し、必要に応じて重複データを削除する処理を行う。統合検索結果重複排除サブプログラム1177は、重複データを検出するために、他の検索サーバ群との間で取り決められたハッシュアルゴリズムを使う。統合検索結果重複排除サブプログラム1177は、そのハッシュアルゴリズムによって、検索結果データ内の任意のデータと他のデータとが同一であるか否かを判断する。 The integrated search result deduplication subprogram 1177 detects duplicate data from search result data acquired from a search server group capable of integrated search, and performs a process of deleting the duplicate data as necessary. The integrated search result deduplication subprogram 1177 uses a hash algorithm negotiated with another search server group in order to detect duplicate data. The integrated search result deduplication subprogram 1177 determines whether any data in the search result data is the same as other data by using the hash algorithm.

なお、検索インデックス登録ファイル管理テーブル4100、検索インデックス管理テーブル4200、検索サーバ管理テーブル4300及び統合用検索結果一時保管テーブル4400については後述する。 The search index registration file management table 4100, the search index management table 4200, the search server management table 4300, and the search result temporary storage table for integration 4400 will be described later.

他の検索サーバ1200、1300は、統合検索に関する構成（統合検索制御プログラム1125、検索サーバ管理テーブル4300、統合検索結果一時保管テーブル4400）を備えない点を除いて、検索サーバ1100と同一構成であるため、説明を割愛する。 Other search servers 1200 and 1300 have the same configuration as search server 1100 except that they do not have a configuration related to integrated search (integrated search control program 1125, search server management table 4300, and integrated search result temporary storage table 4400). Therefore, the explanation is omitted.

図５は、ファイルサーバ2100のハードウェア構成を例示する説明図である。ファイルサーバ2100は、例えば、プログラムを実行するプロセッサ2110と、プログラム及びデータを一時的に格納するメモリ2120と、外部記憶装置2160にアクセスするための外部記憶装置I/F2130と、ネットワーク100を介して他装置（検索サーバ等）と通信するためのネットワークI/F2140と、それらを接続するバス2150とを備える。 FIG. 5 is an explanatory diagram illustrating the hardware configuration of the file server 2100. The file server 2100 includes, for example, a processor 2110 that executes a program, a memory 2120 that temporarily stores the program and data, an external storage device I / F 2130 that accesses the external storage device 2160, and the network 100. A network I / F 2140 for communicating with other devices (such as a search server) and a bus 2150 for connecting them are provided.

メモリ2120には、例えば、外部記憶装置I/Fプログラム2121と、ネットワークI/Fプログラム2122と、ファイル共有サービスプログラム2123と、ファイル管理プログラム2124とが格納される。 In the memory 2120, for example, an external storage device I / F program 2121, a network I / F program 2122, a file sharing service program 2123, and a file management program 2124 are stored.

外部記憶装置I/Fプログラム2121は、外部記憶装置I/F2130を制御する。ネットワークI/Fプログラム2122は、ネットワークI/F2140を制御する。ファイル共有サービスプログラム2123は、ファイルサーバ2100から提供されるファイル共有サービスを管理する。ファイル管理プログラム2124は、ファイルサーバ2100に記憶されたファイルを管理する。 The external storage device I / F program 2121 controls the external storage device I / F 2130. The network I / F program 2122 controls the network I / F 2140. The file sharing service program 2123 manages the file sharing service provided from the file server 2100. The file management program 2124 manages the files stored in the file server 2100.

ファイル共有サービスプログラム2123は、ファイル管理プログラム2124を利用して、共有ファイルを管理する。検索サーバまたはクライアントマシン等は、ファイル共有サービスプログラム2123を利用することで、ファイルサーバ2100に保管されている共有ファイルにアクセスすることが可能になる。 The file sharing service program 2123 uses the file management program 2124 to manage shared files. The search server or client machine can access the shared file stored in the file server 2100 by using the file sharing service program 2123.

なお、他のファイルサーバ2200、2300は、ここで説明したファイルサーバ2100と同一構成のため、説明を割愛する。 Since the other file servers 2200 and 2300 have the same configuration as the file server 2100 described here, the description thereof is omitted.

図６は、クライアントマシン3100のハードウェア構成を例示する説明図である。クライアントマシン3100は、例えば、プログラムを実行するプロセッサ3110と、プログラム及びデータを一時的に格納するメモリ3120と、外部記憶装置3160にアクセスするための外部記憶装置I/F3130と、ネットワークで接続された他装置にアクセスするためのネットワークI/F3140と、それらを接続するバス3150とを備える。 FIG. 6 is an explanatory diagram illustrating the hardware configuration of the client machine 3100. The client machine 3100 is connected to, for example, a processor 3110 that executes a program, a memory 3120 that temporarily stores the program and data, and an external storage device I / F 3130 that accesses the external storage device 3160 via a network. A network I / F 3140 for accessing other devices and a bus 3150 for connecting them are provided.

メモリ3120には、例えば、外部記憶装置I/Fプログラム3121と、ネットワークI/Fプログラム3122と、ファイル管理プログラム3123と、検索サービスクライアントプログラム3124と、ファイル共有サービスクライアントプログラム3125とが格納される。 For example, an external storage device I / F program 3121, a network I / F program 3122, a file management program 3123, a search service client program 3124, and a file sharing service client program 3125 are stored in the memory 3120.

外部記憶装置I/Fプログラム3121は、外部記憶装置I/F3130を制御する。ネットワークI/Fプログラム3122は、ネットワークI/F3140を制御する。ファイル管理プログラム3123は、クライアントマシン3100に保管されたファイルを管理するためのファイルシステムを提供する。検索サービスクライアントプログラム3124は、検索サーバ1100が提供する検索サービス並びに統合検索サービスを利用するためのプログラムである。ファイル共有サービスクライアントプログラム3125は、ファイルサーバ2100が提供するファイル共有サービスを利用するためプログラムである。 The external storage device I / F program 3121 controls the external storage device I / F 3130. The network I / F program 3122 controls the network I / F 3140. The file management program 3123 provides a file system for managing files stored in the client machine 3100. The search service client program 3124 is a program for using the search service provided by the search server 1100 and the integrated search service. The file sharing service client program 3125 is a program for using the file sharing service provided by the file server 2100.

検索サービスクライアントプログラム3124は、検索サービス及び統合検索サービスがHTTPプロトコルを利用する場合、HTTPクライアントプログラム（例えば、Webブラウザなど）を利用する。 The search service client program 3124 uses an HTTP client program (for example, a web browser) when the search service and the integrated search service use the HTTP protocol.

ファイル共有サービスクライアントプログラム3125は、ファイル共有サービスがNFSプロトコルを利用する場合は、NFSクライアントプログラムを使用する。ファイル共有サービスがCIFSプロトコルを利用する場合、ファイル共有サービスクライアントプログラム3125は、CIFSクライアントプログラムを利用する。あるいは、ファイル共有サービスがHTTPプロトコルを利用する場合、ファイル共有サービスクライアントプログラム3125は、HTTPクライアントプログラム（Webブラウザ等）を利用する。 The file sharing service client program 3125 uses an NFS client program when the file sharing service uses the NFS protocol. When the file sharing service uses the CIFS protocol, the file sharing service client program 3125 uses the CIFS client program. Alternatively, when the file sharing service uses the HTTP protocol, the file sharing service client program 3125 uses an HTTP client program (such as a Web browser).

なお、他のクライアントマシン3200、3300は、クライアントマシン3100と同一構成のため、説明を割愛する。 Since the other client machines 3200 and 3300 have the same configuration as the client machine 3100, description thereof is omitted.

図７は、クライアントマシン3100から検索サーバ1100に統合検索要求を発行した場合のシステム全体の動作を模式的に示す。図７では、統合検索要求の発行、各検索サーバでの検索、各検索サーバの検索結果の取得、統合検索結果の提供等の一連の処理が９つのステップで説明されている。以下、ステップを「Ｓ」と略記する場合がある。 FIG. 7 schematically shows the operation of the entire system when an integrated search request is issued from the client machine 3100 to the search server 1100. In FIG. 7, a series of processes such as issuing an integrated search request, searching at each search server, obtaining a search result from each search server, and providing an integrated search result are described in nine steps. Hereinafter, the step may be abbreviated as “S”.

なお、以下では、統合検索処理を実行する「統合検索サーバ」としての検索サーバ1100と、統合検索要求に従って検索する「所定の検索サーバ」としての検索サーバ1100とに同一の符号1100を付す。例えば、「統合検索要求を受けた検索サーバ1100は、各検索サーバ1100、1200、1300に検索を要求する。」等の文章において、「統合検索要求を受けた検索サーバ1100」とは、統合検索要求を受信して統合検索処理を実行する統合検索サーバであり、主として、統合検索制御プログラム1125に相当する。「各検索サーバ1100、1200、1300」における「検索サーバ1100」は、指定された検索を行って結果を返す検索サーバであり、主として、検索制御プログラム1124に相当する。 In the following description, the same reference numeral 1100 is assigned to the search server 1100 as the “integrated search server” that executes the integrated search processing and the search server 1100 as the “predetermined search server” to search according to the integrated search request. For example, in a sentence such as “The search server 1100 that has received an integrated search request requests a search from each of the search servers 1100, 1200, and 1300.” An integrated search server that receives requests and executes integrated search processing, and mainly corresponds to the integrated search control program 1125. The “search server 1100” in the “search servers 1100, 1200, 1300” is a search server that performs a specified search and returns a result, and mainly corresponds to the search control program 1124.

始めに、Ｓ１として、クライアントマシン3100は、統合検索サービスを提供する検索サーバ1100に対して、統合検索要求を送る。統合検索要求では、検索キーワード及び検索条件が指定される。 First, as S1, the client machine 3100 sends an integrated search request to the search server 1100 that provides the integrated search service. In the integrated search request, a search keyword and a search condition are specified.

統合検索に使用される検索キーワード及び検索条件は、従来の一般的な検索エンジンで受付可能な検索キーワード及び検索条件と同様に指定できる。例えば、検索キーワードとして、複数の文字列を指定してもよい。検索条件として、データ作成日またはデータ最終更新日を任意の範囲で指定してもよいし、データ作成者を指定してもよい。 Search keywords and search conditions used for the integrated search can be specified in the same manner as search keywords and search conditions that can be accepted by a conventional general search engine. For example, a plurality of character strings may be specified as search keywords. As a search condition, a data creation date or a data last update date may be specified in an arbitrary range, or a data creator may be specified.

Ｓ２として、統合検索要求を受け付けた検索サーバ1100内の統合検索制御部5100は、統合検索利用可能な検索サーバ1100、1200、1300に対して、ハッシュアルゴリズム（利用可能なハッシュ関数などの識別情報に相当）の折衝を行う。統合検索制御部5100は、主として統合検索制御プログラム1125により実現される。 In S2, the integrated search control unit 5100 in the search server 1100 that has received the integrated search request sends a hash algorithm (identification information such as a usable hash function) to the search servers 1100, 1200, and 1300 that can use the integrated search. Equivalent)). The integrated search control unit 5100 is mainly realized by the integrated search control program 1125.

統合検索要求を受けた検索サーバ1100は、自検索サーバ1100にて利用可能なハッシュアルゴリズムを指定して、そのハッシュアルゴリズムを他の各検索サーバ1100、1200、1300が利用可能であるかを、他の各検索サーバ1200、1300に問い合わせる。 Upon receiving the integrated search request, the search server 1100 specifies a hash algorithm that can be used by the self-search server 1100, and determines whether the other search servers 1100, 1200, and 1300 can use the hash algorithm. The search servers 1200 and 1300 are inquired.

Ｓ３として、問合せをうけた各検索サーバ1100、1200、1300内の検索制御部5110、5210、5310は、指定されたハッシュアルゴリズムをサポートしているかどうかの情報と、指定されたハッシュアルゴリズム以外に利用可能なハッシュアルゴリズムの情報とを、問い合わせ元である統合検索制御部5100に応答する。検索制御部5110、5120、5130は、ハッシュアルゴリズム応答サブプログラム1173により実現される。 As S3, the search control units 5110, 5210, and 5310 in the search servers 1100, 1200, and 1300 that have been queried use information other than the specified hash algorithm and whether or not the specified hash algorithm is supported. Information on possible hash algorithms is returned to the integrated search control unit 5100 that is the inquiry source. Search control units 5110, 5120, 5130 are realized by hash algorithm response subprogram 1173.

統合検索制御部5100は、各検索制御部5110、5210、5310からの応答結果に基づいて、統合検索に利用可能なハッシュアルゴリズムを決定する。以下の説明では、統合検索のために利用可能なハッシュアルゴリズムを、共通ハッシュアルゴリズムと呼ぶ場合がある。なお、一回の問合せで共通ハッシュアルゴリズムを決定できない場合、所定の回数だけ問い合わせと応答を繰り返し実行する構成としてもよい。 The integrated search control unit 5100 determines a hash algorithm that can be used for the integrated search based on the response results from the search control units 5110, 5210, and 5310. In the following description, a hash algorithm that can be used for integrated search may be referred to as a common hash algorithm. If the common hash algorithm cannot be determined by a single query, the query and response may be repeatedly executed a predetermined number of times.

Ｓ４として、統合検索制御部5100は、統合検索利用可能な検索サーバ1100、1200、1300に同一の検索要求を送る。その検索要求には、統合検索要求に含まれている検索キーワード及び検索条件と共に、前述の処理にて決定された共通ハッシュアルゴリズムに関する情報も含まれる。 In S4, the integrated search control unit 5100 sends the same search request to the search servers 1100, 1200, and 1300 that can use the integrated search. The search request includes information related to the common hash algorithm determined in the above-described process, together with the search keyword and search condition included in the integrated search request.

Ｓ５として、検索制御部5110、5210、5310は、自検索サーバ1100、1200、1300内にて管理する検索インデックス5120、5220、5320を利用して、それぞれ検索処理を実行する。検索処理では、統合検索制御部5100から指定された検索キーワードと検索条件を利用する。 As S5, the search control units 5110, 5210, and 5310 execute search processes using the search indexes 5120, 5220, and 5320 managed in the search servers 1100, 1200, and 1300, respectively. In the search process, the search keyword and the search condition specified from the integrated search control unit 5100 are used.

Ｓ６として、検索制御部5110、5210、5310は、それぞれの検索結果について、重複排除処理を行う。具体的には、検索制御部5110、5210、5310はそれぞれ、検索結果に含まれるエントリの中に、同一ファイルを示すエントリが複数個登録されているか否かを調べる。 In S6, the search control units 5110, 5210, and 5310 perform deduplication processing on each search result. Specifically, each of search control units 5110, 5210, and 5310 checks whether or not a plurality of entries indicating the same file are registered in the entries included in the search results.

もしも、同一ファイルを示すエントリが複数個登録されている場合、検索制御部5110、5210、5310は、所定の重複削除条件に沿って、任意の一エントリのみを残して、他エントリを非表示にするか、または削除する。 If a plurality of entries indicating the same file are registered, the search control units 5110, 5210, 5310 leave only one arbitrary entry and hide other entries in accordance with a predetermined duplication deletion condition. Or delete.

同一ファイルであるか否かを判断するために、ハッシュアルゴリズムが使用される。具体的には、ハッシュ関数などが利用される。検索制御部5110、5210、5310は、各ファイルデータについて、または、同一であるか否かを判断すべき複数のファイルデータについて、ハッシュ関数を利用してハッシュ値を生成する。ハッシュ値が一致する場合、それらのファイルは同一であると判定できる。 A hash algorithm is used to determine whether the files are the same. Specifically, a hash function or the like is used. The search control units 5110, 5210, and 5310 generate hash values by using a hash function for each file data or for a plurality of file data to be determined whether they are the same. If the hash values match, it can be determined that the files are identical.

Ｓ７として、検索制御部5110、5210、5310は、各検索サーバ1100、1200、1300内での重複エントリが取り除かれた検索結果を、検索要求元である検索サーバ1100の統合検索制御部5100に応答する。 In step S7, the search control units 5110, 5210, and 5310 respond to the search results obtained by removing the duplicate entries in the search servers 1100, 1200, and 1300 to the integrated search control unit 5100 of the search server 1100 that is the search request source. To do.

さらに、検索制御部5110、5210、5310は、検索結果に加えて、Ｓ４にて指定された共通ハッシュアルゴリズムを利用して生成された情報も、統合検索制御部5100に提供する。具体的には、共通ハッシュアルゴリズムに該当するハッシュ関数を利用して生成されたハッシュ値を統合検索制御部5100に通知する。 Further, the search control units 5110, 5210, and 5310 also provide the integrated search control unit 5100 with information generated by using the common hash algorithm specified in S4 in addition to the search results. Specifically, the integrated search control unit 5100 is notified of a hash value generated using a hash function corresponding to the common hash algorithm.

Ｓ８として、統合検索制御部5100は、各検索サーバから取得した検索結果をもとにして統合検索結果を生成し、さらに、統合検索結果の中から重複したエントリを排除する処理を行う。以下、統合検索結果に含まれる複数エントリのうち重複エントリを取り除く処理を、統合検索結果重複排除処理と呼ぶことがある。 In step S8, the integrated search control unit 5100 generates an integrated search result based on the search result acquired from each search server, and further performs processing for eliminating duplicate entries from the integrated search result. Hereinafter, the process of removing duplicate entries from a plurality of entries included in the integrated search result may be referred to as an integrated search result deduplication process.

統合検索結果重複排除処理の具体的内容は、前述した各検索制御部5110、5210、5310における重複排除処理の内容とほぼ同じである。具体的には、統合検索結果に含まれる各エントリに、同一ファイルデータを示すエントリが複数存在するか否かを調べる。もし、統合検索結果の中に同一ファイルを示すエントリが複数個登録されている場合、所定の重複削除条件に従って、任意の一エントリのみを残し、他エントリを非表示にするかまたは削除する。 The specific content of the integrated search result deduplication processing is almost the same as the content of the deduplication processing in each of the search control units 5110, 5210, and 5310 described above. Specifically, it is checked whether or not there are a plurality of entries indicating the same file data in each entry included in the integrated search result. If a plurality of entries indicating the same file are registered in the integrated search result, only one arbitrary entry is left and other entries are hidden or deleted according to a predetermined duplication deletion condition.

複数のファイルデータが同一であるか否かを判断するために、ハッシュアルゴリズムを使う。具体的には、各検索サーバから提供された、共通ハッシュアルゴリズムにより算出されたハッシュ値を利用する。複数のファイルデータのハッシュ値（検索サーバ内で作成されたハッシュ値）が一致する場合、それらのファイルデータは同一であると判断することができる。 A hash algorithm is used to determine whether or not a plurality of file data are the same. Specifically, the hash value calculated by the common hash algorithm provided from each search server is used. When hash values (hash values created in the search server) of a plurality of file data match, it can be determined that the file data are the same.

最後に、Ｓ９として、統合検索制御部5100は、重複エントリの除かれた統合検索結果を、クライアントマシン3100に応答する。以上の処理によって、クライアントマシン3100は、統合検索結果を取得することができる。 Finally, in S9, the integrated search control unit 5100 responds to the client machine 3100 with the integrated search result from which duplicate entries are removed. Through the above processing, the client machine 3100 can acquire the integrated search result.

図８は、検索インデックス登録ファイル管理テーブル4100の構成例を示す。検索インデックス登録ファイル管理テーブル4100は、検索インデックスの作成対象となっているファイルサーバから検索サーバが取得したファイルに関する情報を管理する。具体的には、検索インデックス登録ファイル管理テーブル4100は、ファイルID4110と、取得元ファイルパス名4120と、対象ファイルのメタデータ4130と、キャッシュ格納先4140と、対象ファイルのハッシュアルゴリズム（及びハッシュ値）4150とを、対応付けて管理する。 FIG. 8 shows a configuration example of the search index registration file management table 4100. The search index registration file management table 4100 manages information related to files acquired by the search server from the file server for which the search index is to be created. Specifically, the search index registration file management table 4100 includes a file ID 4110, an acquisition source file path name 4120, target file metadata 4130, a cache storage destination 4140, and a hash algorithm (and hash value) of the target file. 4150 is managed in association with each other.

ファイルID4110は、ファイルサーバから取得されたファイルを一意に識別するための識別子である。ファイルID4110は、検索サーバ1100により付与される連番でもよいし、または、ファイルサーバ2100により付与される連番でもよい。 The file ID 4110 is an identifier for uniquely identifying a file acquired from the file server. The file ID 4110 may be a serial number given by the search server 1100 or a serial number given by the file server 2100.

取得元ファイルパス名4120は、対象ファイルのファイルサーバにおける格納先を示すファイルパス名である。検索サーバは、ファイルサーバに、取得元ファイルパス名4120を指定してファイル取得要求を発行する。これにより、検索サーバは、ファイルサーバから所望のファイルを取得することができる。 The acquisition source file path name 4120 is a file path name indicating the storage destination of the target file in the file server. The search server issues a file acquisition request specifying the acquisition source file path name 4120 to the file server. Thereby, the search server can acquire a desired file from the file server.

対象ファイルのメタデータ4130は、対象ファイルに関連付けられているメタデータの集合である。メタデータ4130としては、例えば、ファイルサーバにて管理されている、ファイル所有者、ファイル作成日時、ファイルサイズ、ファイルへのアクセス権情報等が該当する。さらに、検索サーバにて管理されている最新ファイル取得日時等の情報も、メタデータ4130に含むことができる。 The target file metadata 4130 is a set of metadata associated with the target file. The metadata 4130 corresponds to, for example, a file owner, file creation date / time, file size, file access right information, etc. managed by the file server. Furthermore, information such as the latest file acquisition date and time managed by the search server can also be included in the metadata 4130.

キャッシュ格納先（格納場所）4140は、対象ファイルのキャッシュデータを検索サーバ内で保管する場合の格納場所を示す情報である。具体的には、検索サーバにて、キャッシュデータをファイル形式で管理する場合、ファイルの格納パス名をキャッシュ格納先の欄4140に登録する。 The cache storage location (storage location) 4140 is information indicating a storage location when the cache data of the target file is stored in the search server. Specifically, when the cache data is managed in the file format in the search server, the storage path name of the file is registered in the cache storage destination column 4140.

対象ファイルのハッシュアルゴリズム及びハッシュ値の欄4150は、対象ファイルデータの重複を検出するために利用する情報を格納する。欄4150には、ハッシュアルゴリズムを登録する欄4151、4153と、ハッシュ値を登録する欄4152、4154とが含まれる。 The target file hash algorithm and hash value column 4150 stores information used for detecting duplication of the target file data. The column 4150 includes columns 4151 and 4153 for registering hash algorithms, and columns 4152 and 4154 for registering hash values.

ハッシュアルゴリズム欄4151、4153には、重複検出のために利用するハッシュ関数の識別情報を登録する。ハッシュアルゴリズム欄4151、4153には、例えば、MD5またはSHA-1等のハッシュ関数を識別するための情報が登録される。ハッシュ値欄4152、4154には、ハッシュアルゴリズム欄4151、4153に登録されたハッシュ関数を用いて作成されたハッシュ値が登録される。 In the hash algorithm fields 4151 and 4153, identification information of hash functions used for duplicate detection is registered. In the hash algorithm fields 4151 and 4153, information for identifying a hash function such as MD5 or SHA-1 is registered. In the hash value columns 4152 and 4154, hash values created using the hash functions registered in the hash algorithm columns 4151 and 4153 are registered.

ハッシュアルゴリズム及びハッシュ値の欄4150には、ハッシュアルゴリズムとハッシュ値との組を複数個登録できるように構成する。図８では、各ファイルに対してそれぞれ二つの組を登録している例を示している。３つ以上の組を登録してもよい。さらに、全ファイルについて同じ個数の組を登録する構成でもよいし、ハッシュアルゴリズムとハッシュ値の組の登録可能数を、各ファイルで異なるような構成でもよい。 The hash algorithm / hash value field 4150 is configured so that a plurality of pairs of hash algorithms and hash values can be registered. FIG. 8 shows an example in which two sets are registered for each file. Three or more sets may be registered. Further, the same number of sets may be registered for all the files, or the number of sets that can be registered for the set of hash algorithm and hash value may be different for each file.

図９は、検索インデックス管理テーブル4200の構成例を示す。検索インデックス管理テーブル4200は、検索サーバにより作成された検索インデックスの情報を管理する。具体的には、検索インデックス管理テーブル4200は、キーワード4210と、位置情報4220とを対応付けて管理する。 FIG. 9 shows a configuration example of the search index management table 4200. The search index management table 4200 manages search index information created by the search server. Specifically, the search index management table 4200 manages the keyword 4210 and the position information 4220 in association with each other.

キーワード4210には、対象ファイルをインデクシング処理して得られた文字列が格納される。位置情報4220には、キーワード4210の文字列を含むファイルの情報が登録される。位置情報4220は、ファイルID4221、4224と、該当位置オフセット4222、4225と、重み付け係数4223、4226とを含む。 The keyword 4210 stores a character string obtained by indexing the target file. In the position information 4220, file information including the character string of the keyword 4210 is registered. The position information 4220 includes file IDs 4221 and 4224, corresponding position offsets 4222 and 4225, and weighting coefficients 4223 and 4226.

ファイルID4221、4224は、キーワードの文字列が出現するファイルを識別するための情報を登録する。検索インデックス登録ファイル管理テーブル4100のファイルID4110の欄に登録されているファイルIDが、ファイルID4221、4224に登録される。 File IDs 4221 and 4224 register information for identifying a file in which a character string of a keyword appears. The file ID registered in the file ID 4110 column of the search index registration file management table 4100 is registered in the file IDs 4221 and 4224.

該当位置オフセット4222、4225は、ファイルの中で、キーワードの文字列が出現するオフセット情報を登録する。この欄4222、4225では、一つのファイルで複数箇所出現する場合は、複数個のオフセット情報を登録する。 Corresponding position offsets 4222 and 4225 register offset information in which the character string of the keyword appears in the file. In these fields 4222 and 4225, when a plurality of locations appear in one file, a plurality of offset information is registered.

重み付け係数4223、4226は、ファイル中にキーワードの文字列が出現することについての重要度を登録する。重要度の値は、検索サーバが適宜設定できる。重要度の値は、大きければ大きいほど、重要であることを意味する。重要度の値は、検索結果の絞り込み、及び、検索結果の整列にも利用できる。 The weighting coefficients 4223 and 4226 register the importance of the appearance of the keyword character string in the file. The importance value can be appropriately set by the search server. The greater the importance value, the more important it is. The importance value can also be used to narrow down search results and to arrange search results.

位置情報4220では、一つのキーワード4210について複数個登録可能である。これにより、キーワード文字列に該当するファイルが複数存在する場合にも対応できる。なお、位置情報4220の欄において、該当するエントリの値が無効であることを意味するnull値を登録することもできる。図中では、null値を「−」として示す。null値は、例えば、登録数が他のエントリより少ないために項目が空いてしまうエントリに、用いられる。 In the position information 4220, a plurality of keywords 4210 can be registered. Accordingly, it is possible to cope with a case where a plurality of files corresponding to the keyword character string exist. Note that a null value indicating that the value of the corresponding entry is invalid can be registered in the position information 4220 column. In the figure, the null value is indicated as “−”. The null value is used, for example, for an entry in which an item becomes empty because the number of registrations is smaller than other entries.

図１０は、検索サーバ管理テーブル4300の構成例を示す。検索サーバ管理テーブル4300は、検索サーバが統合検索を行う場合に、検索要求の送信先となる検索サーバの一覧情報を管理する。具体的には、検索サーバ管理テーブル4300は、検索サーバID4310と、検索サーバ名4320と、IPアドレス4330と、重み付け係数4340とを対応付けて管理する。 FIG. 10 shows a configuration example of the search server management table 4300. The search server management table 4300 manages list information of search servers that are transmission destinations of search requests when the search server performs an integrated search. Specifically, the search server management table 4300 manages a search server ID 4310, a search server name 4320, an IP address 4330, and a weighting coefficient 4340 in association with each other.

検索サーバID4310には、統合検索に利用可能な検索サーバを識別するための識別番号が格納される。検索サーバID4310は、統合検索を行う検索サーバ1100により付与される連番でもよいし、または、統合検索サービスを提供するシステム内で付与される連番でもよい。 The search server ID 4310 stores an identification number for identifying a search server that can be used for the integrated search. The search server ID 4310 may be a serial number given by the search server 1100 that performs the integrated search, or may be a serial number given in the system that provides the integrated search service.

検索サーバ名4320には、検索サーバの名前を格納する。具体的には、検索サーバのホスト名でもよいし、任意の文字列からなる名称でもよい。IPアドレス4330には、検索サーバに付与されているIPアドレスを格納する。なお、DNSを利用してIPアドレスを決定するシステム構成の場合、IPアドレス4330欄には、DNSへの問合せに利用するホスト名を格納してもよい。 The search server name 4320 stores the name of the search server. Specifically, it may be the host name of the search server or a name consisting of an arbitrary character string. The IP address 4330 stores the IP address assigned to the search server. In the case of a system configuration in which an IP address is determined using DNS, the IP address 4330 column may store a host name used for inquiries to DNS.

重み付け係数4340には、検索サーバから得られる検索結果についての、重要性の度合いを示す値を格納する。重み付け係数の値が大きければ大きいほど、検索結果の重要性を高くできる。 In the weighting coefficient 4340, a value indicating the degree of importance of the search result obtained from the search server is stored. The greater the value of the weighting coefficient, the higher the importance of the search result.

重み付け係数4340の値を検索サーバ毎に変えることにより、統合検索結果の中で、特定の検索サーバから得た検索結果を優先させることができる。即ち、重み付け係数が大きく設定された検索サーバからの検索結果を、統合検索結果内において上位に表示させることができる。重み付け係数が小さく設定された検索サーバからの検索結果は、統合検索結果のランキングにおいて下位に表示される。なお、全ての検索サーバから得る検索結果を平等に扱いたい場合は、重み付け係数4340の値を全て同じ値に設定すればよい。 By changing the value of the weighting coefficient 4340 for each search server, the search results obtained from a specific search server can be prioritized among the integrated search results. That is, a search result from a search server having a large weighting coefficient can be displayed at the top in the integrated search result. Search results from the search server set with a small weighting coefficient are displayed in the lower rank in the ranking of the integrated search results. If the search results obtained from all the search servers are to be treated equally, all the values of the weighting coefficient 4340 may be set to the same value.

図１１は、統合検索結果一時保管テーブル4400の構成例を示す。統合検索結果一時保管テーブル4400は、各検索サーバ1100、1200、1300からの検索結果をマージして統合検索結果を生成する処理において、データを一時的に保管するために利用される。 FIG. 11 shows a configuration example of the integrated search result temporary storage table 4400. The integrated search result temporary storage table 4400 is used to temporarily store data in the process of generating the integrated search result by merging the search results from the search servers 1100, 1200, and 1300.

具体的に、統合検索結果一時保管テーブル4400は、検索サーバID4410、ランク4420、ファイルID4430、スコア値4440、ファイルパス名4450、ハッシュアルゴリズム4460、ハッシュ値4470、及び検索キーワード含有文字列4480を対応付けて管理する。 Specifically, the integrated search result temporary storage table 4400 associates the search server ID 4410, rank 4420, file ID 4430, score value 4440, file path name 4450, hash algorithm 4460, hash value 4470, and search keyword-containing character string 4480. Manage.

検索サーバID4410には、検索結果を取得した検索サーバを識別するための情報が格納される。検索サーバID4410には、検索サーバ管理テーブル4300の検索サーバID4310の欄に登録されている検索サーバIDと同じ情報が登録される。 The search server ID 4410 stores information for identifying the search server that acquired the search result. In the search server ID 4410, the same information as the search server ID registered in the search server ID 4310 column of the search server management table 4300 is registered.

ランク4420には、検索サーバから送られてきたエントリのランクの情報をそのまま格納する。ランクとは、各検索サーバが提供する検索結果の中で、検索キーワード及び検索条件に該当する度合いが高い順に整列させ、その整列順に順位付けをした値である。 The rank 4420 stores the rank information of the entry sent from the search server as it is. The rank is a value obtained by sorting the search results provided by each search server in the descending order of the degree corresponding to the search keyword and the search condition, and ranking in the sort order.

ファイルID4430には、検索サーバから送られてきたエントリに対応するファイルのファイルIDがそのまま格納される。具体的には、ファイルID4430には、検索インデックス登録ファイル管理テーブル4100のファイルID4110の欄に登録されているファイルIDと同じ情報が登録される。 The file ID 4430 stores the file ID of the file corresponding to the entry sent from the search server as it is. Specifically, the same information as the file ID registered in the file ID 4110 column of the search index registration file management table 4100 is registered in the file ID 4430.

スコア値4440には、検索サーバから送られてきたエントリのスコア値の情報がそのまま格納される。スコア値とは、各検索サーバが提供する検索結果の中で、検索キーワード及び検索条件に該当する度合いを数値化したものである。スコア値に、検索サーバ管理テーブル4300における重み付け係数4340を乗じて、統合スコア値を算出する。検索サーバ1100は、統合スコア値を統合検索結果における統合ランクを決定するために利用する。 The score value 4440 stores the score value information of the entry sent from the search server as it is. The score value is a numerical value of the degree corresponding to the search keyword and the search condition in the search results provided by each search server. The integrated score value is calculated by multiplying the score value by the weighting coefficient 4340 in the search server management table 4300. The search server 1100 uses the integrated score value to determine the integrated rank in the integrated search result.

ファイルパス名4450には、検索サーバから送られてきたエントリに対応するファイルのファイルパス名をそのまま格納する。具体的には、ファイルパス名4450には、検索インデックス登録ファイル管理テーブル4100の取得元ファイルパス名4120の欄に登録されているファイルパス名と同じ情報が登録される。 The file path name 4450 stores the file path name of the file corresponding to the entry sent from the search server as it is. Specifically, in the file path name 4450, the same information as the file path name registered in the column of the acquisition source file path name 4120 of the search index registration file management table 4100 is registered.

なお、ネットワーク100を介して対象ファイルにアクセスできるようにするために、ファイルパス名4450の欄には、ファイルパス名に加えて、対象ファイルを格納するファイルサーバの識別情報を格納してもよい。 In addition, in order to make the target file accessible via the network 100, the file path name 4450 column may store the identification information of the file server that stores the target file in addition to the file path name. .

ハッシュアルゴリズム4460には、検索サーバで利用可能なハッシュアルゴリズムを識別するための情報が格納される。ハッシュ値4470には、ハッシュアルゴリズムにより算出されるハッシュ値を格納する。 The hash algorithm 4460 stores information for identifying a hash algorithm that can be used by the search server. The hash value 4470 stores a hash value calculated by a hash algorithm.

なお、統合検索に使用されるハッシュアルゴリズム（共通ハッシュアルゴリズム）を決定するための折衝処理において、共通ハッシュアルゴリズムを選定できなかった場合は、ハッシュアルゴリズム4460及びハッシュ値4470の欄に、無効値であることを意味するnull値が格納される。 In the negotiation process for determining the hash algorithm (common hash algorithm) used for the integrated search, if a common hash algorithm cannot be selected, an invalid value is displayed in the fields of the hash algorithm 4460 and the hash value 4470. A null value that means that is stored.

検索キーワード含有文字列4480には、検索サーバから送られてきた検索キーワードを含有する文字列がそのまま格納される。検索キーワード含有文字列とは、各検索サーバからの検索結果の中に含まれる各ファイルの中から、検索キーワードを含む文字列を抜き出したものの集合である。 In the search keyword-containing character string 4480, the character string containing the search keyword sent from the search server is stored as it is. The search keyword-containing character string is a set of character strings including the search keyword extracted from each file included in the search result from each search server.

検索キーワードを含む文字列の情報を検索結果に含めることで、ユーザは、指定した検索キーワードが含まれる部分の文章または文字列を、検索結果の一部として利用することができる。これにより、ユーザは、検索結果に挙げられた対象ファイルに実際にアクセスすることなく、検索キーワードを含む前後の文脈を把握することができる。従って、検索キーワード含有文字列4480により、検索サービスの利便性を高めることができる。 By including the information on the character string including the search keyword in the search result, the user can use the sentence or the character string of the part including the specified search keyword as a part of the search result. Thus, the user can grasp the context before and after the search keyword without actually accessing the target file listed in the search result. Therefore, the convenience of the search service can be enhanced by the search keyword-containing character string 4480.

一つのファイルの中に、検索キーワードを含む箇所が複数個存在する場合は、検索キーワード含有文字列も欄4480に複数登録される。検索サーバは、検索インデックス管理テーブル4200に登録されている情報を利用して、検索キーワード含有文字列を生成する。なお、検索キーワード含有文字列4480の欄において、検索サーバから提供される検索キーワード含有文字列の数が他のエントリよりも少ないために空欄となる箇所には、無効値を意味するnull値を格納する。 When there are a plurality of locations including the search keyword in one file, a plurality of search keyword-containing character strings are also registered in the column 4480. The search server uses the information registered in the search index management table 4200 to generate a search keyword-containing character string. In the column of search keyword-containing character string 4480, a null value indicating an invalid value is stored in a place that is blank because the number of search keyword-containing character strings provided from the search server is smaller than other entries. To do.

図１２は、クライアントマシンから検索サーバ1100に統合検索要求を行う際に指定する、統合検索要求パラメータ6100の構成例を示す。本パラメータは、図７で説明したＳ１で利用される。具体的に、統合検索要求パラメータ6100は、要求先マシン識別情報6110、要求元マシン識別情報6120、処理種別6130、検索キーワード6140、検索オプション6150、統合検索オプション6160を含む。 FIG. 12 shows an example of the configuration of the integrated search request parameter 6100 specified when a client machine makes a search request to the search server 1100. This parameter is used in S1 described with reference to FIG. Specifically, the integrated search request parameter 6100 includes request destination machine identification information 6110, request source machine identification information 6120, processing type 6130, search keyword 6140, search option 6150, and integrated search option 6160.

要求先マシン識別情報6110には、統合検索要求の送信先となる検索サーバを識別するための情報が格納される。要求先マシン識別情報6110には、ネットワーク100を介して検索サーバにアクセスするために、検索サーバのホスト名またはIPアドレス等のアクセス情報が格納される。 The request destination machine identification information 6110 stores information for identifying the search server that is the transmission destination of the integrated search request. The request destination machine identification information 6110 stores access information such as the host name or IP address of the search server in order to access the search server via the network 100.

要求元マシン識別情報6120には、統合検索を要求するクライアントマシンを識別するための情報が格納される。要求元マシン識別情報6120には、ネットワーク100を介してクライアントマシンにアクセスするために、クライアントマシンのホスト名、または、クライアントマシンのIPアドレス等のアクセス情報が格納される。 The request source machine identification information 6120 stores information for identifying a client machine that requests an integrated search. The request source machine identification information 6120 stores access information such as the host name of the client machine or the IP address of the client machine in order to access the client machine via the network 100.

処理種別6130は、処理の内容を識別するための情報を格納する。統合検索要求を発行する場合、処理種別6130には、統合検索要求処理を示す情報が格納される。検索キーワード6140は、統合検索要求に使用される検索キーワードを格納する。 The process type 6130 stores information for identifying the content of the process. When issuing an integrated search request, the process type 6130 stores information indicating the integrated search request process. The search keyword 6140 stores a search keyword used for the integrated search request.

検索オプション6150は、各検索サーバに検索を要求する際に指定するオプションに関する情報を格納する。検索オプション6150としては、例えば、ファイル作成日時、ファイル更新日時、ファイル作成者等に関する条件を指定できる。 The search option 6150 stores information on options specified when requesting a search from each search server. As the search option 6150, for example, conditions relating to file creation date / time, file update date / time, file creator and the like can be designated.

統合検索オプション6160は、統合検索処理を行う検索サーバ1100に指定するオプションに関する情報を格納する。統合検索オプション6160としては、例えば、クライアントマシンに提供する統合検索結果の件数、統合検索結果の先頭エントリにおけるオフセット値に関する条件等がある。オフセット値を設定することにより、例えば、先頭エントリをランク１位から始めるのか、またはランク１００位から始めるのか等を設定できる。 The integrated search option 6160 stores information related to options specified for the search server 1100 that performs the integrated search processing. As the integrated search option 6160, for example, there are the number of integrated search results provided to the client machine, conditions regarding the offset value in the first entry of the integrated search results, and the like. By setting the offset value, for example, it is possible to set whether the head entry starts from rank 1 or rank 100.

図１３は、検索サーバ1100がクライアントマシンに統合検索結果を応答する際に指定する、統合検索結果応答パラメータ6200の構成例を示す。本パラメータ6200は、図７で説明したＳ９にて利用される。具体的に、統合検索結果応答パラメータ6200は、応答先マシン識別情報6210、応答元マシン識別情報6220、処理種別6230、処理結果識別情報6240、総件数6250、応答件数6260、先頭ランク6270、検索結果6280、追加応答要求に必要な情報6290を備えている。 FIG. 13 shows a configuration example of the integrated search result response parameter 6200 specified when the search server 1100 returns an integrated search result to the client machine. This parameter 6200 is used in S9 described with reference to FIG. Specifically, the integrated search result response parameter 6200 includes response destination machine identification information 6210, response source machine identification information 6220, processing type 6230, processing result identification information 6240, total number 6250, response number 6260, top rank 6270, search result 6280, information 6290 necessary for an additional response request is provided.

応答先マシン識別情報6210は、統合検索結果の送信先となるクライアントマシンを識別する情報を格納する。例えば、ネットワーク100を介してクライアントマシンにアクセスするために、クライアントマシンのホスト名またはIPアドレス等のアクセス情報が格納される。 The response destination machine identification information 6210 stores information for identifying the client machine that is the transmission destination of the integrated search result. For example, in order to access a client machine via the network 100, access information such as a host name or an IP address of the client machine is stored.

応答元マシン識別情報6220には、統合検索要求を行った検索サーバ1100を識別するための情報が格納される。前記同様に、例えば、検索サーバ1100のホスト名、IPアドレス等が格納される。 The response source machine identification information 6220 stores information for identifying the search server 1100 that made the integrated search request. Similarly to the above, for example, the host name and IP address of the search server 1100 are stored.

処理種別6230は、処理の内容を識別するための情報を格納する。統合検索の結果を送信する場合、処理種別6230には、統合検索結果の応答処理を示す情報が格納される。処理結果識別情報6240は、統合検索の処理結果を識別する情報を格納する。具体的には、処理が成功したのか、あるいは失敗したのかといった情報が格納される。 The process type 6230 stores information for identifying the contents of the process. When the result of the integrated search is transmitted, the process type 6230 stores information indicating the response process of the integrated search result. The processing result identification information 6240 stores information for identifying the processing result of the integrated search. Specifically, information such as whether the processing has succeeded or failed has been stored.

総件数6250は、指定された条件に合致したファイルデータの総件数を格納する。応答件数6260は、指定された条件に合致したファイルデータの中で、統合検索結果の応答に含まれる件数を格納する。総件数6250の値が応答件数6260の上限値以下の場合、総件数6250と応答件数6260とは一致する。しかし、総件数6250が前記上限値よりも多い場合、一度の応答件数6260の上限値より多い分は、統合検索結果の応答に含めない。 The total number 6250 stores the total number of file data that matches the specified condition. The response number 6260 stores the number of responses included in the response of the integrated search result among the file data matching the specified condition. When the value of the total number of cases 6250 is less than or equal to the upper limit of the number of responses 6260, the total number 6250 and the number of responses 6260 match. However, when the total number of cases 6250 is larger than the upper limit value, the amount exceeding the upper limit value of the number of response cases 6260 is not included in the response of the integrated search result.

先頭ランク6270には、統合検索結果の応答に含まれる先頭エントリのランク値が格納される。もし、ランク１位のエントリが先頭であれば、先頭ランク6270には１が格納され、ランク100位のエントリが先頭であれば、先頭ランク6270には100が格納される。 The head rank 6270 stores the rank value of the head entry included in the response of the integrated search result. If the entry ranked first is the first, 1 is stored in the first rank 6270, and if the entry ranked 100 is the first, 100 is stored in the first rank 6270.

検索結果6280には、統合検索処理によって取得された統合検索結果が格納される。検索結果6280には、応答件数6280で規定された数だけ、検索結果エントリ6281、6282が格納される。検索結果エントリ6281、6282には、統合用検索結果一時保管テーブル4400の各欄4410−4480に格納されている情報と同じ情報が格納される。 The search result 6280 stores the integrated search result acquired by the integrated search process. Search result entries 6281 and 6282 are stored in the search result 6280 as many as the number specified in the response number 6280. The search result entries 6281 and 6282 store the same information as the information stored in the respective columns 4410 to 4480 of the search result temporary storage table 4400 for integration.

追加応答要求に必要な情報6290は、総件数6250の値よりも応答件数6260の値の方が小さい場合に利用する。追加応答要求に必要な情報6290の欄には、統合検索結果の応答に含まれていない他の検索結果に関する情報を取得するためのリンク情報が格納される。 The information 6290 necessary for the additional response request is used when the value of the response number 6260 is smaller than the value of the total number 6250. In the column of information 6290 necessary for the additional response request, link information for acquiring information related to other search results not included in the response of the integrated search result is stored.

図１４は、統合検索要求を受信した検索サーバ1100から、統合検索を利用可能な各検索サーバ1100、1200、1300に、ハッシュアルゴリズムの折衝を行う際の問合せに指定するための、ハッシュアルゴリズム問合せ要求パラメータ6300の構成例を示す。 FIG. 14 shows a hash algorithm query request for designating a query when a hash algorithm is negotiated from the search server 1100 that receives the federated search request to the search servers 1100, 1200, and 1300 that can use the federated search. The structural example of the parameter 6300 is shown.

本パラメータ6300は、図７で説明したＳ２で利用される。具体的に、ハッシュアルゴリズム問合せ要求パラメータ6300は、問合せ先マシン識別情報6310、問合せ元マシン識別情報6320、処理種別6330、利用可能なハッシュアルゴリズム候補識別情報6340、問合せオプション6350を含む。 This parameter 6300 is used in S2 described with reference to FIG. Specifically, the hash algorithm query request parameter 6300 includes query destination machine identification information 6310, query source machine identification information 6320, processing type 6330, available hash algorithm candidate identification information 6340, and query option 6350.

問合せ先マシン識別情報6310は、検索要求の送信先となる検索サーバを識別するための情報を格納する。即ち、問合せ先マシン識別情報6310は、統合検索を開始する前に、利用するハッシュアルゴリズムについて交渉する必要がある各検索サーバを識別するための情報が格納される。例えば、ネットワーク100を介して検索サーバにアクセスするために、検索サーバのホスト名、IPアドレス等のアクセス情報が格納される。 The inquiry destination machine identification information 6310 stores information for identifying a search server that is a transmission destination of a search request. In other words, the inquiry-destination machine identification information 6310 stores information for identifying each search server that needs to negotiate the hash algorithm to be used before starting the integrated search. For example, in order to access the search server via the network 100, access information such as the host name and IP address of the search server is stored.

問合せ元マシン識別情報6320は、統合検索処理を行う検索サーバ1100を識別するための情報を格納する。ネットワーク100を介してマシンにアクセスするために、検索サーバ1100のホスト名またはIPアドレス等のアクセス情報が、問合せ元マシン識別情報6320に格納される。 The inquiry source machine identification information 6320 stores information for identifying the search server 1100 that performs the integrated search process. In order to access a machine via the network 100, access information such as the host name or IP address of the search server 1100 is stored in the inquiry source machine identification information 6320.

処理種別6330は、処理の内容を識別するための情報を格納する。ハッシュアルゴリズムの問い合わせを行う場合、処理種別6330には、ハッシュアルゴリズム問合せ要求処理を示す情報が格納される。 The process type 6330 stores information for identifying the content of the process. When making a hash algorithm inquiry, the process type 6330 stores information indicating a hash algorithm inquiry request process.

利用可能なハッシュアルゴリズム候補識別情報6340には、問合せ元である検索サーバ1100において利用可能なハッシュアルゴリズムの識別情報一覧が格納される。各検索サーバにおいて、ハッシュアルゴリズム候補識別情報6340に格納された複数のハッシュアルゴリズムのうち、共通するハッシュアルゴリズムを利用できる場合、そのハッシュアルゴリズムを利用して、統合検索結果に含まれる重複を検出できる。 The hash algorithm candidate identification information 6340 that can be used stores a list of identification information of hash algorithms that can be used in the search server 1100 that is the query source. In each search server, when a common hash algorithm can be used among a plurality of hash algorithms stored in the hash algorithm candidate identification information 6340, duplication included in the integrated search result can be detected using the hash algorithm.

問合せオプション6350は、ハッシュアルゴリズム問合せ要求処理で指定可能なオプション情報を格納する。具体的には、利用可能なハッシュアルゴリズムの候補を選択する場合の条件として、ハッシュ値のサイズが所定サイズ以上でなければならない場合、ハッシュ値サイズの下限値をオプションとして指定することができる。 The query option 6350 stores option information that can be specified in the hash algorithm query request processing. Specifically, as a condition for selecting an available hash algorithm candidate, if the size of the hash value must be greater than or equal to a predetermined size, a lower limit value of the hash value size can be specified as an option.

図１５は、各検索サーバ1100、1200、1300が、ハッシュアルゴリズムの問合せ要求元である検索サーバ1100に応答する場合に使用される、ハッシュアルゴリズム問合せ応答パラメータ6400の構成例を示す。 FIG. 15 shows a configuration example of the hash algorithm query response parameter 6400 used when each search server 1100, 1200, 1300 responds to the search server 1100 that is the hash algorithm query request source.

本パラメータ6400は、図７で説明したＳ３にて利用される。ハッシュアルゴリズム問合せ応答パラメータ6400は、応答先マシン識別情報6410、応答元マシン識別情報6420、処理種別6430、処理結果識別情報6440、相互利用可能なハッシュアルゴリズム識別情報6450、利用可能なハッシュアルゴリズム候補識別情報6460を含む。 This parameter 6400 is used in S3 described with reference to FIG. Hash algorithm query response parameter 6400 includes response destination machine identification information 6410, response source machine identification information 6420, processing type 6430, processing result identification information 6440, mutually usable hash algorithm identification information 6450, and usable hash algorithm candidate identification information. Includes 6460.

応答先マシン識別情報6410は、ハッシュアルゴリズムに関する問合せを応答すべき検索サーバ1100を識別する情報を格納する。前記同様に、検索サーバ1100のホスト名またはIPアドレス等のアクセス情報が格納される。 The response destination machine identification information 6410 stores information for identifying the search server 1100 to which an inquiry regarding the hash algorithm should be answered. Similarly to the above, access information such as the host name or IP address of the search server 1100 is stored.

応答元マシン識別情報6420は、ハッシュアルゴリズムについての問合せを受けた各検索サーバを識別する情報を格納する。前記同様に、各検索サーバのホスト名またはIPアドレス等のアクセス情報が格納される。 The response source machine identification information 6420 stores information for identifying each search server that has received an inquiry about the hash algorithm. Similarly to the above, access information such as the host name or IP address of each search server is stored.

処理種別6430は、処理の内容を識別するための情報を格納する。処理種別6430には、ハッシュアルゴリズムの問合せに対する応答であることを示す情報が格納される。処理結果識別情報6440には、ハッシュアルゴリズムの問合せについての処理結果を示す情報が格納される。具体的には、処理結果識別情報6440には、問合わせ処理が成功したのか、あるいは失敗したのかという情報が格納される。 The process type 6430 stores information for identifying the contents of the process. The processing type 6430 stores information indicating that it is a response to the hash algorithm query. The processing result identification information 6440 stores information indicating the processing result for the hash algorithm query. Specifically, the processing result identification information 6440 stores information indicating whether the inquiry processing has succeeded or failed.

相互利用可能なハッシュアルゴリズム識別情報6450には、利用可能なハッシュアルゴリズム候補識別情報6340に含まれる複数のハッシュアルゴリズムのうち、問合せを受けた検索サーバにおいても利用可能なハッシュアルゴリズムを特定する情報が格納される。 The mutually usable hash algorithm identification information 6450 stores information for identifying a hash algorithm that can be used also by the search server that has received the query, among a plurality of hash algorithms included in the usable hash algorithm candidate identification information 6340. Is done.

相互利用可能なハッシュアルゴリズム識別情報6450に格納されたハッシュアルゴリズムは、問合せ元検索サーバと問合せ先検索サーバの両方で、利用可能であるため、統合結果の重複検出に利用し得る候補の一つとなる。各検索サーバから返信された、相互利用可能なハッシュアルゴリズムのうち、全ての検索サーバに共通するハッシュアルゴリズムを、統合検索結果から重複を排除するためのハッシュアルゴリズムとして選択できる。 Since the hash algorithm stored in the mutually usable hash algorithm identification information 6450 can be used by both the query source search server and the query destination search server, it is one of the candidates that can be used to detect duplication of integrated results. . A hash algorithm common to all search servers among the mutually usable hash algorithms returned from each search server can be selected as a hash algorithm for eliminating duplication from the integrated search result.

利用可能なハッシュアルゴリズム候補識別情報6460には、ハッシュアルゴリズムの問合せを受けた検索サーバにおいて、他に利用可能なハッシュアルゴリズムがある場合、そのハッシュアルゴリズムを識別するための情報が格納される。統合検索に参加する検索サーバが、統合検索を指揮する検索サーバ1100で利用可能なハッシュアルゴリズム（図１４の欄6340に登録されるハッシュアルゴリズム）以外のハッシュアルゴリズムを利用可能な場合に、そのハッシュアルゴリズムが欄6460に登録される。 The usable hash algorithm candidate identification information 6460 stores information for identifying the hash algorithm when there is another hash algorithm that can be used in the search server that has received the hash algorithm query. If the search server participating in the federated search can use a hash algorithm other than the hash algorithm that can be used by the search server 1100 that conducts the federated search (the hash algorithm registered in the column 6340 in FIG. 14), the hash algorithm Is registered in column 6460.

なお、この利用可能なハッシュアルゴリズム候補識別情報6460には、相互利用可能なハッシュアルゴリズム識別情報6450に格納されたハッシュアルゴリズムの識別情報は格納されない。 The hash algorithm candidate identification information 6460 that can be used does not store the hash algorithm identification information stored in the mutually usable hash algorithm identification information 6450.

図１６は、統合検索要求を受けた検索サーバ1100が検索サーバ1100、1200、1300に検索要求を発行する際に指定する、検索要求パラメータ6500の構成例を示す。本パラメータ6500は、図７で説明したＳ４で利用される。検索要求パラメータ6500は、要求先マシン識別情報6510、要求元マシン識別情報6520、処理種別6530、検索キーワード6540、検索オプション6550を含む。 FIG. 16 shows a configuration example of a search request parameter 6500 that is specified when the search server 1100 that has received the integrated search request issues a search request to the search servers 1100, 1200, and 1300. This parameter 6500 is used in S4 described with reference to FIG. Search request parameter 6500 includes request destination machine identification information 6510, request source machine identification information 6520, process type 6530, search keyword 6540, and search option 6550.

要求先マシン識別情報6510は、検索要求の送信先となる検索サーバを識別する情報（ホスト名またはIPアドレス）を格納する。要求元マシン識別情報6520は、検索要求を発行する検索サーバ1100を識別する情報（ホスト名またはIPアドレス）を格納する。 The request destination machine identification information 6510 stores information (host name or IP address) for identifying the search server that is the transmission destination of the search request. The request source machine identification information 6520 stores information (host name or IP address) for identifying the search server 1100 that issues the search request.

処理種別6530は、処理の内容を識別するための情報を格納する。ここでは、処理種別6530に、検索要求処理を示す情報が格納される。検索キーワード6540は、検索に使用される検索キーワードを格納する。検索オプション6550は、検索に関して指定されたオプション情報を格納する。例えば、オプション情報として、ファイル作成日時、ファイル更新日時、ファイル作成者等の条件を指定可能である。 The process type 6530 stores information for identifying the content of the process. Here, information indicating search request processing is stored in the processing type 6530. Search keyword 6540 stores a search keyword used for the search. The search option 6550 stores option information designated for the search. For example, conditions such as file creation date / time, file update date / time, and file creator can be designated as option information.

さらに、検索オプション6550は、利用ハッシュアルゴリズム識別情報6551を含む。利用ハッシュアルゴリズム識別情報6551には、ハッシュアルゴリズム問合せ処理により、関係する各検索サーバ間で統一したハッシュアルゴリズムが決定された場合、その決定されたハッシュアルゴリズム（共通ハッシュアルゴリズム）の識別情報を格納する。
利用ハッシュアルゴリズム識別情報6551で指定されたハッシュアルゴリズムを利用して、各検索サーバはハッシュ値を作成し、応答する。また、統合検索要求を受けた検索サーバ1100は、共通のハッシュアルゴリズムで作成されたハッシュ値に基づいて、統合検索結果から重複したエントリを検出し排除する。 Further, the search option 6550 includes usage hash algorithm identification information 6551. The hash algorithm identification information 6551 stores identification information of the determined hash algorithm (common hash algorithm) when a hash algorithm that is unified among the related search servers is determined by hash algorithm query processing.
Using the hash algorithm specified by the used hash algorithm identification information 6551, each search server creates a hash value and responds. Further, the search server 1100 that has received the integrated search request detects and eliminates duplicate entries from the integrated search result based on the hash value created by the common hash algorithm.

図１７は、検索サーバ1100、1200、1300が、統合検索を行う検索サーバ1100に、検索結果を応答する際に指定する検索結果応答パラメータ6600の構成例を示す。本パラメータ6600は、図７で説明したＳ７にて利用される。検索結果応答パラメータ6600は、応答先マシン識別情報6610、応答元マシン識別情報6620、処理種別6630、処理結果識別情報6640、総件数6650、応答件数6660、先頭ランク6670、検索結果6680、追加応答要求に必要な情報6690を含む。 FIG. 17 shows a configuration example of a search result response parameter 6600 that is specified when the search servers 1100, 1200, and 1300 return search results to the search server 1100 that performs integrated search. This parameter 6600 is used in S7 described with reference to FIG. Search result response parameter 6600 includes response destination machine identification information 6610, response source machine identification information 6620, process type 6630, process result identification information 6640, total number 6650, response number 6660, top rank 6670, search result 6680, additional response request Contains information needed for 6690.

応答先マシン識別情報6610には、検索結果の送信先となる検索サーバを識別する情報（ホスト名またはIPアドレス）を格納する。応答元マシン識別情報6620は、検索要求を受けた検索サーバを識別する情報（ホスト名またはIPアドレス）を格納する。 The response destination machine identification information 6610 stores information (host name or IP address) for identifying the search server that is the transmission destination of the search result. The response source machine identification information 6620 stores information (host name or IP address) for identifying the search server that has received the search request.

処理種別6630は、処理の内容を識別するための情報を格納する。ここでは、処理種別6630に、検索結果応答処理を示す情報が格納される。 The process type 6630 stores information for identifying the content of the process. Here, information indicating the search result response process is stored in the process type 6630.

処理結果識別情報6640は、検索の処理結果を識別する情報を格納する。具体的には、処理結果識別情報6640には、検索が成功したのか、あるいは失敗したのかを示す情報が格納される。 The processing result identification information 6640 stores information for identifying a search processing result. Specifically, the processing result identification information 6640 stores information indicating whether the search has succeeded or failed.

総件数6650は、指定された条件に合致したファイル及びデータの総件数を格納する。応答件数6660は、指定された条件に合致したファイル及びデータの中で、検索結果応答に含まれている件数を格納する。前記同様に、総件数6650が応答件数6660の上限値以下の場合は、総件数6650と応答件数6660は一致する。総件数6650が応答件数6660の上限値よりも多い場合は、応答件数6660の上限値より多い分は、検索結果応答に含まれない。 The total number 6650 stores the total number of files and data that meet the specified condition. The number of responses 6660 stores the number of cases included in the search result response among files and data that match the specified conditions. Similarly, when the total number 6650 is equal to or less than the upper limit of the response number 6660, the total number 6650 matches the response number 6660. When the total number 6650 is larger than the upper limit value of the response number 6660, the amount exceeding the upper limit value of the response number 6660 is not included in the search result response.

先頭ランク6670は、検索結果応答の中に含まれるエントリに対して、先頭エントリのランク値を格納する。前記同様に、もしも、ランク１位のエントリが先頭の場合、先頭ランク6670には１が格納される。ランク100位のエントリが先頭の場合、先頭ランク6670には100が格納される。 The head rank 6670 stores the rank value of the head entry for the entry included in the search result response. Similarly to the above, if the entry ranked first is the first, 1 is stored in the first rank 6670. When the entry of rank 100 is the first, 100 is stored in the first rank 6670.

検索結果6680には、検索処理により取得された検索結果が格納される。検索結果6680には、応答件数6680の数だけ、検索結果エントリ6681、6684が格納される。検索結果エントリ6681、6684には、統合用検索結果一時保管テーブル4400の各欄4410−4480に格納されている情報と同じ情報が格納される。 The search result 6680 stores the search result acquired by the search process. Search result entries 6680 and 6684 are stored in the search result 6680 as many as the number of responses 6680. In the search result entries 6681 and 6684, the same information as the information stored in the respective columns 4410-4480 of the search result temporary storage table 4400 for integration is stored.

さらに、検索結果エントリ6681、6684には、利用ハッシュアルゴリズム識別情報6682、6685と、ハッシュ値6683、6686とが格納される。利用ハッシュアルゴリズム識別情報6682、6685には、検索要求パラメータ6500の利用ハッシュアルゴリズム識別情報6551にて指定された情報をそのまま格納する。 Furthermore, the search result entries 6661 and 6684 store use hash algorithm identification information 6682 and 6585 and hash values 6683 and 6686, respectively. In the used hash algorithm identification information 6682 and 6585, the information designated by the used hash algorithm identification information 6551 of the search request parameter 6500 is stored as it is.

ハッシュ値6683、6686には、利用ハッシュアルゴリズム識別情報6682、6685で特定されるハッシュアルゴリズム（共通ハッシュアルゴリズム）を利用して作成されたハッシュ値が格納される。統合検索要求を受けた検索サーバ1100は、そのハッシュ値を利用して、統合検索結果の中から重複エントリを検出して排除する。 In the hash values 6683 and 6686, hash values created using the hash algorithm (common hash algorithm) specified by the used hash algorithm identification information 6682 and 6865 are stored. Upon receiving the integrated search request, the search server 1100 uses the hash value to detect and eliminate duplicate entries from the integrated search result.

追加応答要求に必要な情報6690は、総件数6650の値よりも応答件数6660の値が小さい場合に使用する。この場合、追加応答要求に必要な情報6690の欄には、検索結果応答に含まれていないファイルまたはデータの検索結果に関する情報を取得するためのリンク情報が格納される。 Information 6690 necessary for the additional response request is used when the value of the response number 6660 is smaller than the value of the total number 6650. In this case, link information for acquiring information related to the search result of the file or data not included in the search result response is stored in the column of information 6690 necessary for the additional response request.

以上、本実施例による検索システムの構成、管理情報の構成、処理パラメータの構成について詳細に説明した。以降では、本実施例による処理動作を説明する。以下のフローチャートでは、理解のためにループ等を割愛する。従って、図示される各フローチャートは、各処理の概要を示しており、実際のコンピュータプログラムとは相違する。いわゆる当業者であれば、図示されたフローチャートからステップを削除または変更したり、新たなステップをフローチャートに加えたりすることができる。そのような改変されたフローチャートも本発明の範囲に含まれる。 The search system configuration, management information configuration, and processing parameter configuration according to this embodiment have been described in detail above. Hereinafter, the processing operation according to this embodiment will be described. In the following flowchart, loops are omitted for the sake of understanding. Therefore, each flowchart shown in the figure shows an outline of each process, and is different from an actual computer program. A so-called person skilled in the art can delete or change a step from the illustrated flowchart or add a new step to the flowchart. Such a modified flowchart is also included in the scope of the present invention.

図１８のフローチャートは、いずれかのクライアントマシンで実行される統合検索要求処理を示している。始めに、クライアントマシンは、統合検索サービスを提供する「統合検索サーバ」としての検索サーバ1100に、検索キーワードを指定して統合検索処理を要求する（S101）。統合検索を要求する場合は、統合検索要求パラメータ6100を指定する。クライアントマシンは、統合検索処理を行う検索サーバ1100から統合検索の結果を受信した後、その統合検索結果をユーザに提供し（S102）、本処理を終了する。なお、統合検索結果の応答を検索サーバ1100から取得する場合には、統合検索結果応答パラメータ6200が使用される。 The flowchart in FIG. 18 shows the integrated search request process executed on any client machine. First, the client machine requests an integrated search process by specifying a search keyword to the search server 1100 as an “integrated search server” that provides the integrated search service (S101). When requesting an integrated search, an integrated search request parameter 6100 is designated. After receiving the result of the integrated search from the search server 1100 that performs the integrated search process, the client machine provides the integrated search result to the user (S102), and ends this process. Note that when a response of the integrated search result is acquired from the search server 1100, the integrated search result response parameter 6200 is used.

図１９及び図２０は、検索サーバ1100で実行される統合検索処理のフローチャートを示している。始めに、検索サーバ1100は、クライアントマシンから受信した統合検索要求パラメータ6100の処理種別6130に基づいて、統合検索要求が指定されているか否かを判定する（S201）。もし、統合検索要求が指定されていない場合（S201：NO）、エラー終了する（S202）。 19 and 20 show a flowchart of the integrated search process executed by the search server 1100. FIG. First, the search server 1100 determines whether or not an integrated search request is specified based on the processing type 6130 of the integrated search request parameter 6100 received from the client machine (S201). If the integrated search request is not specified (S201: NO), the process ends with an error (S202).

統合検索要求が指定されている場合（S201：YES）、検索サーバ1100は、検索サーバ1100にて利用可能なハッシュアルゴリズムを特定する（S203）。具体的には、検索サーバ1100にて管理されている検索インデックス登録ファイル管理テーブル4100内のハッシュアルゴリズム4151、4153を調べることで、検索サーバ1100で利用可能なハッシュアルゴリズムを特定できる。 When the integrated search request is specified (S201: YES), the search server 1100 specifies a hash algorithm that can be used by the search server 1100 (S203). Specifically, by examining the hash algorithms 4151 and 4153 in the search index registration file management table 4100 managed by the search server 1100, a hash algorithm that can be used by the search server 1100 can be specified.

検索サーバ1100は、検索サーバ管理テーブル4300に登録されている各検索サーバに、各検索サーバで利用可能なハッシュアルゴリズムを問い合わせる（S204）。検索サーバ1100は、その問い合わせに際して、ハッシュアルゴリズム問合せ要求パラメータ6300を指定する。 The search server 1100 inquires each search server registered in the search server management table 4300 about a hash algorithm that can be used by each search server (S204). The search server 1100 designates the hash algorithm query request parameter 6300 at the time of the query.

検索サーバ1100は、各検索サーバから、ハッシュアルゴリズム問合せ応答パラメータ6400に含まれる情報を取得する。検索サーバ1100は、統一されたハッシュアルゴリズムを利用可能か否かを判断する（S205）。検索サーバ1100は、各検索サーバからの応答に基づいて、統合検索に参加する全検索サーバの中で、統一して利用可能なハッシュアルゴリズムが存在するか否かを判定する。 The search server 1100 acquires information included in the hash algorithm query response parameter 6400 from each search server. The search server 1100 determines whether or not a unified hash algorithm can be used (S205). Based on the response from each search server, the search server 1100 determines whether there is a hash algorithm that can be used in a unified manner among all search servers participating in the integrated search.

もし、統一したハッシュアルゴリズムを利用可能な場合（S205：YES）、検索サーバ1100は、統合検索に参加する各検索サーバに、使用すべきハッシュアルゴリズムを指定して検索を要求する（S206）。検索を要求する場合、検索サーバ1100は、検索要求パラメータ6500を指定する。検索サーバ1100は、各検索サーバから検索結果応答パラメータ6600に含まれる情報をそれぞれ取得する。 If a unified hash algorithm can be used (S205: YES), the search server 1100 requests each search server participating in the integrated search by designating the hash algorithm to be used (S206). When requesting a search, the search server 1100 specifies a search request parameter 6500. The search server 1100 acquires information included in the search result response parameter 6600 from each search server.

統合検索に参加する各検索サーバ間で統一したハッシュアルゴリズムを利用できない場合（S205：NO）、検索サーバ1100は、各検索サーバに対して、ハッシュアルゴリズムを指定せずに検索を要求する（S207）。検索サーバ1100は、検索を要求する場合、検索要求パラメータ6500を指定する。検索サーバ1100は、各検索サーバから検索結果応答パラメータ6600に含まれる情報をそれぞれ取得する。 When the hash algorithm unified between the search servers participating in the federated search cannot be used (S205: NO), the search server 1100 requests the search without specifying the hash algorithm (S207). . The search server 1100 specifies a search request parameter 6500 when requesting a search. The search server 1100 acquires information included in the search result response parameter 6600 from each search server.

図２０に移る。検索結果を取得した後、検索サーバ1100は、取得した各検索結果を統合検索結果一時保管テーブル4400に格納する（S208）。検索サーバ1100は、ハッシュ値を利用して統合検索結果から重複エントリを排除可能か否かを判定する（S209）。 Turning to FIG. After acquiring the search results, the search server 1100 stores the acquired search results in the integrated search result temporary storage table 4400 (S208). The search server 1100 determines whether duplicate entries can be excluded from the integrated search result using the hash value (S209).

統合検索結果から重複エントリを排除できない場合（S209：NO）、S210をスキップしてS211に移る。統合検索結果から重複エントリを排除できる場合（S209：YES）、検索サーバ1100は、統一されたハッシュアルゴリズムにより算出されるハッシュ値を利用して、統合検索結果から重複エントリを検出し、排除する（S210）。 If the duplicate entry cannot be excluded from the integrated search result (S209: NO), S210 is skipped and the process proceeds to S211. When the duplicate entry can be eliminated from the integrated search result (S209: YES), the search server 1100 uses the hash value calculated by the unified hash algorithm to detect and eliminate the duplicate entry from the integrated search result ( S210).

検索サーバ1100は、統合検索結果一時保管テーブル4400に登録されている情報を利用して、検索結果をスコア値等に従って整列させ、統合検索の要求元に統合検索結果として提供するためのエントリを抽出する（S211）。 The search server 1100 uses the information registered in the integrated search result temporary storage table 4400, arranges the search results according to the score values, etc., and extracts entries to be provided to the integrated search request source as the integrated search results (S211).

具体的には、検索サーバ1100は、統合用検索結果一時保管テーブル4400に登録されているスコア値4440と、検索サーバ管理テーブル4300に登録されている重み付け係数4340の値とを利用して、統合スコア値を算出する。検索サーバ1100は、その統合スコア値を利用して、統合検索結果エントリを整列させる。 Specifically, the search server 1100 integrates using the score value 4440 registered in the search result temporary storage table 4400 for integration and the value of the weighting coefficient 4340 registered in the search server management table 4300. A score value is calculated. The search server 1100 uses the integrated score value to align the integrated search result entries.

最後に、検索サーバ1100は、統合検索の要求元であるクライアントマシンに、統合検索結果を応答する（S212）。検索サーバ1100は、統合検索結果応答パラメータ6200を指定することにより、統合検索結果をクライアントマシンに応答する。 Finally, the search server 1100 returns the integrated search result to the client machine that is the request source of the integrated search (S212). The search server 1100 responds to the client machine with the integrated search result by specifying the integrated search result response parameter 6200.

図２１は、統合検索に参加する各検索サーバで実行される、ハッシュアルゴリズムの問合せに対する応答処理のフローチャートである。本処理は、「所定の検索サーバ」としての各検索サーバ1100、1200、1300でそれぞれ実施される。便宜上、以下、検索サーバ1200を例に挙げて説明する。 FIG. 21 is a flowchart of a response process for a hash algorithm query executed by each search server participating in the integrated search. This process is performed by each of the search servers 1100, 1200, and 1300 as “predetermined search servers”. For convenience, the search server 1200 will be described below as an example.

始めに、検索サーバ1200は、ハッシュアルゴリズム問合せ要求パラメータ6300で指定された処理種別6330に基づいて、「ハッシュアルゴリズム問合せ要求」が指定されているか否かを判断する（S301）。もし、ハッシュアルゴリズムの問合わせ要求が指定されていない場合（S301：NO）、本処理はエラー終了する（S302）。 First, the search server 1200 determines whether “hash algorithm query request” is specified based on the processing type 6330 specified by the hash algorithm query request parameter 6300 (S301). If the hash algorithm inquiry request is not specified (S301: NO), the process ends in error (S302).

ハッシュアルゴリズムの問い合わせ要求が指定されている場合（S301：YES）、検索サーバ1200は、検索サーバ1200で利用可能なハッシュアルゴリズムを特定する（S303）。S303の「自装置」とは、ここでは、検索サーバ1200である。検索サーバ1200は、検索サーバ1200で管理されている検索インデックス登録ファイル管理テーブル4100のハッシュアルゴリズム4151、4153を調べることにより、検索サーバ1200で利用可能なハッシュアルゴリズムを特定する。 When a hash algorithm inquiry request is specified (S301: YES), the search server 1200 identifies a hash algorithm that can be used by the search server 1200 (S303). In this case, the “own device” in S303 is the search server 1200. The search server 1200 identifies the hash algorithms that can be used by the search server 1200 by examining the hash algorithms 4151 and 4153 of the search index registration file management table 4100 managed by the search server 1200.

検索サーバ1200は、問合せ元の検索サーバ1100にて利用可能なハッシュアルゴリズムのうち、検索サーバ1200でも利用可能なハッシュアルゴリズムが有るか否かを判定する（S304）。検索サーバ1200は、ハッシュアルゴリズム問合せ要求パラメータ6300内の利用可能なハッシュアルゴリズム候補識別情報6340で指定されているハッシュアルゴリズムと、検索サーバ1200で利用可能なハッシュアルゴリズム（S303）とを比較し、両者に共通するハッシュアルゴリズムが存在するか否かを調べる。 The search server 1200 determines whether there is a hash algorithm that can be used by the search server 1200 among hash algorithms that can be used by the search server 1100 that is the inquiry source (S304). The search server 1200 compares the hash algorithm specified in the hash algorithm candidate identification information 6340 that can be used in the hash algorithm query request parameter 6300 with the hash algorithm (S303) that can be used in the search server 1200. Check whether there is a common hash algorithm.

もし、共通のハッシュアルゴリズムが存在する場合（S304：YES）、検索サーバ1200は、そのハッシュアルゴリズムの識別情報を、ハッシュアルゴリズム問合せ応答パラメータ6400内の相互利用可能なハッシュアルゴリズム識別情報6450に登録する（S305）。 If there is a common hash algorithm (S304: YES), the search server 1200 registers the identification information of the hash algorithm in the mutually usable hash algorithm identification information 6450 in the hash algorithm query response parameter 6400 ( S305).

問い合わせ元の検索サーバ1100で利用可能なハッシュアルゴリズムと、問合せ先の検索サーバ1200で利用可能なハッシュアルゴリズムとの間に共通するハッシュアルゴリズムが存在しない場合（S304：NO）、S305をスキップしてS306に移る。 If there is no common hash algorithm between the hash algorithm that can be used in the query server 1100 and the query algorithm that can be used in the query server 1200 (S304: NO), S305 is skipped and S306 is skipped. Move on.

検索サーバ1200は、S304で発見された相互利用可能なハッシュアルゴリズム以外の、検索サーバ1200で利用可能な他のハッシュアルゴリズムが存在するか否かを判定する（S306）。検索サーバ1200は、処理S303において、検索サーバ1200で利用可能なハッシュアルゴリズムであると特定されたハッシュアルゴリズムの中で、処理S305の登録対象にならなかった他のハッシュアルゴリズムが存在するか否かを調べる。 The search server 1200 determines whether there is another hash algorithm that can be used by the search server 1200 other than the mutually usable hash algorithm discovered in S304 (S306). The search server 1200 determines whether or not there is another hash algorithm that has not been registered in the process S305 among the hash algorithms identified as the hash algorithm that can be used by the search server 1200 in the process S303. Investigate.

もし、他のハッシュアルゴリズムが存在する場合（S306：YES）、検索サーバ1200は、そのハッシュアルゴリズムの識別情報を、ハッシュアルゴリズム問合せ応答パラメータ6400内の利用可能なハッシュアルゴリズム候補識別情報6460に登録する（S307）。もし、他のハッシュアルゴリズムが存在しない場合（S306：YES）、S307をスキップしてS308に移る。 If another hash algorithm exists (S306: YES), the search server 1200 registers the identification information of the hash algorithm in the available hash algorithm candidate identification information 6460 in the hash algorithm query response parameter 6400 ( S307). If there is no other hash algorithm (S306: YES), S307 is skipped and the process proceeds to S308.

検索サーバ1200は、ハッシュアルゴリズムの問合せ結果を、問合せ元の検索サーバ1100に応答する（S308）。検索サーバ1200は、ハッシュアルゴリズム問合せ応答パラメータ6400を指定して、問合せ結果を応答する。 The search server 1200 returns the query result of the hash algorithm to the search server 1100 that is the query source (S308). The search server 1200 specifies the hash algorithm query response parameter 6400 and responds with the query result.

図２２は、各検索サーバで実行される検索応答処理のフローチャートを示す。本処理は図２１で述べた処理と同様に、各検索サーバ1100、1200、1300でそれぞれ実施される。ここでは便宜上、検索サーバ1200を例に挙げて説明する。 FIG. 22 shows a flowchart of search response processing executed in each search server. This process is performed by each of the search servers 1100, 1200, and 1300, similarly to the process described in FIG. Here, for convenience, the search server 1200 will be described as an example.

始めに、検索サーバ1200は、検索要求パラメータ6500の中で指定された処理種別6530を調べ、「検索要求」が指定されているか否かを判定する（S401）。もし、指定されていない場合（S401：NO）、本処理はエラー終了する（S402）。指定されている場合（S401：YES）、検索サーバ1200は、指定された検索キーワードで検索処理を実行し、その検索結果を取得する（S403）。検索サーバ1200は、検索要求パラメータ6500内の検索キーワード6540及び検索オプション6550を利用して、検索処理を行う。 First, the search server 1200 checks the processing type 6530 specified in the search request parameter 6500 and determines whether “search request” is specified (S401). If not specified (S401: NO), this process ends in error (S402). If it is specified (S401: YES), the search server 1200 executes a search process with the specified search keyword and acquires the search result (S403). The search server 1200 uses the search keyword 6540 and the search option 6550 in the search request parameter 6500 to perform a search process.

検索サーバ1200は、検索要求パラメータ6500の検索オプション6550内で、利用可能ハッシュアルゴリズム識別情報6551が指定されているかどうかを調べる（S404）。もし、指定されていない場合（S404：NO）、S405をスキップしてS406に移る。 The search server 1200 checks whether the usable hash algorithm identification information 6551 is specified in the search option 6550 of the search request parameter 6500 (S404). If not specified (S404: NO), S405 is skipped and the process proceeds to S406.

指定されている場合（S404：YES）、検索サーバ1200は、取得した検索結果の各エントリに、各エントリに含まれるファイルのハッシュ値とハッシュ値の生成に利用したハッシュアルゴリズム識別情報とを、追加登録する（S405）。検索サーバ1200は、検索インデックス登録ファイル管理テーブル4100に登録されているファイルのハッシュアルゴリズム4150に格納されている情報に基づいて、ハッシュ値及びハッシュアルゴリズム識別情報を取得する。 If specified (S404: YES), the search server 1200 adds the hash value of the file included in each entry and the hash algorithm identification information used to generate the hash value to each entry of the acquired search results. Register (S405). The search server 1200 acquires a hash value and hash algorithm identification information based on information stored in the hash algorithm 4150 of the file registered in the search index registration file management table 4100.

検索サーバ1200は、検索結果を要求元の検索サーバ1100に応答する（S406）。検索サーバ1200は、検索結果応答パラメータ6600を指定して、検索結果を応答する。 The search server 1200 returns the search result to the search server 1100 as the request source (S406). The search server 1200 specifies the search result response parameter 6600 and responds with the search result.

図２３は、検索インデックス更新処理のフローチャートを示す。本処理は、各検索サーバ1100、1200、1300でそれぞれ実施される。便宜上、以下、検索サーバ1200を例に挙げて説明する。 FIG. 23 shows a flowchart of search index update processing. This processing is performed by each search server 1100, 1200, 1300, respectively. For convenience, the search server 1200 will be described below as an example.

始めに、検索サーバ1200は、検索サーバ1200で利用可能なハッシュアルゴリズムを特定する（S501）。S501における「自装置」とは、ここでは検索サーバ1200である。検索サーバ1200は、検索サーバ1200で管理されている検索インデックス登録ファイル管理テーブル4100内のハッシュアルゴリズム4151、4153を調べることにより、検索サーバ1200で利用可能なハッシュアルゴリズムを特定する。 First, the search server 1200 specifies a hash algorithm that can be used by the search server 1200 (S501). The “own device” in S501 is the search server 1200 here. The search server 1200 identifies hash algorithms that can be used by the search server 1200 by examining the hash algorithms 4151 and 4153 in the search index registration file management table 4100 managed by the search server 1200.

検索サーバ1200は、検索インデックスの更新対象であるファイルサーバと、更新対象のルートディレクトリとを特定する（S502）。次に、検索サーバ1200は、検索インデックス更新対象ファイルを全てクローリングし、インデクシングが完了したか否かを判定する（S503）。全てのファイルについてクローリング等が終了している場合（S503：YES）、本処理を終了する。 The search server 1200 identifies the file server that is the update target of the search index and the root directory that is the update target (S502). Next, the search server 1200 crawls all search index update target files and determines whether or not the indexing is completed (S503). When crawling or the like has been completed for all files (S503: YES), this process ends.

全ファイルについてのクローリング及びインデクシング処理が完了していない場合（S503：NO）、検索サーバ1200は、クローリング対象ファイルが格納されているファイルサーバにアクセスし、検索インデックスの更新対象範囲に格納されている任意のファイルを一つ取得する（S504）。 When the crawling and indexing processes for all files have not been completed (S503: NO), the search server 1200 accesses the file server in which the crawling target file is stored, and is stored in the search index update target range. One arbitrary file is acquired (S504).

検索サーバ1200は、S504で取得されたファイルに関する情報を検索インデックスに新規登録する必要があるのか、または、S504で取得されたファイルに関する情報を検索インデックス上で更新させる必要があるかを判定する（S505）。 The search server 1200 determines whether information related to the file acquired in S504 needs to be newly registered in the search index or whether information related to the file acquired in S504 needs to be updated on the search index ( S505).

具体的には、検索サーバ1200は、取得したファイルが前回の検索インデックス更新処理時から更新されているか否か、あるいは、取得したファイルが前回の検索インデックス更新処理時の後に新しく記憶されたファイルであるか否か、といった観点を調べる。新規登録または更新が不要な場合（S505：NO）、S503に戻る。新規登録または更新が必要な場合（S505：YES）、図２４に示すS506に移る。検索サーバ1200は、S504で取得したファイル（対象ファイル）の情報を、検索インデックスに新規登録するのか、または、検索インデックスに登録済みの情報を更新するのかを判断する（S506）。 Specifically, the search server 1200 determines whether the acquired file has been updated since the previous search index update process or whether the acquired file has been newly stored after the previous search index update process. Examine the viewpoint of whether or not there is. If new registration or update is not required (S505: NO), the process returns to S503. When new registration or update is necessary (S505: YES), the process proceeds to S506 shown in FIG. The search server 1200 determines whether the information of the file (target file) acquired in S504 is newly registered in the search index, or information already registered in the search index is updated (S506).

新規登録を行うと判断した場合、検索サーバ1200は、検索インデックス登録ファイル管理テーブル4100に、対象ファイルのエントリを新たに作成し、対象ファイルの情報を登録する（S507）。 If it is determined that new registration is to be performed, the search server 1200 newly creates an entry for the target file in the search index registration file management table 4100 and registers information about the target file (S507).

更新を行うと判断した場合、検索サーバ1200は、検索インデックス登録ファイル管理テーブル4100に記憶されている対象ファイルのエントリを特定し、必要な情報を更新する（S508）。検索サーバ1200は、対象ファイルを解析して、検索インデックス管理テーブル4200に、検索インデックス情報を登録する（S509）。検索サーバ1200は、利用可能なハッシュアルゴリズムが存在するかどうかを確認する（S510）。検索サーバ1200は、S501の特定結果に基づいて、検索サーバ1200で利用可能なハッシュアルゴリズムが一つ以上存在するかどうかを判断する。利用可能なハッシュアルゴリズムが一つも存在しない場合（S510：NO）、S503に戻る。 When it is determined that the update is to be performed, the search server 1200 identifies the entry of the target file stored in the search index registration file management table 4100, and updates necessary information (S508). The search server 1200 analyzes the target file and registers the search index information in the search index management table 4200 (S509). The search server 1200 confirms whether there is an available hash algorithm (S510). The search server 1200 determines whether one or more hash algorithms that can be used by the search server 1200 exist based on the identification result of S501. If there is no available hash algorithm (S510: NO), the process returns to S503.

利用可能なハッシュアルゴリズムが存在する場合（S510：YES）、検索サーバ1200は、利用可能な全てのハッシュアルゴリズムを用いて、対象ファイルのデータから、それぞれのハッシュ値を生成し、生成された各ハッシュ値を検索インデックス登録ファイル管理テーブル4100に登録する（S511）。 If there is an available hash algorithm (S510: YES), the search server 1200 generates each hash value from the data of the target file using all the available hash algorithms, and generates each hash. The value is registered in the search index registration file management table 4100 (S511).

利用可能なハッシュアルゴリズムが複数ある場合、検索サーバ1200は、全てのハッシュアルゴリズムそれぞれに対応するハッシュ値を生成して、検索インデックス登録ファイル管理テーブル4100に登録する。 When there are a plurality of hash algorithms that can be used, the search server 1200 generates hash values corresponding to all the hash algorithms and registers them in the search index registration file management table 4100.

上述の通り構成される本実施例では、検索アルゴリズムまたは／及び検索インデックスの更新タイミング等がそれぞれ異なる複数の検索サーバを疎に結合したシステムにおいて、統合検索を行う場合に、各検索サーバで共通に使用されるハッシュアルゴリズムを決定する。 In the present embodiment configured as described above, when performing an integrated search in a system in which a plurality of search servers having different search algorithms or / and search index update timings, etc. are sparsely combined, Determine the hash algorithm to be used.

従って、本実施例では、同一の検索条件に関する各検索サーバからの検索結果を統合してなる統合検索結果の中から、重複したエントリを検出して排除することができる。これにより、ユーザは、複数の検索サーバに跨る統合検索結果を無駄なく入手することができる。ユーザは、重複エントリの取り除かれた統合結果を用いて、目的のファイルを比較的簡単に発見することができ、ユーザの使い勝手が向上する。 Therefore, in this embodiment, duplicate entries can be detected and eliminated from the integrated search results obtained by integrating the search results from the search servers related to the same search condition. Thereby, the user can obtain an integrated search result across a plurality of search servers without waste. The user can find the target file relatively easily by using the integration result from which duplicate entries are removed, and the usability of the user is improved.

本実施例では、疎結合した複数の検索サーバ1100、1200、1300のうち、統合検索要求を受け付けた検索サーバ1100が、各検索サーバ1100、1200、1300と交渉することにより、各検索サーバ1100、1200、1300間で共通して利用するハッシュアルゴリズムを取り決め、実際の検索は各検索サーバ1100、1200、1300でそれぞれ行われる。さらに、本実施例では、各検索サーバ1100、1200、1300は、取り決められたハッシュアルゴリズムを用いてハッシュ値を作成し、統合検索要求を受け付けた検索サーバ1100は、ハッシュ値を用いて統合検索結果の中から重複したエントリを検出し、それを取り除く。本実施例では、ハッシュ値の作成と、ハッシュ値を利用した重複の検出及び排除とが区別されている。これにより、本実施例では、疎結合された複数の検索サーバ間で役割を分担することができる。 In this embodiment, among the plurality of search servers 1100, 1200, and 1300 that are loosely coupled, the search server 1100 that has received the integrated search request negotiates with each of the search servers 1100, 1200, and 1300, whereby each search server 1100, The hash algorithm used in common between 1200 and 1300 is decided, and the actual search is performed by each search server 1100, 1200 and 1300, respectively. Furthermore, in the present embodiment, each search server 1100, 1200, 1300 creates a hash value using the determined hash algorithm, and the search server 1100 that receives the integrated search request uses the hash value to perform the integrated search result. Detect duplicate entries from and remove them. In this embodiment, the creation of a hash value is distinguished from the detection and elimination of duplication using the hash value. Thereby, in a present Example, a role can be divided among several loosely coupled search servers.

図２５，図２６を参照して第２実施例を説明する。本実施例を含む以下の各実施例は、第１実施例の変形例に相当する。従って、以下の各実施例では、第１実施例との相違点を中心に説明する。 A second embodiment will be described with reference to FIGS. Each of the following embodiments including this embodiment corresponds to a modification of the first embodiment. Accordingly, the following embodiments will be described with a focus on differences from the first embodiment.

上述した第１実施例は、統合検索要求を受けた検索サーバ1100と、他の検索サーバ1100、1200、1300との間で、統合検索処理を行うたびに、統合検索結果の中から重複エントリを排除するために利用するハッシュアルゴリズムについて折衝する。 In the first embodiment described above, every time an integrated search process is performed between the search server 1100 that has received the integrated search request and the other search servers 1100, 1200, and 1300, duplicate entries are extracted from the integrated search results. Negotiate a hash algorithm to use to eliminate.

しかし、各検索サーバ1100、1200、1300間で共通に利用されるハッシュアルゴリズムは、そう頻繁に変更されるものではない。通常の場合、一度決定された後は、同じハッシュアルゴリズムが比較的長期間使用されると考えられる。 However, the hash algorithm that is commonly used among the search servers 1100, 1200, and 1300 is not frequently changed. In the normal case, once determined, the same hash algorithm is considered to be used for a relatively long time.

そこで、本実施例では、最初に取得したハッシュアルゴリズムの情報を、統合検索要求を受け付ける検索サーバ1100内にキャッシュとして保管しておく。それ以後、統合検索要求が発行された場合は、キャッシュされたハッシュアルゴリズムの情報を用いて、共通ハッシュアルゴリズムを決定し、統合検索処理を実行する。従って、本実施例では、統合検索要求を受け付けるたびに各検索サーバ間でハッシュアルゴリズムについて折衝する必要がなく、統合検索開始時のオーバヘッドを低減することができる。 Therefore, in this embodiment, the hash algorithm information acquired first is stored as a cache in the search server 1100 that accepts the integrated search request. Thereafter, when an integrated search request is issued, a common hash algorithm is determined using cached hash algorithm information, and an integrated search process is executed. Therefore, in this embodiment, it is not necessary to negotiate the hash algorithm between each search server every time an integrated search request is received, and the overhead at the start of the integrated search can be reduced.

各検索サーバ1100、1200、1300からそれぞれ取得される一つまたは複数のハッシュアルゴリズムを検索サーバ1100内にキャッシュするためには、検索サーバ1100の有する検索サーバ管理テーブル4300の構成を変更し、さらに、検索サーバ1100で実行される統合検索処理の一部を変更する必要がある。 In order to cache in the search server 1100 one or more hash algorithms respectively acquired from each search server 1100, 1200, 1300, the configuration of the search server management table 4300 that the search server 1100 has, It is necessary to change a part of the integrated search process executed by the search server 1100.

図２５は、本実施例で使用される検索サーバ管理テーブル4300の構成例である。検索サーバ管理テーブル4300には、図１０で述べた各欄4310、4320、4330、4340のほかに、利用可能なハッシュアルゴリズムの識別情報4350を管理する欄が追加されている。 FIG. 25 is a configuration example of the search server management table 4300 used in this embodiment. In the search server management table 4300, in addition to the columns 4310, 4320, 4330, and 4340 described in FIG. 10, a column for managing identification information 4350 of the available hash algorithm is added.

利用ハッシュアルゴリズム識別情報4350には、統合検索に参加する各検索サーバ1100、1200、1300がそれぞれ利用可能なハッシュアルゴリズムを識別するための情報が格納される。一つの検索サーバについて、複数のハッシュアルゴリズム識別情報を格納することもできる。例えば、図２５において、検索サーバID4310が1番のエントリには、利用ハッシュアルゴリズム識別情報4350として、SHA-1及びSHA-2が格納されている。統合検索処理において、各検索サーバから取得された利用可能ハッシュアルゴリズムの情報に基づいて、テーブル4300に利用可能なハッシュアルゴリズムの識別情報が登録される。 The use hash algorithm identification information 4350 stores information for identifying hash algorithms that can be used by the search servers 1100, 1200, and 1300 participating in the integrated search. A plurality of hash algorithm identification information can be stored for one search server. For example, in FIG. 25, SHA-1 and SHA-2 are stored as the use hash algorithm identification information 4350 in the entry with the search server ID 4310 as the first. In the integrated search process, the identification information of the hash algorithm that can be used is registered in the table 4300 based on the information of the available hash algorithm acquired from each search server.

図２６は、検索サーバ1100で実行される統合検索処理の変更内容を示す。本処理と、図１９に示す統合検索処理とは、以下の点で異なる。第一の相違点は、S203の後、検索サーバ1100は、検索サーバ管理テーブル4300内に、利用可能なハッシュアルゴリズム情報が存在するか否かを判定する点である（S213）。検索サーバ1100は、検索サーバ管理テーブル4300内の利用ハッシュアルゴリズム識別情報4350のエントリに、ハッシュアルゴリズム識別情報が登録されているか否かを確認する。 FIG. 26 shows the change contents of the integrated search process executed by the search server 1100. This process differs from the integrated search process shown in FIG. 19 in the following points. The first difference is that after S203, the search server 1100 determines whether there is available hash algorithm information in the search server management table 4300 (S213). The search server 1100 confirms whether or not the hash algorithm identification information is registered in the entry of the used hash algorithm identification information 4350 in the search server management table 4300.

利用可能なハッシュアルゴリズム情報が検索サーバ管理テーブル4300に記憶されている場合（S213：YES）、検索サーバ1100は、ハッシュアルゴリズムの折衝処理S204を省略して、S205に移る。利用可能なハッシュアルゴリズム情報が検索サーバ管理テーブル4300に記憶されていない場合（S213：NO）、検索サーバ1100は、第１実施例のように、ハッシュアルゴリズムの折衝処理を行うべく、S204に移る。 When the usable hash algorithm information is stored in the search server management table 4300 (S213: YES), the search server 1100 skips the hash algorithm negotiation process S204, and proceeds to S205. If the usable hash algorithm information is not stored in the search server management table 4300 (S213: NO), the search server 1100 proceeds to S204 to perform a hash algorithm negotiation process as in the first embodiment.

第二の相違点は、S204の後、検索サーバ1100は、S204にて取得したハッシュアルゴリズム識別情報を検索サーバ管理テーブル4300に登録する点である（S214）。具体的に、検索サーバ1100は、利用ハッシュアルゴリズム識別情報4350の欄に、他の検索サーバ1100、1200、1300からそれぞれ取得したハッシュアルゴリズム識別情報を格納する。一つの検索サーバについて複数のハッシュアルゴリズム識別情報が取得された場合は、全てのハッシュアルゴリズム識別情報を検索サーバ管理テーブル4300に登録する。S214の後はS205に移る。 The second difference is that after S204, the search server 1100 registers the hash algorithm identification information acquired in S204 in the search server management table 4300 (S214). Specifically, the search server 1100 stores the hash algorithm identification information acquired from each of the other search servers 1100, 1200, and 1300 in the column of the used hash algorithm identification information 4350. When a plurality of hash algorithm identification information is acquired for one search server, all the hash algorithm identification information is registered in the search server management table 4300. After S214, the process proceeds to S205.

このように構成される本実施例も第１実施例と同様の効果を奏する。さらに、本実施例では、最初の統合検索時に取得したハッシュアルゴリズム識別情報を保持して、共通ハッシュアルゴリズムを決定し、統合検索を行う。従って、統合検索要求を受信するたびに、各検索サーバ1100、1200、1300からハッシュアルゴリズムについての情報を取得する必要が無く、統合検索のオーバヘッドを短縮できる。 Configuring this embodiment like this also achieves the same effects as the first embodiment. Further, in this embodiment, the hash algorithm identification information acquired at the time of the first integrated search is held, a common hash algorithm is determined, and the integrated search is performed. Therefore, it is not necessary to acquire information about the hash algorithm from each search server 1100, 1200, 1300 each time an integrated search request is received, and the overhead of the integrated search can be shortened.

図２７，図２８を参照して第３実施例を説明する。上述した第１実施例は、統合検索要求を受けた検索サーバ1100と他の検索サーバ1100、1200、1300との間で、統合検索処理を行うたびにハッシュアルゴリズムについて折衝する。しかし、第２実施例で述べたように、ハッシュアルゴリズムは頻繁に変更されるものではない。そこで、本実施例では、以下に述べるように、統合検索サービスを提供するシステムの構築時において、検索サーバ1100が、各検索サーバ1100、1200、1300で利用可能なハッシュアルゴリズムをそれぞれ取得し、検索サーバ1100内に事前に登録する。 A third embodiment will be described with reference to FIGS. In the first embodiment described above, the hash algorithm is negotiated between the search server 1100 that receives the integrated search request and the other search servers 1100, 1200, and 1300 each time the integrated search process is performed. However, as described in the second embodiment, the hash algorithm is not frequently changed. Therefore, in this embodiment, as described below, when a system that provides an integrated search service is constructed, the search server 1100 acquires hash algorithms that can be used by the search servers 1100, 1200, and 1300, respectively. Register in advance in server 1100.

検索サーバ管理テーブル4300については、第２実施例の図２５にて説明した内容と同じ変更を行う必要がある。変更内容は図２５と同じであるため、説明は省略する。検索サーバ1100における統合検索制御プログラム1125については、ハッシュアルゴリズムを事前に登録するための処理を新たに追加する。 The search server management table 4300 needs to be changed in the same way as the contents described in FIG. 25 of the second embodiment. The content of the change is the same as in FIG. For the integrated search control program 1125 in the search server 1100, a process for registering a hash algorithm in advance is newly added.

図２７は、検索サーバ1100の有するコンピュータプログラムの構成を示す。図２７は、図３に示す構成に加えて、統合検索制御プログラム1125内に、ハッシュアルゴリズム事前折衝サブプログラム1178が新たに追加されている。 FIG. 27 shows the configuration of a computer program that the search server 1100 has. 27, in addition to the configuration shown in FIG. 3, a hash algorithm pre-negotiation subprogram 1178 is newly added to the integrated search control program 1125.

ハッシュアルゴリズム事前折衝サブプログラム1178は、統合検索サービスを提供するためのシステム構築時において、各検索サーバ1100、1200、1300で利用されているハッシュアルゴリズムを事前に調査し、その調査結果を検索サーバ1100内に格納させるための処理である。 The hash algorithm pre-negotiation subprogram 1178 investigates in advance the hash algorithm used in each of the search servers 1100, 1200, and 1300 at the time of system construction for providing the integrated search service, and the search result is obtained from the search server 1100. It is a process for making it store in.

図２８は、検索サーバ1100で実行される、ハッシュアルゴリズム事前折衝処理のフローチャートを示す。本処理は、統合検索サービスを提供するシステムを構築する場合において、各検索サーバ1100、1200、1300の設定作業を行う際に、実施される。 FIG. 28 shows a flowchart of the hash algorithm pre-negotiation process executed by the search server 1100. This process is performed when setting the search servers 1100, 1200, and 1300 in the case of constructing a system that provides an integrated search service.

始めに、検索サーバ1100は、検索サーバ1100にて利用可能なハッシュアルゴリズムを特定する（S601）。検索サーバ1100の管理している検索インデックス登録ファイル管理テーブル4100内のハッシュアルゴリズム4151、4153等を調べることにより、特定することができる。 First, the search server 1100 identifies a hash algorithm that can be used by the search server 1100 (S601). It can be specified by examining the hash algorithms 4151 and 4153 in the search index registration file management table 4100 managed by the search server 1100.

検索サーバ1100は、検索サーバ管理テーブル4300に登録している全検索サーバ1100、1200、1300に対して、ハッシュアルゴリズムを問い合わせる（S602）。この問合せには、ハッシュアルゴリズム問合せ要求パラメータ6300が使用される。 The search server 1100 inquires of all the search servers 1100, 1200, 1300 registered in the search server management table 4300 about the hash algorithm (S602). The hash algorithm query request parameter 6300 is used for this query.

検索サーバ1100は、ハッシュアルゴリズム問合せ応答パラメータ6400に含まれる情報を、問合せ先の各検索サーバ1100、1200、1300から取得する。検索サーバ1100は、各ハッシュアルゴリズム識別情報を検索サーバ管理テーブル4300に登録する（S603）。 The search server 1100 acquires information included in the hash algorithm query response parameter 6400 from each of the search servers 1100, 1200, and 1300 that are the query destinations. The search server 1100 registers each hash algorithm identification information in the search server management table 4300 (S603).

本実施例は上述のように構成されるため、第１実施例と同様の効果を奏する。さらに、本実施例では、統合検索サービスを提供するシステム構築時に、各検索サーバからハッシュアルゴリズムに関する情報を収集して保持する。従って、統合検索要求を受けるたびにハッシュアルゴリズムについての折衝処理を行う必要がなく、統合検索のオーバヘッドを短縮できる。 Since the present embodiment is configured as described above, the same effects as the first embodiment can be obtained. Furthermore, in this embodiment, information related to the hash algorithm is collected and stored from each search server when a system that provides an integrated search service is constructed. Therefore, it is not necessary to negotiate the hash algorithm every time an integrated search request is received, and the overhead of the integrated search can be shortened.

図２９を参照して第４実施例を説明する。上述した第１実施例は、各検索サーバ1100、1200、1300における検索インデックス更新処理時に、検索対象ファイルデータについてのハッシュ値を作成する。これに対し、本実施例では、以下に述べるように、検索処理時にハッシュ値を作成させる。従って、本実施例では、検索インデックス更新処理のオーバヘッドを低減でき、さらに、ハッシュ値を格納するための領域を不要にできる。 A fourth embodiment will be described with reference to FIG. The first embodiment described above creates a hash value for search target file data at the time of search index update processing in each search server 1100, 1200, 1300. On the other hand, in the present embodiment, as described below, a hash value is created during the search process. Therefore, in this embodiment, the overhead of the search index update process can be reduced, and an area for storing the hash value can be made unnecessary.

本実施例では、検索サーバが検索要求を受けた際、検索条件に合致するファイルデータを探し出すと共に、そのファイルデータについてのハッシュ値をいわゆるオンデマンドで作成する。 In this embodiment, when the search server receives a search request, the search server searches for file data that matches the search condition and creates a hash value for the file data on a so-called on-demand basis.

図２９は、検索サーバにおける検索応答処理のフローチャートを示す。本処理は、図２２に示すS401−S406を備えており、さらに、S404とS405との間に、新たにS407−S409が追加される。以下、図２２と同様に、主語を検索サーバ1200とする。 FIG. 29 shows a flowchart of search response processing in the search server. This process includes S401 to S406 shown in FIG. 22, and S407 to S409 are newly added between S404 and S405. Hereinafter, as in FIG. 22, the subject is the search server 1200.

ハッシュアルゴリズムが指定された検索要求である場合（S404：YES）、検索サーバ1200は、指定されたハッシュアルゴリズムを用いてハッシュ値が作成されているか否かを判定する（S407）。検索サーバ1200は、検索インデックス登録ファイル管理テーブル4100の対象ファイルのハッシュアルゴリズム4150の欄に、対象ファイルのハッシュ値が登録されているかどうかを調べる。 If the search request specifies a hash algorithm (S404: YES), the search server 1200 determines whether a hash value has been created using the specified hash algorithm (S407). The search server 1200 checks whether or not the hash value of the target file is registered in the field of the hash algorithm 4150 of the target file in the search index registration file management table 4100.

ハッシュ値が作成済みの場合（S407：YES）、S405に移る。これに対し、ハッシュ値が作成されていない場合（S407：NO）、検索サーバ1200は、対象ファイルデータを取得する（S408）。検索サーバ1200は、検索インデックスを更新するためのクローリングにより取得されたファイルデータを利用してもよいし、または、ファイルサーバから対象ファイルデータを改めて取得してもよい。 When the hash value has been created (S407: YES), the process proceeds to S405. On the other hand, when the hash value has not been created (S407: NO), the search server 1200 acquires the target file data (S408). The search server 1200 may use file data acquired by crawling for updating the search index, or may acquire target file data from the file server anew.

対象ファイルデータを取得した後、検索サーバは、指定されたハッシュアルゴリズムを利用して、対象ファイルデータからハッシュ値を作成する（S409）。検索サーバ1200は、ハッシュ値を作成すると、S405に移る。 After acquiring the target file data, the search server creates a hash value from the target file data using a designated hash algorithm (S409). After creating the hash value, the search server 1200 moves to S405.

このように構成される本実施例も第１実施例と同様の効果を奏する。さらに、本実施例では、検索要求を受領した時点で、検索条件に合致するファイルのハッシュ値をその場で生成する。従って、本実施例では、検索インデックスの更新処理時にハッシュ値を作成したり、そのハッシュ値を保存したりする必要がない。 Configuring this embodiment like this also achieves the same effects as the first embodiment. Furthermore, in this embodiment, when a search request is received, a hash value of a file that matches the search condition is generated on the spot. Therefore, in this embodiment, it is not necessary to create a hash value or store the hash value at the time of search index update processing.

図３０を参照して第５実施例を説明する。上述した第４実施例は、各検索サーバ1100、1200、1300での検索処理時に、検索対象ファイルについてハッシュ値を作成する。しかし、各検索サーバの処理負荷またはマシン性能等によっては、いわゆるオンデマンドでのハッシュ値作成が難しい場合もあり得る。 A fifth embodiment will be described with reference to FIG. In the fourth embodiment described above, hash values are created for the search target files during the search processing in the search servers 1100, 1200, and 1300. However, so-called on-demand hash value creation may be difficult depending on the processing load or machine performance of each search server.

そこで、本実施例では、統合検索要求を受けた検索サーバ1100において、各検索サーバ1100、1200、1300でそれぞれ検索されたファイルについてのハッシュ値を作成する。これにより、本実施例では、各検索サーバ1100、1200、1300における検索処理時のオーバヘッドを低減したり、ハッシュ値を格納するための領域を削減したりできる。 Therefore, in this embodiment, the search server 1100 that has received the integrated search request creates hash values for the files searched by the search servers 1100, 1200, and 1300, respectively. Thereby, in this embodiment, it is possible to reduce the overhead at the time of search processing in each search server 1100, 1200, 1300, or to reduce the area for storing hash values.

図３０は、検索サーバ1100における統合検索処理のフローチャートの一部を示す。フローチャートは、図２０に示すフローチャートに対応する。相違点を中心に述べる。 FIG. 30 shows a part of a flowchart of integrated search processing in the search server 1100. The flowchart corresponds to the flowchart shown in FIG. I will focus on the differences.

検索サーバ1100は、統合検索結果一時保管テーブル4400のエントリに格納されているファイルデータを取得する（S213）。検索サーバ1100は、統合検索結果一時保管テーブル4400のファイルパス名4450に基づいて、ファイルサーバからファイルデータを直接取得してもよい。または、検索サーバ1100内で、対象ファイルのキャッシュデータを保管している場合は、そのキャッシュデータを利用してもよい。 The search server 1100 acquires the file data stored in the entry of the integrated search result temporary storage table 4400 (S213). The search server 1100 may directly acquire file data from the file server based on the file path name 4450 of the integrated search result temporary storage table 4400. Alternatively, when cache data of the target file is stored in the search server 1100, the cache data may be used.

検索サーバ1100は、各検索サーバで共通に利用可能なハッシュアルゴリズムを用いて、S213で取得した各ファイルデータのハッシュ値を作成し、統合検索結果一時保管テーブル4400に登録する（S214）。検索サーバ1100は、統合検索結果一時保管テーブル4400内のハッシュアルゴリズム4460欄、及びハッシュ値4470欄に、共通ハッシュアルゴリズムの識別情報、及び作成したハッシュ値を格納する。 The search server 1100 creates a hash value of each file data acquired in S213 using a hash algorithm that can be commonly used by each search server, and registers it in the integrated search result temporary storage table 4400 (S214). The search server 1100 stores the identification information of the common hash algorithm and the created hash value in the hash algorithm 4460 column and the hash value 4470 column in the integrated search result temporary storage table 4400.

このように構成される本実施例も第１実施例と同様の効果を奏する。さらに、本実施例では、検索サーバ1100は、統合検索要求を受けた時に、検索条件に合致するファイルのハッシュ値をその場で生成する。従って、本実施例では、処理負荷が高くなりすぎるために各検索サーバでのハッシュ値作成が難しい場合、統合検索要求を受けた検索サーバにて一括してハッシュ値を作成することができる。 Configuring this embodiment like this also achieves the same effects as the first embodiment. Further, in this embodiment, when receiving the integrated search request, the search server 1100 generates a hash value of the file that matches the search condition on the spot. Therefore, in this embodiment, when it is difficult to create a hash value in each search server because the processing load becomes too high, it is possible to create a hash value collectively at the search server that has received the integrated search request.

さらに、本実施例では、各検索サーバの負荷状況に応じて、ハッシュ値作成の分担を変えることができる。例えば、負荷が低い検索サーバは、その検索サーバ内でハッシュ値を作成したり、負荷が高い検索サーバは、統合検索要求を受けた検索サーバ（または他の検索サーバ）でハッシュ値を作成したりすることもできる。 Furthermore, in this embodiment, the share of hash value creation can be changed according to the load status of each search server. For example, a search server with a low load creates a hash value within the search server, and a search server with a high load creates a hash value with the search server (or other search server) that received the integrated search request. You can also

図３１，図３２を参照して第６実施例を説明する。上述した第５実施例は、検索サーバ1100における統合検索処理時に、その検索サーバ1100が、対象ファイルデータのハッシュ値を作成する。 A sixth embodiment will be described with reference to FIGS. In the fifth embodiment described above, during the integrated search process in the search server 1100, the search server 1100 creates a hash value of the target file data.

しかし、統合検索要求を受けた検索サーバ1100が、対象ファイルデータを格納しているファイルサーバにアクセスできない可能性もある。この場合、統合検索要求を受けた検索サーバ1100は、検索条件に合致するファイルデータを取得できない。 However, the search server 1100 that receives the integrated search request may not be able to access the file server storing the target file data. In this case, the search server 1100 that has received the integrated search request cannot acquire file data that matches the search conditions.

そこで、ファイルデータを利用するのではなく、各検索サーバが検索結果の一部として提供する、検索キーワードを含有した文字列を利用する。本実施例では、検索キーワード含有文字列に基づいてハッシュ値を作成し、統合検索結果から重複エントリを検出して排除するために利用する。これにより、統合検索要求を受けた検索サーバ1100がファイルサーバにアクセスできない場合、または、処理負荷が高いために各検索サーバ1100、1200、1300でハッシュ値を作成することが難しい場合に、統合検索結果から重複エントリを見つけて排除することができる。 Therefore, instead of using file data, a character string containing a search keyword provided by each search server as a part of the search result is used. In this embodiment, a hash value is created based on the search keyword-containing character string, and is used to detect and eliminate duplicate entries from the integrated search result. As a result, when the search server 1100 that receives the integrated search request cannot access the file server, or when it is difficult to create a hash value on each search server 1100, 1200, 1300 due to high processing load, the integrated search Duplicate entries can be found and eliminated from the results.

図３１は、統合検索結果一時保管テーブル4400を示す。図３１の統合検索結果一時保管テーブル4400は、図１１に示す統合検索結果一時保管テーブル4400と比べて、検索キーワード含有文字列4480の中に部分文字列4481と部分ハッシュ値4482とが新たに追加されている点で相違する。 FIG. 31 shows an integrated search result temporary storage table 4400. In the integrated search result temporary storage table 4400 of FIG. 31, a partial character string 4481 and a partial hash value 4482 are newly added to the search keyword-containing character string 4480, compared to the integrated search result temporary storage table 4400 shown in FIG. Is different.

部分文字列4481は、図１１に示す統合検索結果一時保管テーブル4400内の検索キーワード含有文字列4480に元々格納されている情報である。部分ハッシュ値4482には、部分文字列4481から作成されるハッシュ値が格納される。 The partial character string 4481 is information originally stored in the search keyword-containing character string 4480 in the integrated search result temporary storage table 4400 shown in FIG. In the partial hash value 4482, a hash value created from the partial character string 4482 is stored.

部分ハッシュ値4482は、統合検索結果一時保管テーブル4400のハッシュアルゴリズム4440に登録されているハッシュアルゴリズムを利用して作成してもよい。または、検索サーバ1100で利用可能な任意のハッシュアルゴリズムを一つ選択し、そのハッシュアルゴリズムを利用して作成してもよい。 The partial hash value 4482 may be created using a hash algorithm registered in the hash algorithm 4440 of the integrated search result temporary storage table 4400. Alternatively, an arbitrary hash algorithm that can be used by the search server 1100 may be selected and created using the hash algorithm.

検索キーワード含有文字列4480に格納される、部分文字列4481と部分ハッシュ値4482の組は、各エントリに対して複数個保管することができる。なお、検索キーワード含有文字列4480の欄で、部分文字列4481または部分ハッシュ値4482に空欄が生じる場合、null値を格納すればよい。 A plurality of sets of partial character strings 4482 and partial hash values 4482 stored in the search keyword-containing character string 4480 can be stored for each entry. If a blank is generated in the partial character string 4482 or the partial hash value 4482 in the search keyword-containing character string 4480 field, a null value may be stored.

図３２は、検索サーバ1100で実行される統合検索処理のフローチャートの一部を示す。本処理は、検索結果に含まれる検索キーワード含有文字列についてのハッシュ値を、オンデマンドに作成する。 FIG. 32 shows a part of a flowchart of the integrated search process executed by the search server 1100. In this process, a hash value for the search keyword-containing character string included in the search result is created on demand.

図３２において、S208の後、検索サーバ1100は、検索サーバ1100で利用可能なハッシュアルゴリズムを利用して、統合検索結果一時保管テーブル4400のエントリに格納されている部分文字列のハッシュ値を作成し、それを部分ハッシュ値として、統合検索結果一時保管テーブル4400に登録する（S220）。 In FIG. 32, after S208, the search server 1100 creates a hash value of the partial character string stored in the entry of the integrated search result temporary storage table 4400 using a hash algorithm that can be used by the search server 1100. It is registered in the integrated search result temporary storage table 4400 as a partial hash value (S220).

ここで、統合用検索結果一時保管テーブル4400のエントリにおいて、部分文字列4481に複数の部分文字列が格納されている場合、検索サーバ1100は、それら全ての部分文字列について部分ハッシュ値をそれぞれ作成し、部分ハッシュ値4482欄に格納する。 Here, in the entry of the integrated search result temporary storage table 4400, when a plurality of partial character strings are stored in the partial character string 4481, the search server 1100 creates partial hash values for all the partial character strings, respectively. And stored in the partial hash value 4482 column.

検索サーバ1100は、部分ハッシュ値を用いて、統合検索結果から重複したエントリを検出し、それを排除する（S221）。 The search server 1100 detects a duplicate entry from the integrated search result using the partial hash value, and eliminates it (S221).

検索サーバ1100は、全ての部分ハッシュ値同士が合致するエントリを、重複エントリであると判断することができる。または、部分ハッシュ値が一定割合以上合致するエントリを、準重複エントリであると判断してもよい。例えば、ｎ個の部分ハッシュ値のうちｍ（０＜ｍ＜ｎ）個以上の部分ハッシュ値が一致する２つのエントリは、重複エントリであると判定する構成としてもよい。 The search server 1100 can determine that an entry in which all partial hash values match is a duplicate entry. Alternatively, an entry having a partial hash value that matches a certain ratio or more may be determined to be a quasi-duplicate entry. For example, two entries having m (0 <m <n) or more partial hash values that match among the n partial hash values may be determined to be duplicate entries.

このように構成される本実施例も第１実施例と同様の効果を奏する。さらに、本実施例では、ファイルデータからハッシュ値を求めるのではなく、検索キーワードを含む文字列（つまり、ファイルデータの一部分）からハッシュ値を求めて、重複エントリの検出に使用する。従って、検索サーバ1100がファイルサーバにアクセスできない場合、または、検索サーバ1100、1200、1300間で共通のハッシュアルゴリズムを決定できなかった場合でも、統合検索結果の中から重複したエントリを見つけて排除することができる。 Configuring this embodiment like this also achieves the same effects as the first embodiment. Further, in this embodiment, the hash value is not obtained from the file data, but the hash value is obtained from the character string including the search keyword (that is, a part of the file data) and used for detecting the duplicate entry. Therefore, even if the search server 1100 cannot access the file server, or even if a common hash algorithm cannot be determined between the search servers 1100, 1200, and 1300, duplicate entries are found and eliminated from the integrated search results. be able to.

図３３を参照して第７実施例を説明する。本実施例では、統合検索結果の中から重複したエントリを検出した場合、その重複エントリに対応する検索インデックスを管理する検索サーバに、重複エントリが有ることを通知する。 A seventh embodiment will be described with reference to FIG. In this embodiment, when a duplicate entry is detected from the integrated search result, the search server that manages the search index corresponding to the duplicate entry is notified that there is a duplicate entry.

図３３は、重複エントリを発見した場合の処理である。代表検索サーバ1100は、統合検索要求を受け付けて、統合検索を行うための検索サーバである。検索サーバ（1100、1200、1300）は、統合検索に参加する検索サーバであって、代表検索サーバ1100からの検索要求に応じて検索する検索サーバである。 FIG. 33 shows processing when a duplicate entry is found. The representative search server 1100 is a search server for receiving an integrated search request and performing an integrated search. The search servers (1100, 1200, 1300) are search servers that participate in the integrated search, and search according to a search request from the representative search server 1100.

代表検索サーバ1100は、クライアントマシンから統合検索要求を受信すると、各検索サーバに検索要求を発行する（S701）。各検索サーバは、検索要求に従って検索し、検索結果を代表検索サーバ1100に返す（S702）。代表検索サーバ1100は、ハッシュ値を用いて、統合検索結果の中から重複エントリを検出する（S703）。 Upon receiving the integrated search request from the client machine, the representative search server 1100 issues a search request to each search server (S701). Each search server searches according to the search request, and returns the search result to the representative search server 1100 (S702). The representative search server 1100 detects a duplicate entry from the integrated search result using the hash value (S703).

代表検索サーバ1100は、重複エントリの取り除かれた統合検索結果を、クライアントマシンに送信する（S704）。さらに、代表検索サーバ1100は、重複エントリの発見された検索サーバに、重複エントリが発見された旨を通知する（S705）。その通知を受けた検索サーバは、重複エントリであることを確認する（S706）。検索サーバは、ファイルサーバに指示して、重複エントリに対応するファイルを削除させることもできる。 The representative search server 1100 transmits the integrated search result from which the duplicate entry is removed to the client machine (S704). Further, the representative search server 1100 notifies the search server in which the duplicate entry has been found that the duplicate entry has been found (S705). Upon receiving the notification, the search server confirms that it is a duplicate entry (S706). The search server can instruct the file server to delete the file corresponding to the duplicate entry.

このように構成される本実施例も第１実施例と同様の効果を奏する。さらに、本実施例では、統合検索結果から重複エントリが検出された場合、その旨を検索サーバに通知するため、重複したファイルを削除させることもできる。 Configuring this embodiment like this also achieves the same effects as the first embodiment. Furthermore, in this embodiment, when a duplicate entry is detected from the integrated search result, a notification to that effect is sent to the search server, so that a duplicate file can be deleted.

なお、本発明は、上述した実施形態に限定されない。当業者であれば、本発明の範囲内で、種々の追加や変更等を行うことができる。例えば、複数の検索サーバ間で共通するハッシュアルゴリズムが見つからない場合、共通ハッシュアルゴリズムとなり得るハッシュアルゴリズムを備えていない検索サーバに、その共通ハッシュアルゴリズムとなり得るハッシュアルゴリズムを送信してインストールさせる構成としてもよい。 In addition, this invention is not limited to embodiment mentioned above. A person skilled in the art can make various additions and changes within the scope of the present invention. For example, when a hash algorithm common to a plurality of search servers is not found, a configuration may be adopted in which a hash algorithm that can be a common hash algorithm is transmitted and installed in a search server that does not have a hash algorithm that can be a common hash algorithm. .

Claims

A method of searching using a computer system including a plurality of search servers,
The computer system is configured by loosely coupling the plurality of search servers that operate independently of each other,
When the integrated search server included in the plurality of search servers receives an integrated search request for causing the plurality of predetermined search servers included in the plurality of search servers to search, respectively, the predetermined search servers are commonly used. Determining available duplicate detection information for detecting duplicates, issuing a search request corresponding to the integrated search request to each predetermined search server;
Each of the predetermined search servers searches a data group in charge based on the search request, and a duplicate detection for detecting duplicates created using the determined duplicate detection information in the search result. Including the value to the federated search server,
The integrated search server detects duplicate data based on the duplicate detection values from the search results received from the predetermined search servers, and detects the duplicate data detected from the search results. Removing and creating an integrated search result, and providing the integrated search result to the issuer of the integrated search request;
retrieval method.

Each of the predetermined search servers is
For the data group in charge, each of the duplicate detection values for each of a plurality of duplicate detection information is stored in advance,
The duplicate detection value corresponding to the duplicate detection information determined by the integrated search server among the stored duplicate detection values is included in the search result and transmitted to the integrated search server. Search method.

Each of the predetermined search servers creates and stores the duplicate detection value for each of the plurality of duplicate detection information when updating a search index used for searching the data group in charge. Item 3. The search method according to Item 2.

The integrated search server acquires and holds information on duplicate detection information that can be used by each predetermined search server from each predetermined search server, and holds the information when the integrated search request is received. The search method according to claim 1, wherein the duplicate detection information that can be commonly used by the predetermined search servers is determined based on the information related to each duplicate detection information.

When the computer system is constructed, the integrated search server acquires information about duplicate detection information that can be used by each predetermined search server from the predetermined search server, and holds the information. 2. The search according to claim 1, wherein, when an integrated search request is received, the duplicate detection information that can be commonly used by each of the predetermined search servers is determined based on the held information relating to each duplicate detection information. Method.

When each of the predetermined search servers receives the search request from the integrated search server, the predetermined search server creates the duplicate detection value based on the determined duplicate detection information, and includes the duplicate detection value in the search result. The search method according to claim 1, wherein the search method is transmitted to the integrated search server.

The search method according to claim 1, wherein the duplicate detection information is a hash algorithm, and the duplicate detection value is a hash value.

An integrated search server for searching using a computer system configured by loosely coupling a plurality of search servers that operate independently of each other,
When receiving an integrated search request for causing each of a plurality of predetermined search servers included in the plurality of search servers to search,
The duplication detection information for detecting duplication that can be used in common by each of the predetermined search servers is determined,
Specify the determined duplication detection information to each predetermined search server, issue a search request corresponding to the integrated search request,
Each of the predetermined search servers is a search result obtained by searching a data group in charge, and the search result including a duplicate detection value for detecting a duplicate created using the determined duplicate detection information. , Received from each of the predetermined search servers,
From the search results received from each predetermined search server, detect duplicate data based on each duplicate detection value,
Remove the detected duplicate data to create an integrated search result,
Providing the integrated search result to an issuer of the integrated search request;
Integrated search server.

The integrated search server acquires and holds information on duplicate detection information that can be used by each predetermined search server from each predetermined search server, and holds the information when the integrated search request is received. Based on the information on each duplicate detection information that has been made, determine the duplicate detection information that can be commonly used by each of the predetermined search servers,
The integrated search server according to claim 8.

When the computer system is constructed, the integrated search server acquires information about duplicate detection information that can be used by each predetermined search server from the predetermined search server, and holds the information. 9. The integration according to claim 8, wherein, when an integrated search request is received, the duplication detection information that can be commonly used by each of the predetermined search servers is determined based on the held information about each duplication detection information. Search server.

When each of the predetermined search servers receives the search request from the integrated search server, the predetermined search server creates the duplicate detection value based on the determined duplicate detection information, and includes the duplicate detection value in the search result. The federated search server according to claim 8, wherein the federated search server is transmitted to the federated search server.

A computer program for causing a computer to function as an integrated search server for searching using a computer system configured by loosely coupling a plurality of search servers that operate independently of each other,
In the computer,
Receiving an integrated search request for causing each of a plurality of predetermined search servers included in the plurality of search servers to search;
The duplication detection information for detecting duplication that can be commonly used by each of the predetermined search servers is determined,
Each of the predetermined search servers, specifying the determined duplication detection information, causing a search request corresponding to the integrated search request to be issued,
Each of the predetermined search servers is a search result obtained by searching a data group in charge, and the search result including a duplicate detection value for detecting a duplicate created using the determined duplicate detection information. , Received from each of the predetermined search servers,
From the search results received from each predetermined search server, to detect duplicate data based on each duplicate detection value,
Remove the detected duplicate data and create an integrated search result,
Providing the integrated search result to an issuer of the integrated search request;
Computer program.

In the computer,
Information about duplicate detection information that can be used in each predetermined search server is acquired from each predetermined search server and held;
When receiving the integrated search request, based on the information on each held duplicate detection information, to determine the duplicate detection information that each predetermined search server can use in common,
The computer program according to claim 12.

In the computer,
When the computer system is constructed, information on duplicate detection information that can be used in each predetermined search server is acquired from each predetermined search server and held,
When receiving the integrated search request, based on the information on each held duplicate detection information, to determine the duplicate detection information that each predetermined search server can use in common,
The computer program according to claim 12.

The computer program according to claim 12, wherein the duplicate detection information is a hash algorithm, and the duplicate detection value is a hash value.