JP4503379B2

JP4503379B2 - Search device

Info

Publication number: JP4503379B2
Application number: JP2004213294A
Authority: JP
Inventors: 義徳山岸; 光則郡
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2004-07-21
Filing date: 2004-07-21
Publication date: 2010-07-14
Anticipated expiration: 2024-07-21
Also published as: JP2006031624A

Description

この発明は、検索装置に関するものである。 The present invention relates to a search device.

昨今、文書や画像など様々なコンテンツがデジタルデータ化され、データベースに蓄積されつつある。これらの膨大な蓄積データの中から目的のデータを抽出する手段として、検索エンジンなどによる検索サービスが用いられる。検索サービスでは、対話的な検索機能と共に、大量データに対する高速な検索性能が要求される。
これを実現する手段として並列検索装置がある。並列検索装置による処理方式では、１つのデータベースを複数のコンピュータに分割して登録し、個々のコンピュータ（以下、スレーブコンピュータと記す。）で並列に検索して得られる部分的な検索結果を１台のコンピュータ（以下、マスタコンピュータと記す。）が取り纏め、最終的な検索結果を出力する。 Nowadays, various contents such as documents and images are converted into digital data and stored in a database. A search service such as a search engine is used as means for extracting target data from the enormous amount of accumulated data. The search service requires high-speed search performance for a large amount of data as well as an interactive search function.
There is a parallel search device as means for realizing this. In the processing method using the parallel search device, one database is divided into a plurality of computers and registered, and one partial search result obtained by searching in parallel on each computer (hereinafter referred to as a slave computer) is stored. Computer (hereinafter referred to as a master computer) collects and outputs a final search result.

例えば特許文献１に開示された従来の並列データ検索処理装置では、検索端末が、検索処理または検索結果の表示処理を要求すると、検索コーディネータ（マスタコンピュータに相当）は、その検索要求または表示処理要求を分析して、処理を依頼する検索処理装置（スレーブコンピュータに相当）を決定し、検索要求または表示処理要求を出す。要求を受けた検索処理装置は、各々に割り当てられている分割データベースに対して検索等の処理を行い、処理結果を生成する。検索コーディネータは、各検索処理装置から処理結果を受信し、それらを取りまとめて検索端末に提供する。
また、非特許文献１には、検索結果のヒット件数が所定の最大取り出し件数以下の場合にだけヒット件数とヒットした対象データの一覧を表示し、ヒット件数が最大取り出し件数を超える場合にはヒット件数だけを表示する方法が開示されている。 For example, in the conventional parallel data search processing device disclosed in Patent Document 1, when the search terminal requests search processing or display processing of search results, the search coordinator (corresponding to the master computer) requests the search request or display processing. And a search processing device (corresponding to a slave computer) that requests processing is determined, and a search request or display processing request is issued. Upon receiving the request, the search processing device performs a process such as a search on the divided database assigned to each, and generates a processing result. The search coordinator receives the processing results from each search processing device, collects them, and provides them to the search terminal.
Non-Patent Document 1 displays a list of hits and hit target data only when the number of hits in the search result is equal to or less than a predetermined maximum number of retrievals, and hits when the number of hits exceeds the maximum number of retrievals. A method of displaying only the number of cases is disclosed.

しかし、並列検索装置では、スレーブコンピュータによる部分的な検索結果はネットワークなどの通信回線を介してマスタコンピュータへ転送されるため、スレーブコンピュータで得られたヒット数が多くなると、マスタコンピュータへ転送されるデータ量も多くなり、通信経路がボトルネックとなって検索性能が低下するという問題があった。
このような問題を解決するため、上述の特許文献１に開示された並列データ検索処理装置では、検索処理においては通信経路を介したデータ転送量が最少となるように、ヒット件数とヒット対象データを特定する必要最小限のデータ（例えばタイトルなど）に絞ってデータ転送している。 However, in the parallel search device, the partial search result by the slave computer is transferred to the master computer via a communication line such as a network, so if the number of hits obtained by the slave computer increases, it is transferred to the master computer. There is a problem that the amount of data increases, and the search performance is degraded due to the bottleneck of the communication path.
In order to solve such a problem, in the parallel data search processing device disclosed in Patent Document 1 described above, the number of hits and hit target data are set so that the data transfer amount via the communication path is minimized in the search process. The data transfer is limited to the minimum necessary data (for example, titles) for specifying.

特開平１０−１５４１６０号公報JP-A-10-154160 「ＰＡＴＯＬＩＳフルテキスト検索サービス」、ＰＡＴＯＬＩＳニュース、財団法人日本特許情報機構、１９９９年７月、Ｎｏ．２２５“PATOLIS full-text search service”, PATOLIS News, Japan Patent Information Organization, July 1999, No. 225

しかし、特許文献１に開示された並列データ検索処理装置では、検索結果の表示処理時には、得られた検索結果のヒット数が多い場合、転送するデータ量も多くなり、通信経路がボトルネックとなって検索性能が低下するという問題がある。
さらに、非特許文献１に開示された方法では、ヒット件数が最大取り出し件数を超えた段階ではマスタコンピュータにおいてヒットデータ一覧は表示されなくなるため、ヒット対象を特定するデータやヒットデータ一覧の情報をマスタコンピュータへ転送する必要はなくなるが、従来の技術においては無駄なデータ転送が行われているという問題があった。 However, in the parallel data search processing device disclosed in Patent Document 1, when the search results are displayed, if the number of hits in the obtained search results is large, the amount of data to be transferred increases and the communication path becomes a bottleneck. There is a problem that the search performance deteriorates.
Furthermore, in the method disclosed in Non-Patent Document 1, since the hit data list is not displayed on the master computer when the number of hits exceeds the maximum number of fetches, the data specifying the hit target and the information on the hit data list are stored in the master computer. Although there is no need to transfer to a computer, the prior art has a problem in that useless data transfer is performed.

この発明は上記のような課題を解決するためになされたもので、マスタコンピュータとスレーブコンピュータ間でのデータ転送を効率化し、高速な検索処理が可能な検索装置を得ることを目的とする。 The present invention has been made to solve the above-described problems, and an object of the present invention is to obtain a search device that can efficiently transfer data between a master computer and a slave computer and can perform high-speed search processing.

この発明に係る検索装置は、あるデータ量単位で検索対象データを取得するデータ入力部と、検索対象データに対し、与えられた検索条件を用いた照合処理の照合回数とヒット件数とに基づいて、照合回数を集計した照合回数合計と、ヒット件数を集計したヒット件数合計とを算出するデータ照合部と、照合処理によって得られた照合回数合計とヒット件数合計とをパラメータとする所定の算出式により予測ヒット件数を算出し、この予測ヒット件数がある閾値以下の場合には、ヒットデータ一覧を生成する照合結果生成部と、ヒット件数とヒットデータ一覧をマスタコンピュータに送信する照合結果送信部を有する１台以上のスレーブコンピュータと、各スレーブコンピュータよりヒット件数とヒットデータ一覧を受信する照合結果受信部と、各スレーブコンピュータから受信したヒット件数の合計件数がある閾値以下であれば、各スレーブコンピュータから受信した全ヒットデータ一覧を生成する検索結果生成部と、合計件数と全ヒットデータ一覧を出力する検索結果出力部を有するマスタコンピュータとを備え、データ照合部による照合処理と、照合結果生成部によるヒット件数に応じたヒットデータ一覧の生成と、照合結果送信部によるヒット件数とヒットデータ一覧のマスタコンピュータへの送信を、データ量単位の検索対象データ毎に繰り返し行うものである。 The search device according to the present invention is based on a data input unit that acquires search target data in a certain data amount unit, and the number of matches and the number of hits in a matching process using a given search condition for the search target data. , A data matching unit that calculates the total number of matching times that is the total number of matching times, the total number of hit numbers that are the number of hits, and a predetermined calculation formula that uses the total number of matching times and the total number of hits obtained by the matching process as parameters If the predicted hit count is less than a certain threshold, a verification result generation section that generates a hit data list and a verification result transmission section that transmits the hit count and hit data list to the master computer are calculated. One or more slave computers, and a matching result receiving unit that receives the hit count and hit data list from each slave computer If the total number of hits received from each slave computer is less than a certain threshold, the search result generator generates a list of all hit data received from each slave computer, and a search that outputs the total number and all hit data list A master computer having a result output unit, a collation process by the data collation unit, a hit data list generation according to the number of hits by the collation result generation unit, and a hit computer and hit data list master computer by the collation result transmission unit Is repeatedly performed for each search target data in a data amount unit .

この発明によれば、スレーブコンピュータが、照合処理によって得られた照合回数合計とヒット件数合計とをパラメータとする所定の算出式により予測ヒット件数を算出し、当該予測ヒット件数に基づいて、マスタコンピュータにヒットデータ一覧を送信するか否かを判断するようにしたので、マスタコンピュータとスレーブコンピュータ間でのデータ転送を効率化し、高速な検索処理を可能とすることができる。
According to the present invention, the slave computer calculates the predicted hit number by the predetermined calculation formula using the total number of matching times and the total hit number obtained by the matching process as parameters, and based on the predicted hit number, the master computer Since it is determined whether or not to transmit the hit data list, it is possible to make the data transfer between the master computer and the slave computer more efficient and to enable high-speed search processing.

以下、この発明の実施の様々な形態を説明する。
実施の形態１．
図１は、この発明の実施の形態１による、検索システム１００の構成を示すブロック図である。図に示すように、検索システム１００は、検索装置１０と検索端末装置２０を備え、検索装置１０と検索端末装置２０はネットワーク３０を介して接続されている。
検索装置１０は、マスタコンピュータ１１、スレーブコンピュータ１２ａ〜１２ｃ、及びマスタコンピュータ１１とスレーブコンピュータ１２ａ〜１２ｃを接続するＬＡＮスイッチ１３を備えている。 Hereinafter, various embodiments of the present invention will be described.
Embodiment 1 FIG.
FIG. 1 is a block diagram showing a configuration of a search system 100 according to Embodiment 1 of the present invention. As shown in the figure, the search system 100 includes a search device 10 and a search terminal device 20, and the search device 10 and the search terminal device 20 are connected via a network 30.
The search device 10 includes a master computer 11, slave computers 12a to 12c, and a LAN switch 13 that connects the master computer 11 and the slave computers 12a to 12c.

図２は、この発明の実施の形態１によるスレーブコンピュータ１２ａ〜１２ｃの構成を示すブロック図である。図に示すように、スレーブコンピュータ１２ａ〜１２ｃは、記憶装置１２１、データ入力部１２２、データ照合部１２３、照合結果生成部１２４、及び照合結果送信部１２５を備えている。データ入力部１２２、データ照合部１２３、照合結果生成部１２４、及び照合結果送信部１２５は、スレーブコンピュータ１２ａ〜１２ｃのプロセッサを動作させるプログラムのモジュールを表しており、これらは実際には、一体としてスレーブコンピュータ１２ａ〜１２ｃのプロセッサを構成する。
記憶装置１２１は、スレーブコンピュータ１２ａ〜１２ｃのメモリ、あるいはスレーブコンピュータ１２ａ〜１２ｃと接続された外部の記憶装置等である。各スレーブコンピュータ１２ａ〜１２ｃの記憶装置１２１には、検索対象データが分割して格納されている。
また、図３は、実施の形態１によるマスタコンピュータ１１の構成を示すブロック図である。図に示すように、マスタコンピュータ１１は、照合結果受信部１１１、検索結果生成部１１２、及び検索結果出力部１１３を備えている。照合結果受信部１１１、検索結果生成部１１２、及び検索結果出力部１１３は、マスタコンピュータ１１のプロセッサを動作させるプログラムのモジュールを表しており、これらは実際には、一体としてマスタコンピュータ１１のプロセッサを構成する。 FIG. 2 is a block diagram showing a configuration of slave computers 12a-12c according to the first embodiment of the present invention. As shown in the figure, the slave computers 12 a to 12 c include a storage device 121, a data input unit 122, a data collation unit 123, a collation result generation unit 124, and a collation result transmission unit 125. The data input unit 122, the data collation unit 123, the collation result generation unit 124, and the collation result transmission unit 125 represent program modules that operate the processors of the slave computers 12a to 12c. The processor of the slave computers 12a to 12c is configured.
The storage device 121 is a memory of the slave computers 12a to 12c, or an external storage device connected to the slave computers 12a to 12c. Search target data is divided and stored in the storage devices 121 of the slave computers 12a to 12c.
FIG. 3 is a block diagram showing the configuration of the master computer 11 according to the first embodiment. As shown in the figure, the master computer 11 includes a collation result receiving unit 111, a search result generating unit 112, and a search result output unit 113. The collation result receiving unit 111, the search result generating unit 112, and the search result output unit 113 represent program modules that operate the processor of the master computer 11, and these are actually integrated with the processor of the master computer 11. Constitute.

次に動作について説明する。
検索装置１０のマスタコンピュータ１１は、検索端末装置２０から検索問い合わせが通知されると、問い合わせ内容と検索結果の最大取り出し件数Ｍ（ある閾値）をＬＡＮスイッチ１３を介してスレーブコンピュータ１２ａ〜１２ｃへ通知する。ここで、マスタコンピュータ１１は、検索端末装置２０からの問い合わせ内容を解析し、スレーブコンピュータ１２ａ〜１２ｃへは、検索処理の実行に必要な情報に変換された検索条件が通知される。最大取り出し件数Ｍは、マスタコンピュータ１１が出力する検索結果の最大表示件数を表す。 Next, the operation will be described.
When a search inquiry is notified from the search terminal device 20, the master computer 11 of the search device 10 notifies the slave computers 12 a to 12 c via the LAN switch 13 of the inquiry content and the maximum number M of retrieval results (a certain threshold). To do. Here, the master computer 11 analyzes the inquiry content from the search terminal device 20, and the slave computers 12a to 12c are notified of the search conditions converted into information necessary for executing the search process. The maximum number of retrievals M represents the maximum number of retrieval results displayed by the master computer 11.

スレーブコンピュータ１２ａ〜１２ｃは、マスタコンピュータ１１から検索条件及び最大取り出し件数Ｍを受信すると、検索処理を開始する。図４は、実施の形態１によるスレーブコンピュータ１２ａ〜１２ｃの動作のフローチャートである。
まず、スレーブコンピュータ１２ａ〜１２ｃは、スレーブコンピュータ内の全ヒット件数Ｋのカウンタを初期化する（ステップＳＴ１００）。次に、データ入力部１２２は、記憶装置１２１から１つ以上の照合単位を含む一定量（あるデータ量単位）の検索対象データを取得する（ステップＳＴ１０１）。ここで照合単位とは、例えばリレーショナル型データベースの１レコードを表しており、照合処理を行うデータの単位を示している。この時、記憶装置１２１内のデータが終了したと判定された場合には、マスタコンピュータ１１への終了通知を行い（ステップＳＴ１０２、ステップＳＴ１１２）、スレーブコンピュータ１２ａ〜１２ｃの処理を終了する。データ終了でない場合には、取得した検索対象データに対するヒット件数ＮＳのカウンタを初期化する（ステップＳＴ１０３）。 When the slave computers 12a to 12c receive the search condition and the maximum number M of retrieval from the master computer 11, the slave computers 12a to 12c start the search process. FIG. 4 is a flowchart of the operation of the slave computers 12a to 12c according to the first embodiment.
First, the slave computers 12a to 12c initialize a counter for the total number of hits K in the slave computer (step ST100). Next, the data input unit 122 acquires a predetermined amount (a certain data amount unit) of search target data including one or more verification units from the storage device 121 (step ST101). Here, the collation unit represents, for example, one record of a relational database, and indicates a data unit for performing collation processing. At this time, if it is determined that the data in the storage device 121 has been completed, a completion notification is sent to the master computer 11 (step ST102, step ST112), and the processing of the slave computers 12a to 12c is terminated. If it is not the end of data, a counter of the number of hits NS for the acquired search target data is initialized (step ST103).

次に、データ照合部１２３は、取得した検索対象データに含まれる照合単位に対して、マスタコンピュータ１１から通知された検索条件による照合処理を実行する（ステップＳＴ１０４）。照合処理の結果、検索条件にヒットした場合には、ヒット件数ＮＳのカウンタに１を加算し（ステップＳＴ１０５、ステップＳＴ１０６）、全ヒット件数Ｋのカウンタに１を加算する（ステップＳＴ１０７）。ステップＳＴ１０５で検索条件にヒットしない場合には、ステップＳＴ１１０へ進む。 Next, the data collation unit 123 executes collation processing based on the retrieval condition notified from the master computer 11 on the collation unit included in the acquired search target data (step ST104). As a result of the collation processing, when the search condition is hit, 1 is added to the hit number NS counter (step ST105, step ST106), and 1 is added to the total hit number K counter (step ST107). If the search condition is not hit in step ST105, the process proceeds to step ST110.

次に、照合結果生成部１２４は、全ヒット件数Ｋのカウンタが最大取り出し件数Ｍ以下かどうかを判定する（ステップＳＴ１０８）。ここで、全ヒット件数Ｋのカウンタが最大取り出し件数Ｍ以下である場合、照合結果生成部１２４は、ヒットした検索対象データの一覧（以下、ヒットデータ一覧と記す）を生成する（ステップＳＴ１０９）。全ヒット件数Ｋのカウンタが最大取り出し件数Ｍを超えている場合には、ヒットデータ一覧は生成せず、ステップＳＴ１１０へ進む。
図５は、検索装置１０に与えられる検索問い合わせ内容の例であり、図６は、この検索問い合わせに対して照合結果生成部１２４で生成されるヒットデータ一覧の例である。
図５に示す検索問い合わせは、検索対象データＴａｂｌｅに対して、住所＝”神奈川”の検索条件を満たす文書ＩＤとその著者を示す名前を出力するためのものである。この検索問い合わせに対して照合結果生成部１２４は、図６に示すように、検索条件を満たす照合単位（レコード）の文書ＩＤ、名前（著者名）の一覧を生成する。ここでは、Ｌ１、Ｌ２、Ｌ３の３件が検索条件を満たしていることになる。 Next, the collation result generation unit 124 determines whether or not the counter for the total hit number K is equal to or less than the maximum number M (step ST108). Here, if the counter for the total hit number K is equal to or less than the maximum number M, the collation result generation unit 124 generates a list of hit search target data (hereinafter referred to as a hit data list) (step ST109). If the total hit count K counter exceeds the maximum fetch count M, no hit data list is generated and the process proceeds to step ST110.
FIG. 5 is an example of the contents of a search query given to the search device 10, and FIG. 6 is an example of a hit data list generated by the matching result generation unit 124 for this search query.
The search query shown in FIG. 5 is for outputting a document ID satisfying the search condition of address = “Kanagawa” and a name indicating its author to the search target data table. In response to this search query, the verification result generation unit 124 generates a list of document IDs and names (author names) of verification units (records) that satisfy the search conditions, as shown in FIG. Here, three cases of L1, L2, and L3 satisfy the search condition.

次に、照合結果送信部１２５は、次の照合単位があるか否かを判定する（ステップＳＴ１１０）。次の照合単位がある場合には、ステップＳＴ１０４に戻り、その照合単位に対して検索条件による照合処理を実行する。次の照合単位がない場合には、照合結果送信部１２５は、照合結果をマスタコンピュータ１１に送信する（ステップＳＴ１１１）。
照合結果にはヒット件数ＮＳとステップＳＴ１０９で生成されたヒットデータ一覧が含まれるが、上述したように、ヒットデータ一覧については、全ヒット件数Ｋのカウンタが最大取り出し件数Ｍ以下である場合にのみ作成され、その場合にのみマスタコンピュータ１１に送信されることになる。全ヒット件数Ｋのカウンタが最大取り出し件数Ｍを超えている場合にはヒット件数ＮＳのみが送信される。
次に、ステップＳＴ１０１へ戻り、記憶装置１２１から検索対象データが読み出される。ステップＳＴ１０２でデータ終了と判定されるまで、処理が続けられる。
このように、スレーブコンピュータ１２ａ〜１２ｃは、記憶装置１２１内の検索対象データが終了するまで照合処理を複数回に分けて行い、その度に照合結果をマスタコンピュータ１１に送信するという処理を繰り返す（ステップＳＴ１０１〜ステップＳＴ１１１）。 Next, collation result transmission section 125 determines whether or not there is a next collation unit (step ST110). If there is a next collation unit, the process returns to step ST104, and collation processing based on the search condition is executed for the collation unit. If there is no next collation unit, the collation result transmission unit 125 transmits the collation result to the master computer 11 (step ST111).
The collation result includes the number of hits NS and the hit data list generated in step ST109. As described above, the hit data list is only when the counter of all hits number K is equal to or less than the maximum number of fetches M. It is created and transmitted to the master computer 11 only in that case. If the counter for the total hit count K exceeds the maximum fetch count M, only the hit count NS is transmitted.
Next, the process returns to step ST101, and the search target data is read from the storage device 121. The process is continued until it is determined in step ST102 that the data is finished.
As described above, the slave computers 12a to 12c repeat the process of performing the collation process in multiple times until the search target data in the storage device 121 is completed, and transmitting the collation result to the master computer 11 each time ( Step ST101 to Step ST111).

図７は、実施の形態１による、マスタコンピュータ１１の動作のフローチャートである。
マスタコンピュータ１１は、マスタコンピュータ１１に蓄積された全ヒット件数Ｎのカウンタを初期化する（ステップＳＴ２００）。次に、照合結果受信部１１１は、図４のステップＳＴ１１１でスレーブコンピュータ１２ａ〜１２ｃの照合結果送信部１２５から送信された照合結果を受信する（ステップＳＴ２０１）。上述したように、照合結果には、ヒット件数ＮＳのみが含まれる場合と、ヒット件数ＮＳとヒットデータ一覧が含まれる場合がある。照合結果受信部１１１は、受信したのが終了通知であるか否かを判定する（ステップＳＴ２０２）。
ステップＳＴ２０２で終了通知ではないと判定された場合には、全ヒット件数Ｎのカウンタに、通知されたヒット件数ＮＳを加算する（ステップＳＴ２０３）。次に、検索結果生成部１１２は、全ヒット件数Ｎのカウンタが最大取り出し件数Ｍ以下かどうかを判定する（ステップＳＴ２０４）。全ヒット件数Ｎのカウンタが最大取り出し件数Ｍ以下である場合、ヒットデータ一覧を生成する（ステップＳＴ２０５）。全ヒット件数Ｎのカウンタが最大取り出し件数Ｍを超えている場合にはヒットデータ一覧は生成しない。次に、ステップＳＴ２０１へ戻り、スレーブコンピュータ１２ａ〜１２ｃから送信される次の照合結果に対して同様に処理を繰り返す。 FIG. 7 is a flowchart of the operation of the master computer 11 according to the first embodiment.
Master computer 11 initializes a counter for the total number of hits N stored in master computer 11 (step ST200). Next, the collation result receiving unit 111 receives the collation result transmitted from the collation result transmitting unit 125 of the slave computers 12a to 12c in step ST111 of FIG. 4 (step ST201). As described above, the collation result may include only the hit number NS or may include the hit number NS and the hit data list. The verification result receiving unit 111 determines whether or not the received notification is an end notification (step ST202).
If it is determined in step ST202 that the notification is not an end notification, the notified hit count NS is added to the total hit count N counter (step ST203). Next, the search result generation unit 112 determines whether or not the counter for the total number of hits N is equal to or less than the maximum number of extractions M (step ST204). If the counter for the total hit count N is equal to or less than the maximum fetch count M, a hit data list is generated (step ST205). If the total hit count N counter exceeds the maximum fetch count M, no hit data list is generated. Next, the process returns to step ST201, and the same processing is repeated for the next collation result transmitted from the slave computers 12a to 12c.

ステップＳＴ２０２で終了通知であると判断された場合には、ステップＳＴ２０６へ進み、すべてのスレーブコンピュータ１２ａ〜１２ｃから終了通知を受けているかどうか判定する。すべてのスレーブコンピュータ１２ａ〜１２ｃから終了通知を受けていない場合にはステップＳＴ２０１へ進み、処理を繰り返す。ステップＳＴ２０６で、すべてのスレーブコンピュータ１２ａ〜１２ｃから終了通知を受けていると判断された場合には、検索結果出力部１１３は、検索結果生成部１１２で作成されたヒットデータ一覧と全ヒット件数Ｎを検索結果として出力する（ステップＳＴ２０７）。 If it is determined in step ST202 that it is an end notification, the process proceeds to step ST206, and it is determined whether or not the end notification is received from all the slave computers 12a to 12c. If the end notification has not been received from all the slave computers 12a to 12c, the process proceeds to step ST201 and the process is repeated. If it is determined in step ST206 that the end notification has been received from all the slave computers 12a to 12c, the search result output unit 113 displays the hit data list created by the search result generation unit 112 and the total number of hits N. As a search result (step ST207).

以上のように、実施の形態１によれば、スレーブコンピュータ１２ａ〜１２ｃは、マスタコンピュータ１１への照合結果の送信を複数回に分けて行うと共に、全ヒット件数カウンタＫが最大取り出し件数Ｍを超えたら、それ以降はヒット件数ＮＳのみを送信し、ヒットデータ一覧は送信しないようにしたので、マスタコンピュータ１１へのデータ転送量を少なく抑えることが可能となり、データ転送によるボトルネックを解消し、検索処理を高速化することができる。 As described above, according to the first embodiment, the slave computers 12a to 12c perform the transmission of the collation result to the master computer 11 in a plurality of times, and the total hit number counter K exceeds the maximum number of extractions M. After that, since only the hit number NS is transmitted and the hit data list is not transmitted, the data transfer amount to the master computer 11 can be reduced, and the bottleneck caused by the data transfer can be eliminated and the search can be performed. Processing can be speeded up.

実施の形態２．
実施の形態１では、スレーブコンピュータ１２ａ〜１２ｃからマスタコンピュータ１１へ送信する照合結果にヒットデータ一覧を含めるか否かの判断をスレーブコンピュータ１２ａ〜１２ｃ側で行った。
しかし、スレーブコンピュータ１２ａ〜１２ｃの全ヒット件数Ｋが最大取り出し件数Ｍを超えていなくても、マスタコンピュータ１１の全ヒット件数Ｎが最大取り出し件数Ｍを超えている場合には、マスタコンピュータ１１でヒットデータ一覧が生成されないため、スレーブコンピュータ１２ａ〜１２ｃからヒットデータ一覧を送信しても転送データが無駄になってしまう。
そこで、実施の形態２では、マスタコンピュータ１１からスレーブコンピュータ１２ａ〜１２ｃに対して、ヒットデータ一覧の送信が必要か否かを通知する。 Embodiment 2. FIG.
In the first embodiment, the slave computer 12a to 12c determines whether or not to include the hit data list in the collation result transmitted from the slave computers 12a to 12c to the master computer 11.
However, even if the total hit count K of the slave computers 12a to 12c does not exceed the maximum fetch count M, if the total hit count N of the master computer 11 exceeds the maximum fetch count M, the master computer 11 hits Since the data list is not generated, even if the hit data list is transmitted from the slave computers 12a to 12c, the transfer data is wasted.
Therefore, in the second embodiment, the master computer 11 notifies the slave computers 12a to 12c whether or not the hit data list needs to be transmitted.

実施の形態２による検索装置１０の動作について説明する。なお、実施の形態２による検索システム１００及び検索装置１０の構成は、図１〜図３に示すものと同様である。
図８は、この発明の実施の形態２による、スレーブコンピュータ１２ａ〜１２ｃの動作のフローチャートである。
スレーブコンピュータ１２ａ〜１２ｃは、各スレーブコンピュータ内の全ヒット件数Ｋのカウンタを初期化する（ステップＳＴ１００）。次に、データ入力部１２２は、記憶装置１２１から１つ以上の照合単位を含む一定量の検索対象データを取得する（ステップＳＴ１０１）。ここで記憶装置１２１内のデータ終了と判定された場合には、マスタコンピュータ１１への終了通知を行い（ステップＳＴ１０２、ステップＳＴ１１２）、スレーブコンピュータ１２ａ〜１２ｃの処理を終了する。データ終了でない場合には、取得した検索対象データに対するヒット件数ＮＳのカウンタを初期化する（ステップＳＴ１０３）。
次に、データ照合部１２３は、取得した検索対象データに含まれる照合単位に対して、マスタコンピュータ１１から通知された検索条件による照合処理を実行する（ステップＳＴ１０４）。照合処理の結果、検索条件にヒットした場合には、ヒット件数ＮＳのカウンタに１を加算し（ステップＳＴ１０５、ステップＳＴ１０６）、全ヒット件数Ｋのカウンタに１を加算する（ステップＳＴ１０７）。ステップＳＴ１０５で検索条件にヒットしない場合には、ステップＳＴ１１０へ進む。 The operation of the search device 10 according to the second embodiment will be described. The configurations of the search system 100 and the search device 10 according to the second embodiment are the same as those shown in FIGS.
FIG. 8 is a flowchart of the operation of the slave computers 12a to 12c according to the second embodiment of the present invention.
The slave computers 12a to 12c initialize a counter for the total number of hits K in each slave computer (step ST100). Next, the data input unit 122 acquires a certain amount of search target data including one or more verification units from the storage device 121 (step ST101). Here, when it is determined that the data in the storage device 121 is terminated, a termination notification is sent to the master computer 11 (step ST102, step ST112), and the processing of the slave computers 12a to 12c is terminated. If it is not the end of data, a counter of the number of hits NS for the acquired search target data is initialized (step ST103).
Next, the data collation unit 123 executes collation processing based on the retrieval condition notified from the master computer 11 on the collation unit included in the acquired search target data (step ST104). As a result of the collation processing, when the search condition is hit, 1 is added to the hit number NS counter (step ST105, step ST106), and 1 is added to the total hit number K counter (step ST107). If the search condition is not hit in step ST105, the process proceeds to step ST110.

次に、照合結果生成部１２４は、マスタコンピュータ１１からヒットデータ一覧の送信が不要であることを示す通知（ヒットデータ一覧不要通知）を受信しているか否かを確認する（ステップＳＴ１１３）。ヒットデータ一覧不要通知を受けている場合、ステップＳＴ１１０へ進む。ヒットデータ一覧不要通知を受けていない場合には、全ヒット件数Ｋのカウンタが最大取り出し件数Ｍ以下かどうかを判定する（ステップＳＴ１０８）。ここで、全ヒット件数Ｋのカウンタが最大取り出し件数Ｍ以下である場合、照合結果生成部１２４は、ヒットデータ一覧を生成する（ステップＳＴ１０９）。全ヒット件数Ｋのカウンタが最大取り出し件数Ｍを超えている場合には、ヒットデータ一覧は生成せず、ステップＳＴ１１０へ進む。
ステップＳＴ１１０で照合結果送信部１２５は、次の照合単位があるか否かを判定する。次の照合単位がある場合には、ステップＳＴ１０４に戻り、その照合単位に対して検索条件による照合処理を実行する。次の照合単位がない場合には、照合結果送信部１２５は、照合結果をマスタコンピュータ１１に送信する（ステップＳＴ１１１）。
次に、ステップＳＴ１０１へ戻り、検索対象データが読み出される。ステップＳＴ１０２でデータ終了と判定されるまで、処理が続けられる Next, the collation result generation unit 124 checks whether or not a notification (hit data list unnecessary notification) indicating that transmission of the hit data list is unnecessary is received from the master computer 11 (step ST113). If a hit data list unnecessary notification has been received, the process proceeds to step ST110. If the hit data list unnecessary notification has not been received, it is determined whether or not the counter for the total hit count K is equal to or less than the maximum fetch count M (step ST108). Here, if the counter for the total number of hits K is equal to or less than the maximum number of fetches M, the matching result generation unit 124 generates a hit data list (step ST109). If the total hit count K counter exceeds the maximum fetch count M, no hit data list is generated and the process proceeds to step ST110.
In step ST110, the collation result transmission unit 125 determines whether there is a next collation unit. If there is a next collation unit, the process returns to step ST104, and collation processing based on the search condition is executed for the collation unit. If there is no next collation unit, the collation result transmitting unit 125 transmits the collation result to the master computer 11 (step ST111).
Next, the process returns to step ST101, and the search target data is read out. The process is continued until it is determined in step ST102 that the data has ended.

このように、スレーブコンピュータ１２ａ〜１２ｃは、検索対象データが終了するまで照合処理を複数回に分けて行い、その度に照合結果をマスタコンピュータ１１に送信するという処理を繰り返す（ステップＳＴ１０１〜ステップＳＴ１１１）。
また、照合結果送信部１２５がマスタコンピュータ１１に送信する照合結果には、ヒット件数ＮＳとステップＳＴ１０９で生成されたヒットデータ一覧が含まれるが、上述したように、ヒットデータ一覧については、マスタコンピュータ１１からのヒットデータ一覧不要通知が無く、かつ、全ヒット件数Ｋのカウンタが最大取り出し件数Ｍ以下である場合にのみ作成され、その場合にのみマスタコンピュータ１１に送信されることになる。マスタコンピュータ１１からヒットデータ一覧不要通知を受信している場合、または、全ヒット件数Ｋのカウンタが最大取り出し件数Ｍを超えている場合にはヒット件数ＮＳのみが送信される。 As described above, the slave computers 12a to 12c repeat the process of performing the collation process in a plurality of times until the search target data is completed, and transmitting the collation result to the master computer 11 each time (step ST101 to step ST111). ).
Further, the collation result transmitted from the collation result transmitting unit 125 to the master computer 11 includes the hit count NS and the hit data list generated in step ST109. As described above, the hit data list includes the master computer. 11 is generated only when there is no notification of unnecessary hit data list from 11 and the counter of the total number of hits K is equal to or less than the maximum number of fetches M, and only in that case is transmitted to the master computer 11. When the hit data list unnecessary notification is received from the master computer 11 or when the counter of the total hit count K exceeds the maximum fetch count M, only the hit count NS is transmitted.

図９は、実施の形態２による、マスタコンピュータ１１の動作のフローチャートである。
マスタコンピュータ１１は、マスタコンピュータ１１に蓄積された全ヒット件数Ｎのカウンタを初期化する（ステップＳＴ２００）。次に、照合結果受信部１１１は、図４のステップＳＴ１１１でスレーブコンピュータ１２ａ〜１２ｃの照合結果送信部１２５から送信された照合結果を受信する（ステップＳＴ２０１）。上述したように、照合結果には、ヒット件数ＮＳのみが含まれる場合と、ヒット件数ＮＳとヒットデータ一覧が含まれる場合がある。照合結果受信部１１１は、受信したのが終了通知であるか否かを判定する（ステップＳＴ２０２）。
ステップＳＴ２０２で終了通知ではないと判定された場合には、全ヒット件数Ｎのカウンタに、通知されたヒット件数ＮＳを加算する（ステップＳＴ２０３）。次に、検索結果生成部１１２は、全ヒット件数Ｎのカウンタが最大取り出し件数Ｍ以下かどうかを判定する（ステップＳＴ２０４）。全ヒット件数Ｎのカウンタが最大取り出し件数Ｍ以下である場合、ヒットデータ一覧を生成する。
全ヒット件数Ｎのカウンタが最大取り出し件数Ｍを超えている場合には、ヒットデータ一覧は生成せず、すべてのスレーブコンピュータ１２ａ〜１２ｃに対してヒットデータ一覧の送信が不要である旨の通知（ヒットデータ一覧不要通知）をする（ステップＳＴ２０８）。次に、ステップＳＴ２０１へ戻り、スレーブコンピュータ１２ａ〜１２ｃから送信される次の照合結果に対して同様に処理を繰り返す。 FIG. 9 is a flowchart of the operation of the master computer 11 according to the second embodiment.
Master computer 11 initializes a counter for the total number of hits N stored in master computer 11 (step ST200). Next, the collation result receiving unit 111 receives the collation result transmitted from the collation result transmitting unit 125 of the slave computers 12a to 12c in step ST111 of FIG. 4 (step ST201). As described above, the collation result may include only the hit number NS or may include the hit number NS and the hit data list. The verification result receiving unit 111 determines whether or not the received notification is an end notification (step ST202).
If it is determined in step ST202 that the notification is not an end notification, the notified hit count NS is added to the total hit count N counter (step ST203). Next, the search result generation unit 112 determines whether or not the counter for the total number of hits N is equal to or less than the maximum number of extractions M (step ST204). If the counter for the total hit count N is equal to or less than the maximum fetch count M, a hit data list is generated.
If the total hit count N counter exceeds the maximum fetch count M, a hit data list is not generated and a notification that the hit data list need not be transmitted to all slave computers 12a to 12c ( (Notifying hit data list unnecessary)) (step ST208). Next, the process returns to step ST201, and the same processing is repeated for the next collation result transmitted from the slave computers 12a to 12c.

以上のように、実施の形態２によれば、マスタコンピュータ１１は、ヒット件数Ｎが最大取り出し件数Ｍを超えた時点で、スレーブコンピュータ１２ａ〜１２ｃに対してヒットデータ一覧不要通知をし、スレーブコンピュータ１２ａ〜１２ｃは、ヒットデータ一覧不要通知を受信したら、マスタコンピュータ１１へのヒットデータ一覧の送信を行わない。すなわち、スレーブコンピュータ１２ａ〜１２ｃは、実施の形態１のように、全ヒット件数Ｋのカウンタが最大取り出し件数Ｍを超えた場合のみでなく、マスタコンピュータ１１からヒットデータ一覧不要通知を受信した場合にもマスタコンピュータ１１へのヒットデータ一覧の送信を行わないようにした。これにより、データ転送量を実施の形態１よりもさらに削減することが可能となり、検索処理をさらに高速化することができる。 As described above, according to the second embodiment, the master computer 11 notifies the slave computers 12a to 12c of the hit data list unnecessary notification when the hit count N exceeds the maximum fetch count M, and the slave computer Upon receiving the hit data list unnecessary notification, 12a to 12c do not transmit the hit data list to the master computer 11. That is, the slave computers 12a to 12c receive not only the hit data list unnecessary notification from the master computer 11 but also the case where the counter for the total hit number K exceeds the maximum number of fetches M as in the first embodiment. Also, the hit data list is not transmitted to the master computer 11. As a result, the data transfer amount can be further reduced as compared with the first embodiment, and the search process can be further speeded up.

実施の形態３．
実施の形態１及び実施の形態２は、スレーブコンピュータ１２ａ〜１２ｃからマスタコンピュータ１１へのデータ転送量を削減するための手段を備えている。実施の形態３では、スレーブコンピュータ１２ａ〜１２ｃ内での処理を効率化する。具体的には、記憶装置１２１に対する入出力処理の効率を高める。 Embodiment 3 FIG.
The first and second embodiments include means for reducing the amount of data transferred from the slave computers 12a to 12c to the master computer 11. In the third embodiment, the processing in the slave computers 12a to 12c is made efficient. Specifically, the efficiency of input / output processing for the storage device 121 is increased.

実施の形態３による検索装置１０の動作について説明する。なお、実施の形態３による検索システム１００及び検索装置１０の構成は図１〜図３に示すものと同様である。また、実施の形態３は、スレーブコンピュータ１２ａ〜１２ｃのデータ入力部１２２の動作に特徴があり、その他の照合処理や照合結果生成、送信処理、またマスタコンピュータ１１の処理については実施の形態１と同様である。 The operation of the search device 10 according to Embodiment 3 will be described. The configurations of the search system 100 and the search device 10 according to the third embodiment are the same as those shown in FIGS. The third embodiment is characterized by the operation of the data input unit 122 of the slave computers 12a to 12c. Other collation processing, collation result generation, transmission processing, and processing of the master computer 11 are the same as those of the first embodiment. It is the same.

図１０は、記憶装置１２１に格納されている検索対象データの一例を示す図である。図に示すように、記憶装置１２１は、列項目が文書ＩＤ、名前、住所、性別からなる表形式のデータを格納している。
図１１は、実施の形態３による、データ入力部１２２の動作のフローチャートであり、図４に示すステップＳＴ１０１の処理に該当する。
まず、データ入力部１２２は、全ヒット件数Ｋのカウンタが最大取り出し件数Ｍ以下であるか否かを確認する（ステップＳＴ１０１１）。
全ヒット件数Ｋのカウンタが最大取り出し件数Ｍ以下である場合には、スレーブコンピュータ１２ａ〜１２ｃはヒットデータ一覧を作成する必要があるので、検索問い合わせの実行に必要なすべての項目を記憶装置１２１から取得する（ステップＳＴ１０１２）。例えば、図６に示す検索問い合わせを実行する場合を例に説明すると、文書ＩＤ、名前、住所の列項目を取得する必要がある。
一方、全ヒット件数Ｋのカウンタが最大取り出し件数Ｍを超えている場合には、スレーブコンピュータ１２ａ〜１２ｃはヒット件数ＮＳのみをマスタコンピュータ１１に通知すればよいので、ヒット件数の算出に必要な項目、すなわち検索問い合わせに示された検索条件の判定に必要な項目のみを記憶装置１２１から取得する（ステップＳＴ１０１３）。例えば、図６に示す検索問い合わせの例では、検索条件の判定に必要な列項目は住所であり、この列項目だけを取得すればよい。 FIG. 10 is a diagram illustrating an example of search target data stored in the storage device 121. As shown in the figure, the storage device 121 stores tabular data whose column items are document ID, name, address, and gender.
FIG. 11 is a flowchart of the operation of the data input unit 122 according to the third embodiment, which corresponds to the process of step ST101 shown in FIG.
First, the data input unit 122 checks whether or not the counter for the total number of hits K is equal to or less than the maximum number of extractions M (step ST1011).
When the counter of the total hit count K is equal to or less than the maximum fetch count M, the slave computers 12a to 12c need to create a hit data list, so all items necessary for executing the search query are stored from the storage device 121. Obtain (step ST1012). For example, in the case where a search query shown in FIG. 6 is executed as an example, it is necessary to acquire column items of document ID, name, and address.
On the other hand, if the counter of the total number of hits K exceeds the maximum number of fetches M, the slave computers 12a to 12c only need to notify the master computer 11 of the number of hits NS, so items necessary for calculating the number of hits That is, only items necessary for determining the search condition indicated in the search inquiry are acquired from the storage device 121 (step ST1013). For example, in the example of the search query shown in FIG. 6, the column item necessary for determining the search condition is an address, and only this column item needs to be acquired.

なお、全ヒット件数Ｋが最大取り出し件数Ｍ以下である場合と全ヒット件数Ｋが最大取り出し件数Ｍを超えている場合とで記憶装置１２１から取得するデータの形式が異なるため、データ照合部１２３は、図４に示すステップＳＴ１０４において、この違いに従って照合処理を実行する。 Since the format of data acquired from the storage device 121 differs depending on whether the total hit count K is less than or equal to the maximum fetch count M and the total hit count K exceeds the maximum fetch count M, the data matching unit 123 In step ST104 shown in FIG. 4, collation processing is executed according to this difference.

以上のように、実施の形態３によれば、全ヒット件数Ｋのカウンタが最大取り出し件数Ｍを超えたら、データ入力部１２２は、記憶装置１２１からヒット件数の算出に必要な検索対象データだけを取得するようにしたため、記憶装置１２１とデータ入力部１２２の間でやり取りされるデータ量を絞り込むことが可能となり、記憶装置１２１からのデータ読み出し効率が向上し、検索装置１０全体の処理のスループットの向上を図ることができる。 As described above, according to the third embodiment, when the counter for the total hit count K exceeds the maximum fetch count M, the data input unit 122 stores only the search target data necessary for calculating the hit count from the storage device 121. Therefore, the amount of data exchanged between the storage device 121 and the data input unit 122 can be reduced, the efficiency of reading data from the storage device 121 is improved, and the processing throughput of the entire search device 10 is reduced. Improvements can be made.

なお、検索対象データの格納先がスレーブコンピュータ１２ａ〜１２ｃの記憶装置１２１ではなく、例えばネットワーク上のファイルサーバやＳＡＮ（ＳｔｏｒａｇｅＡｒｅａＮｅｔｗｏｒｋ）に格納されたデータであっても、同様にデータ入力部１２２で取得するデータを絞り込むことが可能であり、検索性能の向上効果が期待できる。
図１２は、このような検索システム２００の構成例を示すブロック図である。図１と同一の符号は同一の構成要素を表している。検索装置１０のスレーブコンピュータ１２ａ〜１２ｃは、ネットワーク４０に接続されている。記憶装置４１ａ，４１ｂは、ネットワーク４０に接続されたファイルサーバ等の記憶装置であり、検索対象データを格納している。
なお、検索システム２００に、実施の形態１及び実施の形態２の検索装置１０を適用することも可能である。 Even if the storage destination of the search target data is not the storage device 121 of the slave computers 12a to 12c but data stored in, for example, a file server or SAN (Storage Area Network) on the network, the data input unit 122 is similarly used. It is possible to narrow down the data to be acquired with, and the improvement effect of search performance can be expected.
FIG. 12 is a block diagram illustrating a configuration example of such a search system 200. The same reference numerals as those in FIG. 1 represent the same components. The slave computers 12 a to 12 c of the search device 10 are connected to the network 40. The storage devices 41a and 41b are storage devices such as file servers connected to the network 40, and store search target data.
Note that the search device 10 of the first embodiment and the second embodiment can be applied to the search system 200.

また、実施の形態３は実施の形態２にも適用できる。この場合には、ステップＳＴ１０１１で判断する条件として、マスタコンピュータ１１からヒットデータ一覧不要通知を受信したかどうかを追加する。
なお、検索装置１０の処理のスループットの向上を図るには、スレーブコンピュータ１２ａ〜１２ｃのＣＰＵの処理効率を高めることも有効であるが、この点については、スレーブコンピュータの台数を増やすことにより対応が可能である。 The third embodiment can also be applied to the second embodiment. In this case, whether or not a hit data list unnecessary notification has been received from the master computer 11 is added as a condition to be determined in step ST1011.
In order to improve the processing throughput of the search device 10, it is also effective to increase the processing efficiency of the CPUs of the slave computers 12a to 12c. However, this can be dealt with by increasing the number of slave computers. Is possible.

実施の形態４．
実施の形態４では、スレーブコンピュータ１２ａ〜１２ｃは、逐次集計される全ヒット件数Ｋに基づいて予測ヒット件数Ｐを算出し、予測ヒット件数Ｐを用いてマスタコンピュータ１１にヒットデータ一覧を送信するか否かを判断する。 Embodiment 4 FIG.
In the fourth embodiment, the slave computers 12a to 12c calculate the predicted hit number P based on the total hit number K sequentially counted, and transmit the hit data list to the master computer 11 using the predicted hit number P. Judge whether or not.

実施の形態４による検索装置１０の動作について説明する。なお、実施の形態４による検索システム１００及び検索装置１０の構成は図１〜図３に示すものと同様である。また、マスタコンピュータ１１の動作のフローチャートは、図９に示す実施の形態２と同様である。
図１３は、この発明の実施の形態４による、スレーブコンピュータ１２ａ〜１２ｃの動作のフローチャートである。図８と同一の符号で示されたステップは、実施の形態２と同様なので説明を省略する。
照合結果生成部１２４は、ステップＳＴ１１３でマスタコンピュータ１１からヒットデータ一覧不要通知を受信していないことを確認すると、予測ヒット件数Ｐを算出する（ステップＳＴ１１４）。
ここで、予測ヒット件数Ｐの算出方法の例について説明する。まず、スレーブコンピュータ１２ａ〜１２ｃが実行した照合処理の回数Ｑと全ヒット件数Ｋを用いて、検索ヒット率Ｒを以下のように定義する。
Ｒ＝Ｋ÷Ｑ（１）
次に、検索対象データの全件数をＣとすると、予測ヒット件数Ｐは以下の式で算出することができる。
Ｐ＝Ｒ×Ｃ（２）
なお、予測ヒット件数Ｐの算出には他の方法を用いてもよい。 The operation of the search device 10 according to the fourth embodiment will be described. The configurations of the search system 100 and the search apparatus 10 according to the fourth embodiment are the same as those shown in FIGS. The flowchart of the operation of the master computer 11 is the same as that of the second embodiment shown in FIG.
FIG. 13 is a flowchart of operations of the slave computers 12a to 12c according to the fourth embodiment of the present invention. Steps denoted by the same reference numerals as those in FIG. 8 are the same as those in the second embodiment, and thus description thereof is omitted.
When confirming that the hit data list unnecessary notification has not been received from the master computer 11 in step ST113, the collation result generating unit 124 calculates the predicted hit number P (step ST114).
Here, an example of a method for calculating the predicted hit count P will be described. First, the search hit rate R is defined as follows using the number Q of collation processes executed by the slave computers 12a to 12c and the total number K of hits.
R = K ÷ Q (1)
Next, assuming that the total number of search target data is C, the predicted hit number P can be calculated by the following equation.
P = R × C (2)
Note that other methods may be used for calculating the predicted hit count P.

続くステップＳＴ１１５では、予測ヒット件数Ｐと最大取り出し件数Ｍ、及び全ヒット件数Ｋとヒット件数閾値Ｎｔの比較を行い、予測ヒット件数Ｐが最大取り出し件数Ｍ以下か、或いは、全ヒット件数Ｋがヒット件数閾値Ｎｔ以下であれば、ヒットデータ一覧を生成する。
ここで、ヒット件数閾値Ｎｔは、例えば以下の式で定義することができる。ただし、Ｎｓｌａｖｅはスレーブコンピュータの台数とし、ここでは、Ｎｓｌａｖｅ＝３である。
Ｎｔ＝Ｐ÷Ｎｓｌａｖｅ（３） In the following step ST115, the predicted hit number P is compared with the maximum extraction number M, and the total hit number K is compared with the hit number threshold Nt. If the number threshold is Nt or less, a hit data list is generated.
Here, the hit number threshold Nt can be defined by the following equation, for example. However, Nslave is the number of slave computers, and here Nslave = 3.
Nt = P ÷ Nslave (3)

なお、予測ヒット件数Ｐの算出については、照合回数Ｑが少ない段階では統計的精度に大きなゆらぎが発生することが考えられる。よって、実際上は、照合回数Ｑが予め設定した値を超えるまでは予測ヒット件数Ｐの算出を行わないようにするとよい。照合回数Ｑが予め設定した値を超えるまでは、予測ヒット件数Ｐを常に０に設定し、必ずヒットデータ一覧を生成するようにしてもよい。 In addition, regarding the calculation of the predicted hit number P, it is conceivable that a large fluctuation occurs in the statistical accuracy when the number of matching times Q is small. Therefore, in practice, it is preferable not to calculate the predicted hit number P until the number Q of collations exceeds a preset value. Until the number Q of collations exceeds a preset value, the predicted hit number P may be always set to 0, and a hit data list may be generated without fail.

以上のように、実施の形態４によれば、スレーブコンピュータ１２ａ〜１２ｃは、全ヒット件数Ｋに基づいて逐次予測ヒット件数Ｐを算出し、予測ヒット件数Ｐが最大取り出し件数Ｍを超え、かつ全ヒット件数Ｋがヒット件数閾値Ｎｔを超えた場合にはヒットデータ一覧を作成しないようにしたので、マスタコンピュータ１１に送信する照合結果にヒットデータ一覧を含めるか否かを早期に判断することができ、実施の形態２と比較してマスタコンピュータ１１へ送信するデータ量をより一層絞り込むことが可能となる。これにより、データ転送効率がより一層向上するため、検索処理をさらに高速化することができる。 As described above, according to the fourth embodiment, the slave computers 12a to 12c sequentially calculate the predicted hit number P based on the total hit number K, and the predicted hit number P exceeds the maximum number M of fetches. Since the hit data list is not created when the hit count K exceeds the hit count threshold Nt, it is possible to determine at an early stage whether or not to include the hit data list in the collation result transmitted to the master computer 11. Compared to the second embodiment, the amount of data transmitted to the master computer 11 can be further reduced. Thereby, since the data transfer efficiency is further improved, the search process can be further speeded up.

なお、実施の形態４ではヒットデータ一覧の作成をやめるタイミングが実施の形態１〜実施の形態３とは異なるため、最終的なヒット件数とヒットデータ一覧の数が完全には一致しない。しかし、インターネット検索などでは、検索者は必ずしもヒットデータ一覧をすべて参照するとは限らないため、出力するヒットデータ一覧の件数は近似的にヒット件数と等しければよいと考えられる。
また、実施の形態４は、実施の形態１に適用することも可能である。また、実施の形態４に実施の形態３を適用することも可能である。 In the fourth embodiment, the timing of stopping the creation of the hit data list is different from that in the first to third embodiments, and therefore the final hit count and the number of hit data lists do not completely match. However, in the Internet search or the like, the searcher does not always refer to the entire hit data list, so it is considered that the number of hit data lists to be output should be approximately equal to the number of hits.
The fourth embodiment can also be applied to the first embodiment. Further, the third embodiment can be applied to the fourth embodiment.

この発明の実施の形態１による、検索システムの構成を示すブロック図である。It is a block diagram which shows the structure of the search system by Embodiment 1 of this invention. この発明の実施の形態１による、スレーブコンピュータの構成を示すブロック図である。It is a block diagram which shows the structure of the slave computer by Embodiment 1 of this invention. この発明の実施の形態１による、マスタコンピュータの構成を示すブロック図である。It is a block diagram which shows the structure of the master computer by Embodiment 1 of this invention. この発明の実施の形態１による、スレーブコンピュータの動作のフローチャートである。It is a flowchart of operation | movement of the slave computer by Embodiment 1 of this invention. 検索装置に与えられる検索問い合わせ内容の例を示す図である。It is a figure which shows the example of the search inquiry content given to a search device. この発明の実施の形態１による、スレーブコンピュータによって作成されるヒットデータ一覧の例を示す図である。It is a figure which shows the example of the hit data list produced by the slave computer by Embodiment 1 of this invention. この発明の実施の形態１による、マスタコンピュータの動作のフローチャートである。It is a flowchart of operation | movement of the master computer by Embodiment 1 of this invention. この発明の実施の形態２による、スレーブコンピュータの動作のフローチャートである。It is a flowchart of operation | movement of the slave computer by Embodiment 2 of this invention. この発明の実施の形態２による、マスタコンピュータの動作のフローチャートである。It is a flowchart of operation | movement of the master computer by Embodiment 2 of this invention. 記憶装置に格納されている検索対象データの一例を示す図である。It is a figure which shows an example of the search object data stored in the memory | storage device. この発明の実施の形態３による、データ入力部の動作のフローチャートである。It is a flowchart of operation | movement of the data input part by Embodiment 3 of this invention. この発明による、検索システムの構成の例を示すブロック図である。It is a block diagram which shows the example of a structure of the search system by this invention. この発明の実施の形態４による、スレーブコンピュータの動作のフローチャートである。It is a flowchart of operation | movement of the slave computer by Embodiment 4 of this invention.

Explanation of symbols

１０検索装置、１１マスタコンピュータ、１２ａ〜１２ｃスレーブコンピュータ、１３ＬＡＮスイッチ、２０検索端末装置、３０，４０ネットワーク、４１ａ，４１ｂ記憶装置、１００，２００検索システム、１１１照合結果受信部、１１２検索結果生成部、１１３検索結果出力部、１２１記憶装置、１２２データ入力部、１２３データ照合部、１２４照合結果生成部、１２５照合結果送信部。 DESCRIPTION OF SYMBOLS 10 Retrieval device, 11 Master computer, 12a-12c Slave computer, 13 LAN switch, 20 Retrieval terminal device, 30, 40 Network, 41a, 41b Storage device, 100, 200 Retrieval system, 111 Collation result receiving part, 112 Retrieval result generation Unit, 113 search result output unit, 121 storage device, 122 data input unit, 123 data collation unit, 124 collation result generation unit, 125 collation result transmission unit.

Claims

A data input unit for acquiring search target data in a certain data amount unit ;
Based on the number of matching and the number of hits in the matching process using the given search condition, the total number of matchings obtained by counting the number of matchings and the total number of hits obtained by counting the number of hits. A data verification unit to calculate ,
The number of predicted hits is calculated by a predetermined calculation formula using the total number of matching times obtained by the matching process and the total number of hits as parameters, and if the number of predicted hits is less than a certain threshold, a hit data list is calculated. A verification result generation unit to generate,
One or more slave computers having a matching result transmission unit for transmitting the hit number and the hit data list to a master computer;
A matching result receiving unit for receiving the number of hits and the hit data list from each slave computer;
If the total number of hits received from each slave computer is below a certain threshold, a search result generation unit that generates a list of all hit data received from each slave computer;
A master computer having a search result output unit for outputting the total number of records and the list of all hit data ;
The data collation processing, the collation result generation unit generates the hit data list according to the number of hits, the collation result transmission unit sends the hit number and the hit data list to the master computer, the data A search apparatus, which is repeatedly performed for each search target data in a quantity unit .

The search result generation unit notifies each slave computer that a hit data list is unnecessary when the total number obtained by sequentially counting the number of hits received from the slave computer exceeds a certain threshold,
The verification result generation unit, when having received the notification, according to claim 1 Symbol placement search system characterized in that it does not generate a subsequent hit data list.

If the data input unit is determined not to generate a hit data list in the collation result generation unit, thereafter, only the data items necessary for specifying the number of hits among the search target data are acquired. The search device according to claim 1 or 2 .