JP2009163772A

JP2009163772A - Retrieval system and computer program

Info

Publication number: JP2009163772A
Application number: JP2009103452A
Authority: JP
Inventors: Shigehisa Kawabe; 惠久川邉; Minoru Ikeda; 稔池田; Takashi Osawa; 隆大澤; Atsushi Kadona; 敦門奈; Masao Nukaga; 雅夫額賀
Original assignee: Fuji Xerox Co Ltd
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2002-09-17
Filing date: 2009-04-21
Publication date: 2009-07-23

Abstract

<P>PROBLEM TO BE SOLVED: To provide retrieval technology capable of performing high-speed retrieval even when integrated retrieval is performed over various document filings. <P>SOLUTION: When a retrieval request is issued from a retrieval user terminal 15, a retrieval part 10 provides a user ID etc. of a retrieval user to an access controller 11. The access controller 11 refers to a user authority storage part 12 and returns access authority of the retrieval user. The access controller 11, for example, refers to a table, which specifies relations between access rights and their corresponding indexes, and then returns an identifier (or identifiers) for an index 14, which is referable with the access authority of the retrieval user, to the retrieval part 10. The retrieval part 10 refers to the index 14, which is approved based on the identifier of the referable index 14, and extracts a hit record, then returns it to the retrieval user terminal 15. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

この発明は、検索システムおよびコンピュータプログラムに関する。 The present invention relates to a search system and a computer program.

従来、分散環境において、独立して管理され、開示される複数の文書ファイリングをまたがって、論理的に唯一のインデクスを構築し、インデクスに対して、一回の検索操作で、複数の文書ファイリングに存在する、複数の、文書の属性や、ＵＲＬなどで示される文書の位置を、一元的に管理し、検索が可能なデータベースが構築されている。このような検索を統合検索と呼ぶ。 Conventionally, a logically unique index is constructed across multiple document filings that are managed and disclosed independently in a distributed environment, and a single search operation is performed on the index to multiple document filings. A plurality of existing database attributes and the positions of documents indicated by URLs are managed in a centralized manner, and a database is constructed. Such a search is called an integrated search.

統合検索のためのインデクス構築に際して、検索操作と独立して行われる収集操作によって、複数の文書ファイリングから、文書が収集される。収集操作は、収集対象とする文書ファイリングから、所定のアクセス権を有するユーザ、またはアプリケーションが、所定のネットワークプロトコルで、文書名を指定するか検索を行って、文書を特定し取得する。取得した文書を解析し、インデクス構築に必要な属性やキーワードを作成して、インデクスを構築する。 When an index for integrated search is constructed, documents are collected from a plurality of document filings by a collection operation performed independently of the search operation. In the collection operation, from a document filing to be collected, a user or application having a predetermined access right specifies or retrieves a document by specifying or searching for a document name using a predetermined network protocol. The acquired document is analyzed, attributes and keywords necessary for index construction are created, and the index is constructed.

なお、この発明と関連する特許文献には、複数のデータベースにそれぞれ格納されている文章データを解析し必要項目を抽出し抽出結果をインデクス化し、単一のインデクスで複数のデータベースにアクセスすることを開示するものや（特許文献１）、記憶装置に記憶されている複数のファイルの各々から所定の情報を取得するとともに権限情報も取得し所定の情報と権限情報とを用いてインデクスを構築してユーザの権限に応じた範囲でしか検索が行われないようにすることを開示するもの（特許文献２）がある。 In addition, in patent documents related to the present invention, it is necessary to analyze sentence data stored respectively in a plurality of databases, extract necessary items, index the extraction results, and access a plurality of databases with a single index. Obtaining predetermined information from each of a plurality of files stored in the storage device (Patent Document 1) or (Patent Document 1), obtaining authority information, and constructing an index using the predetermined information and authority information There is a disclosure (Patent Document 2) that discloses that a search is performed only within a range according to the authority of a user.

特開２０００−１６３４４５公報JP 2000-163445 A 特開２００１−３４４２４５公報JP 2001-344245 A

この発明は、種々の文書ファイリングにわたって統合検索を行う場合でも、高速の検索が可能な検索技術を提供することを目的としている。 An object of the present invention is to provide a search technique capable of performing a high-speed search even when performing an integrated search over various document filings.

本発明の原理的な構成では、複数のインデクスを用いて検索を行う検索装置において、各インデクスから取得した、スコアを含むヒットレコードをスコアに基づいて各インデクスごとにソートし、ソートした上記スコアを含むヒットレコードを所定の規則で連結し、連結した上記スコアを含むヒットレコードをスコアに基づいて再度ソートし、再度ソートした後のヒットレコードの上位の所定数を検索結果として出力するようにしている。 In the basic configuration of the present invention, in a search device that performs a search using a plurality of indexes, the hit records including the scores obtained from each index are sorted for each index based on the scores, and the sorted scores are obtained. The hit records that are included are concatenated according to a predetermined rule, the hit records including the concatenated scores are sorted again based on the scores, and the predetermined number at the top of the hit records after the re-sorting is output as a search result. .

この構成においては、インデクスごとにスコアを計算しソートを行うので、複数のインデクスに対して分散処理が可能であり、応答性を高め、スケーラビリティを確保することができる。また、インデクスごとにソートしたヒットレコードを連結する際に、インデクスごとのヒットレコードの処理対象上限値を定めておけば、不必要なヒットレコードをヒットレコード連結部に送る必要がなくなり、例えば通信コストを低減することが可能となる。 In this configuration, since the score is calculated and sorted for each index, distributed processing can be performed for a plurality of indexes, responsiveness can be improved, and scalability can be ensured. In addition, when concatenating hit records sorted by index, if an upper limit for processing hit records for each index is determined, there is no need to send unnecessary hit records to the hit record concatenation unit. Can be reduced.

なお、この発明は装置またはシステムとして実現できるのみでなく、方法としても実現可能である。また、そのような発明の一部をソフトウェアとして構成することができることはもちろんである。またそのようなソフトウェアをコンピュータに実行させるために用いるソフトウェア製品もこの発明の技術的な範囲に含まれることも当然である。 The present invention can be realized not only as an apparatus or a system but also as a method. Of course, a part of the invention can be configured as software. Of course, software products used to cause a computer to execute such software are also included in the technical scope of the present invention.

この発明の上述の側面およびこの発明の他の側面は特許請求の範囲に記載され、以下実施例を用いて詳細に説明される。 The above described aspects of the invention and other aspects of the invention are set forth in the appended claims and are described in detail below with reference to examples.

この発明によれば、複数のインデクスを用いた場合でも、本構成を有していない場合と比較して、高速に検索を行える。 According to this invention, even when a plurality of indexes, compared with the case not having this constitution, allows the high-speed retrieval.

この発明の実施例１の検索装置を模式的に示すブロック図である。It is a block diagram which shows typically the search device of Example 1 of this invention. 上述の実施例１のインデクス構築装置を模式的に示すブロック図である。It is a block diagram which shows typically the index construction | assembly apparatus of the above-mentioned Example 1. FIG. 上述の実施例１をイントラネット環境に適用した構成例を説明する図である。It is a figure explaining the structural example which applied the above-mentioned Example 1 to the intranet environment. この発明の実施例２の検索装置を模式的に示すブロック図である。It is a block diagram which shows typically the search device of Example 2 of this invention. 上述実施例２のインデクス別ヒットレコード数生成部における各インデクスのヒットレコード算出処理を説明するための、Ｂ＋ツリー構造の説明図である。It is explanatory drawing of the B + tree structure for demonstrating the hit record calculation process of each index in the hit record number production | generation part according to index of the said Example 2. FIG. 図５のＢ＋ツリー構造の管理ノードを説明する図である。It is a figure explaining the management node of the B + tree structure of FIG. 図５のＢ＋ツリー構造の中間ノードを説明する図である。It is a figure explaining the intermediate node of the B + tree structure of FIG. 図５のＢ＋ツリー構造のリーフノードを説明する図である。It is a figure explaining the leaf node of the B + tree structure of FIG. 図５のＢ＋ツリー構造の検索を説明する図である。It is a figure explaining the search of the B + tree structure of FIG. 図５のＢ＋ツリー構造に含まれる件数管理情報を説明する図である。It is a figure explaining the number management information contained in the B + tree structure of FIG. 図５のＢ＋ツリー構造を用いてインデクスのヒットレコードを算出する処理を説明するフローチャートである。6 is a flowchart illustrating processing for calculating an index hit record using the B + tree structure of FIG. 5. 図１１の順位算出ルーチンを説明するフローチャートである。12 is a flowchart illustrating a rank calculation routine in FIG. 11. この発明の実施例３の構成を説明する図である。It is a figure explaining the structure of Example 3 of this invention. 上述実施例３の動作を説明するフローチャートである。It is a flowchart explaining operation | movement of the said Example 3. FIG. 上述実施例３におけるレコードのフォーマットを説明する図である。It is a figure explaining the format of the record in the said Example 3. FIG. 上述実施例のスコア計算の例を説明する図である。It is a figure explaining the example of the score calculation of the above-mentioned Example.

以下、この発明の実施例について説明する。 Examples of the present invention will be described below.

［実施例１］
実施例１は複数のインデクスを用いアクセス権限に応じて検索を制御するものである。 [Example 1]
The first embodiment uses a plurality of indexes to control the search according to access authority.

図１は、実施例１の検索装置を模式的に示しており、この図において、検索装置は、検索部１０、アクセス制御部１１、ユーザ権限記憶部１２およびインデクス記憶装置１３を含んで構成されている。インデクス記憶装置１３は、複数のインデクス（便宜上Ａ〜Ｎを付す）１４を記憶している。複数のインデクス１４はそれぞれ異なるレベルのアクセス権限が付与されている。もちろん、同一のアクセス権限が複数のインデクス１４に付与され、同一のアクセス権限のグループとして管理されても良い。１つのインデクス記憶装置１３にすべてのインデクス１４を記憶するのでなく、複数のインデクス記憶装置１３を設け、分散させて記憶するようにしても良い。この実施例の検索装置には検索ユーザ端末１５から検索要求が送られ、検索結果が検索ユーザ端末１５に返される。 FIG. 1 schematically illustrates a search device according to the first embodiment. In FIG. 1, the search device includes a search unit 10, an access control unit 11, a user authority storage unit 12, and an index storage device 13. ing. The index storage device 13 stores a plurality of indexes (A to N are attached for convenience) 14. The plurality of indexes 14 are given different levels of access authority. Of course, the same access authority may be given to a plurality of indexes 14 and managed as a group with the same access authority. Instead of storing all the indexes 14 in one index storage device 13, a plurality of index storage devices 13 may be provided and stored in a distributed manner. A search request is sent from the search user terminal 15 to the search device of this embodiment, and the search result is returned to the search user terminal 15.

インデクス記憶装置１３のインデクス１４は、後述するインデクス構築装置（図２）により構築・管理される。 The index 14 of the index storage device 13 is constructed and managed by an index construction device (FIG. 2) described later.

この実施例において、検索ユーザ端末１５から検索要求がなされると、検索部１０はアクセス制御部１１に検索ユーザのユーザＩＤ等を供給し、アクセス制御部１１は、ユーザ権限記憶部１２を参照して検索ユーザのアクセス権限を返す。アクセス制御部１１は、例えば、アクセス権限とそれに対応するインデクスとの関係を規定した表を表引きして、検索ユーザのアクセス権限で参照可能なインデクス１４の識別子（複数の場合もある）を検索部１０に返す。検索部１０は、参照可能なインデクス１４の識別子に基づいて許容されるインデクス１４を参照してヒットしたレコードを取りだし、検索ユーザ端末１５に返す。ヒットしたレコードを、ランキングスコアに基づいて整理し、所定の表示数のレコードのみ検索ユーザ端末１５に返すようにしてもよい。 In this embodiment, when a search request is made from the search user terminal 15, the search unit 10 supplies the user ID and the like of the search user to the access control unit 11, and the access control unit 11 refers to the user authority storage unit 12. To return the access authority of the search user. For example, the access control unit 11 looks up a table that defines the relationship between the access authority and the corresponding index, and searches for the identifier (which may be plural) of the index 14 that can be referred to by the access authority of the search user. Return to part 10. The search unit 10 refers to the permitted index 14 based on the identifier of the index 14 that can be referred to, extracts the hit record, and returns it to the search user terminal 15. The hit records may be organized based on the ranking score, and only a predetermined display number of records may be returned to the search user terminal 15.

この例では、インデクス１４を参照してヒットしたレコードは、すべてアクセス可能なものであり、ヒットしたレコードについて個々にユーザのアクセス権限を検証する必要がない。 In this example, the records hit with reference to the index 14 are all accessible, and there is no need to individually verify the user access authority for the hit records.

なお、検索ユーザが指定したインデクス１４あるいはすべてのインデクス１４に対して検索部１０が参照要求を行い、アクセス制御部１１が、ユーザ権限記憶部１２のユーザのアクセス権限を参照して参照の許否を行うようにしても良い。 Note that the search unit 10 makes a reference request to the index 14 or all indexes 14 designated by the search user, and the access control unit 11 refers to the access authority of the user in the user authority storage unit 12 to determine whether or not to refer. You may make it do.

つぎにこの実施例のインデクス構築装置について説明する。 Next, the index construction apparatus of this embodiment will be described.

図２は、この実施例のインデクス構築装置を模式的に示しており、この図において、インデクス構築装置は、プロセス起動部２０、インデクスレコード管理部２１、アクセス制御部２３、プロセス権限記憶部２４を含んで構成されている。プロセス起動部２０は、予めアクセス権限が設定されている。プロセス起動部２０は、インデクスレコード管理部２１のインデクスレコード管理プロセス２２を起動し、プロセス起動部２０のプロセスを付与する。ユーザあるいは管理者がインデクスレコード管理部２１のインデクスレコード管理プロセス２２を起動し、そのアクセス権限を付与するようにしても良い。起動されたインデクスレコード管理プロセス２２は、文書を保持する文書ファイリングシステム１０３（図３参照）にアクセスし、自らのアクセス権限で許容される文書を参照してインデクスレコードを生成する。文書ファイリングシステム１０３の文書へのアクセスはアクセス制御部２３およびプロセス権限記憶部２４により制御される。こうしてインデクスレコード管理プロセス２２は、自らのアクセス権限に対応する（同等以下の）セキュリティドメインの文書のインデクスレコードを生成して、インデクス記憶装置１３中の対応するアクセス権限のインデクス１４を構築したり、修正（挿入・削除）したりする。このインデクス１４の構築・修正の処理についてもアクセス制御部２３およびプロセス権限記憶部２４により制御される。 FIG. 2 schematically shows an index construction apparatus according to this embodiment. In this figure, the index construction apparatus includes a process activation unit 20, an index record management unit 21, an access control unit 23, and a process authority storage unit 24. It is configured to include. The process activation unit 20 has access authority set in advance. The process activation unit 20 activates the index record management process 22 of the index record management unit 21 and assigns the process of the process activation unit 20. The user or administrator may activate the index record management process 22 of the index record management unit 21 and grant the access authority. The activated index record management process 22 accesses the document filing system 103 (see FIG. 3) that holds the document, and generates an index record by referring to a document that is allowed with its own access authority. Access to the document of the document filing system 103 is controlled by the access control unit 23 and the process authority storage unit 24. In this way, the index record management process 22 generates an index record of a document in the security domain corresponding to (or equivalent to) its access authority, and constructs the index 14 of the corresponding access authority in the index storage device 13. Modify (insert / delete). The process of building / modifying the index 14 is also controlled by the access control unit 23 and the process authority storage unit 24.

このようにしてアクセス権限ごとにインデクス１４が構築・管理される。 In this way, the index 14 is constructed and managed for each access authority.

図３は、実施例１の検索装置およびインデクス構築装置をイントラネット環境で実現した構成例を示す。図３において、検索システム１００、複数のインデクス構築システム１０２、複数の文書ファイリングシステム１０３、ディレクトリサーバ１０４、ウェブサーバ１０５、アプリケーションサーバ１０６、クライアント端末１２０等が、ＬＡＮ１０８に配置されている。またＬＡＮ１０８にはルータ１０７、ネットワーク１２１を介してクライアント端末１２０が接続されている。 FIG. 3 shows a configuration example in which the search device and the index construction device of the first embodiment are realized in an intranet environment. In FIG. 3, a search system 100, a plurality of index construction systems 102, a plurality of document filing systems 103, a directory server 104, a web server 105, an application server 106, a client terminal 120, and the like are arranged on the LAN 108. A client terminal 120 is connected to the LAN 108 via a router 107 and a network 121.

検索システム１００はインデクス保持部１０１を有し、複数のインデクス（図１のインデクス１４）を参照できる。 The search system 100 includes an index holding unit 101 and can refer to a plurality of indexes (index 14 in FIG. 1).

検索システム１００、インデクス構築システム１０２はそれぞれ記憶媒体１０９、１１０、あるいはネットワーク１２１を用いてインストールされる。 The search system 100 and the index construction system 102 are installed using the storage media 109 and 110 or the network 121, respectively.

文書ファイリングシステム１０３は全体として単一のアクセス権限が付与されていても良いし（例えば１０３Ａ）、文書ファイリングシステム１０３の個々の文書あるいはディレクトリにアクセス権限が個別に付与されても良い。文書ファイリングシステム１０３Ａとインデクス構築システム１０２Ａは例えば同一のアクセス権限を有し、対応するセキュリティドメイン２００をなす。他のファイリングシステム１０３は種々のアクセス権限の文書等を含み、それぞれ、アクセス権限に対応するインデクス構築システム１０２によりインデクスレコードを生成できるようになっている。 The document filing system 103 may be given a single access right as a whole (for example, 103A), or an access right may be given to each document or directory of the document filing system 103 individually. The document filing system 103A and the index construction system 102A have the same access authority, for example, and form the corresponding security domain 200. The other filing system 103 includes documents having various access authorities, and the index construction system 102 corresponding to the access authority can generate an index record.

インデクス構築システム１０２は対応するアクセス権限で各文書ファイリングシステム１０３の文書をアクセスしていき、文書ファイリングシステム１０３はディレクトリサーバ１０４を用いて権限を認証し、アクセスの許否を決定する。インデクス構築システム１０２は、対応するアクセス権限の文書を参照してインデクスレコードを生成して、インデクス保持部１０１の対応するインデクス１４を構築し、あるいは対応するインデクスにレコードを挿入する。また、必要に応じ、インデクスのレコードの削除等の処理を行う。 The index construction system 102 accesses the document of each document filing system 103 with the corresponding access authority, and the document filing system 103 authenticates the authority using the directory server 104 and determines whether access is permitted. The index construction system 102 generates an index record by referring to the corresponding access authority document, constructs the corresponding index 14 of the index holding unit 101, or inserts the record into the corresponding index. Also, processing such as deletion of index records is performed as necessary.

このようにして、インデクス保持部１０１にアクセス権限ごとにインデクス１４が構築されその後管理される。 In this way, the index 14 is constructed for each access authority in the index holding unit 101 and managed thereafter.

検索ユーザはクライアント端末１２０を用いてウェブサーバ１０５およびアプリケーションサーバ１０６（あるいはＣＧＩプログラム等を用いて）を介して検索システム１００に検索要求を行う。検索システム１００は、ディレクトリサーバ１０４を用いて検索ユーザのアクセス権限を調べ、これに応じて対応するインデクス１４を参照して検索ユーザに許容されるヒットレコードのみをリストとしてクライアント端末１２０に返す。検索ユーザは、リストから選択した文書を所定の文書ファイリングシステム１０３から取り出すことができる。 The search user uses the client terminal 120 to make a search request to the search system 100 via the web server 105 and the application server 106 (or using a CGI program or the like). The search system 100 uses the directory server 104 to check the access authority of the search user, and refers to the corresponding index 14 accordingly and returns only the hit records allowed for the search user to the client terminal 120 as a list. The search user can retrieve a document selected from the list from the predetermined document filing system 103.

なお、インデクス保持部１０１をインデクス構築システム１０２サイトに分散して配置し、検索システム１００がこれを参照するようにしても良い。また、インデクス構築システム１０２サイトに検索システム１００およびインデクス保持部１０１を分散配置してもよい。この場合、クライアント端末１２０の検索要求を代行して分散配置された複数の検索システム１００にディスパッチする。 In addition, the index holding unit 101 may be distributed and arranged in the index construction system 102 site, and the search system 100 may refer to this. Further, the search system 100 and the index holding unit 101 may be distributed in the index construction system 102 site. In this case, the search request of the client terminal 120 is dispatched to a plurality of distributed search systems 100 as a proxy.

［実施例２］
つぎにこの発明の実施例２について説明する。この実施例は複数のインデクスを用いた場合でも、ランキングスコアの小さなヒットレコードが表示リストに含まれないようにするものである。 [Example 2]
Next, a second embodiment of the present invention will be described. In this embodiment, even when a plurality of indexes are used, hit records having a small ranking score are not included in the display list.

図４は、この実施例の検索装置を模式的に示しており、この図において、検索部１０は、インデクス別ヒットレコード数生成部３０、インデクス選択部３１、ヒットレコード併合部３２、ヒットレコード一時記憶部３３、表示レコード出力部３４等を含んで構成されている。 FIG. 4 schematically shows the search device of this embodiment. In this figure, the search unit 10 includes an index-specific hit record number generation unit 30, an index selection unit 31, a hit record merging unit 32, and a temporary hit record. The storage unit 33 and the display record output unit 34 are included.

検索ユーザ端末１５は、検索部１０に検索要求を送る。検索要求には検索キーと共に表示レコードの数を含ませることができる。インデクス別ヒットレコード数生成部３０は、検索キーに対してインデクス１４ごとにヒットレコード数を算出する。これについては後に説明する。インデクス選択部３１は、指定された表示レコード数あるいはデフォルトの表示レコード数に基づいてインデクス記憶装置１３から取り出すヒットレコード数を決定する。これを閾値と呼ぶ。閾値は、表示レコード数のＮ倍である（Ｎは十分に精度の良い結果を得られるように決められる）。インデクス選択部３１は、最も少ないインデクス数で閾値のヒットレコードを得られるようにインデクスを選択する。種々の態様が可能であるが、例えば、ヒットレコード数が多い順にインデクスを選び、それで閾値に達したら、そのインデクスのみを選ぶ。ヒットレコード数が閾値に達しない場合には、つぎにヒットレコード数が多いインデクスを選び、そのヒットレコード数を、現在のヒットレコード数の総数に累積する。累積値が閾値に達するまで、同様の処理を繰り返し、用いる１または複数のインデクスを確定する。 The search user terminal 15 sends a search request to the search unit 10. The search request can include the number of display records together with the search key. The index-specific hit record number generating unit 30 calculates the number of hit records for each index 14 with respect to the search key. This will be described later. The index selection unit 31 determines the number of hit records to be extracted from the index storage device 13 based on the designated display record number or the default display record number. This is called a threshold value. The threshold value is N times the number of display records (N is determined so as to obtain a sufficiently accurate result). The index selection unit 31 selects an index so that a threshold hit record can be obtained with the smallest number of indexes. Although various modes are possible, for example, an index is selected in descending order of the number of hit records, and when the threshold is reached, only that index is selected. If the number of hit records does not reach the threshold, the index with the largest number of hit records is selected next, and the number of hit records is accumulated in the total number of current hit records. The same process is repeated until the cumulative value reaches the threshold value, and one or more indexes to be used are determined.

用いるインデクスが複数の場合にはヒットレコードをヒットレコード併合部３２で併合し、ヒットレコード一時記憶部３３にストアする。用いるインデクスが一個の場合にはヒットレコードをそのままヒットレコード一時記憶部３３にストアする。 When there are a plurality of indexes to be used, the hit records are merged by the hit record merging unit 32 and stored in the hit record temporary storage unit 33. When one index is used, the hit record is stored in the hit record temporary storage unit 33 as it is.

ヒットレコード一時記憶部３３のヒットレコードはそこの含まれるランキングスコアに基づいてソートされ、ソート順に表示レコード出力部３４に送られる。表示レコード出力部３４の出力表示レコードリストは検索ユーザ端末１５に返される。 The hit records in the hit record temporary storage unit 33 are sorted based on the ranking score included therein, and sent to the display record output unit 34 in the sort order. The output display record list of the display record output unit 34 is returned to the search user terminal 15.

こうして、ヒットレコードの併合処理の回数を少なくすることができる。 Thus, the number of hit record merging processes can be reduced.

つぎに、インデクス別ヒットレコード数生成部３０で行うヒットレコード数算出処理について説明する。もちろん、キーごとにヒットレコード数を予め求めて表を作成し、このような表を表引きしても良い。 Next, a hit record number calculation process performed by the index-specific hit record number generation unit 30 will be described. Of course, a table may be created by previously obtaining the number of hit records for each key, and such a table may be drawn.

インデクス記憶装置１３のインデクス１４は、例えば、図５に示すように、管理ノード、中間ノードおよびリーフノードにより記述されるＢ＋ツリー構造である。管理ノードは、図６に示すように、複数のＢ＋ツリーを管理する。各Ｂ＋ツリーはスキーマによりキー、バリュー等のバイト数等が規定される。管理ノードにより、検索キーが対応するＢ＋ツリーに振り分けられる。中間ノードは、図７に示すように、分岐を制御するキーと分岐する下位ノード（サブツリー）が規定される。また、この実施例に特有の構成として、各下位ノードについてそのサブツリーのリーフノードに属するレコードの数を件数管理情報として保持している。リーフノードは図８に示すようにキーとバリュー（例えば文書ＩＤ）との複数の対を含んでいる。リーフノードは、中間ノードにおいて分岐を制御するキーについても、そのキーとバリューとの対を含んでいる。また、つぎのリーフノードへのポインタも含まれ、いわゆる水平検索を行える。 The index 14 of the index storage device 13 has a B + tree structure described by a management node, an intermediate node, and a leaf node, for example, as shown in FIG. As shown in FIG. 6, the management node manages a plurality of B + trees. Each B + tree defines the number of bytes such as key and value according to the schema. The search key is distributed to the corresponding B + tree by the management node. As shown in FIG. 7, the intermediate node defines a key for controlling branching and a lower node (subtree) for branching. Further, as a configuration unique to this embodiment, the number of records belonging to the leaf nodes of the subtree for each lower node is held as the number management information. The leaf node includes a plurality of pairs of keys and values (for example, document IDs) as shown in FIG. The leaf node also includes a key-value pair for the key that controls branching in the intermediate node. In addition, a pointer to the next leaf node is included, and so-called horizontal search can be performed.

検索に際しては、図９に示すように、管理ノードによりＢ＋ツリーが決定され、そのルートノードから中間ノードを沿って垂直検索が行われ、リーフノードに当直した後、水平検索が行われる。 In the search, as shown in FIG. 9, a B + tree is determined by the management node, a vertical search is performed along the intermediate node from the root node, and a horizontal search is performed after making contact with the leaf node.

ここで、図１０を用いて、中間ノードの件数管理情報について説明する。図１０において、中間ノードは、第１段目の中間ノード（管理ノードのつぎのノード）を例にすると、キー「ＬＥＦＴ」、Ｋ（０）_１、Ｋ（０）_２、Ｋ（０）_３、・・・により下位ノード（サブツリー）に分岐する。キー「ＬＥＦＴ」の直下にはレコードは格納されない。「Ｋ（０）」は第１段目のキーであることを示す。第ｎ段目の中間ノードのキーは同様に「Ｋ（ｎ−１）」で表す。「ＬＥＦＴ」からＫ（０）１までの範囲のキーが分岐する下位ノード（サブツリー）のリーフノードに格納されるレコードの数Ｒ（０）_１を、下位ノード０の件数管理情報にストアする。Ｋ（０）_１からＫ（０）_２までの範囲のキーが分岐する下位ノード（サブツリー）のリーフノードに格納されるレコードの数ｒ（０）_１を求め、これにその前の下位ノードのレコードの数（この場合Ｒ０）を足して、Ｒ（０）_１＝Ｒ（０）_１＋ｒ（０）_１を得、下位ノード１の件数管理情報に格納する。キーＫ（０）_ＮからキーＫ（０）_Ｎ＋１までの範囲のキーが分岐する下位ノードＮのリーフノードに格納されるレコードｒ（０）_Ｎを求め、これにその直前の下位ノードＮ−１の件数管理情報（Ｒ（０）_Ｎ−１）を足して、下位ノードＮの件数管理情報Ｒ（０）_Ｎ＝Ｒ（０）_Ｎ−１＋ｒ（０）_Ｎを得る。同様に最後の下位ノードまで、件数管理情報を取得して管理する。 Here, the number management information of the intermediate node will be described with reference to FIG. In FIG. 10, the intermediate node is a key “LEFT”, K (0) ₁ , K (0) ₂ , K (0) ₃ , taking the first-stage intermediate node (the node next to the management node) as an example. Branches to a lower node (subtree) by. No record is stored immediately below the key “LEFT”. “K (0)” indicates a key in the first row. Similarly, the key of the n-th intermediate node is represented by “K (n−1)”. The number of records R (0) ₁ stored in the leaf node of the lower node (subtree) where the key in the range from “LEFT” to K (0) 1 branches is stored in the number management information of the lower node 0. The number of records r (0) ₁ stored in the leaf nodes of the lower nodes (subtrees) where the keys in the range from K (0) ₁ to K (0) ₂ branch is obtained, and the number of the previous lower node By adding the number of records (in this case R0), R (0) ₁ = R (0) ₁ + r (0) ₁ is obtained and stored in the number management information of the lower node 1. The record r (0) _N stored in the leaf node of the lower node N where the key in the range from the key K (0) _N to the key K (0) _{N + 1} branches is obtained, and the immediately preceding lower node N−1 is obtained. The number management information R (0) _N = R (0) _N−1 + r (0) _N of the lower node N is obtained by adding the number management information (R (0) _N−1 ). Similarly, the number management information is acquired and managed up to the last lower node.

開始キーおよび終了キーを用いて検索するときに、中間ノードの件数管理情報を用いてリーフノードに到達した時点の順位を求めることができる。すなわち、順次辿っていく中間ノードにおいて、つぎに辿る下位の中間ノードを決定する。このとき、その左側の中間ノードの件数管理情報を求める。つぎに辿る中間ノードでも同様にし、この操作をリーフノードに至るまで繰り返す。例えば、第１段から第Ｎ段のそれぞれのキーＫ（０）_Ａ、Ｋ（１）_Ｂ、Ｋ（２）_Ｃ、・・・、Ｋ（Ｎ−１）_Ｄを辿っていくとすると、中間ノード０のキー（下位のノードまたはサブツリー。以下同様）Ｋ（０）_Ａ−１の件数管理情報Ｒ（０）_Ａ−１、中間ノード１のキーＫ（１）_Ｂ−１の件数管理情報Ｒ（１）_Ｂ−１、中間ノード２のキーＫ（２）_Ｃ−１の件数管理情報Ｒ（２）_Ｃ−１、・・・中間ノード（Ｎ−１）のキーＫ（Ｎ−１）_Ｄ−１の件数管理情報Ｒ（Ｎ−１）_Ｄ−１を累積してリーフノードに到達したときレコードの順位を得ることができる。 When searching using the start key and the end key, it is possible to obtain the rank at the time of reaching the leaf node using the number management information of the intermediate node. That is, in the intermediate nodes that are sequentially traced, the lower intermediate node to be traced next is determined. At this time, the number management information of the intermediate node on the left side is obtained. The same operation is repeated for the next intermediate node, and this operation is repeated until the leaf node is reached. For example, if the keys K (0) _A , K (1) _B , K (2) _C ,..., K (N−1) _D from the first stage to the Nth stage are traced, Key of node 0 (subordinate node or subtree; the same applies hereinafter) K (0) _A-1 number management information R (0) _A-1 , intermediate node 1 key K (1) _B-1 number management information R (1) _B-1 , intermediate node 2 key K (2) _C-1 number management information R (2) _C-1 ,... Intermediate node (N-1) key K (N-1) _{D -1} number management information R (N-1) _D-1 can be accumulated and the rank of the record can be obtained when the leaf node is reached.

まず、開始キーを基づいて中間ノードを辿り、対応する件数管理情報を累積してリーフノードに到達したときのレコードの順位を求め、さらにリーフノードを水平検索する。開始キーを含むレコードに到達したときにそのレコードに至るまでの水平検索時のレコード数を求め、これをリーフノードに到達したときのレコードの順位に足して開始キーを含むレコード（開始キーを含むレコードがない場合には、検索範囲に含まれて開始キーに最も近いキーを含むレコード）の順位（Ｎｓｔａｒｔ）を求める。 First, the intermediate node is traced based on the start key, the corresponding number management information is accumulated, the order of the records when reaching the leaf node is obtained, and the leaf node is horizontally searched. When the record containing the start key is reached, the number of records in the horizontal search up to that record is obtained, and this is added to the order of the records when the leaf node is reached (including the start key) If there is no record, the rank (Nstart) of the record that is included in the search range and includes the key closest to the start key is obtained.

つぎに、終了キーに基づいて中間ノードを辿り、対応する件数管理情報を累積してリーフノードに到達したときのレコードの順位を求め、さらにリーフノードを水平検索する。終了キーを含むレコードに到達したときにそのレコードに至るまでの水平検索時のレコード数を求め、これをリーフノードに到達したときのレコードの順位に足して終了キーを含むレコード（開始キーを含むレコードがない場合には、検索範囲に含まれて開始キーに最も近いキーを含むレコード）の順位（Ｎｅｎｄ）を求める。 Next, the intermediate node is traced based on the end key, the corresponding number management information is accumulated, the rank of the record when the leaf node is reached is obtained, and the leaf node is horizontally searched. When the record that includes the end key is reached, the number of records at the time of horizontal search until that record is obtained, and this is added to the order of the records when the leaf node is reached, and the record that includes the end key (including the start key) If there is no record, the rank (Nend) of the record including the key closest to the start key included in the search range is obtained.

インデクス別ヒットレコード数生成部３０は、ＮｓｔａｒｔおよびＮｅｎｄに基づいて検索範囲に含まれるキーを持つレコードの総数を算出する。終了キーを含むレコードが有る場合には、そのレコードの総数はＮｅｎｄ−Ｎｓｔａｒｔ＋１であり、終了キーを含むレコードがない場合には、そのレコードの総数はＮｅｎｄ−Ｎｓｔａｒｔである。 The index-specific hit record number generation unit 30 calculates the total number of records having keys included in the search range based on Nstart and Nend. When there is a record including the end key, the total number of records is Nend-Nstart + 1, and when there is no record including the end key, the total number of records is Nend-Nstart.

図１１は、インデクス別ヒットレコード数生成部３０における各インデクスごとのヒットレコード数算出処理を示している。図１１においては、語および文書ＩＤを用いて範囲検索における検索範囲のレコード（文書）の総数を算出する。総数の算出の処理は以下のとおりである。なお、検索者が語を入力すると、文書ＩＤの範囲が自動的に０ｘ３０００（１６進数表示）から０ｘ３ｆｆｆとされる。 FIG. 11 shows the hit record number calculation processing for each index in the hit record number generation unit 30 by index. In FIG. 11, the total number of records (documents) in the search range in the range search is calculated using the word and the document ID. The processing for calculating the total number is as follows. When the searcher inputs a word, the document ID range is automatically changed from 0x3000 (hexadecimal notation) to 0x3fff.

［ステップＳ１０］：検索範囲を受け取る。
［ステップＳ１１］：Ｂ＋ツリーを決定する。
［ステップＳ１２］：開始キーを検索キーとする。
［ステップＳ１３］：検索キーが該当するキーを、選択する
［ステップＳ１４］：順位算出ルーチンを実施する。図９参照。
［ステップＳ１５］：順位算出ルーチンで取得した順位をＮｓｔａｒｔとする。
［ステップＳ１６］：終了キーを検索キーとする。
［ステップＳ１７］：順位算出ルーチンを実施する。
［ステップＳ１８］：順位算出ルーチンで取得した順位をＮｅｎｄとする。
［ステップＳ１９］：終了キーに該当するレコードがあるか。あればステップＳ２０ヘ進み、なければステップＳ２１へ進む。
［ステップＳ２０］：検索範囲の件数をＮｅｎｄ−Ｎｓｔａｒｔ＋１で算出する。
［ステップＳ２１］：検索範囲の件数をＮｅｎｄ−Ｎｓｔａｒｔで算出する。 [Step S10]: A search range is received.
[Step S11]: A B + tree is determined.
[Step S12]: The start key is used as a search key.
[Step S13]: Select a key corresponding to the search key [Step S14]: Implement a rank calculation routine. See FIG.
[Step S15]: The rank obtained by the rank calculation routine is set as Nstart.
[Step S16]: The end key is used as a search key.
[Step S17]: A rank calculation routine is executed.
[Step S18]: The rank obtained by the rank calculation routine is defined as Nend.
[Step S19]: Is there a record corresponding to the end key? If there is, the process proceeds to step S20, and if not, the process proceeds to step S21.
[Step S20]: The number of cases in the search range is calculated by Nend−Nstart + 1.
[Step S21]: The number of cases in the search range is calculated by Nend-Nstart.

順位算出ルーチンはつぎのとおりである。 The rank calculation routine is as follows.

［ステップＳ４０］：順位を０にリセットする。
［ステップＳ４１］：中間ノードにおいて検索キーが該当するキーの左のキーの件数管理情報を順位に累積する。
［ステップＳ４２］：検索キーが該当するキーの下位のノードに進む。
［ステップＳ４３］：ノードが中間ノードかリーフノードかを判別する。中間ノードであれば、ステップＳ４１に戻る。リーフノードであればステップＳ４４に進む。
［ステップＳ４４］：リーフノードに到達したときのレコードから検索キーに対応するキーのレコードまで水平検索で辿る。
［ステップＳ４５］：水平検索で辿ったレコードの数を上述の順位に累積する。 [Step S40]: The order is reset to zero.
[Step S41]: The number management information of the key to the left of the key corresponding to the search key in the intermediate node is accumulated in order.
[Step S42]: The search key proceeds to a node below the corresponding key.
[Step S43]: It is determined whether the node is an intermediate node or a leaf node. If it is an intermediate node, the process returns to step S41. If it is a leaf node, the process proceeds to step S44.
[Step S44]: The horizontal search is performed from the record when the leaf node is reached to the record of the key corresponding to the search key.
[Step S45]: The number of records traced in the horizontal search is accumulated in the above-described order.

以上で実施例２の説明を終了する。 This is the end of the description of the second embodiment.

［実施例３］
つぎにこの発明の実施例３について説明する。この実施例は、インデクスを用いた検索処理を行う検索装置本体と検索装置本体の検索結果を連結等する検索管理装置とをネットワークを介して接続して検索システムを構築するものである。 [Example 3]
Next, Embodiment 3 of the present invention will be described. In this embodiment, a search system is constructed by connecting, via a network, a search apparatus body that performs a search process using an index and a search management apparatus that links search results of the search apparatus body.

図１３は、実施例３の検索システムを全体として示しており、図１３において図３と対応する箇所には対応する符号を付した。図１３において、検索システム１００は、検索管理サーバ３００と複数の検索サーバ３０１とを有して構成されている。検索サーバ３０１はそれぞれ対応するインデクス保持部３０２を有し、例えばこのインデクス保持部３０２に格納されているＢ＋ツリーの情報（実施例１、２と同様）を用いて検索を行う。検索管理サーバ３００は、クライアント１２０からの検索要求を受取り、アクセス制御等を行うとともに、検索要求に対して許容された検索サーバ３０１に検索要求をディスパッチする。検索管理サーバ３００は、検索要求をディスパッチした検索サーバ３０１から検索結果を受取り、出力上限値（例えばユーザが指定したもの。あるいはシステム上のデフォルト値）だけヒットレコードを取り出して検索結果としてクライアント端末１２０に返す。 FIG. 13 shows the entire search system according to the third embodiment. In FIG. 13, portions corresponding to those in FIG. In FIG. 13, the search system 100 includes a search management server 300 and a plurality of search servers 301. Each search server 301 has a corresponding index holding unit 302, and performs a search using, for example, B + tree information (similar to the first and second embodiments) stored in the index holding unit 302. The search management server 300 receives a search request from the client 120, performs access control, etc., and dispatches the search request to the search server 301 permitted for the search request. The search management server 300 receives the search result from the search server 301 that dispatched the search request, extracts hit records by the output upper limit value (for example, the one specified by the user or the default value on the system), and the client terminal 120 as the search result. Return to.

図１４は、実施例３の検索システム１００における処理を示しており、その詳細は以下のとおりである。なお、これらの処理は検索管理サーバ３００および検索サーバ３０１で実行されるものであり、例えば記録媒体３０３、３０４に記憶されたプログラムを検索管理サーバ３００や検索サーバ３０１にインストールして実現できる。 FIG. 14 shows processing in the search system 100 according to the third embodiment, and details thereof are as follows. These processes are executed by the search management server 300 and the search server 301. For example, the programs stored in the recording media 303 and 304 can be installed in the search management server 300 and the search server 301.

［ステップＳ５０］：各検索サーバ３０１でインデクス保持部３０２のインデクスを用いて検索を行う。なお、各検索サーバ３０１は、出力制限値（ユーザに出力するレコードの数の上限）の例えば１０倍のレコード数を上限としてレコードを取り出す（上限値を超えたら検索を終了する）。このレコードは例えば図１５に示すようなキーとバリューとを含むものであり、キーは語キー（キーワード等の文書の属性）および文書ＩＤからなる。バリューは各レコードの検索スコアを算出するためのオカレンスデータであり、例えば、更新時刻、出現頻度、出現分布のデータからなる。オカレンスデータからスコアを計算し、このスコアに基づいてヒットレコードをソートする。 [Step S50]: Each search server 301 performs a search using the index stored in the index holding unit 302. Each search server 301 takes out a record with an upper limit of the number of records, for example, 10 times the output limit value (upper limit of the number of records output to the user) (when the upper limit value is exceeded, the search is terminated). This record includes, for example, a key and a value as shown in FIG. 15, and the key includes a word key (document attribute such as a keyword) and a document ID. Value is occurrence data for calculating a search score of each record, and includes, for example, update time, appearance frequency, and appearance distribution data. Calculate the score from the occurrence data and sort the hit records based on this score.

［ステップＳ５１］：検索管理サーバ３００は、検索サーバ３０１からソート済みのヒットレコードを受け取る。受け取るレコードはスコアを直接に含み、オカレンスデータは基本的には不要である。
［ステップＳ５２］：ソート済みのヒットレコード数が多い順に、検索サーバ３０１からのヒットレコードを連結する。
［ステップＳ５３］：連結したヒットレコードの総数が累積上限値、例えば、出力上限値の１０倍に達したかどうかを判別する。累積上限値に達しない場合にはステップＳ５２に戻り処理を繰り返す。達した場合にはステップＳ５４へ進む。
［ステップＳ５４］：連結したヒットレコードをスコアで再度ソートする。
「ステップＤ５５］：出力上限値だけ上位からヒットレコードを出力する。 [Step S51]: The search management server 300 receives the sorted hit records from the search server 301. The record you receive contains the score directly, and no occurrence data is basically needed.
[Step S52]: Concatenate the hit records from the search server 301 in descending order of the number of sorted hit records.
[Step S53]: It is determined whether or not the total number of linked hit records has reached a cumulative upper limit, for example, 10 times the output upper limit. If the cumulative upper limit value is not reached, the process returns to step S52 and is repeated. If reached, the process proceeds to step S54.
[Step S54]: Sort linked hit records again by score.
[Step D55]: Hit records are output from the top by the output upper limit value.

各レコードのスコアは例えばつぎのように算出される。
［数１］
｛Ａ１×（出現密度）＋Ａ２×（更新日−基準日）｝×（出現分布情報で決定される値。例えば１〜２の値）
Ａ１、Ａ２は係数である。 The score of each record is calculated as follows, for example.
[Equation 1]
{A1 × (appearance density) + A2 × (update date−reference date)} × (value determined by appearance distribution information. For example, a value of 1 to 2)
A1 and A2 are coefficients.

出現密度は、キーワードが文書中に含まれる割合であり、例えば、定数×出現数／文書サイズで求められる。出現密度が大きいほどスコアが大きくなる。 The appearance density is a ratio in which a keyword is included in a document, and is obtained by, for example, constant × number of appearances / document size. The higher the appearance density, the higher the score.

更新日は文書を更新した日付であり、原則として基準時は検索を行っている日付に「２０４８」（約４年）を足したものである。「日付」は例えば０〜３２７６７の整数値であり、およそ、１９７０年１月１日から２０３８年１月１９日をカバーする。１日は１．３に相当する。通常、更新日は数カ月から数年程度前の日付である。更新日をそのまま用いると、約３０年分使用しない期間ができてしまうので、ダイナミックレンジが小さくなってしまう。そのため検索実行日から４年前（約２０４８）を基準日としている（更新日−基準日＝更新日−検索実行日＋２０４８）。更新日が新しいほどスコアは大きくなる。 The update date is the date when the document was updated. As a general rule, the reference date is obtained by adding “2048” (about 4 years) to the search date. “Date” is an integer value of 0 to 32767, for example, and covers approximately from January 1, 1970 to January 19, 2038. One day corresponds to 1.3. Usually, the renewal date is several months to several years ago. If the renewal date is used as it is, a period in which it is not used for about 30 years will be created, so the dynamic range will be reduced. Therefore, the reference date is four years before the search execution date (about 2048) (update date−reference date = update date−search execution date + 2048). The newer the update date, the higher the score.

出現分布情報は、文書中の文の列に語キーがどのように分布するかを示すものであり、文の列を３２ビットであらわし、当該文の位置に語キーが出現すれば「１」を立てる。文の数だけビットを設ければより性格であるが、この例では、語キーが出現する文の番号の３２の剰余が示すビット位置に「１」を立てている。複数の語キーを用いたときに３２ビットの出現分布情報のＡＮＤをとり、同一文中に当該複数の語キーが共起するかどうかを表す。ＡＮＤ結果の３２ビットの各値を評価すればより正確であるが、８ビットずつに４つのフラグメントに分け、１つのフラグメント中に「１」があれば２５％ずつ増分する。４つのフラグメントのすべてに「１」があれば２倍となり、すべてのフラグメントに「１」がなければ１倍のままである。 Appearance distribution information indicates how word keys are distributed in a sentence column in a document. If a word key appears at the position of the sentence, the sentence column is represented by 32 bits. Stand up. In this example, “1” is set at the bit position indicated by the remainder of the number of the sentence in which the word key appears. When a plurality of word keys are used, AND of 32-bit appearance distribution information is performed to indicate whether or not the plurality of word keys co-occur in the same sentence. It is more accurate if each value of 32 bits of the AND result is evaluated, but it is divided into 4 fragments every 8 bits, and if there is “1” in one fragment, it is incremented by 25%. If all of the four fragments have “1”, it is doubled, and if all of the fragments do not have “1”, it remains at 1.

また、スコアが同一の値にならないように、スコアに文書サイズの下位数ビットを連結する。 Further, the lower-order bits of the document size are concatenated with the score so that the scores do not have the same value.

図１６は、スコア計算の一例を示している。この例では、「コピー」と「富士ゼロックス株式会社」（「富士ゼロックス」は商標である）のＯＲ検索を行って、文書Ａ、Ｂ、Ｃがヒットした例である。検索日は「２００２年８月１日」である。 FIG. 16 shows an example of score calculation. In this example, an OR search of “copy” and “Fuji Xerox Co., Ltd.” (“Fuji Xerox” is a trademark) is performed, and documents A, B, and C are hit. The search date is “August 1, 2002”.

文書Ａのスコアはつぎのとおりである。すなわち、実際の出現密度の和が「０ｘ０９＋０ｘ１３＝０ｘ１Ｃ」（０ｘは１６進を表す）であり、文書サイズと合わせて「０ｘ１ＣＢ８」である。更新日の寄与を合わせて、「０ｘ３ＣＢ８」となり、出現分布で１．７５倍になり、「０ｘ６Ａ４７＝２７２０７」がスコアとなる。 The score of document A is as follows. That is, the sum of the actual appearance densities is “0x09 + 0x13 = 0x1C” (0x represents hexadecimal) and is “0x1CB8” together with the document size. The total contribution of the update date is “0x3CB8”, the appearance distribution is 1.75 times, and “0x6A47 = 27207” is the score.

文書Ｂのスコアはつぎのとおりである。出現密度と文書サイズから同様に「０ｘ１Ｆ４０」となる。「富士ゼロックス株式会社」からのオカレンスからは得られないので、デフォルト値の「０ｘ１８００」が用いられ、合わせて「０ｘ３７４０」となり、出現分布により２倍され、「０ｘ６Ｅ８０＝２８２８８」がスコアとなる。 The score of document B is as follows. Similarly, “0x1F40” is obtained from the appearance density and the document size. Since it is not obtained from the occurrence from “Fuji Xerox Co., Ltd.”, the default value “0x1800” is used, and “0x3740” is added together, doubled by the appearance distribution, and “0x6E80 = 28288” is the score.

文書Ｃのスコアはつぎのとおりである。実際の出現密度の和が「０ｘ１Ｄ＋０ｘ００＝０ｘ１Ｄ」であり、文書サイズと合わせて「０ｘ１Ｄ８０」である。更新日の寄与を合わせて「０ｘ１７Ｅ３」となる。出現分布により２倍され、「０ｘ２ＦＣ６＝１２２３０」がスコアとなる。 The score of document C is as follows. The sum of the actual appearance densities is “0x1D + 0x00 = 0x1D”, which is “0x1D80” together with the document size. The contribution of the update date is “0x17E3” together. The score is doubled by the appearance distribution, and “0x2FC6 = 1230” is the score.

以上の結果、文書Ｂ、Ａ、Ｃの順にソートされる。 As a result, the documents B, A, and C are sorted in this order.

以上で実施例３の説明を終了する。この実施例によれば、スコア計算やソートを分散させて実行するため、応答性を高くでき、スケーラビリティもある。また、所定の上限値を超えるヒットレコードは検索管理サーバへ送らないので、通信コストが減少する。 This is the end of the description of the third embodiment. According to this embodiment, since score calculation and sorting are executed in a distributed manner, the responsiveness can be increased and the scalability is also achieved. Further, since the hit record exceeding the predetermined upper limit value is not sent to the search management server, the communication cost is reduced.

なお、図１３では、検索管理サーバと検索サーバとを別々に構成し、ネットワーク（ＬＡＮやＷＡＮ）で接続したが、図３に示すように、検索管理サーバの機能と検索サーバの機能を一体化した場合にも、インデクスごとにスコアでソートを行い、これを連結し、その後、再度スコアでソートして検索結果とすることもできることはもちろんである。 In FIG. 13, the search management server and the search server are configured separately and connected via a network (LAN or WAN). However, as shown in FIG. 3, the functions of the search management server and search server are integrated. Even in this case, it is possible to sort by score for each index, concatenate them, and then sort again by score to obtain a search result.

なお、この発明は上述の実施例に限定されるものではなくその趣旨を逸脱しない範囲で種々変更が可能である。例えば、実施例２の検索装置を図４に示すイントラネット環境に適用できることはもちろんであり、その際記録媒体等を用いて同様のシステムをコンピュータシステムにインストールして構築することもできる。 The present invention is not limited to the above-described embodiments, and various modifications can be made without departing from the spirit of the invention. For example, the search apparatus according to the second embodiment can be applied to the intranet environment shown in FIG. 4 and can be constructed by installing a similar system in a computer system using a recording medium or the like.

１０検索部
１１アクセス制御部
１２ユーザ権限記憶部
１３インデクス記憶装置
１４インデクス
１５検索ユーザ端末
２０プロセス起動部
２１インデクスレコード管理部
２２インデクスレコード管理プロセス
２３アクセス制御部
２４プロセス権限記憶部
３０インデクス別ヒットレコード数生成部
３１インデクス選択部
３２ヒットレコード併合部
３３ヒットレコード一時記憶部
３４表示レコード出力部
１００検索システム
１０１インデクス保持部
１０２インデクス構築システム
１０３文書ファイリングシステム
１０４ディレクトリサーバ
１０５ウェブサーバ
１０６アプリケーションサーバ
１０７ルータ
１０８ＬＡＮ
１０９、１１０記憶媒体
１２０クライアント端末
１２１ネットワーク
２００セキュリティドメイン
３００検索管理サーバ
３０１検索サーバ
３０２インデクス保持部 DESCRIPTION OF SYMBOLS 10 Search part 11 Access control part 12 User authority memory | storage part 13 Index storage device 14 Index 15 Search user terminal 20 Process start part 21 Index record management part 22 Index record management process 23 Access control part 24 Process authority storage part 30 Hit record according to index Number generation unit 31 Index selection unit 32 Hit record merging unit 33 Hit record temporary storage unit 34 Display record output unit 100 Search system 101 Index holding unit 102 Index construction system 103 Document filing system 104 Directory server 105 Web server 106 Application server 107 Router 108 LAN
109, 110 Storage medium 120 Client terminal 121 Network 200 Security domain 300 Search management server 301 Search server 302 Index holding unit

Claims

Index storage means for storing a plurality of different search indexes,
A plurality of search means for searching for the corresponding search index corresponding to each different search index, sorting hit records including the search score up to a predetermined number of records, and sorting the search scores according to the search score and outputting the results as search results; ,
Connecting means for connecting the plurality of search results output by the plurality of search means in the descending order of the number of hit records included in the search results until the total number of hit records reaches a predetermined cumulative upper limit; ,
Sort means for sorting the hit records included in the search results connected by the connecting means by the search score;
Output means for outputting hit records sorted by the sorting means from the upper limit only by a predetermined output upper limit value;
A search system comprising:

A plurality of search devices, and a search management device that communicates with the plurality of search devices,
Each of the above search devices
An index storage unit for storing a search index;
Search means corresponding to the search index and searching for the corresponding search index;
The search index stored in the index storage unit is different for each search device,
The search management device is
A receiving means for receiving the search request;
The search request received by the receiving means is transferred to the plurality of search devices, and each search device has a hit record corresponding to the search request as a search score included in the hit record up to a predetermined number of records. Transfer means for sorting and outputting as search results;
Connecting means for connecting the plurality of search results output by the plurality of search devices until the total number of hit records reaches a predetermined cumulative upper limit in descending order of the number of hit records included in the search results;
Sort means for sorting the hit records included in the search results connected by the connecting means by the search score;
Output means for outputting hit records sorted by the sorting means from the top only by the output upper limit value;
A search system comprising:

A receiving means for receiving the search request;
The search request is forwarded to a plurality of search devices each having a different search index stored in the index storage unit, comprising an index unit for storing the search index and search means for searching the search index corresponding to the search index. Transfer means for causing each search device to sort and output hit records corresponding to the search request according to a search score included in the hit records up to a predetermined number of records;
Connecting means for connecting the plurality of search results output by the plurality of search devices until the total number of hit records reaches a predetermined cumulative upper limit value in descending order of the number of hit records included in the search results;
Sorting means for sorting hit records linked by the linking means by the search score;
Output means for outputting the hit records sorted by the sorting means from the top by the output upper limit value;
A search management apparatus comprising:

Computer
A receiving means for receiving the search request;
The search request is forwarded to a plurality of search devices each having a different search index stored in the index storage unit, comprising an index unit for storing the search index and search means for searching the search index corresponding to the search index. Transfer means for causing each search device to sort and output hit records corresponding to the search request by a search score included in the hit records up to a predetermined number of records;
A connecting means for connecting the plurality of search results output by the plurality of search devices until the total hit record number reaches a predetermined cumulative upper limit value in descending order of the number of hit records included in the search result;
Sorting means for sorting hit records linked by the linking means by the search score;
A computer program that functions as output means for outputting hit records sorted by the sorting means from the upper level by an output upper limit value.