JP2007018143A

JP2007018143A - Document retrieval device and method

Info

Publication number: JP2007018143A
Application number: JP2005197304A
Authority: JP
Inventors: Tadanobu Miyauchi; 忠信宮内
Original assignee: Fuji Xerox Co Ltd
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2005-07-06
Filing date: 2005-07-06
Publication date: 2007-01-25

Abstract

<P>PROBLEM TO BE SOLVED: To establish availability improvement and load distribution by a small number of computer resources while maintaining the consistency of data. <P>SOLUTION: A client device 101 of a document retrieval system 100 issues a document retrieval request, and a distribution device 102 distributes a document retrieval request from the client device 101 to document retrieval servers 200 and 300, based on a retrieval range. The distribution device 102 distributes the document retrieval request from the client device 101 to the document retrieval servers 200 and 300 according to the retrieval range and the operating status of the document retrieval servers 200 and 300 so that when one document retrieval server breaks down, the other document retrieval server can function by proxy. Index transferring parts 207 and 307 transfer index information for reflecting a retrieval index generated by index generating parts 201 and 301 on auxiliary retrieval index storage parts 308 and 208 of the other document retrieval servers 300 and 200. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

この発明は、文書検索技術に関し、とくに、データの一貫性を可能な限り維持しつつ、可用性向上・負荷分散を行い、限られた計算機リソースにより、特殊なハードウェアを用いることなく実現しようとするものである。 The present invention relates to a document retrieval technique, and in particular, attempts to improve availability and load distribution while maintaining data consistency as much as possible, and to use a limited computer resource without using special hardware. Is.

従来、データベースや検索システムの可用性向上・負荷分散のため、さまざまなシステムが提案されている。 Conventionally, various systems have been proposed to improve the availability and load distribution of databases and search systems.

特許文献１は、非常に大きなデータベース管理システムにホットスペア冗長性を実現する手法を開示している。この文献では、トランザクションロガーがセカンダリシステムのトランザクションをコミットさせることを開示している。ホットスペア冗長性は、データの一貫性を追求する考え方としては一般的であり、データベースシステムにおいては重要である。ただし、ホットスペアの考え方は、冗長なセカンダリデータベースを必要とするため、非トラブル時のコンピュータリソースとしては無駄になる。 Patent Document 1 discloses a technique for realizing hot spare redundancy in a very large database management system. This document discloses that the transaction logger commits the transaction of the secondary system. Hot spare redundancy is common in the pursuit of data consistency and is important in database systems. However, since the hot spare concept requires a redundant secondary database, it is wasted as a non-trouble computer resource.

他方、特許文献２は、同期に多少の誤差が許される用途向けに、インデックス反映の時期を最適化することを開示している。これにより、同期の誤差と、設備の負荷という相反する要素をバランスさせようとしている。この方式は、更新頻度が低いなど、個別の状況によって負荷分散が期待できる効果はあるが、最新のデータが得られない可能性があり、一貫性という観点からは問題がある。 On the other hand, Patent Document 2 discloses optimizing the timing of index reflection for applications in which some errors are allowed in synchronization. This attempts to balance the conflicting factors of synchronization error and equipment load. Although this method has an effect that load distribution can be expected depending on individual situations such as low update frequency, there is a possibility that the latest data may not be obtained, and there is a problem from the viewpoint of consistency.

以上のように、従来技術はそれぞれ欠点があり、それらを組み合わせではその欠点をカバーできなかった。 As described above, each of the conventional techniques has drawbacks, and combinations of them cannot cover the drawbacks.

図１は特許文献１の手法を文書検索システムに適用した例を示す。この例ではリポジトリＡのキャビネット１、２の文書を検索システムＡ（マスタ）、Ａ’（スタンバイ）で検索するようになっており、そのキャビネット１、２のインデックス１、２を記憶システム（ＮＡＳ／ＳＡＮ、ＲＡＩＤによる冗長性あり。ＮｅｔｗｏｒｋＡｔｔａｃｈｅｄＳｔｏｒａｇｅ／ＳｔｏｒａｇｅＡｒｅａＮｅｔｗｏｒｋ、ＲｅｄｕｎｄａｎｔＡｒｒａｙｓｏｆＩｎｅｘｐｅｎｓｉｖｅＤｉｓｋｓ）に保持している。また、リポジトリＢのキャビネット３の文書を検索システムＢ（マスタ）、Ｂ’（スタンバイ）で検索するようになっており、そのキャビネット３のインデックス３を同様に記憶システムに保持している。通常には、検索システムＡ、Ｂを用い、検索システムＡ、Ｂが故障したときにはスタンバイの検索システムＡ’、Ｂ’を用いるようになっている。図で実線は運用システムの動作を、破線は待機中であることを示す。図の例は、検索システムＡが故障して、検索システムＡ’が代替的に検索を行なっている状況を示す。 FIG. 1 shows an example in which the technique of Patent Document 1 is applied to a document search system. In this example, the documents in the cabinets 1 and 2 of the repository A are searched by the search systems A (master) and A ′ (standby), and the indexes 1 and 2 of the cabinets 1 and 2 are stored in the storage system (NAS / There is redundancy by SAN and RAID, which are held in Network Attached Storage / Storage Area Network, Redundant Arrays of Inexpensive Disks). Further, the documents in the cabinet 3 of the repository B are searched by the search systems B (master) and B ′ (standby), and the index 3 of the cabinet 3 is similarly held in the storage system. Normally, the search systems A and B are used, and when the search systems A and B fail, the standby search systems A 'and B' are used. In the figure, the solid line indicates the operation of the operation system, and the broken line indicates that the system is on standby. The example in the figure shows a situation in which the search system A has failed and the search system A 'is performing a search instead.

図２は他の従来の範疇の冗長サーバを導入して実現した文書検索システムの例を示す。図２の例では、マスタ検索システムＡが検索要求をスレーブ検索システムＡ１、Ａ２に振り分け、スレーブ検索システムＡ１、Ａ２がインデックス１、２を用いて検索を行なう。この例で、例えばスレーブ検索システムＡ１が故障した場合には、残りのスレーブ検索システムＡ２がすべての検索要求を処理する。 FIG. 2 shows an example of a document search system realized by introducing another conventional category of redundant servers. In the example of FIG. 2, the master search system A distributes the search request to the slave search systems A1 and A2, and the slave search systems A1 and A2 use the indexes 1 and 2 to perform the search. In this example, for example, when the slave search system A1 fails, the remaining slave search system A2 processes all search requests.

データの一貫性を維持しつつ、少ない計算機リソースで可用性向上・負荷分散を両立できるような検索方法はなかった。
特開平１１−３２７９９１号公報特開２００１−１４３３６公報 There was no search method that could achieve both availability improvement and load balancing with less computer resources while maintaining data consistency.
JP 11-327991 A JP 2001-14336 A

この発明は、以上の事情を考慮してなされたものであり、データの一貫性を維持しつつ、少ない計算機リソースで可用性向上・負荷分散を両立できる検索システムを提供することを目的としている。 The present invention has been made in consideration of the above circumstances, and an object of the present invention is to provide a search system that can achieve both improvement in availability and load distribution with a small number of computer resources while maintaining data consistency.

この発明の構成例では、上述の目的を達成するために、用意する複数サーバのそれぞれをアクティブスタンバイとし、適切なトリガ発生時に相互に情報を反映する。それぞれのサーバは、当初の受け持ちの範囲の検索要求を処理する。そして適宜なタイミングで他のサーバからインデックス情報を取得して当該他のサーバのインデックスを保持する。所定のサーバが故障したときには、他のサーバが本来の検索サービスに加えて当該故障したサーバの検索サービスも実行する。 In the configuration example of the present invention, in order to achieve the above-described object, each of a plurality of prepared servers is set as an active standby, and information is reflected mutually when an appropriate trigger occurs. Each server processes a search request in the original range. Then, index information is acquired from another server at an appropriate timing, and the index of the other server is held. When a predetermined server fails, other servers also execute a search service for the failed server in addition to the original search service.

さらにこの発明を説明する。 The present invention will be further described.

この発明の一側面によれば、上述の目的を達成するために、文書検索装置に：所定の第１の文書群に対して検索インデックスを生成する検索インデックス生成手段と；上記検索インデックス生成手段により生成された検索インデックスを保持する検索インデックス保持手段と；ユーザからの文書検索要求に対して、上記検索インデックス保持手段に保持されている検索用インデックスを用いて文書検索を行なう検索手段と；所定の第２の文書群に対する他の文書検索装置の検索インデックスを保持する補助検索インデックス保持手段とを設け、上記検索手段に上記所定の第２の文書群に対する検索要求があったときに上記補助検索インデックス保持手段の検索インデックスを用いて文書検索を行なうようにしている。 According to one aspect of the present invention, in order to achieve the above object, a document search device includes: a search index generation unit that generates a search index for a predetermined first document group; Search index holding means for holding the generated search index; search means for searching for a document using a search index held in the search index holding means in response to a document search request from a user; Auxiliary search index holding means for holding a search index of another document search apparatus for the second document group, and the auxiliary search index when the search means has a search request for the predetermined second document group A document search is performed using the search index of the holding means.

この構成においては、通常に運用している文書検索装置が、故障した他の文書検索装置の検索サービスを代行するので、データの一貫性を維持しつつ、少ない計算機リソースで可用性向上・負荷分散を両立できる。 In this configuration, the document search device that is operating normally acts as a search service for other failed document search devices, so it is possible to improve availability and load distribution with fewer computer resources while maintaining data consistency. Can be compatible.

この構成において、上記所定の第２の文書群に対する他の文書検索装置が保持する当該所定の第２の文書群に対する検索インデックスまたはその差分を上記補助検索インデックスにコピーするようにすることが好ましい。 In this configuration, it is preferable that a search index for the predetermined second document group held by another document search apparatus for the predetermined second document group or a difference thereof is copied to the auxiliary search index.

また、上記補助検索インデックス保持手段は、典型的には、上記所定の第２の文書群に対する他の文書検索装置が文書検索に用いる検索インデックスを保持する検索インデックス保持手段である。すなわち、２つの検索装置を例に挙げれば、一方の検索インデックス記憶手段が他方の補助検索インデックス記憶手段として動作する。 The auxiliary search index holding unit is typically a search index holding unit that holds a search index used for document search by another document search apparatus for the predetermined second document group. That is, taking two search devices as an example, one search index storage means operates as the other auxiliary search index storage means.

また、この発明の他の側面によれば、複数の文書検索装置からなる文書検索システムにおいて、上記複数の文書検索装置の各々が：該当する文書群に対して検索インデックスを生成する検索インデックス生成手段と；上記検索インデックス生成手段により生成された検索インデックスを保持する検索インデックス保持手段と；ユーザからの文書検索要求に対して、上記検索インデックス保持手段に保持されている検索用インデックスを用いて文書検索を行なう検索手段と；他の文書検索装置の検索インデックスを保持する補助検索インデックス保持手段とを有し、上記検索手段に上記他の文書検索装置の検索対象とする文書群に対する検索要求があったときに上記補助検索インデックス保持手段の検索インデックスを用いて文書検索を行なうようにしている。 According to another aspect of the present invention, in the document search system comprising a plurality of document search devices, each of the plurality of document search devices: a search index generation means for generating a search index for the corresponding document group A search index holding unit that holds the search index generated by the search index generation unit; and a document search using a search index held in the search index holding unit in response to a document search request from a user. And a search request for a document group to be searched by the other document search apparatus. The search means has a search index holding means for holding a search index of another document search apparatus. Sometimes a document search is performed using the search index of the auxiliary search index holding means. Unishi to have.

この構成においても、データの一貫性を維持しつつ、少ない計算機リソースで可用性向上・負荷分散を両立できる。 Even in this configuration, it is possible to improve availability and load distribution with less computer resources while maintaining data consistency.

この構成において、検索要求を検索対象文書群に応じて上記複数の検索装置の１つに振り分ける振り分け手段をさらに設けても良い。 In this configuration, a distribution unit that distributes the search request to one of the plurality of search devices according to the search target document group may be further provided.

また、上記複数の文書検索装置の稼働状態を検出する検出手段をさらに設け、稼働状態にない文書検索装置の文書群に対する検索要求を他の稼働状態の文書検索装置に対して送出するようにしてもよい。 Further, detection means for detecting operating states of the plurality of document search devices is further provided so that a search request for a document group of a document search device that is not in an operating state is sent to another document search device in an operating state. Also good.

なお、この発明は装置またはシステムとして実現できるのみでなく、方法としても実現可能である。また、そのような発明の一部をソフトウェアとして構成することができることはもちろんである。またそのようなソフトウェアをコンピュータに実行させるために用いるソフトウェア製品もこの発明の技術的な範囲に含まれることも当然である。 The present invention can be realized not only as an apparatus or a system but also as a method. Of course, a part of the invention can be configured as software. Of course, software products used to cause a computer to execute such software are also included in the technical scope of the present invention.

この発明の上述の側面および他の側面は特許請求の範囲に記載され以下実施例を用いて詳述される。 These and other aspects of the invention are set forth in the appended claims and will be described in detail below with reference to examples.

この発明によれば、通常に運用している文書検索装置が、故障した他の文書検索装置の検索サービスを代行するので、データの一貫性を維持しつつ、少ない計算機リソースで可用性向上・負荷分散を両立できる。 According to the present invention, since the document search apparatus that is normally operated acts as a search service for other failed document search apparatuses, it is possible to improve availability and load distribution with less computer resources while maintaining data consistency. Can be compatible.

以下、この発明の実施例について説明する。 Examples of the present invention will be described below.

図３は、この発明の実施例の文書検索システム１００を模式的に示すものであり、図４は当該文書検索システム１００の動作例を説明するものである。この例では文書検索システム１００が２つの文書検索サーバ２００、３００からなるが、これに限定されず、３つ以上の文書検索サーバを用いても良い。文書検索サーバ２００、３００を構成する機能ブロックは、当該サーバのハードウェア資源およびソフトウェア資源を協働させて実現される。 FIG. 3 schematically shows a document search system 100 according to an embodiment of the present invention, and FIG. 4 explains an operation example of the document search system 100. In this example, the document search system 100 includes two document search servers 200 and 300. However, the present invention is not limited to this, and three or more document search servers may be used. The functional blocks constituting the document search servers 200 and 300 are realized by cooperating the hardware resources and software resources of the servers.

図３において、文書検索システム１００は、クライアント装置１０１、振り分け装置１０２、文書検索サーバ２００、３００を含んで構成される。各構成要素は適宜に通信ネットワークで接続される。クライアント装置１０１は、文書検索要求を発行するものであり、例えば、パーソナルコンピュータ等の各種ホストにより実現される。振り分け装置１０２は、クライアント装置１０１からの文書検索要求を文書検索サーバ２００、３００に振り分けるものである。この例では、例えば図４に示すように、通常時には、文書検索サーバ２００がキャビネット１およびキャビネット２の文書に関する検索サービスを提供し、他の文書検索サーバ３００がキャビネット３およびキャビネット４の文書に関する検索サービスを提供している。そして、一方の文書検索サーバが故障したときには他方の文書検索サーバが代行するようになっている。振り分け装置１０２は、検索範囲および文書検索サーバ２００、３００の運用状態に応じて、クライアント装置１０１からの文書検索要求を文書検索サーバ２００、３００に振り分ける。振り分け装置１０２は、例えば、リダイレクト機構を用いて実現出来る。ウェブベースで検索要求を発行する場合には、ＣＧＩプログラムで振り分け先を記述することができる。 In FIG. 3, the document search system 100 includes a client device 101, a sorting device 102, and document search servers 200 and 300. Each component is appropriately connected via a communication network. The client device 101 issues a document search request, and is realized by various hosts such as a personal computer, for example. The distribution device 102 distributes the document search request from the client device 101 to the document search servers 200 and 300. In this example, as shown in FIG. 4, for example, the document search server 200 provides a search service for the documents in the cabinet 1 and the cabinet 2 and the other document search server 300 searches for the documents in the cabinet 3 and the cabinet 4 in a normal state. Service is provided. When one document retrieval server fails, the other document retrieval server acts as a substitute. The distribution device 102 distributes the document search request from the client device 101 to the document search servers 200 and 300 according to the search range and the operation status of the document search servers 200 and 300. The distribution device 102 can be realized using, for example, a redirect mechanism. When issuing a search request on the web base, a distribution destination can be described by a CGI program.

文書検索サーバ２００および文書検索サーバ３００は相互に補完して文書検索サービスを提供するようになっている。すなわち、文書検索サーバ２００は、プライマリサーバとして文書群記憶部４００の文書に対する文書検索サービスを提供するものであり、文書群記憶部４００は、図４にも示すように、例えばキャビネット１、２の文書を保持するものである。文書検索サーバ３００は、プライマリサーバとして文書群記憶部５００の文書に対する文書検索サービスを提供するものであり、文書群記憶部５００は、図４にも示すように、例えばキャビネット３、４の文書を保持するものである。文書検索サーバ２００は、同時に、文書検索サーバ３００が故障しているときに、セカンダリサーバとして文書群記憶部５００の文書に対する文書検索サービスを提供し、文書検索サーバ３００は、文書検索サーバ２００が故障しているときに、セカンダリサーバとして文書群記憶部４００の文書に対する文書検索サービスを提供する。なお、文書群記憶部４００を後述の文書群記憶部５００と個別に示しているが、これらは１つの記憶部であってもよい。図４ではリポジトリ部として１つの機能部分として示されている。 The document search server 200 and the document search server 300 complement each other and provide a document search service. That is, the document search server 200 provides a document search service for documents in the document group storage unit 400 as a primary server. The document group storage unit 400 includes, for example, cabinets 1 and 2 as shown in FIG. Holds the document. The document search server 300 provides a document search service for documents in the document group storage unit 500 as a primary server, and the document group storage unit 500 stores documents in cabinets 3 and 4 as shown in FIG. It is to hold. At the same time, the document search server 200 provides a document search service for a document in the document group storage unit 500 as a secondary server when the document search server 300 is out of order. The document search service for the document stored in the document group storage unit 400 is provided as a secondary server. Although the document group storage unit 400 is shown separately from the document group storage unit 500 described later, these may be a single storage unit. In FIG. 4, the repository part is shown as one functional part.

文書検索サーバ２００は、インデックス生成部２０１、検索インデックス記憶部２０２、検索式入力部２０３、検索部２０４、検索結果出力部２０５、ハートビート送信部２０６、インデックス転送部２０７、補助検索インデックス記憶部２０８等を含んで構成されている。 The document search server 200 includes an index generation unit 201, a search index storage unit 202, a search expression input unit 203, a search unit 204, a search result output unit 205, a heartbeat transmission unit 206, an index transfer unit 207, and an auxiliary search index storage unit 208. Etc. are configured.

インデックス生成部２０１は、文書群記憶部４００の文書を参照して当該文書の検索インデックスを生成して検索インデックス記憶部２０２に記憶する。検索インデックスは、全文検索インデックスでも良いし、関連文書検索インデックスでもよい。検索式入力部２０３は、クライアント装置１０１から送られてきた検索要求の検索式を振り分け装置１０２を介して受け取る。検索部２０４は、検索式に合致する文書を検索インデックス記憶部２０２を参照して特定して検索結果とする。検索結果出力部２０５は検索結果をクライアント装置１０１に供給する。 The index generation unit 201 refers to the document in the document group storage unit 400, generates a search index for the document, and stores it in the search index storage unit 202. The search index may be a full-text search index or a related document search index. The search formula input unit 203 receives the search formula of the search request sent from the client device 101 via the sorting device 102. The search unit 204 refers to the search index storage unit 202 to identify a document that matches the search formula and sets it as a search result. The search result output unit 205 supplies the search result to the client device 101.

インデックス転送部２０７は、インデックス生成部２０１で生成された検索インデックスを他の文書検索サーバ３００の補助検索インデックス記憶部３０８に反映させるためにインデックス情報を転送するものである。典型的には、検索インデックス記憶部２０７へ検索インデックスを書き込むトランザクションのログをとっておき、これを所定のタイミングでトリガして文書検索サーバ３００の補助検索インデックス記憶部３０８にコミットすればよい。所定のタイミングは時間の経過でも良いし、あらたに検索インデックスが登録された文書の数でも良い。 The index transfer unit 207 transfers index information so that the search index generated by the index generation unit 201 is reflected in the auxiliary search index storage unit 308 of another document search server 300. Typically, a transaction log for writing a search index to the search index storage unit 207 is recorded, and this is triggered at a predetermined timing and committed to the auxiliary search index storage unit 308 of the document search server 300. The predetermined timing may be the passage of time or the number of documents in which search indexes are newly registered.

文書検索サーバ２００の補助検索インデックス記憶部２０８は文書検索サーバ３００のインデックス転送部３０７から検索インデックスを受け取り、文書検索サーバ３００の検索サービスの範囲である文書群記憶部５００の文書についての検索インデックスを保持する。 The auxiliary search index storage unit 208 of the document search server 200 receives the search index from the index transfer unit 307 of the document search server 300, and uses the search index for the document in the document group storage unit 500 that is the scope of the search service of the document search server 300. Hold.

ハートビート送信部２０６は、文書検索サーバ２００が運用中である場合に所定の時間間隔でハートビート信号を振り分け装置１０２に送信する。振り分け装置１０２は、文書検索サーバ２００のハートビート送信部２０６からハートビート信号を受け取っているときには、文書検索サーバ２００が運用中であると判断して対応する検索範囲の検索要求を文書検索サーバ２００に送信する。 The heartbeat transmission unit 206 transmits a heartbeat signal to the distribution device 102 at a predetermined time interval when the document search server 200 is in operation. When the distribution apparatus 102 receives a heartbeat signal from the heartbeat transmission unit 206 of the document search server 200, the distribution apparatus 102 determines that the document search server 200 is in operation and sends a search request for a corresponding search range to the document search server 200. Send to.

振り分け装置１０２は、文書検索サーバ２００のハートビート送信部２０６からのハートビート信号が途絶え、かつ、文書検索サーバ３００のハートビート送信部３０６からのハートビート信号が継続している場合には、本来、文書検索サーバ２００に送信すべき検索要求を文書検索サーバ３００に送信する。逆に、文書検索サーバ３００のハートビート送信部３０６からのハートビート信号が途絶え、かつ、文書検索サーバ２００のハートビート送信部２０６からのハートビート信号が継続している場合には、振り分け装置１０２は、本来、文書検索サーバ３００に送信すべき検索要求を文書検索サーバ２００に送信する。文書検索サーバ２００、３００の双方からのハートビートが途絶えた場合には、振り分け装置１０２はエラーをクライアント装置１０１に返す。 If the heartbeat signal from the heartbeat transmission unit 206 of the document search server 200 is interrupted and the heartbeat signal from the heartbeat transmission unit 306 of the document search server 300 is continued, the sorting apparatus 102 originally The search request to be transmitted to the document search server 200 is transmitted to the document search server 300. On the contrary, when the heartbeat signal from the heartbeat transmission unit 306 of the document search server 300 is interrupted and the heartbeat signal from the heartbeat transmission unit 206 of the document search server 200 is continued, the distribution device 102. Transmits a search request that should originally be transmitted to the document search server 300 to the document search server 200. When the heartbeat from both of the document search servers 200 and 300 is interrupted, the distribution apparatus 102 returns an error to the client apparatus 101.

文書検索サーバ２００および文書検索サーバ３００は同一の構成となっており、文書検索サーバ３００の各部には３００番台の対応する符号を付して詳細な説明は繰り返さない。 The document search server 200 and the document search server 300 have the same configuration, and each part of the document search server 300 is denoted by a corresponding reference numeral in the 300s and detailed description will not be repeated.

つぎに、この実施例の動作例を説明する。図４にも示すように、リポジトリ情報のそれぞれに対し、プライマリサーバとセカンダリサーバを設定する（検索Ａ、検索Ｂ）。キャビネット１、２は検索Ａ（文書検索サーバ２００）、キャビネット３、４は検索Ｂ（文書検索サーバ３００）に割り当てる。通常運用時では、検索に関してはプライマリサーバのみにアクセスする。例えばキャビネット１、２の文書に対する文書検索に対しては検索Ａを介してインデックス１、２を参照する。登録処理も、リポジトリからのアクセスはプライマリサーバのみに行ない、プライマリサーバへの登録完了後に、適切なタイミングでトリガを発生し、セカンダリサーバへの登録処理を行なう。例えば、検索Ａからはインデックス１’、２’に登録する。この際、全文検索であればテキスト抽出語のプレーンテキストのみ、関連文書検索であればテキスト解析後の解析結果のみ登録すればよく、非常に軽い処理である。トリガが時間の経過や所定数の新規文書の登録等のイベントを採用できる。 Next, an operation example of this embodiment will be described. As shown in FIG. 4, a primary server and a secondary server are set for each repository information (search A, search B). Cabinets 1 and 2 are assigned to search A (document search server 200), and cabinets 3 and 4 are assigned to search B (document search server 300). During normal operation, only the primary server is accessed for search. For example, for document search for documents in cabinets 1 and 2, indexes 1 and 2 are referred to via search A. In the registration process, access from the repository is performed only to the primary server, and after the registration to the primary server is completed, a trigger is generated at an appropriate timing to perform the registration process to the secondary server. For example, the search A is registered in the indexes 1 'and 2'. At this time, only the plain text of the text extracted word is registered for the full text search, and only the analysis result after text analysis is registered for the related document search, which is a very light process. An event such as the passage of time or registration of a predetermined number of new documents can be adopted as a trigger.

文書検索サーバ２００または３００に障害が発生すると、これが検知され、オペレーション対象を切り替え、セカンダリサーバにアクセスする。例えば、検索Ａが故障した場合には、検索Ｂがキャビネット１、２に対する検索も行ない、検索Ｂがインデックス１’、２’を参照する。検索Ｂ（文書検索サーバ３００）は、もともと含まれるインデックス３、４に加えインデックス１’，２’もトリガのタイミングで一貫したものを保持しているため、検索は通常通り行なえる。また、障害中の登録文書を相互に保持しておくことで、登録の一貫性も維持できる。 When a failure occurs in the document search server 200 or 300, this is detected, the operation target is switched, and the secondary server is accessed. For example, when the search A fails, the search B also searches the cabinets 1 and 2, and the search B refers to the indexes 1 'and 2'. Since the search B (document search server 300) holds the indexes 1 'and 2' that are consistent at the trigger timing in addition to the originally included indexes 3 and 4, the search can be performed as usual. In addition, the consistency of registration can be maintained by holding registered documents in failure.

障害検知は、この例では、一般的なハートビートのチェックで行なっているが、より単純にプライマリサーバに登録・検索処理を行なった際にレスポンスがないことでも判断できる。 In this example, the failure detection is performed by a general heartbeat check. However, it can also be determined that there is no response when performing registration / search processing in the primary server more simply.

図５は、この実施例における検索サービス、文書更新動作（文書登録、削除、編集等）時の検索インデックス更新（インデックスレコードの生成、削除、変更）、インデックス情報のセカンダリサーバへの転送動作等の制御例を示すものである。この例では、検索要求の有無を判断して検索要求に対して検索サービスを提供し（ステップＳ１０、Ｓ１１）、文書更新の有無を判断して文書更新に対応する検索インデックスの更新を行う（ステップＳ１２、Ｓ１３）。さらに、インデックス情報をセカンダリサーバに送信するタイミングが到来したかどうかを判別してインデックス情報をセカンダリサーバに適宜に転送する（Ｓ１４、Ｓ１５）。さらに、ハートビート信号転送タイミングが到来したかどうかを判断してハートビート信号を振り分け装置１０２に供給するようにしている（Ｓ１６、Ｓ１７）。文書検索サーバ２００または３００が故障したときにはハートビート信号が振り分け装置１０２に供給されなくなり、それに応じてプライマリサーバからセカンダリサーバへの切換が行なわれる。なお、図５の例では、文書更新に合わせて検索インデックスを更新したが、スケジュールにしたがって、更新した文書の時間スタンプ等を用いて所定時間間隔等で検索インデックスを更新しても良い。セカンダリサーバに転送して補助検索インデックス記憶部２０８、３０８に反映させるインデックス情報は例えばプライマリサーバの検索インデックス記憶部２０２にコミットしたトランザクションのログ情報であり、セカンダリサーバにおいてこれを順次にコミットすれば良い。 FIG. 5 shows a search service in this embodiment, search index update (index record generation, deletion, modification, etc.) during document update operation (document registration, deletion, editing, etc.), transfer operation of index information to a secondary server, etc. A control example is shown. In this example, the presence or absence of a search request is determined to provide a search service in response to the search request (steps S10 and S11), the presence or absence of document update is determined and the search index corresponding to the document update is updated (step S10). S12, S13). Further, it is determined whether or not the timing for transmitting the index information to the secondary server has arrived, and the index information is appropriately transferred to the secondary server (S14, S15). Furthermore, it is determined whether or not the heartbeat signal transfer timing has arrived, and the heartbeat signal is supplied to the distribution device 102 (S16, S17). When the document search server 200 or 300 fails, the heartbeat signal is not supplied to the distribution device 102, and the primary server is switched to the secondary server accordingly. In the example of FIG. 5, the search index is updated in accordance with the document update. However, the search index may be updated at a predetermined time interval or the like using a time stamp of the updated document according to the schedule. The index information transferred to the secondary server and reflected in the auxiliary search index storage units 208 and 308 is, for example, log information of transactions committed to the search index storage unit 202 of the primary server, and may be sequentially committed in the secondary server. .

この実施例によれば、障害発生のクリティカルなタイミングを除けば、最新の一貫した情報に保たれるうえ、通常運用時の計算機リソースの無駄が少ない。また、一般的なハードウェアで実現できるため、導入コストも低く済む。 According to this embodiment, except for the critical timing of occurrence of a failure, the latest consistent information is maintained, and computer resources are not wasted during normal operation. In addition, since it can be realized with general hardware, the introduction cost is low.

つぎに上述実施例の変形例について説明する。 Next, a modification of the above embodiment will be described.

上述の実施例では、構成を簡素化できるものの、障害発生直後の登録データの一貫性が完全には保たれない。このため、図６のように上位の検索レイヤの仕組みは同じにしたまま、下位のストレージレイヤはＮＡＳ／ＳＡＮ（ＮｅｔｗｏｒｋＡｔｔａｃｈｅｄＳｔｏｒａｇｅ／ＳｔｏｒａｇｅＡｒｅａＮｅｔｗｏｒｋ）６００など共有可能でＲＡＩＤによる冗長性を持つようにしてもよい。これにより、ストレージに投資可能な場合はデータの一貫性を高めることができる。なお、図６において図３と対応する箇所には対応する符号を付した。また、図６において破線は故障時の動作を示す。 In the above-described embodiment, although the configuration can be simplified, the consistency of registered data immediately after the occurrence of a failure cannot be maintained completely. For this reason, as shown in FIG. 6, the upper search layer has the same mechanism, and the lower storage layer can be shared such as NAS / SAN (Network Attached Storage / Storage Area Network) 600 and has redundancy by RAID. May be. This can increase data consistency when storage can be invested. In FIG. 6, portions corresponding to those in FIG. In FIG. 6, a broken line indicates an operation at the time of failure.

この変形例では、１のサーバが故障したときにはアクティブなサーバが、図７に破線で示すように、その代行を行い、サーバ間では管理情報をやり取りするだけでよい。 In this modified example, when one server fails, the active server performs the substitution as shown by the broken line in FIG. 7, and only management information is exchanged between the servers.

なお、この発明は上述実施例に限定されるものではなく、その趣旨を逸脱しない範囲で種々変更が可能である。例えば、上述例ではサーバを２つとしたが、３つ以上としても良い。 The present invention is not limited to the above-described embodiments, and various modifications can be made without departing from the spirit of the invention. For example, although two servers are used in the above example, three or more servers may be used.

従来例を説明する図である。It is a figure explaining a prior art example. 他の従来例を説明する図である。It is a figure explaining another prior art example. この発明の実施例の構成を説明する図である。It is a figure explaining the structure of the Example of this invention. 上述実施例の動作を説明する図である。It is a figure explaining operation | movement of the said Example. 上述実施例の制御例を説明するフローチャートである。It is a flowchart explaining the example of control of the above-mentioned Example. 上述実施例の変形例を説明する図である。It is a figure explaining the modification of the above-mentioned Example. 上述変形例の動作を説明する図である。It is a figure explaining operation | movement of the said modification.

Explanation of symbols

１００文書検索システム
１０１クライアント装置
１０２振り分け装置
２００、３００文書検索サーバ
２０１、３０１インデックス生成部
２０２、３０２検索インデックス記憶部
２０３、３０３検索式入力部
２０４、３０４検索部
２０５、３０５検索結果出力部
２０６、３０６ハートビート送信部
２０７、３０７インデックス転送部
２０８、３０８補助検索インデックス記憶部
４００、５００文書群記憶部
６００ＮＡＳ／ＳＡＮ DESCRIPTION OF SYMBOLS 100 Document search system 101 Client apparatus 102 Distribution apparatus 200, 300 Document search server 201, 301 Index generation part 202, 302 Search index storage part 203, 303 Search expression input part 204, 304 Search part 205, 305 Search result output part 206, 306 Heartbeat transmission unit 207, 307 Index transfer unit 208, 308 Auxiliary search index storage unit 400, 500 Document group storage unit 600 NAS / SAN

Claims

Search index generation means for generating a search index for a predetermined first document group;
Search index holding means for holding the search index generated by the search index generation means;
In response to a document search request from a user, search means for performing a document search using a search index held in the search index holding means;
Auxiliary search index holding means for holding a search index of another document search device for a predetermined second document group,
A document search apparatus, wherein a search is performed using the search index of the auxiliary search index holding means when the search means makes a search request for the predetermined second document group.

2. The document search apparatus according to claim 1, wherein a search index for the predetermined second document group held by another document search apparatus for the predetermined second document group or a difference thereof is copied to the auxiliary search index. .

2. The document search apparatus according to claim 1, wherein the auxiliary search index holding means is search index holding means for holding a search index used by another document search apparatus for the predetermined second document group for document search.

In a document search system comprising a plurality of document search devices,
Each of the plurality of document search devices includes:
Search index generation means for generating a search index for the corresponding document group;
Search index holding means for holding the search index generated by the search index generation means;
In response to a document search request from a user, search means for performing a document search using a search index held in the search index holding means;
Auxiliary search index holding means for holding search indexes of other document search devices,
A document search apparatus characterized in that when the search means makes a search request for a document group to be searched by the other document search apparatus, a document search is performed using the search index of the auxiliary search index holding means.

A document search system further comprising distribution means for distributing a search request to one of the plurality of search devices according to a search target document group.

5. The apparatus according to claim 4, further comprising detection means for detecting an operation state of the plurality of document search devices, and sending a search request for a document group of the document search device that is not in an operation state to another document search device in an operation state. 5. The document search system according to 5.

A step of generating a search index for a predetermined first document group by a search index generating means;
A search index holding unit storing the search index generated by the search index generation unit;
A search means for performing a document search using a search index held in the search index holding means in response to a document search request from a user;
The auxiliary search index holding means stores a search index of another document search device for the predetermined second document group,
A document search method characterized in that a document search is performed using the search index of the auxiliary search index holding means when the search means has a search request for the predetermined second document group.

A step of generating a search index for a predetermined first document group by a search index generating means;
A search index holding unit storing the search index generated by the search index generation unit;
A search means for performing a document search using a search index held in the search index holding means in response to a document search request from a user;
An auxiliary search index holding means is used for causing a computer to execute a step of storing a search index of another document search device for a predetermined second document group; and
A computer program for searching a document, wherein when the search means makes a search request for the predetermined second document group, the search is performed using the search index of the auxiliary search index holding means.