JP2008243077A

JP2008243077A - Structured document management device, method, and program

Info

Publication number: JP2008243077A
Application number: JP2007085978A
Authority: JP
Inventors: Minoru Inada; 稔稲田
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2007-03-28
Filing date: 2007-03-28
Publication date: 2008-10-09

Abstract

PROBLEM TO BE SOLVED: To provide a structured document management device, a method for it, and a system for it for managing a structured document for efficiently transmitting/receiving a structured document between structured document management devices performing distributed management of a plurality of structured documents. SOLUTION: Based on a structure index, in which a unique structure ID is allocated to for each of tag structures of respective elements constituting the structured document, and a vocabulary index, in which a unique vocabulary ID is allocated to each of vocabularies constituting a character string of each of elements of the structured document, the structured document is transformed into an array of the structure ID and the vocabulary ID and transmitted as coded data to another structured document management device. When receiving the coded data from another structured document management device, the coded data are restored to a structured document based on the structure index and the vocabulary index. COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、複数の構造化文書を分散して管理する構造化文書管理装置、方法及びシステムに関する。 The present invention relates to a structured document management apparatus, method, and system for managing a plurality of structured documents in a distributed manner.

分散データベースにおける問い合わせ処理では、結合演算を行うためにノード間で中間データを転送しなければならない場合があり、データ転送に要する時間が処理遅延の一要因となっている。分散リレーショナルデータベース（ＲＤＢ）の場合は、結合演算で用いられるキー（外部キー）は予め定められており、また、キー値でなくともフィールドサイズはテーブル作成時に決まっているので、ノード間で転送するデータ量は分散データベースと比べて小さく、データ転送処理に要する時間も短い。 In query processing in a distributed database, intermediate data may have to be transferred between nodes in order to perform a join operation, and the time required for data transfer is a factor in processing delay. In the case of a distributed relational database (RDB), the key (foreign key) used in the join operation is determined in advance, and the field size is determined at the time of table creation even if it is not a key value, so it is transferred between nodes. The amount of data is small compared to a distributed database, and the time required for data transfer processing is also short.

一方、分散ＸＭＬデータベースでは、どの要素の値で結合演算を行うかといった、ＲＤＢにおける外部キーに相当するものは定められておらず、また、要素毎に保持する文字列の最大長も定められていない。そのため、結合処理のために長大な文字列をノード間で大量に転送するような場合には、データ転送処理に多大な時間を要するという問題がある。また、構造化文書は、一般に不定長の文字列を多く含むため、その転送処理には比較的大きな時間がかかっており、転送処理時間の短縮が課題となっている。 On the other hand, in the distributed XML database, the value corresponding to the foreign key in RDB such as which element value is used for the join operation is not defined, and the maximum length of the character string held for each element is also defined. Absent. Therefore, when a large character string is transferred between nodes in large quantities for the joining process, there is a problem that a long time is required for the data transfer process. In addition, a structured document generally includes a large number of character strings of indefinite length, so that the transfer process takes a relatively long time, and shortening the transfer process time is an issue.

ところで、分散データベースの分野では、ノード間で転送するデータを圧縮することで、転送するデータ量を削減し、転送処理に要する時間を短縮することが一般に行われている。ＸＭＬデータベースの分野においても、転送対象の構造化文書又はその一部分を圧縮し、転送データ量の削減を図ることで、上記転送処理に関する問題の解決を図った技術が提案されている。例えば、特許文献１には、タグを含む文書で構成された文字列ストリームからタグを分離し、タグが分離された文字列ストリームの位置に識別のためにタグ符号を配置した後、この文字列ストリームを符号化して出力する技術が開示されている。 Incidentally, in the field of distributed databases, it is common practice to reduce the amount of data to be transferred and reduce the time required for transfer processing by compressing the data to be transferred between nodes. Also in the field of XML databases, a technique has been proposed in which the structured document to be transferred or a part thereof is compressed to reduce the amount of transferred data, thereby solving the problems related to the transfer process. For example, in Patent Document 1, a tag is separated from a character string stream composed of a document including a tag, a tag code is arranged for identification at the position of the character string stream from which the tag is separated, and then the character string A technique for encoding and outputting a stream is disclosed.

特開２０００−１０１４４２号公報JP 2000-101442 A

しかしながら、特許文献１の技術では、あらゆる語彙に対応するため、転送対象の構造化文書に含まれる語彙とは無関係に、約１３万語の単語を登録した辞書を予め保持する必要があり、符号化処理に係るデータ量が増大化するという問題がある。また、特許文献１の技術のように、辞書式の符号化手法でデータの圧縮を行う場合、転送データ量を小さくできるため転送に要する時間は短縮できるが、辞書中から該当する単語を検索する処理に時間を要するため、結果的に圧縮・解凍に時間を要することになる。そのため、伝送を行うネットワークが比較的高速な場合には、圧縮・転送・解凍を合わせた場合の処理時間と、圧縮を行わずに転送した場合の処理時間とが同等となることがあるため、転送時間の改善とならない可能性がある。 However, since the technique of Patent Document 1 supports all vocabularies, it is necessary to store a dictionary in which approximately 130,000 words are registered in advance regardless of the vocabulary included in the structured document to be transferred. There is a problem that the amount of data related to the digitization process increases. In addition, when data is compressed by a lexicographic encoding method as in the technique of Patent Document 1, the amount of data to be transferred can be reduced, so that the time required for transfer can be shortened, but a corresponding word is searched from the dictionary. Since processing takes time, as a result, time is required for compression / decompression. Therefore, if the network that performs the transmission is relatively fast, the processing time when combining compression, transfer, and decompression may be equivalent to the processing time when transferring without compression, The transfer time may not improve.

本発明は上記に鑑みてなされたものであって、複数の構造化文書を分散して管理する構造化文書管理装置間において、構造化文書の送受信を効率的に行うことが可能な構造化文書管理装置、方法及びシステムを提供することを目的とする。 The present invention has been made in view of the above, and is a structured document capable of efficiently transmitting and receiving structured documents between structured document management apparatuses that manage a plurality of structured documents in a distributed manner. An object is to provide a management apparatus, method, and system.

上述した課題を解決し、目的を達成するために、本発明は、ネットワーク上に接続された他の構造化文書管理装置とともに複数の構造化文書を分散して記憶し、当該他の構造化文書管理装置との間で共通化された前記構造化文書を構成する各要素のタグ構造の種別毎に固有の構造ＩＤと、前記構造化文書を構成する各要素に含まれる語彙毎に固有の語彙ＩＤと、に基づいて前記複数の構造化文書を管理する構造化文書管理装置において、前記構造化文書を記憶する構造化文書記憶手段と、前記構造化文書記憶手段に記憶された構造化文書を構成する各要素のタグ構造と、当該タグ構造の種別毎に固有の前記構造ＩＤと、を対応付けた構造索引を記憶する構造索引記憶手段と、前記構造化文書記憶手段に記憶された構造化文書の各要素に含まれた文字列部分を構成する各語彙と、当該語彙の種別毎に固有の前記語彙ＩＤと、を対応付けた語彙索引を記憶する語彙索引記憶手段と、前記構造索引及び語彙索引に基づいて、前記構造化文書記憶手段に記憶された構造化文書を前記構造ＩＤ及び語彙ＩＤの配列に変換した符号化データを生成し、前記他の構造化文書管理装置に送信する符号化手段と、前記他の構造化文書管理装置から符号化データを受信した際に、前記構造索引及び語彙索引に基づいて、当該符号化データを構造化文書に復元する復元手段と、を備える。 In order to solve the above-described problems and achieve the object, the present invention distributes and stores a plurality of structured documents together with other structured document management apparatuses connected on the network, and the other structured documents. A unique structure ID for each type of tag structure of each element constituting the structured document shared with the management apparatus, and a unique vocabulary for each vocabulary included in each element constituting the structured document In a structured document management apparatus for managing the plurality of structured documents based on an ID, structured document storage means for storing the structured document, and structured documents stored in the structured document storage means A structure index storage means for storing a structure index in which a tag structure of each constituent element and a structure ID unique to each tag structure type are associated; and a structure stored in the structured document storage means Contained in each element of the document Vocabulary index storage means for storing a vocabulary index in which each vocabulary constituting a character string portion and the vocabulary ID unique to each vocabulary type are associated, and the structure index based on the structure index and the vocabulary index. Encoding means for generating encoded data obtained by converting the structured document stored in the structured document storage means into an array of the structure ID and the vocabulary ID, and transmitting the encoded data to the other structured document management apparatus; and the other structure When receiving encoded data from the structured document management apparatus, a restoration unit restores the coded data into a structured document based on the structure index and the vocabulary index.

また、本発明は、ネットワーク上に接続された他の構造化文書管理装置とともに複数の構造化文書を分散して記憶し、当該他の構造化文書管理装置との間で共通化された前記構造化文書を構成する各要素のタグ構造の種別毎に固有の構造ＩＤと、前記構造化文書を構成する各要素に含まれる語彙毎に固有の語彙ＩＤと、に基づいて前記複数の構造化文書を管理する構造化文書管理装置の構造化文書管理方法であって、自己の構造化文書管理装置に記憶された構造化文書を構成する各要素のタグ構造と、当該タグ構造の種別毎に固有の前記構造ＩＤと、を対応付けた構造索引を記憶する構造索引、及び、自己の構造化文書管理装置に記憶された構造化文書の各要素に含まれた文字列部分を構成する各語彙と、当該語彙の種別毎に固有の前記語彙ＩＤと、を対応付けた語彙索引に基づいて、自己の構造化文書管理装置に記憶された構造化文書を前記構造ＩＤ及び語彙ＩＤの配列に変換した符号化データを生成し、前記他の構造化文書管理装置に送信する符号化工程と、前記他の構造化文書管理装置から符号化データを受信した際に、前記構造索引及び語彙索引に基づいて、当該符号化データを構造化文書に復元する復元工程と、を含む。 In addition, the present invention provides a structure in which a plurality of structured documents are distributed and stored together with other structured document management apparatuses connected on a network, and is shared with the other structured document management apparatuses. The plurality of structured documents based on a unique structure ID for each tag structure type of each element constituting the structured document and a unique vocabulary ID for each vocabulary included in each element constituting the structured document Structured document management method of a structured document management device for managing the tag structure of each element constituting the structured document stored in its own structured document management device, and specific to each tag structure type A structure index that stores a structure index that associates the structure IDs with each other, and each vocabulary that constitutes a character string portion included in each element of the structured document stored in its own structured document management device; Vocabulary unique to each vocabulary type Based on the vocabulary index that associates D with D, the encoded data obtained by converting the structured document stored in its own structured document management device into the structure ID and the vocabulary ID array is generated, and the other structure is generated. An encoding process to be transmitted to the structured document management apparatus, and when the encoded data is received from the other structured document management apparatus, the encoded data is restored to the structured document based on the structure index and the vocabulary index. And a restoring step.

また、本発明は、ネットワーク上に接続された複数の構造化文書管理装置により複数の構造化文書を分散して記憶し、当該複数の構造化文書管理装置間で共通化された前記構造化文書を構成する各要素のタグ構造の種別毎に固有の構造ＩＤと、前記構造化文書を構成する各要素に含まれる語彙毎に固有の語彙ＩＤと、に基づいて前記複数の構造化文書を管理する構造化文書管理システムにおいて、前記構造化文書管理装置は、前記構造化文書を記憶する構造化文書記憶手段と、前記構造化文書記憶手段に記憶された構造化文書を構成する各要素のタグ構造と、当該タグ構造の種別毎に固有の前記構造ＩＤと、を対応付けた構造索引を記憶する構造索引記憶手段と、前記構造化文書記憶手段に記憶された構造化文書の各要素に含まれた文字列部分を構成する各語彙と、当該語彙の種別毎に固有の前記語彙ＩＤと、を対応付けた語彙索引を記憶する語彙索引記憶手段と、前記構造索引及び語彙索引に基づいて、前記構造化文書記憶手段に記憶された構造化文書を前記構造ＩＤ及び語彙ＩＤの配列に変換した符号化データを生成し、他の構造化文書管理装置に送信する符号化手段と、前記他の構造化文書管理装置から符号化データを受信した際に、前記構造索引及び語彙索引に基づいて、当該符号化データを構造化文書に復元する復元手段と、を備える。 According to the present invention, the structured document is distributed between a plurality of structured document management apparatuses connected on a network and stored in a shared manner, and the structured document is shared among the plurality of structured document management apparatuses. The plurality of structured documents are managed based on a unique structure ID for each type of tag structure of each element constituting the vocabulary and a unique vocabulary ID for each vocabulary included in each element constituting the structured document. In the structured document management system, the structured document management device includes a structured document storage unit that stores the structured document, and a tag of each element constituting the structured document stored in the structured document storage unit Included in each element of the structured document stored in the structured index storage means for storing the structure index that associates the structure and the structure ID unique to each type of the tag structure, and the structured document storage means String part Vocabulary index storage means for storing a vocabulary index that associates each vocabulary constituting each vocabulary and the vocabulary ID unique to each vocabulary type, and the structured document storage based on the structure index and the vocabulary index. Encoding means for generating encoded data obtained by converting the structured document stored in the means into an array of the structure ID and vocabulary ID, and transmitting the encoded data to another structured document management apparatus; and the other structured document management apparatus And a restoring means for restoring the coded data into a structured document based on the structure index and the vocabulary index when the coded data is received from.

本発明によれば、構造化文書に実際に含まれたタグ構造及び語彙に対応する構造ＩＤ及び語彙ＩＤを用いて構造化文書を符号化し、この符号化データを他の構造化文書管理装置との送受信に用いるため、元の構造化文書をそのまま転送する場合に比べて、転送データ量を減少し、転送に要する時間を短縮することができる。また、様々な単語を登録した辞書を予め保持する場合に比べ、符号化処理にかかるデータ量を減少させることができるため、構造化文書の符号化及び復元に要する時間を減少させることができ、構造化文書の送受信を効率的に行うことが可能となる。 According to the present invention, a structured document is encoded using a structure ID and a vocabulary ID corresponding to a tag structure and a vocabulary actually included in the structured document, and this encoded data is transferred to another structured document management apparatus. Therefore, compared with the case where the original structured document is transferred as it is, the amount of transfer data can be reduced and the time required for transfer can be shortened. In addition, since it is possible to reduce the amount of data required for the encoding process compared to the case of storing a dictionary in which various words are registered in advance, the time required for encoding and restoring a structured document can be reduced. It is possible to efficiently send and receive structured documents.

以下に添付図面を参照して、構造化文書管理装置、方法及びシステムの最良な実施形態を詳細に説明する。 Exemplary embodiments of a structured document management apparatus, method, and system will be described below in detail with reference to the accompanying drawings.

［第１の実施形態］
図１は、本実施形態の構造化文書管理システム１０の構成を示したブロック図である。図１に示すとおり、構造化文書管理システム１０は、複数の構造化文書管理装置１１〜１４と、クライアント装置４０とを有し、各装置間はネットワークＮを介して互いに通信可能に接続されている。なお、ネットワークＮに接続される構造化文書管理装置及びクライアント装置４０の個数は、この図示例に限定されないものとする。 [First Embodiment]
FIG. 1 is a block diagram showing the configuration of the structured document management system 10 of this embodiment. As shown in FIG. 1, the structured document management system 10 includes a plurality of structured document management apparatuses 11 to 14 and a client apparatus 40, and each apparatus is connected to be communicable with each other via a network N. Yes. Note that the number of structured document management devices and client devices 40 connected to the network N is not limited to the illustrated example.

構造化文書管理装置１１〜１４は、ＸＭＬ（eXtensible Markup Language）等のマークアップ言語で記述された構造化文書の記憶・管理を行う構造化文書ＤＢ３１（図２参照）を夫々有し、これら構造化文書管理装置１１〜１４により、構造化文書を分散して管理する構造化文書管理システム１０としての機能が実現されている。以下、構造化文書管理システム１０を構成する構造化文書管理装置１１〜１４の夫々を、単にノードともいう。 Each of the structured document management apparatuses 11 to 14 has a structured document DB 31 (see FIG. 2) for storing and managing a structured document described in a markup language such as XML (eXtensible Markup Language). The structured document management apparatuses 11 to 14 realize the function as the structured document management system 10 for managing structured documents in a distributed manner. Hereinafter, each of the structured document management apparatuses 11 to 14 constituting the structured document management system 10 is also simply referred to as a node.

構造化文書管理システム１０は、ネットワークＮを通じてクライアント装置４０から入力される検索要求に応じ、当該構造化文書管理システム１０が備える構造化文書ＤＢ３１から、該当する構造化文書の検索し、検索結果として構造化文書をクライアント装置４０に提供する。また、構造化文書管理システム１０は、ネットワークＮを通じてクライアント装置４０から入力される構造化文書の登録要求に応じ、構造化文書管理システム１０が備える構造化文書ＤＢ３１に登録対象となった構造化文書を登録する。 In response to a search request input from the client device 40 through the network N, the structured document management system 10 searches for a corresponding structured document from the structured document DB 31 provided in the structured document management system 10, and as a search result. The structured document is provided to the client device 40. In addition, the structured document management system 10 responds to a structured document registration request input from the client device 40 through the network N, and the structured document to be registered in the structured document DB 31 included in the structured document management system 10. Register.

クライアント装置４０は、ユーザが操作するＰＣ（Personal Computer）等の端末装置である。ユーザは、クライアント装置４０を介して、構造化文書管理システム１０に対し、特定の文字列を含んだ構造化文書の検索を要求する検索要求や、構造化文書の新規登録を要求する登録要求を送信することが可能となっている。ここで、クライアント装置４０から送信される検索要求には、ＸＱｕｅｒｙ等の検索にかかる所定の検索式が含まれているものとする。また、登録要求には登録対象となる構造化文書が含まれているものとする。 The client device 40 is a terminal device such as a PC (Personal Computer) operated by a user. The user makes a search request for requesting a search for a structured document including a specific character string or a registration request for requesting a new registration of a structured document to the structured document management system 10 via the client device 40. It is possible to send. Here, it is assumed that the search request transmitted from the client device 40 includes a predetermined search expression related to the search such as XQuery. In addition, it is assumed that the registration request includes a structured document to be registered.

ここで、クライアント装置４０から送信される検索要求又は登録要求は、構造化文書管理システム１０を構成する４つのノードのうち、何れか一のノードに受信されるものとする。なお、クライアントからの要求を受け付ける一のノードは、特定のノードに限定してもよいし、ラウンドロビン的に各ノードが順繰りに受け取る態様としてもよい。また、各ノードの負荷を考慮し、最も負荷が小さいノードがクライアント装置４０からの要求を受け取る態様としてもよい。 Here, it is assumed that the search request or registration request transmitted from the client device 40 is received by any one of the four nodes constituting the structured document management system 10. Note that one node that receives a request from a client may be limited to a specific node, or each node may receive it sequentially in a round robin manner. Further, in consideration of the load of each node, the node with the smallest load may receive the request from the client device 40.

また、本実施形態では、クライアント装置４０からの要求を各ノードが受け付けることが可能な態様としたが、これに限らず、クライアント装置４０からの要求受け付けに特化した要求受付装置（図示せず）を別途備える態様としてもよい。 In this embodiment, each node can receive a request from the client device 40. However, the present invention is not limited to this, and a request receiving device specialized for receiving a request from the client device 40 (not shown). ) May be provided separately.

以下、構造化文書管理装置１１の構成について説明する。なお、本実施形態では、説明の簡略化のため、構造化文書管理装置１１〜１４の夫々は同様の構成を備えるものとし、構造化文書管理装置１２〜１４の構成についての説明は省略する。 Hereinafter, the configuration of the structured document management apparatus 11 will be described. In the present embodiment, for simplicity of explanation, each of the structured document management apparatuses 11 to 14 has the same configuration, and the description of the configuration of the structured document management apparatuses 12 to 14 is omitted.

図２は、構造化文書管理装置１１のハードウェア構成を示したブロック図である。図２に示したように、構造化文書管理装置１１は、ＣＰＵ（Central Processing Unit）１０１、操作部１０２、表示部１０３、ＲＯＭ（Read Only Memory）１０４、ＲＡＭ（Random Access Memory）１０５、通信部１０６及び記憶部１０７等を備え、各部はバス１０８により接続されている。 FIG. 2 is a block diagram showing a hardware configuration of the structured document management apparatus 11. As shown in FIG. 2, the structured document management apparatus 11 includes a CPU (Central Processing Unit) 101, an operation unit 102, a display unit 103, a ROM (Read Only Memory) 104, a RAM (Random Access Memory) 105, and a communication unit. 106, a storage unit 107, and the like, and each unit is connected by a bus 108.

ＣＰＵ１０１は、ＲＡＭ１０５の所定領域を作業領域として、ＲＯＭ１０４又は記憶部１０７に予め記憶された各種制御プログラムとの協働により各種処理を実行し、構造化文書管理装置１１を構成する各部の動作を統括的に制御する。 The CPU 101 uses the predetermined area of the RAM 105 as a work area, executes various processes in cooperation with various control programs stored in advance in the ROM 104 or the storage unit 107, and controls the operation of each unit constituting the structured document management apparatus 11. Control.

また、ＣＰＵ１０１は、ＲＯＭ１０４又は記憶部１０７に予め記憶された所定のプログラムとの協働により、後述する要求受付部２０、通信処理部２１、検索プラン生成部２２、検索プラン処理部２３、構造化文書変換部２４、構造化文書取得部２６、格納処理部２７、構造索引部２８、語彙索引部２９及び構造化文書ＤＢ管理部３０（図４参照）の各機能部を実現させる。なお、各機能部の詳細については後述する。 In addition, the CPU 101 cooperates with a predetermined program stored in advance in the ROM 104 or the storage unit 107, a request reception unit 20, a communication processing unit 21, a search plan generation unit 22, a search plan processing unit 23, and a structuring described later. The functional units of the document conversion unit 24, the structured document acquisition unit 26, the storage processing unit 27, the structure index unit 28, the vocabulary index unit 29, and the structured document DB management unit 30 (see FIG. 4) are realized. Details of each functional unit will be described later.

操作部１０２は、各種入力キーを備え、ユーザから操作入力された情報を指示信号として受け付け、その指示信号をＣＰＵ１０１に出力する。 The operation unit 102 includes various input keys, receives information input from the user as an instruction signal, and outputs the instruction signal to the CPU 101.

表示部１０３は、ＬＣＤ（Liquid Crystal Display）等の表示手段により構成され、ＣＰＵ１０１からの表示信号に基づいて、各種情報を表示する。なお、表示部１０３は、操作部１０２と一体的にタッチパネルを構成する態様としてもよい。 The display unit 103 includes a display unit such as an LCD (Liquid Crystal Display), and displays various types of information based on a display signal from the CPU 101. The display unit 103 may be configured to form a touch panel integrally with the operation unit 102.

ＲＯＭ１０４は、構造化文書管理装置１１の制御にかかるプログラムや各種設定情報等を書き換え不可能に記憶する。 The ROM 104 stores a program for controlling the structured document management apparatus 11 and various setting information in a non-rewritable manner.

ＲＡＭ１０５は、ＳＤＲＡＭ等の揮発性記憶手段であって、ＣＰＵ１０１の作業エリアとして機能し、バッファ等の役割を果たす。 The RAM 105 is a volatile storage unit such as an SDRAM, functions as a work area for the CPU 101, and functions as a buffer.

通信部１０６は、ネットワークＮを通じ他のノードやクライアント装置４０との間で通信を行うインターフェースである。通信部１０６は、他のノードやクライアント装置４０から送信された各種情報をＣＰＵ１０１に出力し、また、ＣＰＵ１０１から出力される各種情報を他のノードやクライアント装置４０へと送信する。 The communication unit 106 is an interface that communicates with other nodes and the client device 40 through the network N. The communication unit 106 outputs various types of information transmitted from other nodes and client devices 40 to the CPU 101, and transmits various types of information output from the CPU 101 to other nodes and client devices 40.

記憶部１０７は、磁気的又は光学的に記録可能な記憶媒体を有し、構造化文書管理装置１１の制御にかかるプログラムや各種設定情報等を書き換え可能に記憶する。また、記憶部１０７は、構造化文書を記憶するためのデータベースとしての構造化文書ＤＢ３１や、後述する構造−構造ＩＤ対応データ３２、語彙−語彙ＩＤ対応データ３３を記憶する記憶領域を有している。なお、本実施形態では、一の記憶手段により構造化文書ＤＢ３１、構造−構造ＩＤ対応データ３２及び語彙−語彙ＩＤ対応データ３３を保持する態様としたが、これに限らず、各機能に応じて夫々異なる記憶手段に保持する態様としてもよい。この場合、記憶手段の備えられる位置は、構造化文書管理装置１１の内部、外部を問わないものとする。 The storage unit 107 includes a storage medium that can be magnetically or optically recorded, and stores a program, various setting information, and the like related to the control of the structured document management apparatus 11 in a rewritable manner. The storage unit 107 has a storage area for storing a structured document DB 31 as a database for storing structured documents, structure-structure ID correspondence data 32, and vocabulary-vocabulary ID correspondence data 33 described later. Yes. In the present embodiment, the structured document DB 31, the structure-structure ID correspondence data 32, and the vocabulary-vocabulary ID correspondence data 33 are held by one storage unit. However, the present invention is not limited to this. It is good also as an aspect hold | maintained in a respectively different memory | storage means. In this case, the position where the storage unit is provided may be inside or outside the structured document management apparatus 11.

ここで、本実施形態で取り扱う構造化文書の記述形式について説明する。図３は、ＸＭＬで記述された構造化文書の一例を示した図である。同図では、特許に関する情報をＸＭＬ形式で記述した構造化文書の例を示している。ＸＭＬでは、文書の構造の表現にタグが用いられる。タグには、開始タグと終了タグが存在し、構造化文書の構成要素を開始タグと終了タグで囲むことにより、文書中の文字列の区切りと、その文字列が構造上何れの構成要素に属するのかを明確に記述することができるようになっている。 Here, the description format of the structured document handled in this embodiment will be described. FIG. 3 is a diagram showing an example of a structured document described in XML. In the figure, an example of a structured document in which information related to a patent is described in an XML format is shown. In XML, tags are used to represent the structure of a document. A tag has a start tag and an end tag. By enclosing the components of a structured document with a start tag and an end tag, the delimiter of the character string in the document and the character string can be assigned to any component in the structure. It is possible to clearly describe whether it belongs.

ここで、開始タグとは要素名称を記号「＜」、「＞」で閉じた書式で記載され、終了タグとは要素名称を記号「＜／」と「＞」で閉じた書式で記載される。なお、ＸＭＬでは、一連の開始タグと終了タグとで定義されたデータが一の要素を意味しており、例えば、＜特許＞タグと＜／特許＞タグとを含み、両タグで囲まれたデータが一の要素を構成している。 Here, the start tag is described in a format in which the element name is closed with symbols “<” and “>”, and the end tag is described in a format in which the element name is closed with symbols “</” and “>”. . In XML, data defined by a series of start tags and end tags means one element. For example, it includes a <patent> tag and a </ patent> tag, and is surrounded by both tags. Data constitutes one element.

図４は、構造化文書管理装置１１の機能構成を示したブロック図である。図３に示したように、構造化文書管理装置１１は、要求受付部２０、通信処理部２１、検索プラン生成部２２、検索プラン処理部２３、構造化文書変換部２４、ディクショナリ２５、構造化文書取得部２６、格納処理部２７、構造索引部２８、語彙索引部２９及び構造化文書ＤＢ管理部３０を備えている。 FIG. 4 is a block diagram showing a functional configuration of the structured document management apparatus 11. As shown in FIG. 3, the structured document management apparatus 11 includes a request receiving unit 20, a communication processing unit 21, a search plan generation unit 22, a search plan processing unit 23, a structured document conversion unit 24, a dictionary 25, and a structuring. A document acquisition unit 26, a storage processing unit 27, a structure index unit 28, a vocabulary index unit 29, and a structured document DB management unit 30 are provided.

要求受付部２０は、通信部１０６介して受信される他のノードやクライアント装置４０からの各種要求を受け付ける機能部である。要求受付部２０で受け付ける要求としては、クライアント装置４０から送信される検索要求や登録要求、他のノードから送信される実行要求等が挙げられる。 The request reception unit 20 is a functional unit that receives various requests from other nodes and the client device 40 received via the communication unit 106. The request received by the request receiving unit 20 includes a search request and a registration request transmitted from the client device 40, an execution request transmitted from another node, and the like.

通信処理部２１は、検索要求に対する検索結果や後述する処理結果等を、通信部１０６を介して他のノードやクライアント装置４０に送信する。 The communication processing unit 21 transmits a search result for the search request, a processing result described later, and the like to other nodes and the client device 40 via the communication unit 106.

検索プラン生成部２２は、要求受付部２０により受け付けられた検索要求に含まれる検索式を構文解析し、処理コストが最小になるようなプラン（検索処理の実行計画）を生成する。 The search plan generation unit 22 parses the search expression included in the search request received by the request reception unit 20 and generates a plan (search process execution plan) that minimizes the processing cost.

検索プラン処理部２３は、検索プラン生成部２２で生成されたプランに従って実行計画を実行し、構造化文書取得部２６を介して構造化文書ＤＢ３１から最終的に取得した構造化文書を検索結果として、通信処理部２１を介しクライアント装置４０に送信する。 The search plan processing unit 23 executes the execution plan according to the plan generated by the search plan generation unit 22 and uses the structured document finally acquired from the structured document DB 31 via the structured document acquisition unit 26 as a search result. And transmitted to the client device 40 via the communication processing unit 21.

構造化文書変換部２４は、後述する構造−構造ＩＤ対応データ３２に登録された構造索引３２１、語彙−語彙ＩＤ対応データ３３に登録された語彙索引３３１に基づいて、構造化文書の符号化及び符号化された構造化文書の復元を行う。 The structured document conversion unit 24 encodes the structured document based on the structure index 321 registered in the structure-structure ID correspondence data 32 and the vocabulary index 331 registered in the vocabulary-vocabulary ID correspondence data 33 described later. Reconstruct the encoded structured document.

また、構造化文書変換部２４は、符号化データに所定の圧縮形式で圧縮処理を施すことで符号化データのデータ量を減少させるとともに、他のノードから入力された符号化データに圧縮処理が施されていると判定した場合には、この圧縮を解凍する。なお、圧縮形式は特に問わないものとするが、例えば、ＺＩＰ形式やＬＺＨ形式等を用いることができる。 Further, the structured document conversion unit 24 reduces the data amount of the encoded data by performing compression processing on the encoded data in a predetermined compression format, and performs compression processing on the encoded data input from another node. If it is determined that it has been applied, this compression is decompressed. The compression format is not particularly limited. For example, a ZIP format, an LZH format, or the like can be used.

ディクショナリ２５には、構造化文書が登録される格納先ノードを決定するための所定のルールが予め定義されており、新たな構造化文書を構造化文書管理システム１０に登録する際には、このディクショナリ２５に定義されたルールに基づいて、格納処理部２７により格納先ノードが決定される。なお、決定された格納先ノードは、格納処理部２７により、登録対象となった構造化文書と対応付けられ、格納先情報としてディクショナリ２５に登録される。また、既に登録された構造化文書を読み出す際には、この格納先情報に基づいて、構造化文書が格納されたノードが特定される。 In the dictionary 25, predetermined rules for determining a storage destination node in which a structured document is registered are defined in advance. When a new structured document is registered in the structured document management system 10, this rule is stored. Based on the rules defined in the dictionary 25, the storage processing unit 27 determines the storage destination node. The determined storage destination node is associated with the structured document to be registered by the storage processing unit 27 and registered in the dictionary 25 as storage destination information. Further, when reading the already registered structured document, the node storing the structured document is specified based on the storage location information.

なお、ディクショナリ２５に定義されたルール自体は、ＲＯＭ１０４又は記憶部１０７に予め記憶されているものとする。また、ディクショナリ２５に登録される格納先情報は、記憶部１０７の所定の記憶領域に記憶されるものとする。 It is assumed that the rules defined in the dictionary 25 are stored in advance in the ROM 104 or the storage unit 107. The storage location information registered in the dictionary 25 is assumed to be stored in a predetermined storage area of the storage unit 107.

構造化文書取得部２６は、検索プラン処理部２３により指示された構造化文書を、構造化文書ＤＢ管理部３０を介し構造化文書ＤＢ３１から取得する。 The structured document acquisition unit 26 acquires the structured document instructed by the search plan processing unit 23 from the structured document DB 31 via the structured document DB management unit 30.

格納処理部２７は、クライアント装置４０から要求受付部２０を介して入力された登録要求に含まれる構造化文書の格納先ノードを、ディクショナリ２５に定義されたルールに基づいて決定し、この格納先ノードの構造化文書ＤＢ管理部３０に登録対象の構造化文書を登録させる。 The storage processing unit 27 determines the storage destination node of the structured document included in the registration request input from the client device 40 via the request reception unit 20 based on the rules defined in the dictionary 25, and this storage destination The structured document DB management unit 30 of the node registers the structured document to be registered.

また、格納処理部２７は、図４に示したとおり、文書解析部２７１、構造解析部２７２及び語彙解析部２７３を有している。文書解析部２７１は、ＸＭＬパーサ等であって、入力された構造化文書を構造解析する。具体的に、文書解析部２７１は、入力された構造化文書が図３に示したようなＸＭＬ文書である場合には、開始タグと終了タグとの対応など、この構造化文書がＸＭＬとしての記法に従っているかの確認を行う。この解析処理において、構造化文書が記法に従っていることを確認した後、当該構造化文書を構造解析部２７２及び語彙解析部２７３に出力する。 The storage processing unit 27 includes a document analysis unit 271, a structure analysis unit 272, and a vocabulary analysis unit 273 as illustrated in FIG. The document analysis unit 271 is an XML parser or the like, and analyzes the structure of the input structured document. Specifically, when the input structured document is an XML document as shown in FIG. 3, the document analysis unit 271 determines that the structured document is an XML document such as a correspondence between a start tag and an end tag. Check if the notation is followed. In this analysis process, after confirming that the structured document conforms to the notation, the structured document is output to the structure analyzing unit 272 and the vocabulary analyzing unit 273.

構造解析部２７２は、文書解析部２７１から入力された構造化文書に含まれるタグの構造位置を示すタグ構造を要素毎に解析し、このタグ構造の種別毎に固有の構造ＩＤを割り当て、当該構造と構造ＩＤとを対応付けた情報を構造索引３２１として生成する。ここで、要素とは、開始タグと終了タグ及び当該両タグ間に囲まれた文字列部分を意味する。また、割り当てられる構造ＩＤは、数値等の圧縮に適した形式であることが好ましい。 The structure analysis unit 272 analyzes the tag structure indicating the structure position of the tag included in the structured document input from the document analysis unit 271 for each element, assigns a unique structure ID for each type of tag structure, and Information associating the structure with the structure ID is generated as the structure index 321. Here, the element means a start tag, an end tag, and a character string portion surrounded by the two tags. Moreover, it is preferable that the structure ID to be assigned has a format suitable for compression such as numerical values.

また、構造解析部２７２は、上記した構造索引３２１の生成時において、構造−構造ＩＤ対応データ３２に登録された既存の構造索引を参照し、重複するものについては構造ＩＤを割り当てないよう制御する。 In addition, the structure analysis unit 272 refers to the existing structure index registered in the structure-structure ID correspondence data 32 at the time of generating the structure index 321 described above, and performs control so that structure IDs are not assigned to overlapping items. .

また、構造解析部２７２は、生成した構造索引３２１を、構造索引部２８に出力することで自己の構造−構造ＩＤ対応データ３２に登録させるとともに、通信処理部２１を介して他のノードに送信し、他のノードの構造−構造ＩＤ対応データ３２に登録させることで、生成した構造索引３２１を構造化文書管理システム１０内で共通化させる。また、構造解析部２７２は、要求受付部２０を介し、他のノードから構造索引３２１を受け取った場合には、当該構造索引３２１を構造索引部２８に出力することで構造−構造ＩＤ対応データ３２に登録させる。 In addition, the structure analysis unit 272 outputs the generated structure index 321 to the structure index unit 28 so that the structure analysis unit 272 registers the structure index 321 in its own structure-structure ID correspondence data 32 and transmits it to other nodes via the communication processing unit 21. The generated structure index 321 is made common in the structured document management system 10 by registering it in the structure-structure ID correspondence data 32 of other nodes. In addition, when the structure analysis unit 272 receives the structure index 321 from another node via the request reception unit 20, the structure analysis unit 272 outputs the structure index 321 to the structure index unit 28, whereby the structure-structure ID correspondence data 32 is output. To register.

図５−１、図５−２は、構造解析部２７２の動作を説明するための図である。まず、構造解析部２７２による構造化文書の解析の結果、要素毎のタグ構造が、図５−１の状態であったものとする。この場合、構造解析部２７２は、図５−２に示したように、タグ構造の種別毎に固有の構造ＩＤを割り当てた構造索引３２１を生成する。なお、図中「ｔｅｘｔ（）」は、この要素の文字列部分の位置を示している。 5A and 5B are diagrams for explaining the operation of the structure analysis unit 272. First, as a result of analyzing the structured document by the structure analysis unit 272, it is assumed that the tag structure for each element is in the state shown in FIG. In this case, as shown in FIG. 5B, the structure analysis unit 272 generates a structure index 321 to which a unique structure ID is assigned for each type of tag structure. In the figure, “text ()” indicates the position of the character string portion of this element.

図４に戻り、語彙解析部２７３は、文書解析部２７１から入力された構造化文書に含まれる文字列部分を要素毎に解析し、各文字列部分を所定の文字数からなる語彙単位に分割し、分割した語彙毎に固有の語彙ＩＤを割り当て、当該語彙と語彙ＩＤとを対応付けた情報を語彙索引３３１として生成する。ここで、文字列部分とは、開始タグと終了タグとで囲まれた文字列部分を意味する。また、割り当てられる語彙ＩＤは、数値等の圧縮に適した形式であることが好ましい。 Returning to FIG. 4, the vocabulary analysis unit 273 analyzes the character string portion included in the structured document input from the document analysis unit 271 for each element, and divides each character string portion into vocabulary units having a predetermined number of characters. A unique vocabulary ID is assigned to each divided vocabulary, and information that associates the vocabulary with the vocabulary ID is generated as the vocabulary index 331. Here, the character string portion means a character string portion surrounded by a start tag and an end tag. The assigned vocabulary ID is preferably in a format suitable for compression such as numerical values.

また、語彙解析部２７３は、上記した語彙索引３３１の生成時において、語彙−語彙ＩＤ対応データ３３に登録された既存の語彙索引を参照し、重複するものについては語彙ＩＤを割り当てないよう制御する。 In addition, the vocabulary analysis unit 273 refers to the existing vocabulary index registered in the vocabulary-vocabulary ID correspondence data 33 when generating the vocabulary index 331, and performs control so that lexical IDs are not assigned to overlapping items. .

また、語彙解析部２７３は、生成した語彙索引３３１を、語彙索引部２９に出力することで自己の語彙−語彙ＩＤ対応データ３３に登録させるとともに、通信処理部２１を介して他のノードに送信し、他のノードの語彙−語彙ＩＤ対応データ３３に登録させることで、成した語彙索引３３１を構造化文書管理システム１０内で共通化させる。また、語彙解析部２７３は、要求受付部２０を介し、他のノードから語彙索引３３１を受け取った場合には、当該語彙索引３３１を語彙索引部２９に出力することで語彙−語彙ＩＤ対応データ３３に登録させる。 Further, the vocabulary analysis unit 273 outputs the generated vocabulary index 331 to the vocabulary index unit 29 to register it in its own vocabulary-vocabulary ID correspondence data 33 and transmits it to other nodes via the communication processing unit 21. The registered vocabulary index 331 is made common in the structured document management system 10 by registering it in the vocabulary-vocabulary ID correspondence data 33 of other nodes. In addition, when the vocabulary analysis unit 273 receives the vocabulary index 331 from another node via the request reception unit 20, the vocabulary index 331 is output to the vocabulary index unit 29 to output the vocabulary-vocabulary ID correspondence data 33. To register.

図６は、語彙解析部２７３の動作を説明するための図である。まず、語彙解析部２７３による構造化文書の解析の結果、図６左図に示したように、構造化文書に含まれる一要素の文字列「並列検索装置」が文字列部分としと抽出されたものとする。この場合、語彙解析部２７３は、図６中図に示したように、文字列「並列検索装置」を二文字からなる語彙単位に分割する。なお、最後に分割された文字列は、一文字分の「置」となるため、語彙索引部２９により末尾に「＃」を付与している。次いで、語彙解析部２７３は、分割した語彙毎に固有の語彙ＩＤを付与し、図６右図に示したように、当該語彙と語彙ＩＤとを対応付けた情報を語彙索引３３１として生成する。 FIG. 6 is a diagram for explaining the operation of the vocabulary analysis unit 273. First, as a result of the analysis of the structured document by the vocabulary analysis unit 273, as shown in the left diagram of FIG. 6, a single character string “parallel search device” included in the structured document is extracted as a character string portion. Shall. In this case, the vocabulary analyzing unit 273 divides the character string “parallel search device” into vocabulary units composed of two characters, as shown in FIG. Since the character string divided last is a “place” for one character, the vocabulary index unit 29 adds “#” to the end. Next, the vocabulary analysis unit 273 assigns a unique vocabulary ID to each divided vocabulary, and generates information that associates the vocabulary with the vocabulary ID as the vocabulary index 331 as illustrated in the right diagram of FIG.

構造索引部２８は、構造解析部２７２から入力される構造索引３２１を、構造−構造ＩＤ対応データ３２に登録する。また、語彙索引部２９は、語彙解析部２７３から入力される語彙索引３３１を、語彙−語彙ＩＤ対応データ３３に登録する。 The structure index unit 28 registers the structure index 321 input from the structure analysis unit 272 in the structure-structure ID correspondence data 32. The vocabulary index unit 29 registers the vocabulary index 331 input from the vocabulary analysis unit 273 in the vocabulary-vocabulary ID correspondence data 33.

構造化文書ＤＢ管理部３０は、構造化文書ＤＢ３１の管理を行う機能部であって、構造化文書取得部２６からの指示に応じて、構造化文書ＤＢ３１から構造化文書の読み出しや、格納処理部２７からの指示に応じて、構造化文書ＤＢ３１に構造化文書の登録等を行う。 The structured document DB management unit 30 is a functional unit that manages the structured document DB 31. In response to an instruction from the structured document acquisition unit 26, the structured document DB 31 is read from the structured document DB 31 and stored. In response to an instruction from the unit 27, the structured document is registered in the structured document DB 31.

以下、図７〜図１０を参照して、クライアント装置４０から、構造化文書の登録を要求する登録要求が構造化文書管理システム１０に送信された場合の動作について説明する。 Hereinafter, an operation when a registration request for requesting registration of a structured document is transmitted from the client device 40 to the structured document management system 10 will be described with reference to FIGS.

図７は、登録要求を受け付けたノード（マスタノード）での構造化文書の登録に係る処理（構造化文書登録処理）の手順を示したフローチャートである。 FIG. 7 is a flowchart showing a procedure of a process (structured document registration process) related to registration of a structured document at a node (master node) that has received a registration request.

まず、要求受付部２０は、クライアント装置４０から登録要求として指示された構造化文書を受け付けると、この構造化文書を格納処理部２７へと出力する（ステップＳ１１）。続いて、格納処理部２７の文書解析部２７１は、入力された構造化文書を解析し（ステップＳ１２）、この解析結果を構造解析部２７２及び語彙解析部２７３に出力することで、ステップＳ１３の構造−構造ＩＤデータ更新処理、ステップＳ１４の語彙−語彙ＩＤデータ更新処理を順次実行する。 First, when receiving a structured document instructed as a registration request from the client device 40, the request receiving unit 20 outputs the structured document to the storage processing unit 27 (step S11). Subsequently, the document analysis unit 271 of the storage processing unit 27 analyzes the input structured document (step S12), and outputs the analysis result to the structure analysis unit 272 and the vocabulary analysis unit 273, so that the process of step S13 is performed. The structure-structure ID data update process and the vocabulary-vocabulary ID data update process in step S14 are sequentially executed.

以下、図８を参照して、ステップＳ１３の構造−構造ＩＤデータ更新処理について説明する。図８は、構造−構造ＩＤデータ更新処理の手順を示したフローチャートである。 Hereinafter, the structure-structure ID data update process in step S13 will be described with reference to FIG. FIG. 8 is a flowchart showing the procedure of the structure-structure ID data update process.

まず、構造解析部２７２は、文書解析部２７１から入力された構造化文書に基づき、タグの構造位置を示すタグ構造を要素毎に取得する（ステップＳ１３１）。 First, the structure analysis unit 272 acquires, for each element, a tag structure indicating a tag structure position based on the structured document input from the document analysis unit 271 (step S131).

次いで、構造解析部２７２は、構造−構造ＩＤ対応データ３２を参照し、ステップＳ１３１で取得したタグ構造と同様のタグ構造が、構造−構造ＩＤ対応データ３２に既に登録されているか否かを判定する（ステップＳ１３２）。ここで、取得した全てのタグ構造が構造−構造ＩＤ対応データ３２に既に登録されていると判定した場合には（ステップＳ１３２；Ｙｅｓ）、本処理を終了し図７のステップＳ１４へと直ちに移行する。 Next, the structure analysis unit 272 refers to the structure-structure ID correspondence data 32 and determines whether the same tag structure as the tag structure acquired in step S131 is already registered in the structure-structure ID correspondence data 32. (Step S132). Here, if it is determined that all the acquired tag structures are already registered in the structure-structure ID correspondence data 32 (step S132; Yes), this process is terminated and the process immediately proceeds to step S14 in FIG. To do.

一方、ステップＳ１３２において、構造−構造ＩＤ対応データ３２に登録されていないタグ構造があると判定した場合には（ステップＳ１３２；Ｎｏ）、構造解析部２７２は、登録されていないと判定したタグ構造に対し、固有の構造ＩＤを割り当て（ステップＳ１３３）、このタグ構造と構造ＩＤとを対応付けた構造索引３２１を生成する（ステップＳ１３４）。 On the other hand, if it is determined in step S132 that there is a tag structure not registered in the structure-structure ID correspondence data 32 (step S132; No), the structure analysis unit 272 determines that the tag structure is not registered. A unique structure ID is assigned (step S133), and a structure index 321 in which the tag structure is associated with the structure ID is generated (step S134).

続いて、構造解析部２７２は、生成した構造索引３２１を、構造索引部２８に出力することで自己の構造−構造ＩＤ対応データ３２に登録させるとともに（ステップＳ１３５）、通信処理部２１を介して他のノード（スレーブノード）に送信することで、他のノードの構造−構造ＩＤ対応データ３２に登録させ（ステップＳ１３６）、図７のステップＳ１４へと移行する。 Subsequently, the structure analysis unit 272 outputs the generated structure index 321 to the structure index unit 28 so as to be registered in its own structure-structure ID correspondence data 32 (step S135), and via the communication processing unit 21. By transmitting to another node (slave node), the structure-structure ID correspondence data 32 of the other node is registered (step S136), and the process proceeds to step S14 in FIG.

次に、図９を参照して、ステップＳ１４の語彙−語彙ＩＤデータ更新処理について説明する。図９は、語彙−語彙ＩＤデータ更新処理の手順を示したフローチャートである。 Next, the vocabulary-vocabulary ID data update processing in step S14 will be described with reference to FIG. FIG. 9 is a flowchart showing a vocabulary-vocabulary ID data update process.

まず、語彙解析部２７３は、文書解析部２７１から入力された構造化文書に含まれる文字列部分を、要素毎に抽出し（ステップＳ１４１）、抽出した各文字列を所定の文字数毎に分割し語彙を取得する（ステップＳ１４２）。 First, the vocabulary analysis unit 273 extracts a character string portion included in the structured document input from the document analysis unit 271 for each element (step S141), and divides each extracted character string for each predetermined number of characters. Vocabulary is acquired (step S142).

次いで、語彙解析部２７３は、語彙−語彙ＩＤ対応データ３３を参照し、ステップＳ１４２で取得した各語彙と同様の語彙が、語彙−語彙ＩＤ対応データ３３に既に登録されているか否かを判定する（ステップＳ１４３）。ここで、語彙索引部２９は、取得した全ての語彙が語彙−語彙ＩＤ対応データ３３に既に登録されていると判定した場合には（ステップＳ１４３；Ｙｅｓ）、本処理を終了し図７のステップＳ１５へと直ちに移行する。 Next, the vocabulary analysis unit 273 refers to the vocabulary-vocabulary ID correspondence data 33 and determines whether or not the vocabulary similar to each vocabulary acquired in step S142 is already registered in the vocabulary-vocabulary ID correspondence data 33. (Step S143). Here, if the vocabulary index unit 29 determines that all the acquired vocabularies are already registered in the vocabulary-vocabulary ID correspondence data 33 (step S143; Yes), the process ends and the step of FIG. Immediately move to S15.

一方、ステップＳ１４３において、語彙−語彙ＩＤ対応データ３３に登録されていない語彙があると判定した場合には（ステップＳ１４３；Ｎｏ）、語彙索引部２９は、登録されていないと判定した語彙に対し、固有の語彙ＩＤを割り当て（ステップＳ１４４）、この語彙と語彙ＩＤとを対応付けた語彙索引３３１を生成する（ステップＳ１４５）。 On the other hand, if it is determined in step S143 that there is a vocabulary that is not registered in the vocabulary-vocabulary ID correspondence data 33 (step S143; No), the vocabulary index unit 29 determines that the vocabulary determined not to be registered. A unique vocabulary ID is assigned (step S144), and a vocabulary index 331 that associates the vocabulary with the vocabulary ID is generated (step S145).

続いて、語彙索引部２９は、生成した構造索引３２１を、語彙索引部２９に出力することで自己の語彙−語彙ＩＤ対応データ３３に登録させるとともに（ステップＳ１４６）、通信処理部２１を介して他のノード（スレーブノード）に送信することで、他のノードの語彙−語彙ＩＤ対応データ３３に登録させ（ステップＳ１４７）、図７のステップＳ１５へと移行する。 Subsequently, the vocabulary index unit 29 outputs the generated structure index 321 to the vocabulary index unit 29 so as to be registered in its own vocabulary-vocabulary ID correspondence data 33 (step S146) and through the communication processing unit 21. By transmitting to another node (slave node), it is registered in the vocabulary-vocabulary ID correspondence data 33 of the other node (step S147), and the process proceeds to step S15 in FIG.

図７に戻り、格納処理部２７は、ディクショナリ２５を参照し、登録対象となった構造化文書の格納先ノードを決定した後（ステップＳ１５）、この格納先ノードが自ノードか否かを判定する（ステップＳ１６）。 Returning to FIG. 7, the storage processing unit 27 refers to the dictionary 25, determines the storage destination node of the structured document to be registered (step S 15), and determines whether or not this storage destination node is its own node. (Step S16).

ステップＳ１６において、格納先ノードが自己のノードと判定した場合には（ステップＳ１６；Ｙｅｓ）、登録対象となった構造化文書を、自己のノードが有する構造化文書ＤＢ３１に登録し（ステップＳ１７）、ステップＳ１９の処理へと移行する。 If it is determined in step S16 that the storage destination node is its own node (step S16; Yes), the structured document to be registered is registered in the structured document DB 31 of its own node (step S17). The process proceeds to step S19.

一方、ステップＳ１６において、格納先ノードが他のノードと判定した場合には（ステップＳ１６；Ｎｏ）、格納処理部２７は、登録対象となった構造化文書を、通信処理部２１を介して格納先ノードに送信（転送）することで、格納先ノードの構造化文書ＤＢ３１に登録させ（ステップＳ１８）、ステップＳ１９の処理へと移行する。 On the other hand, when it is determined in step S16 that the storage destination node is another node (step S16; No), the storage processing unit 27 stores the structured document to be registered via the communication processing unit 21. By transmitting (transferring) to the destination node, it is registered in the structured document DB 31 of the storage destination node (step S18), and the process proceeds to step S19.

続くステップＳ１９では、格納処理部２７は、登録対象となった構造化文書と、当該構造化文書の格納先ノードとを対応付けた格納先情報を、ディクショナリ２５に登録するとともに（ステップＳ１９）、この格納先情報を、通信処理部２１を介して他のノード（スレーブノード）に送信することで、他のノードのディクショナリ２５に登録させ（ステップＳ２０）、本処理を終了する。 In subsequent step S19, the storage processing unit 27 registers storage destination information in which the structured document to be registered and the storage destination node of the structured document are associated with each other in the dictionary 25 (step S19). This storage location information is transmitted to another node (slave node) via the communication processing unit 21 to be registered in the dictionary 25 of the other node (step S20), and this process is terminated.

図１０は、マスタノードから送信される各種情報を受け付けるノード側（スレーブノード）での、各種情報の登録にかかる処理（他ノード側登録処理）の手順を示したフローチャートである。なお、本処理は、常時又はマスタノードから送信される指示信号に応じて実行されるものとする。 FIG. 10 is a flowchart showing a procedure of processing (registration processing on the other node side) related to registration of various information on the node side (slave node) that receives various information transmitted from the master node. It is assumed that this process is always performed or according to an instruction signal transmitted from the master node.

まず、格納処理部２７（構造解析部２７２）は、要求受付部２０を介してマスタノードから構造索引３２１を受信したか否かを判定する（ステップＳ２１）。ここで、格納処理部２７は、構造索引３２１が受信していないと判定した場合には（ステップＳ２１；Ｎｏ）、ステップＳ２３の処理へと直ちに移行する。 First, the storage processing unit 27 (structure analysis unit 272) determines whether or not the structure index 321 has been received from the master node via the request reception unit 20 (step S21). Here, when the storage processing unit 27 determines that the structure index 321 has not been received (step S21; No), the storage processing unit 27 immediately proceeds to the process of step S23.

一方、ステップＳ２１において、マスタノードから構造索引３２１を受信したと判定した場合（ステップＳ２１；Ｙｅｓ）、構造解析部２７２は、この構造索引３２１を構造索引部２８に出力することで、自己の構造−構造ＩＤ対応データ３２に登録させた後（ステップＳ２２）、ステップＳ２３の処理へと移行する。 On the other hand, when it is determined in step S21 that the structure index 321 has been received from the master node (step S21; Yes), the structure analysis unit 272 outputs the structure index 321 to the structure index unit 28, so -After being registered in the structure ID correspondence data 32 (step S22), the process proceeds to step S23.

続くステップＳ２３では、格納処理部２７（語彙解析部２７３）が、要求受付部２０を介してマスタノードから語彙索引３３１を受信したか否かを判定する（ステップＳ２３）。ここで、格納処理部２７は、語彙索引３３１を受信していないと判定した場合には（ステップＳ２３；Ｎｏ）、ステップＳ２５の処理へと直ちに移行する。 In subsequent step S23, the storage processing unit 27 (vocabulary analyzing unit 273) determines whether or not the vocabulary index 331 has been received from the master node via the request receiving unit 20 (step S23). If the storage processing unit 27 determines that the vocabulary index 331 has not been received (step S23; No), the storage processing unit 27 immediately proceeds to the process of step S25.

一方、ステップＳ２３において、マスタノードから語彙索引３３１を受信したと判定した場合（ステップＳ２３；Ｙｅｓ）、語彙解析部２７３は、この語彙索引３３１を語彙索引部２９に出力することで、自己の語彙−語彙ＩＤ対応データ３３に登録させた後（ステップＳ２４）、ステップＳ２５の処理へと移行する。 On the other hand, when it is determined in step S23 that the vocabulary index 331 has been received from the master node (step S23; Yes), the vocabulary analysis unit 273 outputs the vocabulary index 331 to the vocabulary index unit 29, so that its own vocabulary is obtained. -After being registered in the vocabulary ID correspondence data 33 (step S24), the process proceeds to step S25.

続くステップＳ２５では、格納処理部２７が、要求受付部２０を介してマスタノードから登録対象となる構造化文書が転送されてきたか否かを判定する（ステップＳ２５）。ここで、格納処理部２７は、構造化文書を受信していないと判定した場合には（ステップＳ２５；Ｎｏ）、ステップＳ２７の処理へと直ちに移行する。 In subsequent step S25, the storage processing unit 27 determines whether or not a structured document to be registered has been transferred from the master node via the request receiving unit 20 (step S25). If the storage processing unit 27 determines that the structured document has not been received (step S25; No), the storage processing unit 27 immediately proceeds to the process of step S27.

一方、ステップＳ２５において、マスタノードから登録対象となる構造化文書を受信したと判定した場合（ステップＳ２５；Ｙｅｓ）、格納処理部２７は、この構造化文書を構造化文書ＤＢ管理部３０に出力することで、構造化文書ＤＢ３１に登録させた後（ステップＳ２６）、ステップＳ２７の処理へと移行する。 On the other hand, when it is determined in step S25 that the structured document to be registered is received from the master node (step S25; Yes), the storage processing unit 27 outputs the structured document to the structured document DB management unit 30. As a result, after being registered in the structured document DB 31 (step S26), the process proceeds to step S27.

続くステップＳ２７では、格納処理部２７が、要求受付部２０を介して格納先情報をマスタノードから受信したか否かを判定する（ステップＳ２７）。ここで、格納処理部２７は、格納先情報を受信していないと判定した場合には（ステップＳ２７；Ｎｏ）、本処理を直ちに終了する。 In subsequent step S27, the storage processing unit 27 determines whether or not the storage destination information is received from the master node via the request receiving unit 20 (step S27). Here, if the storage processing unit 27 determines that the storage location information has not been received (step S27; No), the process immediately ends.

一方、ステップＳ２７において、マスタノードから格納先情報を受信したと判定した場合には（ステップＳ２７；Ｙｅｓ）、格納処理部２７は、この格納先情報をディクショナリ２５に登録した後（ステップＳ２８）、本処理を終了する。 On the other hand, if it is determined in step S27 that the storage location information has been received from the master node (step S27; Yes), the storage processing unit 27 registers this storage location information in the dictionary 25 (step S28). This process ends.

このように、新たに登録した構造化文書に含まれるタグ構造及び語彙を、構造化文書管理システム１０を構成する各ノード間で共通の情報とすることができ、また、タグ構造に割り当てた構造ＩＤ及び語彙に割り当てた語彙ＩＤを、構造化文書管理システム１０を構成する各ノード間で共通の情報とすることができる。 In this way, the tag structure and vocabulary included in the newly registered structured document can be used as common information among the nodes constituting the structured document management system 10, and the structure assigned to the tag structure The vocabulary ID assigned to the ID and vocabulary can be common information among the nodes constituting the structured document management system 10.

なお、本実施形態では、マスタノードから受け付けた構造索引３２１、語彙索引３３１を、無条件に構造−構造ＩＤ対応データ３２、語彙−語彙ＩＤ対応データ３３の夫々に登録する態様としたが、これに限らず、上述した構造−構造ＩＤデータ更新処理（図８参照）、語彙−語彙ＩＤデータ更新処理（図９参照）のように、登録前に重複する索引が存在するか否かを確認する態様としてもよい。 In this embodiment, the structure index 321 and the vocabulary index 331 received from the master node are unconditionally registered in the structure-structure ID correspondence data 32 and the vocabulary-vocabulary ID correspondence data 33, respectively. It is not limited to this, and it is checked whether there is an overlapping index before registration, such as the structure-structure ID data update process (see FIG. 8) and the vocabulary-vocabulary ID data update process (see FIG. 9). It is good also as an aspect.

また、本実施形態では、格納先ノードに登録対象となった構造化文書自体を送信する態様としたが、これに限らず、当該構造化文書に含まれるタグ構造（構造ＩＤ）及び語彙（語彙ＩＤ）を各ノード間で共通化した後には、後述する構造化文書符号化処理（図１２参照）を施した符号化データの状態で送信する態様としてもよい。この場合、後述する構造化文書復元処理（図１４参照）により、元の構造化文書を復元することができ、元の構造化文書をそのまま転送する場合に比べて、転送データ量を減少し、転送に要する時間を短縮することができる。 In this embodiment, the structured document itself that is the registration target is transmitted to the storage node. However, the present invention is not limited to this, and the tag structure (structure ID) and vocabulary (vocabulary) included in the structured document are not limited thereto. After sharing the ID) between the nodes, it may be transmitted in the state of encoded data subjected to a structured document encoding process (see FIG. 12) described later. In this case, the original structured document can be restored by a structured document restoration process (see FIG. 14), which will be described later, and the transfer data amount is reduced as compared with the case of transferring the original structured document as it is, The time required for transfer can be reduced.

次に、図１１〜図１６を参照して、クライアント装置４０から、構造化文書の検索を要求する検索要求が入力された場合の動作について説明する。 Next, with reference to FIGS. 11 to 16, an operation when a search request for requesting a search for a structured document is input from the client device 40 will be described.

図１１は、検索要求を受け付けたノード（マスタノード）での構造化文書の検索に係る処理（構造化文書検索処理）の手順を示したフローチャートである。まず、要求受付部２０は、クライアント装置４０から検索要求を受け付けると、この検索要求を検索プラン生成部２２に出力する（ステップＳ３１）。続いて、検索プラン生成部２２は、入力された検索要求に含まれた検索式に基づいて、プラン（検索処理の実行計画）を生成し、検索プラン処理部２３に出力する（ステップＳ３２）。 FIG. 11 is a flowchart illustrating a procedure of a process (structured document search process) related to a search for a structured document at a node (master node) that has received a search request. First, when receiving a search request from the client device 40, the request receiving unit 20 outputs the search request to the search plan generating unit 22 (step S31). Subsequently, the search plan generation unit 22 generates a plan (search process execution plan) based on the search expression included in the input search request, and outputs the plan to the search plan processing unit 23 (step S32).

次いで、検索プラン処理部２３は、ディクショナリ２５を参照し、入力されたプランに他のノードの構造化文書ＤＢ３１に係る処理が含まれているか否かを判定する（ステップＳ３３）。ここで、検索プラン処理部２３は、プランに含まれた各実行計画が自己のノードの構造化文書ＤＢ３１に登録された構造化文書のみで遂行できると判定した場合には（ステップＳ３３；Ｎｏ）、プランに指示された実行計画を順次実行し（ステップＳ３４）、ステップＳ４０へと移行する。 Next, the search plan processing unit 23 refers to the dictionary 25 and determines whether or not the input plan includes a process related to the structured document DB 31 of another node (step S33). Here, when the search plan processing unit 23 determines that each execution plan included in the plan can be executed only with the structured document registered in the structured document DB 31 of its own node (step S33; No). Then, the execution plans instructed by the plan are sequentially executed (step S34), and the process proceeds to step S40.

一方、ステップＳ３３において、他のノードの構造化文書ＤＢ３１に格納された構造化文書が必要と判定した場合には（ステップＳ３３；Ｙｅｓ）、検索プラン処理部２３が、プランに指示された実行計画のうち、他のノードに係る実行計画まで実行した後（ステップＳ３５）、ステップＳ３６の構造化文書符号化処理へと移行する。以下、図１２を参照して、ステップＳ３６の構造化文書符号化処理について説明する。 On the other hand, if it is determined in step S33 that a structured document stored in the structured document DB 31 of another node is necessary (step S33; Yes), the search plan processing unit 23 executes the execution plan indicated in the plan. Among them, after executing up to an execution plan relating to another node (step S35), the process proceeds to the structured document encoding process of step S36. Hereinafter, the structured document encoding process in step S36 will be described with reference to FIG.

図１２は、構造化文書符号化処理の手順を示したフローチャートである。まず、構造化文書変換部２４は、ステップＳ３５の処理で取得された中間データ（構造化文書）に含まれる一の要素を処理対象とする（ステップＳ３６１）。 FIG. 12 is a flowchart showing the procedure of the structured document encoding process. First, the structured document conversion unit 24 sets one element included in the intermediate data (structured document) acquired in the process of step S35 as a processing target (step S361).

次いで、構造化文書変換部２４は、構造−構造ＩＤ対応データ３２に登録された構造索引に基づいて、処理対象要素のタグ構造を構造ＩＤに変換する（ステップＳ３６２）。 Next, the structured document conversion unit 24 converts the tag structure of the processing target element into a structure ID based on the structure index registered in the structure-structure ID correspondence data 32 (step S362).

続いて、構造化文書変換部２４は、語彙−語彙ＩＤ対応データ３３に登録された構造索引に基づいて、処理対象要素に含まれた文字列を構成する語彙を語彙ＩＤに変換するとともに、当該語彙ＩＤ毎に対応する語彙の出現位置を指示するオフセットを付与する（ステップＳ３６３）。 Subsequently, the structured document conversion unit 24 converts the vocabulary constituting the character string included in the processing target element into the vocabulary ID based on the structure index registered in the vocabulary-vocabulary ID correspondence data 33, and An offset indicating the appearance position of the corresponding vocabulary is assigned to each vocabulary ID (step S363).

次に、構造化文書変換部２４は、中間データに含まれた全ての要素に対し、ステップＳ３６２及びステップＳ３６３の処理を施したか否かを判定し、未処理の要素が存在すると判定した場合には（ステップＳ３６４；Ｎｏ）、ステップＳ３６１へと再び戻り、未処理の要素を処理対象とした後、ステップＳ３６２及びステップＳ３６２の処理を施す。 Next, the structured document conversion unit 24 determines whether or not the processing in steps S362 and S363 has been performed on all elements included in the intermediate data, and determines that there are unprocessed elements. (Step S364; No), the process returns to Step S361 again, and unprocessed elements are processed, and then the processes of Steps S362 and S362 are performed.

一方、ステップＳ３６４において、全ての要素に対しステップＳ３６２及びステップＳ３６３の処理を施したと判定した場合には（ステップＳ３６４；Ｙｅｓ）、要素毎に取得した変換後のデータを結合し、符号化データとする（ステップＳ３６５）。 On the other hand, if it is determined in step S364 that all the elements have been processed in steps S362 and S363 (step S364; Yes), the converted data acquired for each element is combined and encoded data (Step S365).

次いで、構造化文書変換部２４は、符号化データに対し、圧縮処理を施すか否かを判定する（ステップＳ３６６）。ここで、圧縮処理を施すか否かの設定は、設定情報として記憶部１０７に予め記憶されているものとし、構造化文書変換部２４は、この設定情報に基づいて、ステップＳ３６６の判定を行うものとする。 Next, the structured document conversion unit 24 determines whether or not to compress the encoded data (step S366). Here, it is assumed that the setting as to whether or not to perform compression processing is stored in advance in the storage unit 107 as setting information, and the structured document conversion unit 24 performs the determination in step S366 based on this setting information. Shall.

ステップＳ３６６において、圧縮処理を施さないと判定した場合には（ステップＳ３６６；Ｎｏ）、図１１のステップＳ３７へと直ちに移行する。また、ステップＳ３６６において、圧縮処理を施すと判定した場合には（ステップＳ３６６；Ｙｅｓ）、構造化文書変換部２４は、符号化データに対し、所定の圧縮形式で圧縮処理を施すことで、データ量を減少させた後（ステップＳ３６７）、図１１のステップＳ３７へと移行する。 If it is determined in step S366 that compression processing is not performed (step S366; No), the process immediately proceeds to step S37 in FIG. If it is determined in step S366 that compression processing is to be performed (step S366; Yes), the structured document conversion unit 24 performs compression processing on the encoded data in a predetermined compression format to obtain data. After the amount is decreased (step S367), the process proceeds to step S37 in FIG.

図１１に戻り、検索プラン処理部２３は、ステップＳ３６で取得された符号化データと、ステップＳ３２で生成されたプランとを含んだ実行要求を、通信処理部２１を介して、当該プランの遂行に必要な他のノード（スレーブノード）に送信する（ステップＳ３７）。 Returning to FIG. 11, the search plan processing unit 23 executes an execution request including the encoded data acquired in step S 36 and the plan generated in step S 32 via the communication processing unit 21. To other nodes (slave nodes) necessary for the transmission (step S37).

以下、図１３を参照して、マスタノードから送信される実行要求を受け付けるノード（スレーブノード）での、構造化文書の検索にかかる動作を説明する。図１３は、スレーブノードで実行される構造化文書の検索にかかる処理（他ノード側構造化検索処理）の手順を示したフローチャートである。 Hereinafter, with reference to FIG. 13, an operation related to the retrieval of the structured document in the node (slave node) that receives the execution request transmitted from the master node will be described. FIG. 13 is a flowchart showing a procedure of a process related to structured document search (another node side structured search process) executed in the slave node.

スレーブノード側では、要求受付部２０が実行要求を受け付けると、この実行要求を自己の検索プラン処理部２３に出力する（ステップＳ５１）。続いて、検索プラン処理部２３は、実行要求に含まれた符号化データを構造化文書変換部２４に出力し（ステップＳ５２）、ステップＳ５３の構造化文書復元処理へと移行させる。以下、図１４を参照して、ステップＳ５３の構造化運書復元処理について説明する。 On the slave node side, when the request reception unit 20 receives the execution request, the execution request is output to its own search plan processing unit 23 (step S51). Subsequently, the search plan processing unit 23 outputs the encoded data included in the execution request to the structured document conversion unit 24 (step S52), and shifts to the structured document restoration processing in step S53. Hereinafter, with reference to FIG. 14, the structured memorandum restoration process in step S53 will be described.

図１４は、構造化文書復元処理の手順を示したフローチャートである。まず、検索プラン処理部２３は、入力された符号化データの拡張子や、データ構造に基づいて、当該符号化データに圧縮処理が施されているか否かを判定する（ステップＳ５３１）。ここで、圧縮処理が施されていないと判定した場合には（ステップＳ５３１；Ｎｏ）、ステップＳ５３３へと直ちに移行する。 FIG. 14 is a flowchart showing the procedure of the structured document restoration process. First, the search plan processing unit 23 determines whether or not compression processing has been performed on the encoded data based on the extension of the input encoded data and the data structure (step S531). If it is determined that the compression process has not been performed (step S531; No), the process immediately proceeds to step S533.

また、ステップＳ５３１において、圧縮処理が施されていると判定した場合には（ステップＳ５３１；Ｙｅｓ）、検索プラン処理部２３は、この符号化データに解凍処理を施した後（ステップＳ５３２）、ステップＳ５３３へと移行する。 If it is determined in step S531 that compression processing has been performed (step S531; Yes), the search plan processing unit 23 performs decompression processing on the encoded data (step S532), and then step The process shifts to S533.

続くステップＳ５３３において、検索プラン処理部２３は、符号化データに含まれた一の要素を処理対象とする（ステップＳ５３３）。ここで、要素とは、符号化された一のタグ構造（構造ＩＤ）と当該タグ構造の文字列部分（語彙群）を意味する。 In subsequent step S533, the search plan processing unit 23 sets one element included in the encoded data as a processing target (step S533). Here, the element means one encoded tag structure (structure ID) and a character string portion (vocabulary group) of the tag structure.

ステップＳ５３３において、検索プラン処理部２３は、構造−構造ＩＤ対応データ３２に登録された構造索引に基づいて、処理対象要素に含まれた構造ＩＤを、タグ構造に変換する（ステップＳ５３４）。 In step S533, the search plan processing unit 23 converts the structure ID included in the processing target element into a tag structure based on the structure index registered in the structure-structure ID correspondence data 32 (step S534).

続いて、検索プラン処理部２３は、語彙−語彙ＩＤ対応データ３３に登録された語彙索引に基づいて、処理対象要素に含まれた各語彙ＩＤを語彙に変換するとともに、各語彙をオフセットで指示された出現位置に基づいて配置し、文字列部分を復元する（ステップＳ５３５）。 Subsequently, the search plan processing unit 23 converts each vocabulary ID included in the processing target element into a vocabulary based on the vocabulary index registered in the vocabulary-vocabulary ID correspondence data 33 and designates each vocabulary with an offset. The character string portion is restored based on the appearing position (step S535).

次いで、構造化文書変換部２４は、符号化データに含まれた全ての要素に対し、ステップＳ５３４及びステップＳ５３５の処理を施したか否かを判定する（ステップＳ５３６）。ここで、未処理の要素が存在すると判定した場合には（ステップＳ５３６；Ｎｏ）、ステップＳ５３３へと再び戻り、未処理の要素を処理対象とした後、ステップＳ５３４及びステップＳ５３５の処理を施す。 Next, the structured document conversion unit 24 determines whether or not the processing of Step S534 and Step S535 has been performed on all elements included in the encoded data (Step S536). Here, when it is determined that there is an unprocessed element (step S536; No), the process returns to step S533 again, and after processing the unprocessed element, the processes of steps S534 and S535 are performed.

また、ステップＳ５３６において、全ての要素に対してステップＳ５３４及びステップＳ５３５の処理を施したと判定した場合には（ステップＳ５３６；Ｙｅｓ）、構造化文書変換部２４は、全ての要素に対するステップＳ５３４及びステップＳ５３５の処理結果から元の構造化文書を再構成し（ステップＳ５３７）、図１３のステップＳ５４の処理へと移行する。 If it is determined in step S536 that all the elements have been processed in steps S534 and S535 (step S536; Yes), the structured document conversion unit 24 performs steps S534 and S534 for all the elements. The original structured document is reconstructed from the processing result of step S535 (step S537), and the process proceeds to step S54 of FIG.

図１３に戻り、検索プラン処理部２３は、ステップＳ５３の処理で復元された中間データ（構造化文書）を用いて、プランに指示された実行計画のうち、自己のノードにて処理可能な実行計画を実行し（ステップＳ５４）、結果として得られた構造化文書を検索プラン処理部２３に出力することで、構造化文書符号化処理を実行させる（ステップＳ５５）。なお、ステップＳ５５の構造化文書符号化処理は、上述したステップＳ３７の構造化文書符号化処理と同様であるため、その説明は省略する。 Returning to FIG. 13, the search plan processing unit 23 uses the intermediate data (structured document) restored in the process of step S 53, and the execution plan that can be processed in its own node among the execution plans instructed by the plan. The plan is executed (step S54), and the structured document obtained as a result is output to the search plan processing unit 23, thereby executing the structured document encoding process (step S55). Note that the structured document encoding process in step S55 is the same as the structured document encoding process in step S37 described above, and a description thereof will be omitted.

続いて、検索プラン処理部２３は、ステップＳ５５により取得された符号化データを、処理結果として実行要求の送信元となったマスタノードに通信処理部２１を介して送信し（ステップＳ５６）、スレーブノード側での処理を終了する。 Subsequently, the search plan processing unit 23 transmits the encoded data acquired in step S55 as a processing result to the master node that is the transmission source of the execution request via the communication processing unit 21 (step S56). Terminates processing on the node side.

図１１に戻り、マスタノード側では、要求受付部２０がスレーブノードから処理結果を受け付けると、この処理結果を検索プラン処理部２３を介して、構造化文書変換部２４に出力する（ステップＳ３８）。 Returning to FIG. 11, on the master node side, when the request reception unit 20 receives the processing result from the slave node, the processing result is output to the structured document conversion unit 24 via the search plan processing unit 23 (step S38). .

構造化文書変換部２４では、処理結果として入力された符号化データを復元する構造化文書復元処理を実行し、符号化データから中間データ（構造化文書）を復元する（ステップＳ３９）。ここで、ステップＳ４０の構造化文書複合化処理は、上述したステップＳ５３の構造化文書複合化処理と同様であるため、その説明は省略する。 The structured document conversion unit 24 executes structured document restoration processing for restoring the encoded data input as the processing result, and restores intermediate data (structured document) from the encoded data (step S39). Here, the structured document composition processing in step S40 is the same as the structured document composition processing in step S53 described above, and a description thereof will be omitted.

続いて、検索プラン処理部２３は、ステップＳ３９で復元された中間データを用いて、プランに指示された実行計画を実行する（ステップＳ４０）。なお、処理結果として受け付けた構造化文書が、プランに指示された実行計画の最終結果である場合にはこの限りでない。また、他のノードにおいてさらに処理を要する場合には、処理結果として受け付けた構造化文書又は当該構造化文書に基づいて取得した構造化文書を中間データとして、構造化文書変換部２４により符号化した後、スレーブノードに送信する。 Subsequently, the search plan processing unit 23 executes the execution plan specified in the plan using the intermediate data restored in step S39 (step S40). This is not the case when the structured document received as the processing result is the final result of the execution plan instructed by the plan. When further processing is required in another node, the structured document received as the processing result or the structured document acquired based on the structured document is encoded by the structured document conversion unit 24 as intermediate data. After that, it transmits to the slave node.

次いで、検索プラン処理部２３は、プランの実行結果として得られた構造化文書を検索結果として、通信処理部２１を介してクライアント装置４０に送信し（ステップＳ４１）、本処理を終了する。 Next, the search plan processing unit 23 transmits the structured document obtained as the plan execution result as a search result to the client device 40 via the communication processing unit 21 (step S41), and ends this process.

なお、本実施形態では、検索結果を構造化文書の状態で送信することとしたが、これに限らず、構造化文書を符号化データに変換した状態で送信する態様としてもよい。 In this embodiment, the search result is transmitted in the state of the structured document. However, the present invention is not limited to this, and a mode in which the structured document is converted into encoded data may be used.

図１５−1、図１５−２、図１５−３は、上記した構造化文書の検索時における、構造化文書変換部２４の動作を説明するための図である。同図において、下向きの矢印は、構造化文書の符号化にかかる処理の流れを示しており、上向きの矢印は、符号化された構造化文書の復元にかかる処理の流れを示している。 FIGS. 15A, 15B, and 15C are diagrams for explaining the operation of the structured document conversion unit 24 when the structured document is searched. In the figure, a downward arrow indicates a flow of processing related to encoding of a structured document, and an upward arrow indicates a flow of processing related to restoration of the encoded structured document.

まず、構造化文書の符号化時の動作について説明する。上述したステップＳ３６１により、図１５−１の上段に示した構造化文書の一要素「＜発明の名称＞並列検索方法および装置＜／発明の名称＞」が処理対象に設定されたものとする。この場合、構造化文書変換部２４は、図１５−２に示したように、この処理対象要素のタグ構造「＜発明の名称＞」を、構造−構造ＩＤ対応データ３２に登録された構造索引３２１（図１６−１参照）に基づいて、当該タグ構造に対応する構造ＩＤ「０００７」に変換する。 First, the operation at the time of encoding a structured document will be described. It is assumed that the element “<name of invention> parallel search method and device </ name of invention>” of the structured document shown in the upper part of FIG. In this case, as shown in FIG. 15B, the structured document conversion unit 24 converts the tag structure “<name of invention>” of the processing target element into the structure index registered in the structure-structure ID correspondence data 32. Based on 321 (see FIG. 16A), the structure ID is converted to a structure ID “0007” corresponding to the tag structure.

また、構造化文書変換部２４は、処理対象要素に含まれた文字列「並列検索方法および装置」を、図１５−３の中段に示したように、複数の語彙に分解した後、語彙−語彙ＩＤ対応データ３３に登録された語彙索引３３１（図１６−２参照）に基づいて、各語彙を対応する語彙ＩＤに夫々変換し、各語彙の出現位置を指示するオフセットを語彙ＩＤに夫々付加する。 In addition, the structured document conversion unit 24 decomposes the character string “parallel search method and apparatus” included in the processing target element into a plurality of vocabularies as shown in the middle part of FIG. Based on the vocabulary index 331 (see FIG. 16-2) registered in the vocabulary ID correspondence data 33, each vocabulary is converted into a corresponding vocabulary ID, and an offset indicating the appearance position of each vocabulary is added to each vocabulary ID. To do.

例えば、図１５−３に示したように、語彙「方法」は、語彙−語彙ＩＤ対応データ３３に登録された語彙索引３３１に基づいて、当該語彙「方法」に対応する語彙ＩＤ「０１０１」に変換された後、この語彙ＩＤの先頭部分にオフセット「０」が付加される。ここで、オフセット「０」は、先行する語彙「検索」の末尾と、自己の語彙「方法」との間の距離を意味しており、文字間隔を空けることなく連続して配置することを意味している。 For example, as shown in FIG. 15C, the vocabulary “method” is assigned to the vocabulary ID “0101” corresponding to the vocabulary “method” based on the vocabulary index 331 registered in the vocabulary-vocabulary ID correspondence data 33. After the conversion, an offset “0” is added to the head part of this vocabulary ID. Here, the offset “0” means the distance between the end of the preceding vocabulary “search” and the own vocabulary “method”, and means that the vocabulary is continuously arranged without a space between characters. is doing.

次に、構造化文書変換部２４は、図１５−１の中段に示したように、変換した構造ＩＤと各語彙ＩＤ＋オフセットとを結合した符号化データを生成し、設定内容に応じて圧縮処理を施す。 Next, as shown in the middle part of FIG. 15A, the structured document conversion unit 24 generates encoded data obtained by combining the converted structure ID and each vocabulary ID + offset, and performs compression processing according to the setting content. Apply.

次に、構造化文書の復元時の動作について説明する。上述したステップＳ５３３により、図１５−１の中段に示した符号化データが、処理対象の要素として設定されたものとする。この場合、構造化文書変換部２４は、図１５−２に示したように、構造−構造ＩＤ対応データ３２に登録された構造索引３２１（図１６−１参照）に基づいて、この処理態様要素に含まれた構造ＩＤをタグ構造に変換（逆変換）する。 Next, the operation at the time of restoring the structured document will be described. Assume that the encoded data shown in the middle part of FIG. 15A is set as an element to be processed in step S533 described above. In this case, as shown in FIG. 15B, the structured document conversion unit 24 performs this processing mode element based on the structure index 321 (see FIG. 16A) registered in the structure-structure ID correspondence data 32. The structure ID included in is converted into a tag structure (inverse conversion).

また、構造化文書変換部２４は、図１５−３に示したように、語彙−語彙ＩＤ対応データ３３に登録された語彙索引３３１（図１６−２）に基づいて、処理対象要素に含まれた各語彙ＩＤを対応する語彙に夫々変換（逆変換）するとともに、各語彙ＩＤに付加されたオフセットに基づいて語彙を配列することで、元の文字列を再構成する。なお、圧縮された符号化データを受け取った場合には、この解凍処理を施した後、タグ構造及び文字列の復元を行うものとする。 Further, as shown in FIG. 15C, the structured document conversion unit 24 is included in the processing target element based on the vocabulary index 331 (FIG. 16-2) registered in the vocabulary-vocabulary ID correspondence data 33. Each vocabulary ID is converted (reversely converted) into a corresponding vocabulary, and the vocabulary is arranged based on an offset added to each vocabulary ID, thereby reconstructing the original character string. When compressed encoded data is received, the tag structure and the character string are restored after performing this decompression process.

次に、構造化文書変換部２４は、図１５−１の上段に示したように、復元したタグ構造と文字列とを結合し、元の構造化文書を復元する。 Next, as shown in the upper part of FIG. 15A, the structured document conversion unit 24 combines the restored tag structure and the character string to restore the original structured document.

以上のように、本実施形態によれば、構造化文書に実際に含まれたタグ構造及び語彙に対応する構造ＩＤ及び語彙ＩＤを用いて構造化文書を符号化し、この符号化データを他の構造化文書管理装置との送受信に用いるため、元の構造化文書をそのまま転送する場合に比べて、転送データ量を減少し、転送に要する時間を短縮することができる。また、様々な単語を登録した辞書を予め保持する場合に比べ、符号化処理にかかるデータ量を減少させることができるため、構造化文書の符号化及び復元に要する時間を減少させることができ、構造化文書の送受信を効率的に行うことが可能となる。 As described above, according to the present embodiment, the structured document is encoded using the structure ID and the vocabulary ID corresponding to the tag structure and the vocabulary actually included in the structured document, and the encoded data is converted into other encoded data. Since it is used for transmission / reception with the structured document management apparatus, the amount of transfer data can be reduced and the time required for transfer can be shortened as compared with the case where the original structured document is transferred as it is. In addition, since it is possible to reduce the amount of data required for the encoding process compared to the case of storing a dictionary in which various words are registered in advance, the time required for encoding and restoring a structured document can be reduced. It is possible to efficiently send and receive structured documents.

また、タグ構造及び語彙を圧縮形式に適した構造ＩＤ及び語彙ＩＤに変換した後、圧縮処理を施して転送することが可能であるため、構造化文書をより効率的に圧縮することができ、転送に要する時間をより短縮することができる。 In addition, since the tag structure and the vocabulary are converted into the structure ID and the vocabulary ID suitable for the compression format, and can be transferred after being compressed, the structured document can be compressed more efficiently. The time required for transfer can be further reduced.

なお、本実施形態では、構造化文書ＤＢ３１に対する索引として、構造索引３２１と語彙索引３３１とを備える態様としたが、この例に限らず、例えば、日付情報等の他の索引を備える態様としてもよい。 In the present embodiment, the structure index 321 and the vocabulary index 331 are provided as indexes for the structured document DB 31. However, the present invention is not limited to this example. For example, the index may be provided as another index such as date information. Good.

［第２の実施形態］
次に、第２の実施形態の構造化文書管理システム１０について説明する。なお、上述した第１の実施形態と同様の構成については、同一の符号を付与し、その説明を省略する。 [Second Embodiment]
Next, the structured document management system 10 according to the second embodiment will be described. In addition, about the structure similar to 1st Embodiment mentioned above, the same code | symbol is provided and the description is abbreviate | omitted.

図１７は、第２の実施形態における構造化文書管理装置１１の機能構成を示したブロック図である。図１７に示したように、本実施形態の構造化文書管理装置１１は、格納処理部２７内に新たに構造解析部２７４と、語彙解析部２７５とを備えている。 FIG. 17 is a block diagram showing a functional configuration of the structured document management apparatus 11 in the second embodiment. As shown in FIG. 17, the structured document management apparatus 11 of this embodiment newly includes a structure analysis unit 274 and a vocabulary analysis unit 275 in the storage processing unit 27.

構造解析部２７４は、上述した構造解析部２７２の機能に加え、入力される各構造化文書に含まれたタグ構造の個数を種別毎にカウントし、構造履歴情報として記憶部１０７の所定領域に蓄積する。また、構造解析部２７４は、構造履歴情報に基づいてタグ構造の種別毎の出現頻度を算出し、この出現頻度に応じて、構造−構造ＩＤ対応データ３２に登録された構造ＩＤを割り当て直す。 In addition to the functions of the structure analysis unit 272 described above, the structure analysis unit 274 counts the number of tag structures included in each input structured document for each type, and stores it in a predetermined area of the storage unit 107 as structure history information. accumulate. Further, the structure analysis unit 274 calculates the appearance frequency for each type of tag structure based on the structure history information, and reassigns the structure ID registered in the structure-structure ID correspondence data 32 according to the appearance frequency.

具体的に、構造解析部２７４は、出現頻度が高いタグ構造の構造ＩＤほど、圧縮効率がより高くなる数値へと割り当て直す。例えば、「０００１」や「１１１１」等、同じ数値が連続して出現するほど、その圧縮効率は高くなる傾向があるため、構造解析部２７４は、このような数値を構造ＩＤとして再割り当てを行うものとする。 Specifically, the structure analysis unit 274 reassigns the structure ID of the tag structure having a higher appearance frequency to a numerical value that increases the compression efficiency. For example, as the same numerical value such as “0001” or “1111” continuously appears, the compression efficiency tends to increase. Therefore, the structure analysis unit 274 reassigns the numerical value as a structure ID. Shall.

また、構造解析部２７４は、構造ＩＤの再割り当て時に、構造ＩＤに割り当て直した数値と、当該構造ＩＤに対応するタグ構造とを対応付けた情報を構造ＩＤ再割当要求として生成し、通信処理部２１を介して他のノードに送信することで、当該他のノードの構造−構造ＩＤ対応データ３２に登録された構造ＩＤを割り当て直させる。 In addition, the structure analysis unit 274 generates, as a structure ID reassignment request, information associating the numerical value reassigned to the structure ID and the tag structure corresponding to the structure ID when the structure ID is reassigned. By transmitting to another node via the unit 21, the structure ID registered in the structure-structure ID correspondence data 32 of the other node is reassigned.

また、構造解析部２７４は、要求受付部２０を介し、他のノードから構造ＩＤ再割当要求を受け取った場合には、この構造ＩＤ再割当要求の指示内容に基づいて、自己のノードが備える構造−構造ＩＤ対応データ３２の構造ＩＤを割り当て直す。 In addition, when the structure analysis unit 274 receives a structure ID reassignment request from another node via the request reception unit 20, the structure analysis unit 274 has a structure included in its own node based on the instruction content of the structure ID reassignment request. -Reassign the structure ID of the structure ID correspondence data 32.

また、語彙解析部２７５は、上述した語彙解析部２７３の機能に加え、文字列部分を分割した語彙の個数を種別毎にカウントし、語彙履歴情報として記憶部１０７の所定領域に蓄積する。また、語彙解析部２７５は、語彙履歴情報に基づいて語彙の種別毎の出現頻度を算出し、この出現頻度に応じて、語彙−語彙ＩＤ対応データ３３に登録された語彙ＩＤの数値を、上記構造解析部２７４と同様に割り当て直す。 In addition to the function of the vocabulary analysis unit 273 described above, the vocabulary analysis unit 275 counts the number of vocabularies obtained by dividing the character string portion for each type, and stores the vocabulary history information in a predetermined area of the storage unit 107. Further, the vocabulary analysis unit 275 calculates the appearance frequency for each vocabulary type based on the vocabulary history information, and the numerical value of the vocabulary ID registered in the vocabulary-vocabulary ID correspondence data 33 is calculated according to the appearance frequency. Reassignment in the same manner as the structure analysis unit 274.

また、語彙解析部２７５は、語彙ＩＤの再割り当て時に、語彙ＩＤに割り当て直した数値と、当該語彙ＩＤに対応する語彙とを対応付けた情報を語彙ＩＤ再割当要求として、通信処理部２１を介して他のノードに送信し、当該他のノードの語彙−語彙ＩＤ対応データ３３に登録された語彙ＩＤを割り当て直させる。 Further, the vocabulary analysis unit 275 uses the communication processing unit 21 as a vocabulary ID reassignment request with information associating the numerical value reassigned to the vocabulary ID and the vocabulary corresponding to the vocabulary ID when the vocabulary ID is reallocated. The vocabulary ID registered in the vocabulary-vocabulary ID correspondence data 33 of the other node is reassigned.

また、語彙解析部２７５は、要求受付部２０を介し、他のノードから語彙ＩＤ再割当要求を受け取った場合には、この語彙ＩＤ再割当要求の指示内容に基づいて、自己のノードが備える語彙−語彙ＩＤ対応データ３３の語彙ＩＤを割り当て直す。 When the vocabulary analyzing unit 275 receives a vocabulary ID reassignment request from another node via the request accepting unit 20, the vocabulary included in its own node is based on the instruction content of the vocabulary ID reassignment request. -Reassign the vocabulary ID of the vocabulary ID correspondence data 33.

以下、図１８、図１９を参照して、本実施形態の構造解析部２７４の動作について説明する。図１８は、後述する構造ＩＤ再割当要求を生成するノード（マスタノード）で実行される構造ＩＤの再割り当てに係る処理（構造ＩＤ再割当処理）の手順を示したフローチャートである。なお、マスタノードにて、構造ＩＤ再割当処理が実行されるタイミングは、特に問わないものとし、例えば、所定時間毎や構造索引３２１が所定量登録される毎に実行する態様としてもよい。 Hereinafter, with reference to FIGS. 18 and 19, the operation of the structure analysis unit 274 of the present embodiment will be described. FIG. 18 is a flowchart showing a procedure of structure ID reassignment processing (structure ID reassignment processing) executed by a node (master node) that generates a structure ID reassignment request to be described later. Note that the timing at which the structure ID reassignment process is executed in the master node is not particularly limited. For example, the structure ID may be executed every predetermined time or every time the structure index 321 is registered.

まず、構造解析部２７４は、記憶部１０７の所定領域に蓄積された構造履歴情報に基づいて、タグ構造毎の出現頻度を算出する（ステップＳ６１）。続いて構造解析部２７４は、算出したタグ構造毎の出現頻度に応じて、構造−構造ＩＤ対応データ３２に登録されたタグ構造の構造ＩＤを再割り当てする（ステップＳ６２）。 First, the structure analysis unit 274 calculates the appearance frequency for each tag structure based on the structure history information accumulated in the predetermined area of the storage unit 107 (step S61). Subsequently, the structure analysis unit 274 reassigns the structure ID of the tag structure registered in the structure-structure ID correspondence data 32 according to the calculated appearance frequency for each tag structure (step S62).

次いで、構造解析部２７４は、ステップＳ６２で再割り当てを行った構造ＩＤと、当該構造ＩＤに対応するタグ構造とを対応付けた情報を構造ＩＤ再割当要求として生成し（ステップＳ６３）、生成した構造ＩＤ再割当要求を、通信処理部２１を介して他のノード（スレーブノード）に送信した後（ステップＳ６４）、本処理を終了する。 Next, the structure analysis unit 274 generates, as a structure ID reassignment request, information associating the structure ID reassigned in step S62 with the tag structure corresponding to the structure ID (step S63). After transmitting the structure ID reassignment request to another node (slave node) via the communication processing unit 21 (step S64), this process is terminated.

図１９は、構造ＩＤ再割当要求を受け付けたノード（スレーブノード）で実行される構造ＩＤの再割り当てに係る処理（他ノード側構造ＩＤ再割当処理）の手順を示したフローチャートである。 FIG. 19 is a flowchart showing the procedure of the process related to the structure ID reassignment (another node side structure ID reassignment process) executed by the node (slave node) that has received the structure ID reassignment request.

スレーブノード側では、要求受付部２０がＩＤ再割当要求を受け付けると、この構造ＩＤ再割当要求を、構造解析部２７４に出力する（ステップＳ７１）。 On the slave node side, when the request reception unit 20 receives the ID reassignment request, the structure ID reassignment request is output to the structure analysis unit 274 (step S71).

続いて、構造解析部２７４は、構造ＩＤ再割当要求として入力された構造ＩＤと、当該構造ＩＤに対応するタグ構造とを対応付けた情報に基づいて、構造−構造ＩＤ対応データ３２中の該当するタグ構造の構造ＩＤを、指示された値に変更し（ステップＳ７２）、本処理を終了する。 Subsequently, the structure analysis unit 274, based on the information in which the structure ID input as the structure ID reassignment request and the tag structure corresponding to the structure ID are associated with each other, in the structure-structure ID correspondence data 32 The structure ID of the tag structure to be changed is changed to the instructed value (step S72), and this process ends.

なお、語彙解析部２７５の動作については、上述した構造解析部２７４と同様の動作を、語彙ＩＤ（語彙−語彙ＩＤ対応データ３３）に対して行うのみであるため、その説明は省略する。 Note that the operation of the vocabulary analysis unit 275 is the same as that of the structure analysis unit 274 described above, and is only performed on the vocabulary ID (vocabulary-vocabulary ID correspondence data 33).

以上のように、本実施形態によれば、出現頻度の高いタグ構造の構造ＩＤ及び／又は語彙の語彙ＩＤに、圧縮効率がより高くなる数値を割り当てることができるため、圧縮時における構造化文書のデータ量をより減少させることができ、ノード間においてデータの転送に要する時間を短縮させることができる。 As described above, according to this embodiment, a numerical value with higher compression efficiency can be assigned to a structure ID and / or vocabulary vocabulary ID of a tag structure having a high appearance frequency. The amount of data can be further reduced, and the time required for data transfer between nodes can be shortened.

なお、本実施形態では、構造ＩＤ、語彙ＩＤについて個別に再割り当てを行う態様としたが、これに限らず、両ＩＤの再割り当てを同じタイミングで行う態様としてもよいし、何れか一方のＩＤのみを割り当て直す態様としてもよい。 In the present embodiment, the structure ID and the vocabulary ID are individually reassigned. However, the present invention is not limited to this, and it is possible to reassign both IDs at the same timing. It is good also as an aspect which reassigns only.

［第３の実施形態］
次に、第３の実施形態の構造化文書管理システム１０について説明する。なお、上述した第１の実施形態と同様の構成については、同一の符号を付与し、その説明を省略する。 [Third Embodiment]
Next, the structured document management system 10 according to the third embodiment will be described. In addition, about the structure similar to 1st Embodiment mentioned above, the same code | symbol is provided and the description is abbreviate | omitted.

図１７は、本実施形態におけるクライアント装置４０の機能構成を示したブロック図である。同図に示したように、クライアント装置４０は、通信処理部４１、構造索引部４２、語彙索引部４３、構造−構造ＩＤ対応データ４４、語彙−語彙ＩＤ対応データ４５及び構造化文書復元部４６を備えている。なお、通信処理部４１、構造索引部４２、語彙索引部４３及び構造化文書復元部４６は、クライアント装置４０が備える図示しないＣＰＵと、ＲＯＭ又は記憶部に予め記憶された所定のプログラムとの協働により実現される機能部である。 FIG. 17 is a block diagram showing a functional configuration of the client device 40 in the present embodiment. As shown in the figure, the client device 40 includes a communication processing unit 41, a structure index unit 42, a vocabulary index unit 43, a structure-structure ID correspondence data 44, a vocabulary-vocabulary ID correspondence data 45, and a structured document restoration unit 46. It has. The communication processing unit 41, the structure index unit 42, the vocabulary index unit 43, and the structured document restoration unit 46 are a combination of a CPU (not shown) provided in the client device 40 and a predetermined program stored in advance in the ROM or storage unit. It is a functional part realized by operation.

また、構造−構造ＩＤ対応データ４４、語彙−語彙ＩＤ対応データ４５は、クライアント装置４０が備える図示しない記憶部の所定領域に格納されており、上述した構造−構造ＩＤ対応データ３２、語彙−語彙ＩＤ対応データ３３と同様、構造索引３２１、語彙索引３３１を夫々保持する。 The structure-structure ID correspondence data 44 and the vocabulary-vocabulary ID correspondence data 45 are stored in a predetermined area of a storage unit (not shown) included in the client device 40, and the structure-structure ID correspondence data 32, vocabulary-vocabulary, and the like described above. Similar to the ID correspondence data 33, the structure index 321 and the vocabulary index 331 are held.

通信処理部４１は、構造化文書管理システム１０を構成する各ノードから送信された構造索引３２１を、ネットワークＮを介して受信すると、この構造索引３２１を構造索引部４２に出力することで、当該構造索引部４２により構造−構造ＩＤ対応データ４４に登録させる。 When the communication processing unit 41 receives the structure index 321 transmitted from each node constituting the structured document management system 10 via the network N, the communication processing unit 41 outputs the structure index 321 to the structure index unit 42, thereby The structure index unit 42 registers the structure-structure ID correspondence data 44.

また、通信処理部４１は、構造化文書管理システム１０から送信された語彙索引３３１を、ネットワークＮを介して受信すると、この語彙索引３３１を語彙索引部４３に出力することで、当該語彙索引部４３により語彙−語彙ＩＤ対応データ４５に登録させる。 In addition, when the communication processing unit 41 receives the vocabulary index 331 transmitted from the structured document management system 10 via the network N, the communication processing unit 41 outputs the vocabulary index 331 to the vocabulary index unit 43, thereby 43, the vocabulary-vocabulary ID correspondence data 45 is registered.

なお、本実施形態では、各ノードから送信される構造索引３２１及び語彙索引３３１は、ブロードキャストで送信されるものとするが、これに限らず、例えば、クライアント装置４０を含めてユニキャストで送信される態様としてもよいし、マルチキャストで送信される態様としてもよい。 In this embodiment, the structure index 321 and the vocabulary index 331 transmitted from each node are transmitted by broadcast. However, the structure index 321 and the vocabulary index 331 are not limited to this, and are transmitted by unicast including the client device 40, for example. It is good also as a mode which is transmitted, and it is good also as a mode transmitted by multicast.

また、通信処理部４１は、検索要求に対する検索結果を構造化文書管理システム１０から受信すると、この検索結果を構造化文書復元部４６に出力する。なお、本実施形態では、構造化文書管理システム１０から送信される検索結果は、上述した符号化データの状態で送信されるものとする。 Further, when the communication processing unit 41 receives a search result for the search request from the structured document management system 10, the communication processing unit 41 outputs the search result to the structured document restoration unit 46. In the present embodiment, the search result transmitted from the structured document management system 10 is transmitted in the state of the encoded data described above.

構造索引部４２は、通信処理部４１から入力される構造索引３２１を、構造−構造ＩＤ対応データ４４に登録する。また、語彙索引部４３は、通信処理部４１から入力される語彙索引３３１を、語彙−語彙ＩＤ対応データ４５に登録する。 The structure index unit 42 registers the structure index 321 input from the communication processing unit 41 in the structure-structure ID correspondence data 44. The vocabulary index unit 43 registers the vocabulary index 331 input from the communication processing unit 41 in the vocabulary-vocabulary ID correspondence data 45.

構造化文書復元部４６は、構造−構造ＩＤ対応データ４４に登録された構造索引３２１及び語彙−語彙ＩＤ対応データ４５に登録された語彙索引３３１に基づいて、通信処理部４１から入力される検索結果としての符号化データを、構造化文書に復元する。また、構造化文書復元部４６は、符号化データに圧縮処理が施されていると判定した場合には、この符号化データに解凍処理を施した後、構造化文書の復元を行う。なお、構造化文書の復元にかかる動作は、上述した構造化文書変換部２４における構造化文書の復元にかかる動作と同様であるため、その説明は省略する。 The structured document restoration unit 46 performs a search input from the communication processing unit 41 based on the structure index 321 registered in the structure-structure ID correspondence data 44 and the vocabulary index 331 registered in the vocabulary-vocabulary ID correspondence data 45. The resulting encoded data is restored to a structured document. If the structured document restoration unit 46 determines that the encoded data has been subjected to compression processing, the structured document restoration unit 46 decompresses the encoded data and then restores the structured document. The operation related to the reconstruction of the structured document is the same as the operation related to the restoration of the structured document in the structured document conversion unit 24 described above, and the description thereof will be omitted.

上記の構成において、クライアント装置４０は、構造化文書管理システム１０から検索要求に対する検査結果（構造化文書）を、符号化データの状態で取得すると、構造化文書復元部４６は、構造−構造ＩＤ対応データ４４に登録された構造索引３２１及び語彙−語彙ＩＤ対応データ４５に登録された語彙索引３３１に基づいて、元の構造化文書へと復元する。 In the above configuration, when the client device 40 acquires the inspection result (structured document) for the search request from the structured document management system 10 in the state of the encoded data, the structured document restoration unit 46 displays the structure-structure ID. Based on the structure index 321 registered in the correspondence data 44 and the vocabulary index 331 registered in the vocabulary-vocabulary ID correspondence data 45, the original structured document is restored.

以上のように、本実施形態によれば、構造化文書管理システム１０を構成するノード−クライアント装置間において、転送する構造化文書を圧縮に適した形式に変換し、圧縮して転送することが可能であるため、元の構造化文書をそのまま転送する場合に比べて、転送データ量を削減し、転送に要する時間を短縮することができる。 As described above, according to the present embodiment, the structured document to be transferred can be converted into a format suitable for compression, and compressed and transferred between the node and the client device constituting the structured document management system 10. Therefore, compared to the case where the original structured document is transferred as it is, the amount of transfer data can be reduced and the time required for transfer can be shortened.

なお、本実施形態では、クライアント装置４０が構造索引部４２及び語彙索引部４３を備える態様としたが、これに限らず、構造索引部４２及び語彙索引部４３の何れか一方又は両方を備えない態様としてもよい。 In the present embodiment, the client device 40 includes the structure index unit 42 and the vocabulary index unit 43. However, the present invention is not limited thereto, and the client device 40 does not include one or both of the structure index unit 42 and the vocabulary index unit 43. It is good also as an aspect.

以上、発明の実施の形態について説明したが、本発明はこれに限定されるものではなく、本発明の主旨を逸脱しない範囲での種々の変更、置換、追加などが可能である。 Although the embodiments of the invention have been described above, the present invention is not limited to these embodiments, and various modifications, substitutions, additions, and the like can be made without departing from the spirit of the present invention.

構造化文書管理システムの構成を示した図である。It is the figure which showed the structure of the structured document management system. 構造化文書管理装置のハードウェア構成を示した図である。It is the figure which showed the hardware constitutions of the structured document management apparatus. 構造化文書の一例を示した図である。It is the figure which showed an example of the structured document. 構造化文書管理装置の機能構成の一例を示した図である。It is the figure which showed an example of the function structure of the structured document management apparatus. 構造索引部の動作を説明するための図である。It is a figure for demonstrating operation | movement of a structure index part. 構造索引部の動作を説明するための図である。It is a figure for demonstrating operation | movement of a structure index part. 語彙索引部の動作を説明するための図である。It is a figure for demonstrating operation | movement of a vocabulary index part. 構造化文書登録処理の手順を示したフローチャートである。It is the flowchart which showed the procedure of the structured document registration process. 構造−構造ＩＤデータ更新処理の手順を示したフローチャートである。It is the flowchart which showed the procedure of the structure-structure ID data update process. 語彙−語彙ＩＤデータ更新処理の手順を示したフローチャートである。It is the flowchart which showed the procedure of the vocabulary-vocabulary ID data update process. 他ノード側登録処理の手順を示したフローチャートである。It is the flowchart which showed the procedure of the other node side registration process. 構造化文書検索処理の手順を示したフローチャートである。It is the flowchart which showed the procedure of the structured document search process. 構造化文書符号化処理の手順を示したフローチャートである。It is the flowchart which showed the procedure of the structured document encoding process. 他ノード側構造化文書検索処理の手順を示したフローチャートである。It is the flowchart which showed the procedure of the other node side structured document search process. 構造化文書復元処理の手順を示したフローチャートである。It is the flowchart which showed the procedure of the structured document restoration process. 構造化文書変換部の動作を説明するための図である。It is a figure for demonstrating operation | movement of a structured document conversion part. 構造化文書変換部の動作を説明するための図である。It is a figure for demonstrating operation | movement of a structured document conversion part. 構造化文書変換部の動作を説明するための図である。It is a figure for demonstrating operation | movement of a structured document conversion part. 構造索引の一例を示した図である。It is the figure which showed an example of the structure index. 語彙索引の一例を示した図である。It is the figure which showed an example of the vocabulary index. 構造化文書管理装置の機能構成の一例を示したブロック図である。It is the block diagram which showed an example of the function structure of the structured document management apparatus. 構造ＩＤ再割当処理の手順を示したフローチャートである。It is the flowchart which showed the procedure of structure ID reallocation processing. 他ノード側構造ＩＤ再割当処理の手順を示したフローチャートである。It is the flowchart which showed the procedure of the other node side structure ID reassignment process. クライアント装置の機能構成を示したブロック図である。It is the block diagram which showed the function structure of the client apparatus.

Explanation of symbols

１０構造化文書管理システム
１１構造化文書管理装置
１２構造化文書管理装置
１３構造化文書管理装置
１４構造化文書管理装置
２０要求受付部
２１通信処理部
２２検索プラン生成部
２３検索プラン処理部
２４構造化文書変換部
２５ディクショナリ
２６構造化文書取得部
２７格納処理部
２７１文書解析部
２７２構造解析部
２７３語彙解析部
２７４構造解析部
２７５語彙解析部
２８構造索引部
２９語彙索引部
３０構造化文書ＤＢ管理部
３１構造化文書ＤＢ
３２構造−構造ＩＤ対応データ
３２１構造索引
３３語彙−語彙ＩＤ対応データ
３３１語彙索引
４０クライアント装置
４１通信処理部
４２構造索引部
４３語彙索引部
４４構造−構造ＩＤ対応データ
４５語彙−語彙ＩＤ対応データ
４６構造化文書復元部
１０１ＣＰＵ
１０２操作部
１０３表示部
１０４ＲＯＭ
１０５ＲＡＭ
１０６通信部
１０７記憶部
１０８バス DESCRIPTION OF SYMBOLS 10 Structured document management system 11 Structured document management apparatus 12 Structured document management apparatus 13 Structured document management apparatus 14 Structured document management apparatus 20 Request reception part 21 Communication processing part 22 Search plan generation part 23 Search plan processing part 24 Structure Document conversion unit 25 Dictionary 26 Structured document acquisition unit 27 Storage processing unit 271 Document analysis unit 272 Structure analysis unit 273 Vocabulary analysis unit 274 Structure analysis unit 275 Vocabulary analysis unit 28 Structure index unit 29 Vocabulary index unit 30 Structured document DB management Part 31 Structured document DB
32 Structure-Structure ID Corresponding Data 321 Structure Index 33 Vocabulary-Vocabulary ID Corresponding Data 331 Vocabulary Index 40 Client Device 41 Communication Processing Unit 42 Structure Index Unit 43 Vocabulary Index Unit 44 Structure-Structure ID Corresponding Data 45 Vocabulary-Vocabulary ID Corresponding Data 46 Structured document restoration unit 101 CPU
102 Operation unit 103 Display unit 104 ROM
105 RAM
106 Communication unit 107 Storage unit 108 Bus

Claims

A plurality of structured documents are distributed and stored together with other structured document management apparatuses connected to the network, and each of the structured documents shared with the other structured document management apparatuses is configured. A structured document that manages the plurality of structured documents based on a unique structure ID for each type of tag structure of an element and a unique vocabulary ID for each vocabulary included in each element constituting the structured document In the management device,
Structured document storage means for storing the structured document;
Structure index storage means for storing a structure index in which the tag structure of each element constituting the structured document stored in the structured document storage means is associated with the structure ID unique to each type of the tag structure When,
A vocabulary index that associates each vocabulary constituting a character string portion included in each element of the structured document stored in the structured document storage means with the vocabulary ID unique to each vocabulary type. Vocabulary index storage means for storing;
Based on the structure index and the vocabulary index, the encoded data obtained by converting the structured document stored in the structured document storage means into an array of the structure ID and the vocabulary ID is generated, and the other structured document management apparatus Encoding means for transmitting to
A restoring means for restoring the encoded data to the structured document based on the structure index and the lexical index when the encoded data is received from the other structured document management device;
A structured document management apparatus comprising:

The encoding means transmits encoded data obtained by compressing the encoded data in a predetermined compression format to the other structured document management apparatus,
2. The structured document management apparatus according to claim 1, wherein the restoration unit decompresses the encoded data compressed in the predetermined compression format, and then restores the structured data to a structured document.

When the structured document is newly stored in the structured document storage unit, a new structure ID that is different from the structure ID is assigned to each tag structure type of each element constituting the structured document. A structure index generating means for generating a simple structure index;
Registering the new structure index in the structure index storage means, transmitting to the other structured document management apparatus, and making the other structured document management apparatus share the structure index registration means,
The structured document management apparatus according to claim 1, further comprising:

The structure index generation means determines whether or not the tag structure is already registered in the structure index storage means, and assigns the new structure ID to the tag structure when it is determined that the tag structure is not registered. The structured document management apparatus according to claim 3.

5. The structured document management apparatus according to claim 3, wherein the structure index generation unit determines the value of the new structure ID according to the appearance frequency of the tag structure.

4. The structure index registration unit, when a new structure index is received from the other structured document registration unit, registers the new structure index in the structure index storage unit. Structured document management device.

When the structured document is newly stored in the structured document storage unit, a new word different from the vocabulary ID is created for each type of vocabulary constituting a character string part included in each element of the structured document. Vocabulary index generating means for generating a new vocabulary index to which a vocabulary ID is assigned;
Registering the new vocabulary index in the vocabulary index storage means, transmitting to the other structured document management apparatus, and sharing the lexical index with the other structured document management apparatus;
The structured document management apparatus according to claim 1, further comprising:

The vocabulary index generating means determines whether or not the vocabulary is already registered in the vocabulary index storage means, and assigns the new vocabulary ID to the vocabulary when it is determined that the vocabulary is not registered. Item 8. The structured document management apparatus according to Item 7.

9. The structured document management apparatus according to claim 7, wherein the vocabulary index generation unit determines the value of the new vocabulary ID according to the appearance frequency of the vocabulary.

8. The vocabulary index registration means, when receiving a new vocabulary index from the other structured document registration means, registers the new vocabulary index in the vocabulary index storage means. Structured document management device.

A plurality of structured documents are distributed and stored together with other structured document management apparatuses connected to the network, and each of the structured documents shared with the other structured document management apparatuses is configured. A structured document that manages the plurality of structured documents based on a unique structure ID for each type of tag structure of an element and a unique vocabulary ID for each vocabulary included in each element constituting the structured document A structured document management method for a management device, comprising:
A structure index for storing a structure index in which a tag structure of each element constituting a structured document stored in its own structured document management apparatus is associated with the structure ID unique to each type of the tag structure; In addition, each vocabulary constituting the character string portion included in each element of the structured document stored in the structured document management apparatus of the self is associated with the vocabulary ID unique to each vocabulary type. Based on the vocabulary index, encoded data obtained by converting the structured document stored in its own structured document management apparatus into the structure ID and the vocabulary ID array is generated and transmitted to the other structured document management apparatus. Encoding process;
A restoration step of restoring the encoded data into a structured document based on the structure index and the vocabulary index when the encoded data is received from the other structured document management device;
A structured document management method comprising:

A plurality of structured documents are distributed and stored by a plurality of structured document management apparatuses connected on a network, and each element constituting the structured document shared by the plurality of structured document management apparatuses is stored. A structured document management system that manages the plurality of structured documents based on a unique structure ID for each type of tag structure and a unique vocabulary ID for each vocabulary included in each element constituting the structured document In
The structured document management apparatus includes:
Structured document storage means for storing the structured document;
Structure index storage means for storing a structure index in which the tag structure of each element constituting the structured document stored in the structured document storage means is associated with the structure ID unique to each type of the tag structure When,
A vocabulary index that associates each vocabulary constituting a character string portion included in each element of the structured document stored in the structured document storage means with the vocabulary ID unique to each vocabulary type. Vocabulary index storage means for storing;
Based on the structure index and the vocabulary index, it generates encoded data obtained by converting the structured document stored in the structured document storage means into an array of the structure ID and the vocabulary ID. Encoding means for transmitting;
A restoring means for restoring the encoded data to the structured document based on the structure index and the lexical index when the encoded data is received from the other structured document management device;
A structured document management system characterized by comprising:

A client device capable of communicating with the plurality of structured document management devices;
The client device is
The structure index storage means;
The lexical index storage means;
Structured document restoration means for restoring the coded data to a structured document based on the structure index and the lexical index when the coded data is received from the structured document management device;
The structured document management system according to claim 12, comprising: