JP3492247B2

JP3492247B2 - XML data search system

Info

Publication number: JP3492247B2
Application number: JP20390899A
Authority: JP
Inventors: 泰彦金政; 和己久保田; 博石川
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1999-07-16
Filing date: 1999-07-16
Publication date: 2004-02-03
Anticipated expiration: 2019-07-16
Also published as: JP2001034619A

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、ＸＭＬで記述され
た大量のデータを関係データベースに格納し、検索する
ＸＭＬデータの検索システムに関し、特に、ＸＭＬ文書
の構造に依存せずにあらゆるＸＭＬデータを格納できる
ようにし、また格納されたＸＭＬデータに対するＸＭＬ
の木構造を辿る問い合わせを高速に実行できるようにし
たＸＭＬデータの検索システムに関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an XML data retrieval system for storing and retrieving a large amount of data described in XML in a relational database, and more particularly to all XML data independent of the structure of an XML document. XML for enabling storage and for stored XML data
The present invention relates to an XML data search system capable of executing a query following a tree structure at high speed.

【０００２】[0002]

【従来の技術】現在、ＸＭＬデータを格納するのに用い
られている手法は、大まかに次の２つのタイプに分類す
ることができる。ファイル格納：ＸＭＬ文書をファイル形式のまま格納
する手法。この手法は、オリジナルのＸＭＬファイルの
全体あるいは一部をそのまま利用することを目的として
おり、そのため、ＸＭＬ文書をファイル形式のまま格納
する。しかし、それだけでは、ファイルの数が増えたと
きに目的とするファイルを見つけ出すことが困難になる
ので、目的とするファイルを検索する為のインデックス
も用意しておく必要がある。2. Description of the Related Art The techniques currently used to store XML data can be broadly classified into two types: File storage: A method of storing an XML document in the file format. This method is intended to use the whole or part of the original XML file as it is, and therefore stores the XML document in the file format as it is. However, this alone makes it difficult to find the target file when the number of files increases, so it is necessary to prepare an index for searching the target file.

【０００３】テーブル格納：ＸＭＬを関係データベー
スのテーブルにマッピングして格納する手法。この手法
ではＸＭＬ文書を構造化データと見なし、データベース
に格納することによって高速な検索を行なうことを目的
としている。そのため、この手法では、各エレメントを
関係データベースのテーブルの各カラムにマツピングし
て格納する。ＸＭＬデータをテーブルにマツピングする
為には、ＸＭＬの各エレメントをテーブルの各カラムに
どのようにマツピングするかというマツピング規則が必
要である。このマツピング規則はユーザが事前に指定す
る必要がある。Table storage: A method for mapping and storing XML in a relational database table. In this method, the XML document is regarded as structured data and is stored in the database to perform high-speed search. Therefore, in this method, each element is mapped and stored in each column of the table of the relational database. In order to map XML data to a table, a mapping rule is required as to how each element of XML is mapped to each column of the table. This mapping rule needs to be specified by the user in advance.

【０００４】[0004]

【発明が解決しようとする課題】ＸＭＬデータを格納す
る際に一番問題となるのは、そのデータ構造が一意に定
まっていないという事である。特に、ＤＴＤ（文書型宣
言) のないＸＭＬデータでは、どこにどのようなタグが
出現するか分からず、データ構造は全く分からない。Ｄ
ＴＤのあるＸＭＬデータでさえも、ＤＴＤの中でタグの
繰り返しやタグの選択、タグの再帰的な宣言が許されて
いるので、データ構造が一意に定まらない。なお、この
ようなデータを半構造データと呼ぶ。このようなデータ
構造の定まっていないＸＭＬデータを格納しようとする
と、格納スキーマの設計が問題となる。例えば、図８に
示される〔ＤＴＤ〕を持つ、サンプルＸＭＬデータ〔Ｘ
ＭＬデータ〕をテーブル格納でデータベースに格納した
場合を考える。なお、このサンプルＸＭＬデータは、２
冊の本の情報を含む書籍目録のデータである。The biggest problem in storing XML data is that its data structure is not uniquely determined. In particular, in XML data without DTD (document type declaration), it is not known where and what tag appears, and the data structure is completely unknown. D
Even XML data with TD does not have a unique data structure because it is allowed to repeat tags, select tags, and recursively declare tags in DTD. Note that such data is called semi-structured data. When attempting to store XML data whose data structure is not fixed, the design of the storage schema becomes a problem. For example, sample XML data [X having the [DTD] shown in FIG.
ML data] is stored in the database as a table. This sample XML data is 2
It is data of a book catalog including information on books.

【０００５】図９は上記ＸＭＬデータをテーブルに格納
した様子を示す図である。図９のテーブルでは、１タプ
ルが本１冊分の情報に相当していて、列にはＸＭＬデー
タ中で出現する可能性のある全てのタグがとられてい
る。これを見ると、一見サンプルデータが問題なく格納
されているかのように見える。しかし、サンプルデータ
のＤＴＤに書かれた定義には著者数の制限が無いのに、
図９のテーブルでは著者を格納するスペースは最大２人
分しか用意されていない。もしＸＭＬデータの中に著者
がそれ以上存在したら、そのデータは格納できないか、
格納しても情報が一部欠損することになる。このよう
に、テーブル格納では、ＸＭＬのＤＴＤで記述される繰
り返しタグを格納することができない。これは、テーブ
ル格納ではあらかじめ格納する要素を列として指定して
おく必要があるので、最大数が未定の繰り返し要素を表
現できないからである。また、同じ理由で再帰的に定義
されているタグも格納できない。さらに、そもそもＸＭ
ＬデータにＤＴＤが存在しなくて、どのようなタグが出
現するか分かっていないときには、テーブルの構造を決
められず、全く対応できない。FIG. 9 is a diagram showing how the XML data is stored in a table. In the table of FIG. 9, one tuple corresponds to the information for one book, and all the tags that may appear in the XML data are taken in columns. At first glance, it looks as if the sample data was stored without problems. However, although there is no limit on the number of authors in the definition written in the sample data DTD,
In the table of FIG. 9, the space for storing the authors is prepared for only two persons at the maximum. If there are more authors in the XML data, can the data not be stored?
Even if it is stored, some information will be lost. As described above, the table storage cannot store the repeated tag described in the XML DTD. This is because it is necessary to specify the elements to be stored in advance as columns in the table storage, and therefore it is not possible to represent the repeating elements whose maximum number has not been determined. Also, you cannot store tags that are recursively defined for the same reason. In addition, XM
If there is no DTD in the L data and it is not known what kind of tag will appear, the structure of the table cannot be determined and it cannot be handled at all.

【０００６】一方、ファイル格納は、ＸＭＬデータをフ
ァイル形式のまま格納するので、ＤＴＤの無いＸＭＬデ
ータであろうと半構造のＸＭＬデータであろうと、格納
できないＸＭＬデータは存在しない。しかし、それだけ
では大量に格納されたデータの中から自分の求める情報
だけを検索することができないので、検索用のインデッ
クスが必要となる。インデックスの構成は目的に応じて
色々と考えられ、簡単なものではタグ名と文字列の組を
キーにして、そのタグに囲まれてその文字列が出現して
いるようなＸＭＬ文書を検索してくるというものがあ
る。しかし、そのような簡単なインデックスでは、タグ
の階層構造を考慮した検索は行なえない。タグの階層構
造の情報を持つようにインデックスを工夫することも考
えられるが、それでもなお次のことが問題として残る。On the other hand, in the file storage, since XML data is stored in the file format as it is, there is no XML data that cannot be stored regardless of whether it is XML data without DTD or semi-structured XML data. However, it is not possible to retrieve only the information desired by the user from a large amount of stored data, so an index for retrieval is required. The structure of the index can be variously designed according to the purpose, and in a simple case, a set of a tag name and a character string is used as a key to search for an XML document in which the character string is surrounded by the tag and appears. There is something that comes. However, such a simple index cannot perform a search considering the hierarchical structure of tags. It may be possible to devise an index so that it has information on the hierarchical structure of tags, but the following still remains a problem.

【０００７】（１）インデックスがＸＭＬの木構造の全
ての情報を持っていないので、ＸＭＬデータの全情報を
使った検索ができない。（２）インデックスが木構造を辿ることに最適化されて
いないので、そのような検索を行った場合は検索速度が
遅い。以上のように、データ構造が一意に定まっていな
いＸＭＬデータにおいては、いかにしてＤＴＤ無しのＸ
ＭＬデータや半構造のＸＭＬデータを格納するか、ま
た、格納されたＸＭＬデータに対していかにして木構造
を辿るような複雑な問い合わせを高速に実行できるよう
にするかといった問題がある。本発明は上記した事情に
鑑みなされたものであって、本発明の目的は、データ構
造が一意に定まっていないＸＭＬデータをデータベース
に格納し、複雑な間合わせを高速に実行することができ
るＸＭＬデータの検索システムを提供することである。(1) Since the index does not have all the information of the XML tree structure, it is impossible to search using all the information of the XML data. (2) Since the index is not optimized for tracing a tree structure, the search speed is slow when such a search is performed. As described above, in XML data whose data structure is not uniquely defined, how to use X without DTD
There is a problem of how to store ML data or semi-structured XML data, and how to execute a complicated inquiry that follows a tree structure to the stored XML data at high speed. The present invention has been made in view of the above-mentioned circumstances, and an object of the present invention is to store XML data whose data structure is not uniquely determined in a database, and to perform complicated trimming at high speed. It is to provide a data retrieval system .

【０００８】[0008]

【課題を解決するための手段】図１は本発明の基本構成
を示す図である。同図に示すように、本発明のシステム
は、エレメントを中間ノードとし、エレメント値と属性
値を葉ノードとし、タグをリンクとする木構造で表現さ
れるＸＭＬで記述されたデータを検索するシステムにお
いて、ＸＭＬデータを格納する格納手段１を設け、該格
納手段１の関係データベースに、少なくとも中間ノード
の情報を格納するための中間ノードテーブル２と、リン
クの情報を格納するためのリンクテーブル３と、葉ノー
ドの情報を格納するための葉ノードテーブル４とを設け
る。そして、上記ＸＭＬの木構造で表現されたＸＭＬデ
ータをノード単位で分割し、上記テーブル２〜４に各ノ
ードとリンク情報を関係付けて格納する。ＸＭＬでは、
木構造を形成する中間ノードと、エレメントの値を持っ
ている葉ノードとでは、格納するために最適な格納構造
が異なるので、上記のようにそれぞれ最適化された別々
の専用テーブルに格納するのが望ましい。このように、
値を持つためのノードである葉ノードと木構造の情報を
持っためのノードである中間ノードを別々のテーブルに
格納することにより、値を格納するための格納スペース
を節約することが可能となる。各ノード間の接続情報を
保持する為のリンクも、リンクテーブル３に格納して持
っておく必要がある。また、属性情報を格納するための
属性テーブル５を別途設けてもよい。さらに、中間ノー
ドテーブル２に各ノードのルートからのフルパス情報を
ＩＤで記述し、パス用のＩＤと文字列の対応表をパスＩ
Ｄテーブル６として別に持つことにより、格納スペース
の節約と、検索の高速化を図ることができる。同様に、
リンクテーブル３のタグ名と属性ノードテーブルの属性
名をＩＤで記述し、これらラベルのＩＤと文字列の対応
表をラベルＩＤテーブル７として別に持つことによっ
て、格納スペースの節約と文字列検索の高速化を図るこ
とができる。また、リンクテーブル３の中に各子エレメ
ントがそのエレメント内で出現した順序の情報を付加
し、葉ノードテーブルの中に各エレメント値がそのエレ
メント内で出現した順序の情報を付加することにより、
元のＸＭＬ文書の復元が可能となる。FIG. 1 is a diagram showing the basic configuration of the present invention. As shown in the figure, the system of the present invention is a system for searching data described in XML represented by a tree structure in which elements are intermediate nodes, element values and attribute values are leaf nodes, and tags are links. In the above, storage means 1 for storing XML data is provided, and an intermediate node table 2 for storing at least information of intermediate nodes and a link table 3 for storing information of links are provided in a relational database of the storage means 1. , And a leaf node table 4 for storing information of leaf nodes. Then, the XML data represented by the XML tree structure is divided in units of nodes, and the nodes 2 and 4 are associated with the link information and stored. In XML,
Since the optimal storage structure for storing is different between the intermediate node forming the tree structure and the leaf node that has the value of the element, store them in separate dedicated tables optimized as described above. Is desirable. in this way,
By storing the leaf node, which is a node for holding values, and the intermediate node, which is a node for holding tree structure information, in separate tables, it is possible to save the storage space for storing values. . A link for holding connection information between the nodes also needs to be stored and held in the link table 3. Further, the attribute table 5 for storing the attribute information may be separately provided. Further, the full path information from the root of each node is described in the intermediate node table 2 by ID, and the correspondence table between the ID for the path and the character string is stored in the path I.
By having the D table 6 separately, it is possible to save the storage space and speed up the search. Similarly,
The tag name of the link table 3 and the attribute name of the attribute node table are described by IDs, and a correspondence table of these label IDs and character strings is separately provided as the label ID table 7 to save storage space and speed up character string retrieval. Can be realized. Further, by adding information on the order in which each child element appears in the link table 3 and information on the order in which each element value appears in the element in the leaf node table,
The original XML document can be restored.

【０００９】本発明では、ＸＭＬの木構造をそのまま格
納手段１に格納するので、ＤＴＤ無しのＸＭＬデータや
半構造のＸＭＬデータも格納できる。また、ＸＭＬの木
構造を全てデータベース上に格納しているので、木構造
の全ての情報を検索に利用することができる。しかしこ
れだけでは問い合わせが行なわれたときに、ノード単位
に分割して格納されているＸＭＬデータの木構造を再結
合するのに時間がかかり、問い合わせの実行時間が遅く
なる。そこで本発明では、上記のテーブル２〜７に、Ｘ
ＭＬデータへの問い合わせパターンを考慮してインデッ
クス８を張る。これにより、ＸＭＬの木構造を辿るよう
な複雑な問い合わせの実行を高速に行なうことを可能と
なる。上記ＸＭＬデータを検索するには、例えばＸＭＬ
データ検索言語により、問い合わせを行う。これにより
問い合わせ処理手段９は、問い合わせ文の構文チェック
を行い問い合わせのための構文木を生成し、最適な実行
プランを生成する。この実行プランは、木構造検索用の
関数セットで記述される。この実行プランにより、上記
インデックス８を用いて木構造を辿る問い合わせを実行
し、要求された検索結果を出力する。In the present invention, since the XML tree structure is stored in the storage means 1 as it is, XML data without DTD and semi-structured XML data can also be stored. Further, since all XML tree structures are stored in the database, all the information of the tree structure can be used for searching. However, with this alone, when an inquiry is made, it takes time to rejoin the tree structure of the XML data that is divided and stored for each node, and the execution time of the inquiry becomes slow. Therefore, in the present invention, X is added to the above tables 2 to 7.
The index 8 is set in consideration of the inquiry pattern for the ML data. As a result, it becomes possible to execute a complex query that follows the XML tree structure at high speed. To retrieve the XML data, for example, XML
Make inquiries in the data search language. As a result, the query processing means 9 checks the syntax of the query statement, generates a syntax tree for the query, and generates an optimum execution plan. This execution plan is described by a function set for tree structure search. According to this execution plan, an inquiry that follows the tree structure is executed using the index 8 and the requested search result is output.

【００１０】本発明においては、次のように構成する
こともできる。（１) テーブルに関係データベースの制約の機能を適用
することによって、ＸＭＬの構文規則をチェックする。（２) リンクテーブルの中に、各エレメントの同ラベル
を持つ兄弟エレメント中での出現順序の情報を付加し、
各ラベルの出現順序を指定した問い合わせの実行を可能
とする。（３) リンクテーブルにリンクの両節点の情報だけでな
くタグ名の情報も待つことによって、タグ名を指定して
リンクを辿る問い合わせを高速に実行する。（４) 属性テーブルの中の属性ノードの接続先をリンク
ではなくて中間ノードにすることによって、属性を条件
にして木構造を辿る問い合わせを実行する際のテーブル
検索回数を削減し、問い合わせの高速実行を可能とす
る。（５) 中間ノードテーブルのパスＩＤによる検索を高速
に行なうためのインデックスをB ⁺-tree で構築する場
合において、キー値をパスＩＤとノードＩＤの組とする
ことによってキー値の重複を無くす。（６) 中間ノードテーブルの文書ＩＤによる検索を高速
に行なうためのインデックスをB ⁺-tree で構築する場
合において、キー値を文書ＩＤとノードＩＤの組とする
ことによってキー値の重複を無くす。The present invention can also be configured as follows. (1) Check XML syntax rules by applying relational database constraint functionality to tables. (2) In the link table, add information on the order of appearance in sibling elements that have the same label for each element,
It is possible to execute an inquiry that specifies the order of appearance of each label. (3) By waiting for not only the information on both nodes of the link but also the information on the tag name in the link table, an inquiry that follows the link by specifying the tag name is executed at high speed. (4) by the intermediate node to the destination attribute node in the attribute table rather than link, the attribute condition to reduce a table search count when executing the query following the tree structure, fast query Enable execution. (5) When constructing an index in B ⁺ -tree for performing a search by path ID in the intermediate node table at high speed, duplication of key values is eliminated by using a key value as a set of path ID and node ID. (6) When constructing an index in B ⁺ -tree for performing a high-speed search by the document ID of the intermediate node table, by using a key value as a set of the document ID and the node ID, duplication of the key value is eliminated.

【００１１】[0011]

【発明の実施の形態】以下、本発明の実施の形態につい
て説明する。（１) システム構成図２は本発明の実施例のシステムの構成を示す図であ
る。同図に示すように、本実施例のシステムは大きくわ
けて、ＸＭＬデータ格納部１１、ＸＭＬデータ格納部１
１にＸＭＬデータを挿入するためのＸＭＬデータ挿入モ
ジュール１２、格納されたＸＭＬデータへの問い合わせ
を処理する問い合わせ処理エンジン部１３から構成され
る。ＸＭＬデータは、ＸＭＬデータ挿入モジュール１２
によって、ＸＭＬデータ格納部１１に挿入される。ＸＭ
Ｌデータ挿入モジュール１２は、ＸＭＬパーザ１２ａと
ローダー１２ｂから成り、ＸＭＬバーザ１２ａは入力さ
れたＸＭＬデータを構文解析し、ＸＭＬデータの木構造
を、ＸＭＬデータ格納部１１に格納できるようにノード
単位に分解する。また、ローダー１２ｂは、そのノード
単位に分解された木構造をＸＭＬデータ格納部１１のテ
ーブルに挿入する。BEST MODE FOR CARRYING OUT THE INVENTION Embodiments of the present invention will be described below. (1) System Configuration FIG. 2 is a diagram showing a system configuration of an embodiment of the present invention. As shown in the figure, the system of this embodiment is roughly divided into an XML data storage unit 11 and an XML data storage unit 1.
1, an XML data insertion module 12 for inserting XML data, and an inquiry processing engine unit 13 for processing an inquiry to the stored XML data. The XML data is the XML data insertion module 12
Is inserted into the XML data storage unit 11. XM
The L data insertion module 12 is composed of an XML parser 12a and a loader 12b. The XML parser 12a parses the input XML data, and a tree structure of the XML data is stored in the XML data storage unit 11 in units of nodes. Disassemble. Further, the loader 12b inserts the tree structure decomposed into node units into the table of the XML data storage unit 11.

【００１２】図３に上記ＸＭＬデータの格納処理を示す
フローチャートを示す。本実施例においてＸＭＬデータ
の格納処理は次のように行われる。まず、ステップＳ１
において、ＸＭＬファイルを読み込む。ステップＳ２に
おいて、ＸＭＬパーザにより、入力ファイルの構文解析
を行う。解析が成功した場合には、ステップＳ３に行
き、ＸＭＬパーザが解析結果として、ＸＭＬの木構造の
ノード情報とリンク情報を中間形式としてファイル出力
する。また、解析が成功しない場合には、構文解析失敗
としてエラー出力し処理を終了する。ステップＳ４にお
いて、生成された中間形式ファイルを読み込み、ステッ
プＳ５において、読み込んだＸＭＬデータをローダによ
って関係データベースの各テーブルに挿入し、処理を終
了する。また、上記挿入が成功しない場合には、データ
挿入失敗としてエラー出力をして処理を終了する。FIG. 3 is a flow chart showing a storage process of the XML data. In this embodiment, the XML data storage process is performed as follows. First, step S1
At, the XML file is read. In step S2, the XML parser parses the input file. If the analysis is successful, the process proceeds to step S3, and the XML parser outputs the XML tree structure node information and link information as an intermediate format in a file. If the parsing is not successful, an error is output as a syntax parsing failure and the process ends. In step S4, the generated intermediate format file is read, and in step S5, the read XML data is inserted into each table of the relational database by the loader, and the process ends. If the insertion is not successful, an error is output as a data insertion failure and the process ends.

【００１３】格納されたＸＭＬデータに対する問い合わ
せは、ＸＭＬデータ問い合わせ言語で行なわれ、その問
い合わせは問い合わせ処理エンジン１３で処理される。
問い合わせ処理エンジン１３は、問い合わせ言語のパー
ザ１３ａ、問い合わせ最適化エンジン１３ｂ、木構造検
索用ＡＰＩ（アプリケーション・プログラミング・イン
タフェース) １３ｃから成る。問い合わせ言語のパーザ
１３ａは、入力された問い合わせ文の構文チェックを行
い問い合わせのための構文木を生成する。問い合わせ最
適化エンジン１３ｂは、上記構文木を基に、最適な実行
プランを生成する。この実行プランは、木構造検索用Ａ
ＰＩ１３ｃの関数セットで記述される。木構造検索用Ａ
ＰＩ１３ｃは、ＸＭＬデータ格納部１１とのインタフェ
ースで、ＸＭＬの木構造上での基本的な検索を行なう関
数のセットである。The inquiry for the stored XML data is made in the XML data inquiry language, and the inquiry is processed by the inquiry processing engine 13.
The query processing engine 13 includes a query language parser 13a, a query optimization engine 13b, and a tree structure search API (application programming interface) 13c. The query language parser 13a checks the syntax of the input query text and generates a syntax tree for the query. The query optimization engine 13b creates an optimal execution plan based on the syntax tree. This execution plan is for tree structure search A
It is described by the function set of PI13c. For tree structure search A
The PI 13c is an interface with the XML data storage unit 11 and is a set of functions for performing a basic search on the XML tree structure.

【００１４】次に、上記システムにおける各部の構成に
ついてさらに詳細に説明する。（１) テーブル構成まず、上記ＸＭＬデータ格納部１１に格納されるテーブ
ルの構成について説明する。ＸＭＬデータを木構造で表
現する方法はいくつかあるが、本実施例では図４に示す
木構造表現を想定している。図４は、前記図８に示した
ＸＭＬデータを木構造で表現したものである。この木構
造表現において、丸い中間ノードはエレメントを表して
おり、ノードの親子関係がエレメントの包含関係を表し
ている。Next, the configuration of each part in the above system will be described in more detail. (1) Table Configuration First, the configuration of the table stored in the XML data storage unit 11 will be described. Although there are several methods of expressing XML data in a tree structure, the present embodiment assumes the tree structure expression shown in FIG. FIG. 4 shows the XML data shown in FIG. 8 in a tree structure. In this tree structure representation, the round intermediate nodes represent elements, and the parent-child relationship of the nodes represents the inclusion relationship of the elements.

【００１５】また、ノードの丸の中の数字はノードＩＤ
を表している。ノードとノードを結ぶリンク（枝) はタ
グを表しており、リンクの横に書かれている文字列はタ
グ名を表している。三角の葉ノードはエレメントの値を
表し、四角い葉ノードはタグに付けられた属性(Atrriｂ
ute)を表している。値を持つのはこの２つの葉ノードだ
けである。ノードを分割してデータベースに格納すると
きに、ノードの情報だけをデータベースのテーブルに格
納したのでは、木構造のノード間の繋がり、つまりリン
クの情報が欠落してしまう。そこで、リンクの情報はリ
ンクの情報としてそれを格納する専用のテーブルを用意
する。またノードも、中間ノードと、エレメント値の葉
ノード、属性の葉ノードとは最適な格納構造が異なるの
で、別々のテーブルに格納する必要がある。The number in the circle of the node is the node ID
Is represented. The link (branch) connecting the nodes represents a tag, and the character string written next to the link represents the tag name. The triangular leaf node represents the value of the element, and the square leaf node is the attribute (Atrrib) attached to the tag.
ute). Only these two leaf nodes have a value. If only the node information is stored in the database table when the nodes are divided and stored in the database, the connection between the nodes in the tree structure, that is, the link information is lost. Therefore, for the link information, a dedicated table for storing the link information is prepared. Also, regarding the nodes, since the optimum storage structure is different between the intermediate node, the leaf node of the element value, and the leaf node of the attribute, it is necessary to store them in separate tables.

【００１６】本実施例で使用するテーブルは、全部で次
の６つである。中間ノードテーブルこれは中間ノードの情報を格納するテーブルである。ノ
ードＩＤ(id)の他に、そのノードが含まれている文書の
文書ＩＤ(docid) 、そのノードまでのルートからのフル
パスのＩＤ(pathid)をカラムとして持っている。リンクテーブルこれはノード間のリンクを格納するテーブルである。ノ
ードＩＤ(id)、リンクのラベル（タグ名) のＩＤ(lａbe
lid)、子ノードのノードＩＤ(child) 、その子ノードの
全兄弟ノード中での出現順序(tord:total order)、その
子ノードの同ラベルを持つ兄弟ノード中での出現順序(p
ord:partial order)をカラムとして持っている。上記の
ように、リンクテーブル中にラベル（タグ名) のＩＤ(l
ａbelid)を付加することによりタグ名を指定してリンク
を辿る問い合わせを高速に実行することが可能となる。The following six tables are used in total in this embodiment. Intermediate node table This is a table that stores information about intermediate nodes. In addition to the node ID (id), the column has a document ID (docid) of a document including the node and a full path ID (pathid) from the root to the node. Link table This is a table that stores links between nodes. Node ID (id), link label (tag name) ID (labe
lid), the node ID of the child node (child), the order of appearance of the child node in all sibling nodes (tord: total order), the order of appearance of the child node in sibling nodes having the same label (p
ord: partial order) as a column. As described above, in the link table, the label (tag name) ID (l
By adding abelid), it becomes possible to execute a query that follows a link by specifying a tag name at high speed.

【００１７】葉ノードテーブルこれはエレメント値の葉ノードを格納するテーブルであ
る。そのエレメントにあたる中間ノードのノードＩＤ(i
d)の他に、エレメントの値(value) と、そのエレメント
中でその値が出現した順序(order) をカラムとして持っ
ている。このように、値を持つための葉ノードテーブル
を、前記中間ノードテーブルとは別に設けることによ
り、値を格納するスペースを節約することができる。Leaf node table This is a table for storing leaf nodes of element values. The node ID of the intermediate node that corresponds to that element (i
In addition to d), it has the value of the element and the order in which the value appears in the element as a column. Thus, by providing the leaf node table for holding values separately from the intermediate node table, it is possible to save the space for storing the values.

【００１８】属性ノードテーブルこれはタグにつけられた属性（例えば図８における<boo
k year="1995">におけるyear）を格納するテーブルであ
る。そのタグが含まれるエレメントにあたる中間ノード
のノードＩＤ(id)の他に、属性名のＩＤ(labelid) 、属
性値(Attvalue)をカラムとして待つ。なお、属性テーブ
ルに関係データベースの制約機能を用いて、(id,labeli
d)の組がユニークという制約をかけておくことによっ
て、「同一のタグ内では同一の属性名は出現してはなら
ない」というＸＭＬの属性に関する構文規則をチェック
することができる。また、本実施例で想定している木構
造表現では、ＸＭＬのタグが木構造のリンクに相当する
ので、ＸＭＬのタグに付けられる属性は本来ならばリン
クに付くべきである。しかし、図４では、属性はリンク
に対してではなく、その下のノードに付いている。これ
は、検索時のテーブル参照の回数を少なくするためであ
る。すなわち、属性を条件として木構造を辿る問い合わ
せを実行する際のテーブル検索回数を削減し、問い合わ
せの高速化を図ることが可能となる。Attribute node table This is an attribute attached to a tag (for example, <boo in FIG. 8).
This is a table that stores years in k year = "1995">. In addition to the node ID (id) of the intermediate node corresponding to the element including the tag, the ID (labelid) of the attribute name and the attribute value (Attvalue) are waited as columns. In addition, using the constraint function of the relational database for the attribute table, (id, labeli
By applying the constraint that the set of d) is unique, it is possible to check the syntax rule regarding the attribute of XML that "the same attribute name must not appear in the same tag". Further, in the tree structure representation assumed in this embodiment, the XML tag corresponds to a tree structure link, and therefore the attribute attached to the XML tag should be attached to the link as it should be. However, in FIG. 4, the attribute is attached to the node below it, not to the link. This is to reduce the number of table references at the time of search. That is, it is possible to reduce the number of table searches when executing a query that follows a tree structure with an attribute as a condition, and to speed up the query.

【００１９】パスＩＤテーブルこれはパスＩＤとパスの文字列の対応表である。パスの
文字列を中間ノードテーブルに直接書き込まないでこの
ように別に持っているのは、スペースの節約の為もある
が、パス名の文字列マッチングを含む検索が行なわれた
ときに、検索対象が少なくてすみ、検索が高速化できる
からでもある。ラベルＩＤテーブルこれはラベルＩＤとラベルの文字列の対応表である。こ
のように、リンクテーブルのタグ名と、属性ノードテー
ブルの属性名をＩＤで記述し、このラベルのＩＤと文字
列の対応表をラベルＩＤテーブルとして別に持つことに
より、パスＩＤテーブルと同様、格納スペースの節約
と、検索の高速化を図ることができる。Path ID Table This is a correspondence table of path IDs and path character strings. Having the path string separately instead of directly writing it to the intermediate node table saves space, but when a search involving path name string matching is performed, the search target This is because there is less and the search speed can be increased. Label ID Table This is a correspondence table of label IDs and label character strings. In this way, the tag name of the link table and the attribute name of the attribute node table are described by ID, and the correspondence table of the ID of this label and the character string is separately provided as a label ID table, so that the same as the path ID table is stored. You can save space and speed up searches.

【００２０】また、上記のように、リンクテーブル中
に、子ノードの全兄弟ノード中での出現順序(tord:tota
l order)の情報を付加し、また、葉ノードテーブル中
に、各エレメント値がそのエレメント内で出現した順序
(order) の情報を付加することに、ＸＭＬデータ格納部
１１に格納されるノード単位に分解されたＸＭＬデータ
から、元のＸＭＬ文書を復元することが可能となる。例
えば、「今日は <天気> 晴れ</天気> だった。○○は <
場所> デパート</場所> へでかけた。」のようにタグで
区切られた文章を復元することも可能になる。また、リ
ンクテーブル中に、各エレメントの同ラベルを持つ兄弟
ノード中での出現順序(pord:partial order)の情報を付
加することにより、各ラベルの出現順序を指定した問い
合わせを高速に実行することが可能となる。Further, as described above, in the link table, the appearance order (tord: tota
(l order) information is added, and the order in which each element value appears in the element in the leaf node table.
By adding the information of (order), the original XML document can be restored from the XML data decomposed into node units stored in the XML data storage unit 11. For example, "Today was <weather> Fine </ weather>.
I went to Places> Department Store </ Place>. It is also possible to restore sentences separated by tags such as "." In addition, by adding the information of the appearance order (pord: partial order) in sibling nodes that have the same label of each element in the link table, queries that specify the appearance order of each label can be executed at high speed. Is possible.

【００２１】一例として、図８のサンプルＸＭＬデータ
（図４の木構造表現) を上記のテーブル群で格納した様
子を図５、図６に示す。図５は中間ノードテーブル、リ
ンクテーブルの例を示す図である。中間ノードテーブル
において、例えば、第１行目のid（＝５) は図４におい
て" ５" と記されたノードを示し、そのノードが含まれ
ている文書の文書ＩＤ(docid) は１である。また、その
ノードまでのルートからのフルパスのＩＤ(pathid)は１
であり、このＩＤに対応したpathは、"bib.book.puｂli
sher.name"である。また、リンクテーブルにおいて、例
えば１行目のid（＝４）は図４において、" ４" と記さ
れたノードを示し、そのlaｂelidは５であり、このlabe
lid に対応するlabel は"name"である。また、その出現
順序を示すtord,pord はそれぞれ" ０"," ０" であり、
子ノードは、図４で "５" と記されたノードである。As an example, FIGS. 5 and 6 show how the sample XML data of FIG. 8 (tree structure representation of FIG. 4) is stored in the above table group. FIG. 5 is a diagram showing an example of the intermediate node table and the link table. In the intermediate node table, for example, the id (= 5) in the first line indicates the node marked with "5" in FIG. 4, and the document ID (docid) of the document containing that node is 1. . Also, the full path ID (pathid) from the root to that node is 1.
And the path corresponding to this ID is "bib.book.publi
sher.name ". In addition, in the link table, for example, the id (= 4) in the first line indicates the node marked" 4 "in FIG. 4, and its label is 5, and this labe
The label corresponding to the lid is "name". Also, tord and pord, which indicate the order of appearance, are "0" and "0" respectively,
The child node is a node marked "5" in FIG.

【００２２】図６は葉ノードテーブル、属性ノードテー
ブル、パスＩＤテーブル、ラベルＩＤテーブルの例を示
す図である。葉ノードテーブルにおいて、例えば第１行
目のid（＝５) は図４において、"５" と記されたノー
ドを示し、そのorder は" ０" 、またその葉ノードの値
(vａlue)は"Addison-Wesley"である。属性ノードテーブ
ルにおいて、例えば第１行目のid（＝３) は図４におい
て、”３" と記されたノードを示し、そのlabelid は３
（"year"に対応) 、その属性値（ａttvalue ) は "１９
９５”である。また、パスＩＤテーブル、ラベルＩＤテ
ーブルにはそれぞれ、上記各テーブル中のpathid、labe
lid に対応したパスの文字列、ラベルの文字列が格納さ
れ、例えば、pathid＝”１" に対応した文字列は前記し
たように" ｂib.book.puｂlisher.name"であり、また、
例えばlabelid ＝”１" に対応した文字列は" bib"であ
る。FIG. 6 is a diagram showing an example of a leaf node table, an attribute node table, a path ID table, and a label ID table. In the leaf node table, for example, the id (= 5) in the first row indicates the node marked "5" in FIG. 4, its order is "0", and the value of that leaf node.
(value) is "Addison-Wesley". In the attribute node table, for example, the id (= 3) in the first row indicates the node marked "3" in FIG. 4, and its labelid is 3
(Corresponding to "year"), its attribute value (attvalue) is "19"
95 ". The path ID table and the label ID table respectively include pathid and labe in the above tables.
The path character string and the label character string corresponding to the lid are stored. For example, the character string corresponding to pathid = "1" is "bib.book.publisher.name" as described above, and
For example, the character string corresponding to labelid = "1" is "bib".

【００２３】（２) インデックスの構成本実施例においては、本来連結されていたはずの木構造
のノードが、前記したように１つ１つに分割されて関係
データベースのテーブルに格納されている。このため
に、木構造を辿る問い合わせが行なわれた場合、問い合
わせで辿る部分のリンクを連結し直すためにジョイン操
作が行なわれる。このジョイン操作の速度は全体の検索
速度に大きく影響するので、ジョイン操作を高速に行な
えるようにインデックスを効果的に張っておく必要があ
る。また、問い合わせが行なわれる場合、検索条件とし
て指定されるのは、エレメントの値、属性、パス、出現
順序などである。それらの検索も高速に行なう必要があ
るので、そこにもインデックスを用意しておく必要があ
る。(2) Structure of Index In this embodiment, the tree-structured nodes that were originally connected are divided into individual nodes and stored in the relational database table as described above. For this reason, when an inquiry is made that follows the tree structure, a join operation is performed to reconnect the links of the part that is followed by the inquiry. Since the speed of this join operation greatly affects the overall search speed, it is necessary to effectively extend the index so that the join operation can be performed at high speed. Also, when an inquiry is made, what is specified as a search condition is an element value, an attribute, a path, an appearance order, or the like. It is also necessary to prepare an index for those searches as well because they need to be searched at high speed.

【００２４】図７に、上記図５、図６に示したテーブル
に張ったインデックスの一覧を示す。このインデックス
は B⁺-tree で張ってあり、キーが複数の属性の組から
なるインデックスは、その組の先頭からの部分的な属性
の組で検索に用いることもできる。なお中間ノードテー
ブルに張ってあるインデックスでキーが(pathid,id) の
ものは、あるパスに該当する全てのノードを検索してく
るときに使用するものである。このインデックスのキー
は、一見pathid単独で構わないように思われるかもしれ
ない。しかしキーをpathidだけにすると、同じキー値を
持つエントリが多量に発生して、B ⁺-tree インデック
スが機能しなくなる。上記のようにキー値をパスＩＤ(p
athid)とノードのＩＤ(id)の組とすることにより、キー
値の重複を無くすことができ、B ⁺-tree の検索を高速
に行うことができる。また、中間ノードテーブルに張っ
てあるインデックスでキーが(docid,id)も同様であり、
文書ＩＤ(docid) とノードのＩＤ(id)の組とすることに
より、キー値の重複を無くすことができ、B ⁺-tree の
検索を高速に行うことができる。FIG. 7 shows a list of indexes created in the tables shown in FIGS. 5 and 6. This index is extended by B ⁺ -tree, and an index whose key is a set of multiple attributes can also be used for searching with a partial set of attributes from the beginning of the set. The index in the intermediate node table whose key is (pathid, id) is used when searching all the nodes corresponding to a certain path. The key to this index may seem ok at the seemingly pathid alone. However, if the key is only pathid, there will be too many entries with the same key value and the B ⁺ -tree index will not work. The key value is the path ID (p
By making the combination of ath id) and the ID (id) of the node, it is possible to eliminate the duplication of the key value, and to search the B ⁺ -tree at high speed. The same applies to the index (docid, id) in the index that is set up in the intermediate node table.
By using the combination of the document ID (docid) and the node ID (id), it is possible to eliminate the duplication of the key value and to search the B ⁺ -tree at high speed.

【００２５】（３) 問い合わせの実行前記したように、格納されたＸＭＬデータに対する問い
合わせは、例えばＸＭＬデータの問い合わせ言語で行な
われる。ＸＭＬデータのための検索言語の一つとして検
索言語ＸＱＬがある。ＸＱＬによる問い合わせ文を、例
により簡単に説明する。(3) Execution of Inquiries As described above, inquiries regarding the stored XML data are made in the inquiry language of the XML data, for example. A search language XQL is one of the search languages for XML data. An inquiry sentence in XQL will be briefly described with an example.

【００２６】 SELECT result:<$book.title> FROM book: bib.book WHERE $book.author.lastname="Darwen"; この問い合わせの意味は「bib.book.author.lastnameが
Darwenであるようなbib.bookについて、bib.book.title
を検索結果として得たい」という意味である。SELECT result: <$ book.title> FROM book: bib.book WHERE $ book.author.lastname = "Darwen"; The meaning of this inquiry is "bib.book.author.lastname is
For a bib.book that is Darwen, bib.book.title
Is to be obtained as a search result ”.

【００２７】上記に示すように、問い合わせ文は大き
く、SELECT、FROM、WHERE の３つの部分に別れている。
SELECTの部分では検索結果として得たいエレメントのプ
ロジェクションを指定する。FROMの部分では検索の対象
となるエレメントを指定している。WHERE の部分では検
索の条件のセレクションを指定する。上記のような問い
合わせは前記したように、問い合わせ処理エンジン１３
で処理される。問い合わせ処理エンジン１３では、上記
のような問い合わせ文の構文チェックを行い問い合わせ
のための構文木を生成する。そして、該構文木を基に、
最適な実行プランを生成する。この実行プランは、木構
造検索用の関数セットで記述される。As shown above, the inquiry sentence is large and divided into three parts: SELECT, FROM, and WHERE.
In the SELECT part, specify the projection of the element you want to get as a search result. In the FROM part, the element to be searched is specified. In the WHERE part, a selection of search conditions is specified. As described above, the inquiry as described above is made for the inquiry processing engine 13
Is processed in. The query processing engine 13 checks the syntax of the query statement as described above and generates a syntax tree for the query. Then, based on the syntax tree,
Generate an optimal execution plan. This execution plan is described by a function set for tree structure search.

【００２８】次に、上記ＸＭＬデータに対する問い合わ
せ処理が、どのように行なわれるかを説明する。ここで
は、図８のサンプルＸＭＬデータを、ＸＭＬデータ格納
部１１に格納し、前述した図５、図６に示したテーブル
に挿入した場合を例として、上記のように「著者がDarw
enである本のタイトルを求めよ」という問い合わせを行
なった場合について説明する。この場合のテーブル検索
は、次のように行われる。なお、下記１. 〜１０. の処
理は、上記木構造検索用の関数により実行される。Next, how the inquiry processing for the XML data is performed will be described. Here, as an example of the case where the sample XML data of FIG. 8 is stored in the XML data storage unit 11 and is inserted into the tables shown in FIGS.
A description will be given of a case in which an inquiry is made asking for the title of the book that is "en". The table search in this case is performed as follows. The following processings 1 to 10 are executed by the tree structure search function.

【００２９】１. 葉ノードテーブルを検索して、値が
“Darwen" であるノードのノードＩＤ（＝１６) を得
る。２. パスＩＤテーブルを検索して、パス"bib.book.auth
or.lastname " のパスＩＤ（＝４) を得る。３. 中間ノードテーブルを上記１. で得られたノードＩ
Ｄ（＝１６) で検索して、得られたパスＩＤ（＝４) が
上記２. で得られたパスＩＤ（＝４) と一致することを
確認する。４. ラベルＩＤテーブルを検索して、ラベル"lastname"
のラベルＩＤ（＝８) を得る。５. リンクテーブルを検索して、上記１. で得られたノ
ードＩＤ（＝１６) と上記４. で得られたラベルＩＤ
（＝８) から、親ノードのノードＩＤ（＝１５) を得
る。６. ラベルＩＤテーブルを検索して、ラベル" author "
のラベルＩＤ（＝７) を得る。７. リンクテーブルを検索して、上記５. で得られたノ
ードＩＤ（＝１５) と上記６. で得られたラベルＩＤ
（＝７) から、親ノードのノードＩＤ（＝９）を得る。８. ラベルＩＤテーブルを検索して、ラベル"title" の
ラベルＩＤ（＝６) を得る。９. リンクテーブルを検索して上記７. で得られたノー
ドＩＤ（＝９) と上記８. で得られたラベルＩＤ（＝
６) から、子ノードのノードＩＤ（＝１２) を得る。１０. 葉ノードテーブルを検索して、上記９. で得られ
たノードＩＤ（＝１２)から、そのノードの値("Foundat
ion for Object/Relational Database") を得る。以上
のようにして得られた検索結果は、問い合わせ処理エン
ジン１３を介して出力され、ユーザに提示される。1. The leaf node table is searched to obtain the node ID (= 16) of the node whose value is "Darwen". 2. Search the path ID table to find the path "bib.book.auth.
Get the path ID (= 4) of "or.lastname." 3. Set the intermediate node table to node I obtained in 1. above.
It searches by D (= 16) and confirms that the obtained path ID (= 4) matches the path ID (= 4) obtained in 2. above. 4. Search the label ID table and label "lastname"
Label ID (= 8) of 5. Search the link table to find the node ID (= 16) obtained in 1 above and the label ID obtained in 4 above.
The node ID (= 15) of the parent node is obtained from (= 8). 6. Search the label ID table and label "author"
To obtain the label ID (= 7). 7. Search the link table to find the node ID (= 15) obtained in 5 above and the label ID obtained in 6 above.
The node ID (= 9) of the parent node is obtained from (= 7). 8. The label ID table is searched to obtain the label ID (= 6) of the label "title". 9. Search the link table to find the node ID (= 9) obtained in 7 above and the label ID (= 9) obtained in 8 above.
From 6), the node ID (= 12) of the child node is obtained. 10. Search the leaf node table, and from the node ID (= 12) obtained in 9. above, find the value of that node ("Foundat
ion for Object / Relational Database "). The search result obtained as described above is output via the query processing engine 13 and presented to the user.

【００３０】[0030]

【発明の効果】以上説明したように、本発明において
は、関係データベースに、中間ノードの情報を格納する
ための中間ノードテーブルと、リンクの情報を格納する
ためのリンクテーブルと、葉ノードの情報を格納するた
めの葉ノードテーブル等のテーブルを設け、ＸＭＬの木
構造をノードとリンクに分解して、上記テーブルに各ノ
ードとリンク情報を関係付けて格納し、上記テーブルを
参照して木構造を辿る問い合わせを実行し、ＸＭＬデー
タを検索するようにしたので、データ構造が一意に定ま
っていないＸＭＬデータに対する複雑な問い合わせを高
速に実行することができる。また、ＸＭＬの木構造をそ
のまま格納手段に格納するので、ＤＴＤ無しのＸＭＬデ
ータや半構造のＸＭＬデータも格納することができる。
さらにＸＭＬの木構造を全てデータベース上に格納して
いるので、木構造の全ての情報を検索に利用することが
できる。As described above, according to the present invention, the relational database has an intermediate node table for storing information of intermediate nodes, a link table for storing information of links, and information of leaf nodes. A table such as a leaf node table for storing the tree structure is provided, the XML tree structure is decomposed into nodes and links, and each node and link information are stored in the above table in association with each other. Since a query following is executed and XML data is searched, a complex query for XML data whose data structure is not uniquely determined can be executed at high speed. In addition, since the XML tree structure is stored in the storage means as it is, XML data without DTD and semi-structured XML data can also be stored.
Furthermore, since all XML tree structures are stored in the database, all information of the tree structure can be used for searching.

[Brief description of drawings]

【図１】本発明の基本構成図である。FIG. 1 is a basic configuration diagram of the present invention.

【図２】本発明の実施例のシステムの構成例を示す図で
ある。FIG. 2 is a diagram showing a configuration example of a system according to an embodiment of the present invention.

【図３】本発明の実施例のシステムにおける格納処理フ
ローを示す図である。FIG. 3 is a diagram showing a storage processing flow in the system of the embodiment of the present invention.

【図４】ＸＭＬデータの木構造表現の一例を示す図であ
る。FIG. 4 is a diagram showing an example of a tree structure representation of XML data.

【図５】本発明の実施例のテーブル構成の一例を示す図
（１) である。FIG. 5 is a diagram (1) showing an example of a table configuration according to the embodiment of the present invention.

【図６】本発明の実施例のテーブル構成の一例を示す図
（２) である。FIG. 6 is a diagram (2) showing an example of a table configuration according to the embodiment of the present invention.

【図７】本発明の実施例のイッデックス一覧を示す図で
ある。FIG. 7 is a diagram showing an index list of an example of the present invention.

【図８】ＸＭＬデータの一例を示す図である。FIG. 8 is a diagram showing an example of XML data.

【図９】図８のＸＭＬデータをテーブルに格納した様子
を示す図である。9 is a diagram showing how the XML data of FIG. 8 is stored in a table.

[Explanation of symbols]

１ＸＭＬデータ格納格納手段２中間ノードテーブル３リンクテーブル４葉ノードテーブル５属性テーブル６パスＩＤテーブル７ラベルＩＤテーブル８インデックス９問い合わせ処理手段１１ＸＭＬデータ格納部１２ＸＭＬデータ挿入モジュール１２ａＸＭＬパーザ１２ｂローダ１３問い合わせ処理エンジン部１３ａ問い合わせ言語のパーザ１３ｂ問い合わせ最適化エンジン１３c 木構造検索用 1 XML data storage storage means 2 Intermediate node table 3 link table 4 leaf node table 5 attribute table 6-pass ID table 7 Label ID table 8 indexes 9 Inquiry processing means 11 XML data storage 12 XML data insertion module 12a XML parser 12b loader 13 Inquiry processing engine section 13a Query language parser 13b Query optimization engine 13c For tree structure search

フロントページの続き (56)参考文献志村壮是，吉川正俊，オブジェクト的関係を用いたＸＭＬ文書の汎用的な格納と検索，情報処理学会第58回（平成11年前期）全国大会講演論文集（３），1999 年３月９日，第265〜266頁田島敬史，半構造データのためのデータモデルと操作言語，情報処理学会論文誌，1999年２月15日，第40巻，第ＳＩＧ３（ＴＯＤ１）号，第152〜170頁 (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06F 17/30 G06F 12/00 ＪＩＣＳＴファイル（ＪＯＩＳ)Continuation of the front page (56) References Somura Shimura, Masatoshi Yoshikawa, general-purpose storage and retrieval of XML documents using object-like relationships, Proc. Of IPSJ 58th (the first half of 1999) 3), March 9, 1999, pp.265-266, Takashi Tajima, Data models and operational languages for semi-structured data, Journal of Information Processing Society of Japan, February 15, 1999, Volume 40, SI G3 (TOD1), pp. 152-170 (58) Fields investigated (Int.Cl. ⁷ , DB name) G06F 17/30 G06F 12/00 JISST file (JOIS)

Claims

(57) [Claims]

1. A system for retrieving data described in XML represented by a tree structure in which an element is an intermediate node, element values and attribute values are leaf nodes, and tags are links, the system comprising: An intermediate node table for storing at least intermediate node information in a relational database of the storage means, the storage means storing XML data;
A link table for storing link information and a leaf node table for storing leaf node information are provided, and an intermediate node table is quickly searched by a node ID.
Depending on the index to do and the document ID of the table
Index for fast search and path
Prepare an index for high-speed search and search the linked table from parent node to child node at high speed.
And the parent node from the child node
Prepare an index for quick search , and store the value of that node from the node ID in the leaf node table.
Find an index to get and a node with a certain value
Prepares an index for doing so , decomposes the XML tree structure into nodes and links, and stores each node and link information in association with each other in the table,
An XML data search system , which refers to the table and executes a query that follows a tree structure using the index to search XML data.

2. The relational database is provided with a path ID table, which is a correspondence table of path character strings and path IDs , and a label ID table, which is a correspondence table of label character strings and label IDs. Bus ID corresponding to the path character string in the table
Prepare an index to search for
Search the label ID corresponding to the label character string
An XML data search system according to claim 1, wherein an index for performing the above is prepared, and a query for tracing a tree structure is executed using the index.

3. The link table adds information about the order in which each child element appears in that element, and the leaf node table adds information about the order in which each element value appears in that element, Based on the above information, the original X
Claim, characterized in that to enable restoration of the ML document 1
Alternatively, the XML data search system according to claim 2 .