JP2006228155A

JP2006228155A - Xml data processing apparatus, xml data processing method, xml data processing program, and storage medium having xml data processing program recorded therein

Info

Publication number: JP2006228155A
Application number: JP2005044744A
Authority: JP
Inventors: Takeharu Eda; 毅晴江田; Makoto Onizuka; 真鬼塚; Masashi Yamamuro; 雅司山室
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2005-02-21
Filing date: 2005-02-21
Publication date: 2006-08-31
Anticipated expiration: 2025-02-21
Also published as: JP4562130B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide an apparatus which efficiently performs inquiry processing by eliminating unnecessary reference to links with respect to an inquiry about XML data. <P>SOLUTION: When XML data to be a retrieval object is registered, an XML data loader part 230 constitutes summary information of XML data and gives a range label to the summary information and stores them in a secondary storage device 130. On retrieval, an inquiry about XML data and summary information are inputted, and an inquiry analysis part 150 converts the inquiry to a tree structure to generate a structural part of an inquiry tree from which a value part is deleted, and an inquiry execution part 170 collates an intermediate inquiry tree and summary information to generate a set of intermediate summary trees and converts it to a set of substantial intermediate trees including data resulting from substantiating the set of intermediate summary trees with respect to the XML data, and a value part corresponding to the set of substantial intermediate trees is used to perform filter processing with respect to the XML data, and an inquiry result is outputted. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、ＸＭＬ（eXtensible Markup Language）データを高速に処理するＸＭＬデータ処理装置、ＸＭＬデータ処理方法、ＸＭＬデータ処理プログラムおよびＸＭＬデータ処理プログラムを記録した記憶媒体に関する。 The present invention relates to an XML data processing apparatus, an XML data processing method, an XML data processing program, and a storage medium on which the XML data processing program is recorded, which processes XML (eXtensible Markup Language) data at high speed.

ＸＭＬデータ処理を高速化するための技術は、すでにいくつか公開されている。例えば、非特許文献１は、ＸＭＬデータに対する高速な構造ジョインアルゴリズムを提案している。ここで、構造ジョインとは、関係データベースのジョイン演算に類似した演算で、複数の検索結果を組み合わせる演算である（詳細は後記する）。また、非特許文献２では、ラベルを用いることで先祖子孫関係および親子関係を指定したデータベース検索を高速化している。そして、非特許文献３では、ＸＭＬデータの統計情報を用いることで、構造ジョイン演算の実行順序を最適化して検索を高速にする方法を提案している。
Shurug Al-Khalifa, H. V. Jagadish, Nick Koudas, Jignesh M. Patel, Divesh Srivastava, and Yuqing Wu著、"Structural Joins: A Primitive for Efficient XML Query Pattern Matching",（米国） In Proc. ICDE (2002) Quanzhong Li and Bongki Moon著、"Indexing and Querying XML Data for Regular Path Expressions",（米国） In Proc. VLDB (2001) Yuqing Wu, Jignesh M. Patel, and H. V. Jagadish著、"Structural Join Order Selection for XML Query Optimization", （米国） In Proc. ICDE (2003) Several techniques for accelerating XML data processing have already been released. For example, Non-Patent Document 1 proposes a fast structure join algorithm for XML data. Here, the structure join is an operation similar to the join operation of the relational database, and is an operation that combines a plurality of search results (details will be described later). Further, in Non-Patent Document 2, using a label speeds up database search specifying an ancestor-descendant relationship and a parent-child relationship. Non-Patent Document 3 proposes a method of optimizing the execution order of the structure join operation and using the statistical information of the XML data to speed up the search.
Shurug Al-Khalifa, HV Jagadish, Nick Koudas, Jignesh M. Patel, Divesh Srivastava, and Yuqing Wu, "Structural Joins: A Primitive for Efficient XML Query Pattern Matching", (USA) In Proc. ICDE (2002) Quanzhong Li and Bongki Moon, "Indexing and Querying XML Data for Regular Path Expressions", (USA) In Proc. VLDB (2001) Yuqing Wu, Jignesh M. Patel, and HV Jagadish, "Structural Join Order Selection for XML Query Optimization", (USA) In Proc. ICDE (2003)

しかしながら、これらの技術は、ＸＭＬデータが持っている構造を充分に活用し切れておらず、まだ、高速化する余地が残っていた。特に、コストが高い構造ジョイン演算の回数を減らすという点では、充分な対応が取られていないという問題があった。 However, these technologies have not fully utilized the structure of XML data, and there is still room for speeding up. In particular, there has been a problem that sufficient measures have not been taken in terms of reducing the number of costly structural join operations.

そこで、本発明では、前記した問題を解決し、高速なＸＭＬデータ処理を実現するＸＭＬデータ処理装置、ＸＭＬデータ処理方法、ＸＭＬデータ処理プログラムおよびＸＭＬデータ処理プログラムを記録した記憶媒体を提供することを目的とする。 Accordingly, the present invention provides an XML data processing apparatus, an XML data processing method, an XML data processing program, and a storage medium on which the XML data processing program is recorded, which solves the above-described problems and realizes high-speed XML data processing. Objective.

前記の目的を実現するために、本発明（請求項１）では、演算手段と記憶手段を少なくとも備えた計算機を用いて実現するＸＭＬデータ処理装置であって、前記ＸＭＬデータ処理装置が、ＸＭＬデータを解析する手段とＸＭＬデータを記憶する手段を備え、検索の対象となるＸＭＬデータの登録時に、前記ＸＭＬデータを解析する手段が、前記ＸＭＬデータの構成および統計に関する情報を含む要約情報を構成し、前記要約情報に前記ＸＭＬデータのノード間の先祖子孫関係を判定可能なラベルを付与し、前記ＸＭＬデータを記憶する手段が、前記ＸＭＬデータ、前記要約情報および前記ラベルを記憶する。 In order to achieve the above object, the present invention (Claim 1) is an XML data processing apparatus realized by using a computer having at least an arithmetic means and a storage means, and the XML data processing apparatus is an XML data. And means for storing XML data, and at the time of registering the XML data to be searched, the means for analyzing the XML data constitutes summary information including information on the structure and statistics of the XML data. A means for assigning a label capable of determining an ancestor-descendant relationship between nodes of the XML data to the summary information, and storing the XML data stores the XML data, the summary information, and the label.

この構成によれば、ＸＭＬデータの要約情報を構成することができる。 According to this configuration, summary information of XML data can be configured.

また、本発明（請求項２）では、前記演算手段が、ＸＭＬデータのＳＡＸイベントシーケンスを入力として、ＳＡＸイベントごとに対応するノード情報が前記要約情報に含まれるか否かを判定し、前記ノード情報が前記要約情報に含まれていない場合に、前記ＸＭＬデータを解析する手段が、前記ノード情報を前記要約情報に追加することによって、所定の回数の走査で要約情報を構成する。 In the present invention (Claim 2), the computing means receives an SAX event sequence of XML data as input, determines whether or not node information corresponding to each SAX event is included in the summary information, and the node When the information is not included in the summary information, the means for analyzing the XML data adds the node information to the summary information, thereby forming the summary information by a predetermined number of scans.

この構成によれば、所定の回数の走査で要約情報を構成することができる。 According to this configuration, the summary information can be configured by a predetermined number of scans.

また、本発明（請求項３）では、前記先祖子孫関係を判定可能なラベルとして、木構造の中で検索対象となるノードを特定する機能を持つ範囲ラベルを用いる。 In the present invention (Claim 3), a range label having a function of specifying a search target node in the tree structure is used as a label capable of determining the ancestor-descendant relationship.

この構成によれば、範囲ラベルを用いて効率のよい検索が可能になる。 According to this configuration, an efficient search can be performed using the range label.

また、本発明（請求項４）では、演算手段と記憶手段を少なくとも備えた計算機を用いて実現するＸＭＬデータ処理装置であって、前記ＸＭＬデータ処理装置が、ＸＭＬデータに対する問合せを解析する手段と、前記問合せを実行する手段とを備え、前記問合せとＸＭＬデータの構成および統計に関する情報を含む要約情報を入力として、前記問合せを解析する手段が、前記問合せを木構造に変換して属性値またはノードの値の部分を削除した問合せ木の構造部を生成し、前記問合せを実行する手段が、前記問合せ木の構造部の部分である中間問合せ木と要約情報とを照合して要約情報の一部の情報を含む中間要約木の集合を生成し、前記ＸＭＬデータに対して前記中間要約木の集合を実体化したデータを含む実体中間木の集合に変換し、前記ＸＭＬデータに対して、前記実体中間木の集合に対応する値の部分を用いてフィルタ処理を行い、問合せ結果を得る。 According to the present invention (Claim 4), there is provided an XML data processing device realized by using a computer having at least an arithmetic means and a storage means, the XML data processing device analyzing means for querying XML data; Means for executing the query, and means for analyzing the query by using the query and summary information including information on the structure and statistics of the XML data as input, and converting the query into a tree structure to obtain attribute values or A query tree structure part from which the node value part has been deleted is generated, and the means for executing the query collates the intermediate query tree, which is a part of the query tree structure part, with the summary information, and provides a summary information. A set of intermediate summary trees including part information, and converting the XML data into a set of entity intermediate trees including data obtained by materializing the set of intermediate summary trees. For the XML data, performs filter processing using part of the value corresponding to said set of entities intermediate tree to obtain query results.

この構成によれば、要約情報と問合せを入力して、ＸＭＬデータから所望の検索結果を得ることができる。 According to this configuration, it is possible to input summary information and an inquiry and obtain a desired search result from XML data.

また、本発明（請求項５）では、前記問合せを記述する検索言語としてＸＰａｔｈを用いる。 In the present invention (Claim 5), XPath is used as a search language for describing the query.

この構成によれば、ＸＰａｔｈの書式で問合せを行うことができる。 According to this configuration, an inquiry can be made in the XPath format.

また、本発明（請求項６）では、前記問合せを解析する手段が、前記要約情報に対して、イベントシーケンスを作成する。 In the present invention (Claim 6), the means for analyzing the query creates an event sequence for the summary information.

この構成によれば、要約情報をイベントシーケンスに変換できる。 According to this configuration, summary information can be converted into an event sequence.

また、本発明（請求項７）では、前記ＸＭＬデータ処理装置が、複数の検索結果を組み合わせる構造ジョインの演算を行う手段をさらに備え、前記構造ジョインの演算を行う手段が、前記中間要約木に対して構造ジョインの演算を用いることで前記実体中間木の集合に変換する。 In the present invention (Claim 7), the XML data processing apparatus further includes means for performing a structure join operation for combining a plurality of search results, and the means for performing the structure join operation is included in the intermediate summary tree. On the other hand, it is converted into a set of entity intermediate trees by using a structure join operation.

この構成によれば、ＸＭＬデータへの問合せの中間処理段階を効率化することが可能になる。 According to this configuration, it is possible to improve the efficiency of the intermediate processing stage of the inquiry to the XML data.

また、本発明（請求項８）では、前記ＸＭＬデータ処理装置が、問合せを最適化する手段をさらに備え、前記問合せを最適化する手段が、前記中間要約木を実体化したデータを含む前記実体中間木の集合に変換する処理である実体化処理の方法と前記実体中間木の集合に対応する値の部分を用いてフィルタ処理の方法を組み合わせた実行プランのコストを計算して、最適な実行プランを決定する。 In the present invention (Claim 8), the XML data processing apparatus further includes means for optimizing a query, and the means for optimizing the query includes the entity including data obtained by materializing the intermediate summary tree. Optimal execution by calculating the cost of an execution plan that combines the materialization method that is the process of converting into a set of intermediate trees and the filtering method using the value part corresponding to the set of the intermediate tree Decide on a plan.

この構成によれば、問合せの処理をさらに効率化できる。 According to this configuration, it is possible to further improve the efficiency of query processing.

また、本発明（請求項９）では、前記ＸＭＬデータ処理装置が、データベースを管理する手段をさらに備え、データベースを管理する手段が、前記ＸＭＬデータを管理する。 In the present invention (Claim 9), the XML data processing apparatus further includes means for managing a database, and the means for managing the database manages the XML data.

この構成によれば、二次記憶装置に保管されているＸＭＬデータを効率よく参照することができる。 According to this configuration, it is possible to efficiently refer to XML data stored in the secondary storage device.

また、本発明（請求項１０）では、前記ＸＭＬデータ処理装置が、ＸＭＬデータを解析する手段とＸＭＬデータを記憶する手段をさらに備え、検索の対象となるＸＭＬデータの登録時に、前記ＸＭＬデータを解析する手段が、前記ＸＭＬデータの構成および統計に関する情報を含む要約情報を構成し、前記要約情報に前記ＸＭＬデータのノード間の先祖子孫関係を判定可能なラベルを付与し、前記ＸＭＬデータを記憶する手段が、前記ＸＭＬデータ、前記要約情報および前記ラベルを記憶する。 In the present invention (Claim 10), the XML data processing apparatus further includes means for analyzing the XML data and means for storing the XML data, and the XML data is stored when registering the XML data to be searched. Analyzing means configures summary information including information on the configuration and statistics of the XML data, assigns a label that can determine an ancestor-descendant relationship between nodes of the XML data to the XML data, and stores the XML data Means for storing the XML data, the summary information and the label.

この構成によれば、ＸＭＬデータに対する事前処理により要約情報を得て、さらに、この要約情報と問合せを与えることにより、問合せを効率よく処理することができる。 According to this configuration, the query can be efficiently processed by obtaining the summary information by the pre-processing on the XML data and further giving the summary information and the query.

また、本発明（請求項１１）では、前記演算手段が、ＸＭＬデータのＳＡＸイベントシーケンスを入力として、ＳＡＸイベントごとに対応するノード情報が前記要約情報に含まれるか否かを判定し、前記ノード情報が前記要約情報に含まれていない場合に、前記ＸＭＬデータを解析する手段が、前記ノード情報を前記要約情報に追加することによって、所定の回数の走査で要約情報を構成する。 In the present invention (claim 11), the computing means receives an SAX event sequence of XML data as input, determines whether or not node information corresponding to each SAX event is included in the summary information, and the node When the information is not included in the summary information, the means for analyzing the XML data adds the node information to the summary information, thereby forming the summary information by a predetermined number of scans.

この構成によれば、効率よく要約情報を構成することができる。 According to this configuration, summary information can be efficiently configured.

また、本発明（請求項１２）では、前記先祖子孫関係を判定可能なラベルとして、木構造の中で検索対象となるノードを特定する機能を持つ範囲ラベルを用いる。 In the present invention (Claim 12), a range label having a function of specifying a search target node in the tree structure is used as a label capable of determining the ancestor-descendant relationship.

このようなＸＭＬデータ処理装置、ＸＭＬデータ処理方法、ＸＭＬデータ処理プログラムおよびＸＭＬデータ処理プログラムを記録した記憶媒体によれば、検索対象となるＸＭＬデータがもっているデータの構造を活用して、高速なＸＭＬデータの処理が可能になる。 According to the XML data processing apparatus, the XML data processing method, the XML data processing program, and the storage medium on which the XML data processing program is recorded, the structure of the data included in the XML data to be searched is utilized to perform high-speed processing. XML data can be processed.

次に、本発明の実施形態について、適宜図面を参照しながら詳細に説明する。 Next, embodiments of the present invention will be described in detail with reference to the drawings as appropriate.

<<第１の実施形態>>
［システム構成］
図１は、本発明の第１の実施形態におけるＸＭＬデータ処理装置１００の構成を説明する図である。ＸＭＬデータ処理装置１００は、主記憶装置１１０、ＣＰＵ（Central Processing Unit、中央演算装置）１２０および二次記憶装置１３０を備えた計算機にＸＭＬデータを処理するための計算機プログラムが読み込まれることで構成される。 << first embodiment >>
[System configuration]
FIG. 1 is a diagram for explaining the configuration of an XML data processing apparatus 100 according to the first embodiment of the present invention. The XML data processing apparatus 100 is configured by reading a computer program for processing XML data into a computer including a main storage device 110, a CPU (Central Processing Unit) 120, and a secondary storage device 130. The

問合せ解析部１５０は、ＸＭＬデータへの問合せ（図５参照）を字句解析および構文解析し、問合せの内部表現である問合せ木(図７参照)を生成する。また、問合せ解析部１５０は、問合せ木を構造部と値処理部に分離し、中間問合せ木を生成する。 The query analysis unit 150 performs lexical analysis and syntax analysis on the query to the XML data (see FIG. 5), and generates a query tree (see FIG. 7) that is an internal representation of the query. In addition, the query analysis unit 150 separates the query tree into a structure unit and a value processing unit, and generates an intermediate query tree.

問合せ最適化部１６０は、問合せ最適化を行う場合に、中間問合せ木から問合せ実行プラン（説明は後記する）を生成し、それぞれの実行プランのコストをＸＭＬデータの構成に関する情報や統計情報などから計算し、その中から最適な実行プランを選ぶ。 When performing query optimization, the query optimization unit 160 generates a query execution plan (explained later) from the intermediate query tree, and calculates the cost of each execution plan from information related to the configuration of XML data, statistical information, and the like. Calculate and choose the optimal execution plan.

問合せ実行部１７０は、問合せ解析部１５０から渡された中間問合せ木、または、問合せ最適化部１６０から渡された実行プランを実行する。このとき、問合せ実行部１７０は、二次記憶装置１３０に格納されたＸＭＬデータおよびＸＭＬデータの木構造の情報を要約したデータガイド木（ストロングデータガイド木、請求項における要約情報に該当）２４０を対象として問い合わせを実行する。なお、二次記憶装置１３０には、後記するディスクマネージャ部１９０を介してアクセスし、データガイド木（ストロングデータガイド木）２４０には、イベントシーケンス生成部２１０を介してアクセスする。 The query execution unit 170 executes the intermediate query tree passed from the query analysis unit 150 or the execution plan passed from the query optimization unit 160. At this time, the query execution unit 170 obtains a data guide tree (strong data guide tree, corresponding to summary information in the claims) 240 that summarizes XML data stored in the secondary storage device 130 and information on the tree structure of the XML data. Perform a query as a target. The secondary storage device 130 is accessed via a disk manager unit 190, which will be described later, and the data guide tree (strong data guide tree) 240 is accessed via the event sequence generation unit 210.

構造ジョイン演算部１８０は、既存技術である構造ジョインの演算を行う。そして、このとき、ディスクマネージャ部１９０を介して二次記憶装置１３０にアクセスする。 The structure join calculation unit 180 performs a structure join calculation which is an existing technique. At this time, the secondary storage device 130 is accessed via the disk manager unit 190.

なお、構造ジョインとは、関係データベースのジョイン演算に類似した演算である。関係データベースにおけるジョイン演算では、テーブルの結合処理の際に指定された項目の値が結合する両方のテーブルのエントリ（タプル）に共通に出現するエントリだけを対象としてその全ての組み合わせに対応するエントリを生成する。それに対して、ＸＭＬデータに対する構造ジョインでは、親子関係または先祖子孫関係の問合せを行うときに、個別に２つのタグ名を検索しておいて、これらの検索結果を親子関係または先祖子孫関係に基づいて組み合わせて、所望の検索結果を得るというものである。たとえば、「//Book//person」という問合せを行ったときに、「Book」を検索する演算と「person」を検索する演算を個別に行い、これらの検索結果を組み合わせて、「//Book//person」に対する結果を生成する。この際、親子関係または先祖子孫関係のリンクを辿る通常の方法を用いると全ての対象ノードに対する検索を必要とするが、後記する範囲ラベルを用いて対象ノードを限定して構造ジョインを行えば、効率的に所望のデータが得られる。 Note that the structure join is an operation similar to the join operation of the relational database. In join operations in relational databases, entries corresponding to all combinations of only entries that appear in common in both table entries (tuples) to which the value of the item specified during table join processing is joined Generate. In contrast, in a structure join for XML data, when querying a parent-child relationship or an ancestor-descendant relationship, two tag names are searched individually, and these search results are based on the parent-child relationship or ancestor-descendant relationship. Are combined to obtain a desired search result. For example, when a query "// Book // person" is made, an operation that searches for "Book" and an operation that searches for "person" are performed separately, and these search results are combined to create "// Book // Create a result for "person". At this time, if the normal method of tracing the link of the parent-child relationship or the ancestor-descendant relationship is used, a search for all target nodes is required, but if the target node is limited using a range label described later, Desired data can be obtained efficiently.

ディスクマネージャ部１９０は、二次記憶装置１３０に格納されたＸＭＬデータへのアクセス手段を提供する。 The disk manager unit 190 provides a means for accessing the XML data stored in the secondary storage device 130.

結果生成部２００は、既存技術を用いて問合せの処理結果を生成する。ＸＰａｔｈの問い合わせ処理結果は、結果ノードの指定する部分木であるため、結果生成部２００は、このような結果を生成する。 The result generation unit 200 generates a query processing result using an existing technology. Since the XPath query processing result is a subtree specified by the result node, the result generation unit 200 generates such a result.

イベントシーケンス生成部２１０は、データガイド木（ストロングデータガイド木）２４０からイベントシーケンス（図４参照、説明は後記する）を生成する。なお、データガイド木（ストロングデータガイド木）２４０については、図８および図９を用いた説明において、後記する。 The event sequence generation unit 210 generates an event sequence (see FIG. 4, description will be described later) from the data guide tree (strong data guide tree) 240. The data guide tree (strong data guide tree) 240 will be described later in the description using FIGS.

データガイド木ローダ部２２０は、二次記憶装置１３０に記録する前のＸＭＬデータ２０、または、記録後のＸＭＬデータ１４０からデータガイド木（ストロングデータガイド木）２４０をロードする。 The data guide tree loader unit 220 loads a data guide tree (strong data guide tree) 240 from the XML data 20 before recording in the secondary storage device 130 or the XML data 140 after recording.

ＸＭＬデータローダ部２３０は、ＸＭＬデータ２０を二次記憶装置１３０にロードする。ＸＭＬデータローダ部２３０は、データガイド木ローダ部２２０と並行して動くことができる。そして、ＸＭＬデータローダ部２３０は、主記憶装置１１０の容量に応じて、次の３つの方式でデータガイド木２４０に付与する範囲ラベル（詳細は後記）を格納する。
（１）主記憶装置１１０の容量が充分に大きい場合、範囲ラベルを主記憶装置１１０に全て格納する。
（２）主記憶装置１１０の容量が中程度の場合、範囲ラベルの一部を主記憶装置１１０に格納し、残りの一部を二次記憶装置１３０に格納する。
（３）主記憶装置１１０の容量が小さい場合、範囲ラベルは全て二次記憶装置１３０に格納する。 The XML data loader unit 230 loads the XML data 20 into the secondary storage device 130. The XML data loader unit 230 can move in parallel with the data guide tree loader unit 220. Then, the XML data loader unit 230 stores a range label (details will be described later) to be given to the data guide tree 240 by the following three methods according to the capacity of the main storage device 110.
(1) When the capacity of the main storage device 110 is sufficiently large, all range labels are stored in the main storage device 110.
(2) When the capacity of the main storage device 110 is medium, a part of the range label is stored in the main storage device 110 and the remaining part is stored in the secondary storage device 130.
(3) When the capacity of the main storage device 110 is small, all range labels are stored in the secondary storage device 130.

この格納方式は、利用者が選択することができる。（１）の場合は、主記憶装置１１０の中に範囲ラベルを全て保持するため、必要な容量は大きくなるが、構造ジョインが全て主記憶装置１１０内で完結するため、非常に高速に処理を行うことができる。 This storage method can be selected by the user. In the case of (1), since all the range labels are held in the main storage device 110, the required capacity becomes large, but since all the structure joins are completed in the main storage device 110, the processing is very fast. It can be carried out.

（２）の場合は、利用者の問合せに頻出するノードの範囲ラベルを主記憶装置１１０の中に保持することにより、（１）に比べて必要な一次記憶装置の容量を削減しつつも、頻出する問合せに関しては、効率よく処理できる。 In the case of (2), while holding the range labels of the nodes that frequently appear in the user's inquiry in the main storage device 110, while reducing the capacity of the primary storage device required compared to (1), Queries that appear frequently can be processed efficiently.

（３）の場合は構造ジョインを実行する際に範囲ラベルを主記憶装置１１０に取り出す必要があるが、構造ジョイン回数は削減されるため、主記憶装置１１０の容量が少ない場合にも、効率のよい処理が可能になる。 In the case of (3), it is necessary to take out the range label to the main storage device 110 when executing the structure join. Good processing is possible.

本実施形態において、利用者１０は、ＸＰａｔｈの検索式を用いて、プログラミングＡＰＩや対話型インタフェースプログラムなどからデータベースに問合せを行う。利用者１０が入力したＸＰａｔｈの検索式は、問合せ解析部１５０で構文解析されて、問合せ木になり、問合せ木は、図１０に示すような構造部３００と値処理部３１０に分解されて、さらに、ストロングデータガイド木と組み合わせて処理されて、問合せ中間要約木に変換される。問合せ最適化部１６０は、中間木実体化処理および値処理部のフィルタリング処理の実行順序（プラン）を決定する。このとき、二次記憶装置１３０に格納されたＸＭＬデータの構成に関する情報やＸＭＬデータの統計情報等を用いる。そして、問合せ実行部１７０が、決定された実行順序（プラン）にそって検索処理を実行する。 In the present embodiment, the user 10 makes an inquiry to the database from a programming API, an interactive interface program, or the like using an XPath search expression. The XPath search expression input by the user 10 is parsed by the query analysis unit 150 into a query tree. The query tree is decomposed into a structure unit 300 and a value processing unit 310 as shown in FIG. Further, it is processed in combination with a strong data guide tree and converted into a query intermediate summary tree. The query optimization unit 160 determines the execution order (plan) of the intermediate tree materialization processing and the filtering processing of the value processing unit. At this time, information regarding the configuration of the XML data stored in the secondary storage device 130, statistical information of the XML data, and the like are used. Then, the query execution unit 170 executes search processing according to the determined execution order (plan).

［問合せ処理］
図２の（ａ）は、ＸＰａｔｈ処理の前に行うＸＰａｔｈ事前処理の概要を説明する図である。そして、図２の（ｂ）は、ＸＰａｔｈ処理の概要を説明する図である。 [Query processing]
(A) of FIG. 2 is a figure explaining the outline | summary of the XPath pre-process performed before an XPath process. FIG. 2B is a diagram for explaining the outline of the XPath process.

図２の（ａ）に示されているＸＰａｔｈ処理に先立って行なわれるＸＰａｔｈ事前処理では、予め与えられたＸＭＬデータに対応するストロングデータガイド木を生成する（Ｓ１００）。そして、このストロングデータガイド木を利用して、後記するＸＰａｔｈ処理を行う。 In the XPath pre-process that is performed prior to the XPath process shown in FIG. 2A, a strong data guide tree corresponding to XML data given in advance is generated (S100). Then, using this strong data guide tree, an XPath process described later is performed.

図２の（ｂ）に示されているＸＰａｔｈ処理では、まず、ストロングデータガイド木を元にイベントシーケンスを生成する（Ｓ１１０）。なお、このステップＳ１１０をＸＰａｔｈ事前処理において行ってもよい。 In the XPath process shown in FIG. 2B, first, an event sequence is generated based on the strong data guide tree (S110). Note that step S110 may be performed in the XPath pre-processing.

次に、ステップＳ１１０において作成したストロングデータガイド木に対応するイベントシーケンスを用いてＸＰａｔｈ処理を行う（Ｓ１２０）。ステップＳ１２０は、後記する中間要約木を得るまでの処理に該当する。なお、本実施形態では、ステップＳ１１０においてストロングデータガイド木からイベントシーケンスに変換し、ステップ１２０において生成したイベントシーケンスをＸＰａｔｈ処理して中間要約木を得る方法を利用しているが、ＸＰａｔｈ処理方法は、これに限定されない。代わりに既存の各種ＸＰａｔｈ処理方法を用いても構わない。例えば、問合せをボトムアップに処理することにより、多項式時間で処理可能なＸＰａｔｈ処理アルゴリズムなどの公知技術（Georg Gottlob, Christof Koch and Reinhard Pichler著、"Efficient Algorithms for Processing XPath Queries"(米国) In Proc. VLDB(2002)参照）を用いて、ストロングデータガイド木をＸＭＬ木とみなして、ＸＰａｔｈ処理を行ってもよい。また、この処理の際には、イベントシーケンスを作成しなくてもよい。 Next, XPath processing is performed using the event sequence corresponding to the strong data guide tree created in step S110 (S120). Step S120 corresponds to processing until an intermediate summary tree to be described later is obtained. In the present embodiment, a method is used in which a strong data guide tree is converted into an event sequence in step S110, and the event sequence generated in step 120 is processed by XPath to obtain an intermediate summary tree. However, the present invention is not limited to this. Instead, various existing XPath processing methods may be used. For example, known techniques such as an XPath processing algorithm that can be processed in polynomial time by processing a query bottom-up (Georg Gottlob, Christof Koch and Reinhard Pichler, "Efficient Algorithms for Processing XPath Queries" (USA) In Proc. XPath processing may be performed by regarding the strong data guide tree as an XML tree using VLDB (2002). In this process, an event sequence need not be created.

そして、ここまでのＸＰａｔｈ処理では行っていない残りのＸＰａｔｈ処理を最適化するか否かを判断する（Ｓ１３０）。ＸＰａｔｈ処理を最適化しない場合には（Ｓ１３０のＮｏ）、二次記憶装置１３０のＸＭＬデータ１４０（図１参照）を用いて残りのＸＰａｔｈ処理を所定の順序で行う。すなわち、予め固定された処理手順でＸＭＬデータに対する検索を行い、処理を終了する。 Then, it is determined whether to optimize the remaining XPath processing that has not been performed in the XPath processing so far (S130). When the XPath processing is not optimized (No in S130), the remaining XPath processing is performed in a predetermined order using the XML data 140 (see FIG. 1) of the secondary storage device 130. That is, the XML data is searched according to a processing procedure fixed in advance, and the process ends.

残りのＸＰａｔｈ処理を最適化する場合には（Ｓ１３０のＹｅｓ）、残りのＸＰａｔｈ処理のコストを見積もり、処理順序を決定する（Ｓ１５０）。すなわち、最適な処理順序を決定する。そして、決定された順序で残りのＸＰａｔｈ処理を行い（Ｓ１６０）、処理を終了する。 When optimizing the remaining XPath processing (Yes in S130), the cost of the remaining XPath processing is estimated and the processing order is determined (S150). That is, the optimum processing order is determined. Then, the remaining XPath processes are performed in the determined order (S160), and the process ends.

図３は、ＸＭＬデータの例を説明する図である。図３に示すように、ＸＭＬデータはタグを用いて構造化されている。例えば、図３に含まれている以下の行は、１つのタグのまとまりに該当する。
<title>XML 1</title>
そして、このｔｉｔｌｅのタグも、Ｂｏｏｋｌｉｓｔのタグに含まれる形で構造化されている。ＸＭＬデータは、このようなタグによる構造化によってデータを記述している。 FIG. 3 is a diagram illustrating an example of XML data. As shown in FIG. 3, the XML data is structured using tags. For example, the following lines included in FIG. 3 correspond to one tag group.
<title> XML 1 </ title>
The title tag is also structured in a form included in the Booklist tag. The XML data describes the data by structuring with such tags.

図４は、ＳＡＸ（ＳｉｍｐｌｅＡＰＩｆｏｒＸＭＬ）のイベントシーケンスの例を示す図である。なお、ＳＡＸは、リアルタイムに少ないメモリで、木を作らずに、ＸＭＬ文書を上から下へと走査して構文解析処理を行う技術である。図４のイベントシーケンスは、図３のＸＭＬデータの例に対してＳＡＸの技術を用いて、イベントを抽出することによって生成したものである。 FIG. 4 is a diagram illustrating an example of an event sequence of SAX (Simple API for XML). Note that SAX is a technology that performs parsing processing by scanning an XML document from top to bottom without creating a tree with a small amount of memory in real time. The event sequence in FIG. 4 is generated by extracting an event using the SAX technique with respect to the example of the XML data in FIG.

図５は、図３のＸＭＬデータに対して構築したＸＭＬ木の例を示す図である。図５に示すようにＸＭＬ木は１つのタグに対応して１つのノードを定義している。ＸＭＬ木は、枠で囲まれて示しているタグや属性に対応する一番下のノードの下にテキストの値や属性値を示す葉ノードがつながった構造をしている。 FIG. 5 is a diagram showing an example of an XML tree constructed for the XML data of FIG. As shown in FIG. 5, the XML tree defines one node corresponding to one tag. The XML tree has a structure in which a leaf node indicating a text value or an attribute value is connected under a bottom node corresponding to a tag or attribute surrounded by a frame.

図５のＸＭＬ木においては、Ｂｏｏｋｌｉｓｔをルートノードとして、その下にＢｏｏｋのノードが３つ出現している。そして、そのＢｏｏｋのノードの下には、それぞれｔｉｔｌｅのノードが出現している。このように、もとのＸＭＬデータを単純に変換したＸＭＬ木では、同様のノードのパターンが複数出現することがある。ストロングデータガイド木は、この繰り返しパターンを集約しているため、無駄なく検索を行うために用いることができる。 In the XML tree of FIG. 5, the Booklist is the root node, and three Book nodes appear below it. A title node appears below the Book node. As described above, in the XML tree obtained by simply converting the original XML data, a plurality of similar node patterns may appear. Since the strong data guide tree aggregates the repeated patterns, it can be used to perform a search without waste.

なお、ＸＭＬ木の中でノード名の記述が「＠」からはじまるものは、属性を表すノードであり、その下につながって記載されているのが属性値である。この例では、「＠ｉｓｂｎ」が属性のノードである。 In the XML tree, node names whose description starts with “@” are nodes representing attributes, and the attribute values are described below. In this example, “@isbn” is an attribute node.

図６は、ＸＰａｔｈによる問合せの例を示す図である。図６には、（１）から（４）まで４つの問い合わせの例を示しているが、問合せにおいては、親子関係だけでなく、先祖子孫関係も用いて検索対象を指定できる。「／」はノード間の親子関係を表し、「／／」はノード間の先祖子孫関係を表す。 FIG. 6 is a diagram illustrating an example of an inquiry by XPath. FIG. 6 shows an example of four queries from (1) to (4). In the query, the search target can be specified using not only the parent-child relationship but also the ancestor-descendant relationship. “/” Represents a parent-child relationship between nodes, and “//” represents an ancestor-descendant relationship between nodes.

図６の例のうち、（１）の例は、親子関係を用いた例であり、ルートノードの下にＢｏｏｋｌｉｓｔというノードが親子関係でつながっており、同様にそのＢｏｏｋｌｉｓｔのノードにＢｏｏｋのノードが連なり、さらにＢｏｏｋのノードからａｕｔｈｏｒｓのノード、さらにｐｅｒｓｏｎのノードがそれぞれ親子関係で連なっている構造のデータを検索する問い合わせである。 In the example of FIG. 6, the example of (1) is an example using a parent-child relationship, and a node called “Booklist” is connected under a root node in a parent-child relationship, and similarly, a “Book” node is connected to that Booklist node. This is a query for retrieving data having a structure in which a book node, an authors node, and a person node are connected in a parent-child relationship.

（２）から（４）の問合せでは、親子関係だけでなく、先祖子孫関係も用いている例である。例えば、（４）の例では、ルートノードとＢｏｏｋの間に先祖子孫関係があるという記述になっている。そして、（２）から（４）の例では、値や属性値を指定する問合せの例になっている。 In the queries (2) to (4), not only the parent-child relationship but also the ancestor-descendant relationship is used. For example, in the example of (4), there is a description that there is an ancestor-descendant relationship between the root node and the book. The examples (2) to (4) are examples of queries that specify values and attribute values.

図７は、図６の問合せ（１）から（４）に対応する問合せ木の例を示す図である。この例のようにＸＰａｔｈの問合せは、木で表現できる。 FIG. 7 is a diagram illustrating an example of a query tree corresponding to queries (1) to (4) in FIG. As in this example, the XPath query can be expressed by a tree.

図８は、図５に示したＸＭＬ木に対応するストロングデータガイド木の例を示す図である。ストロングデータガイド木は、図５のＸＭＬ木に示された木構造の情報について要約した情報を表している。すなわち、ノード間の関係を抽出してまとめた情報になっている。ストロングデータガイド木を生成する方法は後記する。 FIG. 8 is a diagram illustrating an example of a strong data guide tree corresponding to the XML tree illustrated in FIG. 5. The strong data guide tree represents information that summarizes the information of the tree structure shown in the XML tree of FIG. That is, the information is obtained by extracting the relationships between the nodes. A method for generating a strong data guide tree will be described later.

図９は、範囲ラベル付ストロングデータガイド木の例を示す図である。図９に示すように、ストロングデータガイド木の各ノードは１つずつキューを持っていて、左右の２つの数字が並んでいるラベルがそのキューの中に収納される構成になっている。範囲ラベル付ストロングデータガイド木を構成する方法は後記する。 FIG. 9 is a diagram illustrating an example of a strong data guide tree with range labels. As shown in FIG. 9, each node of the strong data guide tree has one queue, and a label in which two numbers on the left and right are arranged is stored in the queue. A method of constructing a strong data guide tree with range labels will be described later.

図１０は、問合せ木の構造部と値処理部を説明する図である。図１０に示す問合せ木は図７の問合せ木と同じものであるが、図１０に示すように、構造部３００と値処理部３１０に分けることができる。このとき、値および属性値を持たない問合せ木では、構造部３００のみで、値処理部３１０は存在しないことになる。この問合せ部３００と値処理部３１０を分ける処理は、後記する中間要約木を得る処理において用いられる。 FIG. 10 is a diagram for explaining the structure part and the value processing part of the query tree. The query tree shown in FIG. 10 is the same as the query tree shown in FIG. 7, but can be divided into a structure unit 300 and a value processing unit 310 as shown in FIG. At this time, in the query tree having no value and attribute value, only the structure unit 300 and the value processing unit 310 do not exist. The process of dividing the inquiry unit 300 and the value processing unit 310 is used in a process for obtaining an intermediate summary tree described later.

図１１は、範囲ラベル無しのストロングデータガイド木を生成する処理を説明する図である。この処理では、ＳＡＸイベントを入力して、ラベル無しのストロングデータガイド木を出力する処理である。 FIG. 11 is a diagram for explaining processing for generating a strong data guide tree without a range label. In this process, a SAX event is input and a strong data guide tree without a label is output.

まず、最初のあるいは次のＳＡＸイベントを入力して（Ｓ２００）、処理を開始する。そして、そのＳＡＸイベントが開始イベントか否かを調べる（Ｓ２１０）。開始イベントとは、図４に示した例における「ｓｔａｒｔ（Ｂｏｏｋｌｉｓｔ）」のような「ｓｔａｒｔ（）」という形をしたイベントのことである。 First, the first or next SAX event is input (S200), and the process is started. Then, it is checked whether the SAX event is a start event (S210). The start event is an event in the form of “start ()” such as “start (Booklist)” in the example shown in FIG.

開始イベントでなかった場合は（Ｓ２１０のＮｏ）、入力されたＳＡＸイベントが終了イベントか否かを調べる（Ｓ２２０）。ここで、終了イベントとは、図４に示した例における「ｅｎｄ（Ｂｏｏｋｌｉｓｔ）」のような「ｅｎｄ（）」という形をしたイベントのことである。終了イベントであった場合は（Ｓ２２０のＹｅｓ）、ストロングデータガイド木における各時点において対象となっているノードである現在ノードをその現在ノードの親ノード（ルートノード側に１つ上がったノード）に移動し（Ｓ２３０）、後記するステップＳ２９０に進む。終了イベントでなかった場合は（Ｓ２２０のＮｏ）、そのままステップＳ２９０に進む。 If it is not a start event (No in S210), it is checked whether the input SAX event is an end event (S220). Here, the end event is an event in the form of “end ()” such as “end (Booklist)” in the example shown in FIG. If the event is an end event (Yes in S220), the current node, which is the target node at each point in the strong data guide tree, is changed to the parent node of the current node (the node that is one higher on the root node side). It moves (S230) and progresses to step S290 mentioned later. If it is not an end event (No in S220), the process proceeds to step S290 as it is.

ステップＳ２１０において、ＳＡＸイベントが開始イベントであった場合は（Ｓ２１０のＹｅｓ）、入力されたＳＡＸイベントのタグ名がストロングデータガイド木に存在するか否かを調べる（Ｓ２４０）。そのタグ名が存在しなかった場合（Ｓ２４０のＮｏ）、そのタグ名で子ノードをストロングデータガイド木に追加し（Ｓ２５０）、ステップＳ２６０に進む。そのタグ名が存在した場合（Ｓ２４０のＹｅｓ）、そのままステップＳ２５０を実行せずにステップＳ２６０に進む。 In step S210, if the SAX event is a start event (Yes in S210), it is checked whether the tag name of the input SAX event exists in the strong data guide tree (S240). If the tag name does not exist (No in S240), a child node is added to the strong data guide tree with the tag name (S250), and the process proceeds to step S260. If the tag name exists (Yes in S240), the process proceeds to step S260 without executing step S250 as it is.

ステップＳ２６０では、そのノード、すなわち、前記の同じタグ名のノードまたは前記の新たに追加した子ノードに現在ノードを移動する。そして、その入力されたＳＡＸイベントが属性を持つか否かを調べる（Ｓ２７０）。属性を持たない場合は（Ｓ２７０のＮｏ）、そのままステップＳ２９０に進み、属性を持つ場合は（Ｓ２７０のＹｅｓ）、その属性に対応する子ノードの追加処理を行い（Ｓ２８０）、ステップＳ２９０へ進む。なお、その属性に対応する子ノードの追加処理については、詳細を後記する。 In step S260, the current node is moved to that node, that is, the node with the same tag name or the newly added child node. Then, it is checked whether or not the input SAX event has an attribute (S270). If it does not have an attribute (No in S270), the process proceeds directly to step S290, and if it has an attribute (Yes in S270), a child node corresponding to the attribute is added (S280), and the process proceeds to step S290. Details of the child node addition process corresponding to the attribute will be described later.

ステップＳ２９０では、その時点で全てのＳＡＸイベントを入力済みか否かについて調べる。その結果、未入力のＳＡＸイベントがある場合は（Ｓ２９０のＮｏ）、ステップＳ２００から処理を繰り返す。全てのＳＡＸイベントが入力済みである場合には（Ｓ２９０のＹｅｓ）、その時点までに作成したストロングデータガイド木を出力して（Ｓ３００）、処理を終了する。 In step S290, it is checked whether all SAX events have been input at that time. As a result, when there is a non-input SAX event (No in S290), the process is repeated from step S200. If all SAX events have been input (Yes in S290), the strong data guide tree created up to that point is output (S300), and the process is terminated.

図１２は、範囲ラベル無しのストロングデータガイド木に属性に対応する子ノードを追加する処理を説明する図である。なお、図１２は、図１１におけるステップＳ２８０の処理を詳細に示したものである。 FIG. 12 is a diagram illustrating processing for adding a child node corresponding to an attribute to a strong data guide tree without a range label. FIG. 12 shows details of the process in step S280 in FIG.

まず、処理の対象となる属性名Ａを入力する（Ｓ４００）。そして、その属性名Ａに対応する「＠Ａ」を名前とする子ノードが現在ノードに存在するか否かを調べる（Ｓ４１０）。例えば、属性名Ａが「ｉｓｂｎ」であった場合は、「＠ｉｓｂｎ」という名前の子ノードが存在するか否かを調べる。存在しない場合は（Ｓ４１０のＮｏ）、属性名Ａに対応する「＠Ａ」を名前とする子ノードを現在ノードに追加して（Ｓ４２０）、ステップＳ４３０に進む。存在する場合は（Ｓ４１０のＹｅｓ）、そのままステップＳ４３０に進む。 First, the attribute name A to be processed is input (S400). Then, it is checked whether or not a child node named “@A” corresponding to the attribute name A exists in the current node (S410). For example, if the attribute name A is “isbn”, it is checked whether or not there is a child node named “@isbn”. If it does not exist (No in S410), a child node named “@A” corresponding to the attribute name A is added to the current node (S420), and the process proceeds to step S430. When it exists (Yes of S410), it progresses to step S430 as it is.

ステップＳ４３０では、その時点で全ての属性を処理したか否かを調べる（Ｓ４３０）。全ての属性を処理し終わっている場合は（Ｓ４３０のＹｅｓ）、そのまま処理を終了し、未処理の属性が残っている場合は（Ｓ４３０のＮｏ）、ステップＳ４００に戻って処理を繰り返す。 In step S430, it is checked whether all attributes have been processed at that time (S430). If all the attributes have been processed (Yes in S430), the process is terminated as it is, and if any unprocessed attributes remain (No in S430), the process returns to Step S400 to repeat the process.

図１３は、範囲ラベル付きのストロングデータガイド木を生成する処理を説明する図である。
範囲ラベルは、図９に示したように左右２つの値の組をラベルとして用いるが、まず、このラベルを管理するためのラベルの左の値ｌｅｆｔと右の値ｒｉｇｈｔを初期化する（Ｓ５００）。例えば、それぞれの値を１にしてもよい。そして、ルートノードを作成し、現在ノードをルートノードに設定する（Ｓ５１０）。これは、ストロングデータガイド木の初期化に該当するものである。 FIG. 13 is a diagram for explaining processing for generating a strong data guide tree with a range label.
As shown in FIG. 9, the range label uses a pair of left and right two values as a label. First, a left value left and a right value right for managing the label are initialized (S500). . For example, each value may be set to 1. Then, a root node is created, and the current node is set as the root node (S510). This corresponds to initialization of the strong data guide tree.

そして、最初の、または、次のＳＡＸイベントを入力する（Ｓ５２０）。そして、そのＳＡＸイベントに応じて範囲ラベルつきストロングデータガイド木を構築する処理を行う（Ｓ５３０）。このステップＳ５３０の処理についての詳細は後記する。 Then, the first or next SAX event is input (S520). Then, a process for constructing a strong data guide tree with a range label according to the SAX event is performed (S530). Details of the processing in step S530 will be described later.

そして、その時点までに全てのＳＡＸイベントを入力済みか否かについて調べる（Ｓ５４０）。その結果、未入力のＳＡＸイベントがある場合には（Ｓ５４０のＮｏ）、ステップＳ５２０から処理を繰り返す。全てのＳＡＸイベントが入力済みの場合は（Ｓ５４０のＹｅｓ）、その時点までに作成した範囲ラベル付きストロングデータガイド木を出力して（Ｓ５５０）、処理を終了する。 Then, it is checked whether or not all SAX events have been input by that time (S540). As a result, when there is an uninput SAX event (No in S540), the processing is repeated from Step S520. If all SAX events have been input (Yes in S540), the strong data guide tree with the range label created up to that point is output (S550), and the process is terminated.

図１４は、１つのＳＡＸイベントに応じて範囲ラベル付きストロングデータガイド木を構築する処理を説明する図である。図１４で説明する処理は、図１３のステップＳ５３０の処理に該当し、図１４における入力や変数を引き継いで処理するサブルーチンの役割を果たす。 FIG. 14 is a diagram for explaining a process of constructing a strong data guide tree with a range label according to one SAX event. The process described with reference to FIG. 14 corresponds to the process of step S530 in FIG. 13 and plays the role of a subroutine that takes over the input and variables in FIG.

まず、入力されたＳＡＸイベントが開始イベントか否かを調べる（Ｓ６００）。開始イベントだった場合（Ｓ６００のＹｅｓ）、図１３の処理で説明したｌｅｆｔの値を変数ｌに保存し、ｌｅｆｔの値を増やす（Ｓ６１０）。このとき、ｌｅｆｔの値を増やすときの増分は所定の値であればよい。例えば、その所定の増分を１や１０にしてもよい。 First, it is checked whether or not the input SAX event is a start event (S600). If it is a start event (Yes in S600), the value of left described in the process of FIG. 13 is stored in the variable l, and the value of left is increased (S610). At this time, the increment for increasing the value of left may be a predetermined value. For example, the predetermined increment may be 1 or 10.

そして、そのＳＡＸイベントの開始タグ名ＴＡＧを名前とする子ノードが存在するか否かを調べる（Ｓ６２０）。例えば、ＳＡＸイベントの開始タグ名が「ｔｉｔｌｅ」だったとすると、それと同じ「ｔｉｔｌｅ」という名前の子ノードがあるか否かを調べる。ＳＡＸイベントの開始タグ名ＴＡＧと同じ名前の子ノードが存在しない場合（Ｓ６２０のＮｏ）、現在ノードにそのＴＡＧを名前とする子ノードを追加する（Ｓ６３０）。例えば、「ｔｉｔｌｅ」という開始タグ名の場合は、「ｔｉｔｌｅ」という名前の子ノードを追加する。そして、次のステップＳ６４０に進む。既に同じ名前の子ノードが存在する場合は（Ｓ６２０のＹｅｓ）、そのままステップＳ６４０に進む。 Then, it is checked whether or not there is a child node whose name is the start tag name TAG of the SAX event (S620). For example, if the start tag name of the SAX event is “title”, it is checked whether or not there is a child node named “title”. If there is no child node having the same name as the start tag name TAG of the SAX event (No in S620), a child node whose name is the TAG is added to the current node (S630). For example, in the case of the start tag name “title”, a child node named “title” is added. Then, the process proceeds to next Step S640. If a child node with the same name already exists (Yes in S620), the process proceeds to step S640 as it is.

ステップＳ６４０では、現在ノードをその子ノードに移動する（Ｓ６４０）。すなわち、子ノードを追加した場合はその子ノードに、同じ名前の子ノードが存在した場合はその子ノードに、現在ノードを移動する。そして、ＳＡＸイベントが属性を持つか否かを調べる（Ｓ６５０）。属性を持つ場合は（Ｓ６５０のＹｅｓ）、属性子ノード追加および範囲ラベル付与処理を行い（Ｓ６６０）、ステップＳ６７０の処理に進む。なお、このステップＳ６６０の処理の詳細については、後記する。属性を持たない場合は（Ｓ６５０のＮｏ）、そのままステップＳ６７０に進む。 In step S640, the current node is moved to its child node (S640). That is, when a child node is added, the current node is moved to the child node. When a child node with the same name exists, the current node is moved to the child node. Then, it is checked whether or not the SAX event has an attribute (S650). If it has an attribute (Yes in S650), an attribute child node addition and range label assignment process is performed (S660), and the process proceeds to step S670. Details of the process in step S660 will be described later. If it has no attribute (No in S650), the process proceeds to step S670 as it is.

ステップＳ６７０では、現在ノードのキューの最後にステップＳ６１０で保存した変数ｌの値を用いて（ｌ，ＮＵＬＬ）のラベルを追加し、処理を終了する。なお、ＮＵＬＬは、有効なラベルの値が入っていないことを表す特別な値である。 In step S670, the label of (l, NULL) is added to the end of the queue of the current node using the value of the variable l saved in step S610, and the process ends. Note that NULL is a special value indicating that no valid label value is entered.

ステップＳ６００においてＳＡＸイベントが開始イベントではなかった場合（Ｓ６００のＮｏ）、次に、そのＳＡＸイベントが終了イベントか否かを調べる（Ｓ６８０）。終了イベントではない場合は（Ｓ６８０のＮｏ）、ＳＡＸイベントがテキストノードイベントか否かを調べる（Ｓ６９０）。なお、テキストノードイベントとは、図４に示したイベントシーケンスに出現している「ｔｅｘｔ（“ＸＭＬ１”）」と同じ「ｔｅｘｔ（）」という形をしたイベントを指す。調べた結果、テキストノードイベントではない場合は（Ｓ６９０のＮｏ）、そのまま処理を終了し、テキストノードイベントであった場合は（Ｓ６９０のＹｅｓ）、テキストノードのラベルの左側にｌｅｆｔ、右側にｒｉｇｈｔの値を入れて、そのラベルを現在ノードのキューに入れる（Ｓ７００）。そして、ｌｅｆｔとｒｉｇｈｔの値を増やして（Ｓ７１０）、処理を終了する。 If the SAX event is not a start event in step S600 (No in S600), it is then checked whether the SAX event is an end event (S680). If it is not an end event (No in S680), it is checked whether the SAX event is a text node event (S690). Note that the text node event refers to an event having the form “text ()” that is the same as “text (“ XML 1 ”)” appearing in the event sequence shown in FIG. As a result of the examination, if it is not a text node event (No in S690), the process is terminated as it is. If it is a text node event (Yes in S690), the left of the text node label is left and the right is right. A value is entered and the label is placed in the queue of the current node (S700). Then, the values of left and right are increased (S710), and the process ends.

ステップＳ６８０においてＳＡＸイベントが終了イベントだった場合（Ｓ６８０のＹｅｓ）、現在ノードのキューの中で、ラベルの右側の値がＮＵＬＬになっている最後のラベルの右にｒｉｇｈｔの値を入れて、このラベルをキューに入れる（Ｓ７２０）。そして、ｒｉｇｈｔの値を増やす（Ｓ７３０）。そして、現在ノードを現在ノードの親ノードに移動して（Ｓ７４０）、処理を終了する。 If the SAX event is an end event in step S680 (Yes in S680), the right value of the label in the current node's queue is set to the right of the last label whose value is NULL. The label is put in the queue (S720). Then, the value of right is increased (S730). Then, the current node is moved to the parent node of the current node (S740), and the process ends.

図１５は、属性子ノード追加および範囲ラベル付与処理を説明する図である。なお、この処理は、図１４のステップＳ６６０の処理に該当し、入力や変数を引き継いで処理するサブルーチンの役割を果たす。 FIG. 15 is a diagram for explaining attribute child node addition and range label assignment processing. This process corresponds to the process of step S660 in FIG. 14, and plays the role of a subroutine that takes over input and variables and performs processing.

まず、ｌｅｆｔとｒｉｇｈｔの値を引き継いで（Ｓ８００）、処理を開始する。そして、まず、属性名Ａを入力する（Ｓ８１０）。そして、現在ノードに属性名Ａに対応する「＠Ａ」を名前とする子ノードが存在するか否かを調べる（Ｓ８２０）。その結果、そのような子ノードが存在しない場合は（Ｓ８２０のＮｏ）、「＠Ａ」を名前とする子ノードを現在ノードに追加し（Ｓ８３０）、ステップＳ８４０に進む。同じ名前の子ノードが存在する場合は（Ｓ８２０のＮｏ）、そのままステップＳ８４０に進む。 First, the values of left and right are taken over (S800), and the process is started. First, the attribute name A is input (S810). Then, it is checked whether or not there is a child node named “@A” corresponding to the attribute name A in the current node (S820). As a result, if there is no such child node (No in S820), a child node whose name is “@A” is added to the current node (S830), and the process proceeds to step S840. If there is a child node with the same name (No in S820), the process proceeds directly to step S840.

ステップＳ８４０では、属性名Ａに対応する「＠Ａ」を名前とする子ノードに対して、（ｌｅｆｔ，ｒｉｇｈｔ）のラベルをキューに入れる（Ｓ８４０）。そして、ｌｅｆｔとｒｉｇｈｔの値をそれぞれ増加させる（Ｓ８５０）。そして、この時点で全ての属性を処理したか否かを調べる（Ｓ８６０）。既に全ての属性を処理している場合には（Ｓ８６０のＹｅｓ）、処理を終了する。まだ処理されていない属性がある場合には（Ｓ８６０のＮｏ）、ステップＳ８１０に戻って処理を繰り返す。 In step S840, a label of (left, right) is queued for the child node whose name is “@A” corresponding to the attribute name A (S840). Then, the values of left and right are increased (S850). Then, it is checked whether or not all attributes have been processed at this time (S860). If all attributes have already been processed (Yes in S860), the process ends. If there is an attribute that has not yet been processed (No in S860), the process returns to step S810 and the process is repeated.

図１６は、ストロングデータガイド木からイベントシーケンスを構築する処理を説明する図である。図１７は、ストロングデータガイド木に対するイテレータの探索例を説明する図である。図１６に加えて、適宜、図１７も参照しつつ、イベントシーケンスを構築する処理について説明する。 FIG. 16 is a diagram for explaining processing for constructing an event sequence from a strong data guide tree. FIG. 17 is a diagram for explaining an iterator search example for the strong data guide tree. A process for constructing an event sequence will be described with reference to FIG. 17 as appropriate in addition to FIG.

この処理では、まず、ストロングデータガイド木を入力し、そのルートノードを現在ノードとする（Ｓ９００）。そして、現在ノードのイテレータが下降する方向に動くか否かを調べる（Ｓ９１０）。なお、下降するとは、ルートノードから遠ざかる動きをすることを指す。また、ここでのイテレータとは、ストロングデータガイド木のような木構造データなどを扱うための処理モジュールを指し、イテレータは、最初はルートノードに位置していて、図１７の例に示すように、木構造を下降したり、上昇したりしながら、木全体を探索する動きをするものである。 In this process, first, a strong data guide tree is input, and the root node is set as the current node (S900). Then, it is checked whether or not the iterator of the current node moves in the descending direction (S910). Note that descending means moving away from the root node. The iterator here refers to a processing module for handling tree structure data such as a strong data guide tree, and the iterator is initially located at the root node, as shown in the example of FIG. It moves to search the whole tree while descending or ascending the tree structure.

ステップＳ９１０で調べた結果、イテレータが下降する方向に動く場合は（Ｓ９１０のＹｅｓ）、現在ノードのノード名Ｎを用いて、「ｓｔａｒｔ（Ｎ）」を出力し（Ｓ９４０）、ステップＳ９７０に進む。このとき、例えば、ノード名Ｎが「ａｕｔｈｏｒｓ」だった場合には、「ｓｔａｒｔ（ａｕｔｈｏｒｓ）」が出力される。 As a result of the investigation in step S910, when the iterator moves in the descending direction (Yes in S910), “start (N)” is output using the node name N of the current node (S940), and the process proceeds to step S970. At this time, for example, when the node name N is “authors”, “start (authors)” is output.

ステップＳ９１０で調べた結果、イテレータが下降する方向に動かない場合は（Ｓ９１０のＮｏ）、次に、現在ノードのイテレータが上昇する方向に動くか否かを調べる（Ｓ９２０）。上昇する方向に動く場合は（Ｓ９２０のＹｅｓ）、現在ノードのノード名Ｎを用いて、「ｅｎｄ（Ｎ）」を出力し（Ｓ９５０）、ステップＳ９７０に進む。 If the iterator does not move in the descending direction as a result of the investigation in step S910 (No in S910), it is next examined whether the iterator of the current node moves in the ascending direction (S920). When moving in the upward direction (Yes in S920), “end (N)” is output using the node name N of the current node (S950), and the process proceeds to step S970.

ステップＳ９２０で調べた結果、イテレータが上昇する方向に動かない場合は（Ｓ９２０のＮｏ）、現在ノードのイテレータが属性を認識するか否かを調べる（Ｓ９３０）。属性を認識した場合は（Ｓ９３０のＹｅｓ）、現在ノードのノード名Ｎを用いて、「ａｔｔｒ＠（Ｎ）」を出力し（Ｓ９６０）、ステップＳ９７０へ進む。属性を認識しない場合は（Ｓ９３０のＮｏ）、そのままステップＳ９７０に進む。 As a result of checking in step S920, if the iterator does not move in the upward direction (No in S920), it is checked whether or not the iterator of the current node recognizes the attribute (S930). When the attribute is recognized (Yes in S930), “attr @ (N)” is output using the node name N of the current node (S960), and the process proceeds to step S970. If the attribute is not recognized (No in S930), the process proceeds directly to Step S970.

ステップＳ９７０では、現在ノードのイテレータに次のイベントを設定し（Ｓ９７０）、この時点で全てのノードを処理してルートノードに戻ったか否かを調べる（Ｓ９８０）。なお、このとき、イテレータからＮＵＬＬの値が返ってくれば全てのノードを処理してルートノードに戻ったということを意味している。全ての処理が終わっていれば（Ｓ９８０のＹｅｓ）、処理を終了し、まだ処理が残っていれば（Ｓ９８０のＮｏ）、ステップＳ９１０に戻って、処理を繰り返す。 In step S970, the next event is set in the iterator of the current node (S970), and it is checked whether all nodes have been processed and returned to the root node at this point (S980). At this time, if a NULL value is returned from the iterator, it means that all nodes have been processed and the process has returned to the root node. If all the processes are completed (Yes in S980), the process is terminated, and if there are still processes (No in S980), the process returns to Step S910 and the process is repeated.

図１８は、ストロングデータガイド木により問合せて中間要約木を得る処理を説明する図である。
まず、ストロングデータガイド木とＸＰａｔｈによる問合せを入力する（Ｓ１０００）。そして、ＸＰａｔｈによる問合せの構造部を取り出す（Ｓ１０１０）。なお、この問合せの構造部とは、図１０において記載されている構造部３００と同様のものを指す。 FIG. 18 is a diagram for explaining processing for obtaining an intermediate summary tree by inquiring with a strong data guide tree.
First, a strong data guide tree and an inquiry by XPath are input (S1000). Then, the structure part of the inquiry by XPath is taken out (S1010). It should be noted that the structure part of this inquiry is the same as the structure part 300 described in FIG.

そして、ストロングデータガイド木をＸＭＬデータとみなし、前記構造部のＸＰａｔｈ処理を実行し、内部マッチ木の列を得る（Ｓ１０２０）。それから、内部マッチ木から分岐ノードと枝ノードの組からなる中間要約木列を取り出して出力し（Ｓ１０３０）、処理を終了する。 Then, the strong data guide tree is regarded as XML data, the XPath processing of the structure part is executed, and an internal match tree sequence is obtained (S1020). Then, an intermediate summary tree sequence composed of a pair of branch nodes and branch nodes is extracted from the internal match tree and output (S1030), and the process is terminated.

図１９は、中間要約木から実態中間木を生成する処理を説明する図である。
まず、中間要約木を入力し（Ｓ１１００）、中間要約木の全てのノード間に対する構造ジョインを行い、実体中間木を生成する（Ｓ１１１０）。 FIG. 19 is a diagram for explaining processing for generating an actual intermediate tree from an intermediate summary tree.
First, an intermediate summary tree is input (S1100), a structure join is performed between all nodes of the intermediate summary tree, and an actual intermediate tree is generated (S1110).

その後、全ての中間要約木を処理したか否かを調べて（Ｓ１１２０）、処理が終わっていない場合は（Ｓ１１２０のＮｏ）、ステップＳ１１００に戻って処理を繰り返す。全ての中間要約木を処理し終わっている場合は（Ｓ１１２０のＹｅｓ）、それまでの処理で得られた実体中間木列を出力し（Ｓ１１３０）、処理を終了する。 Thereafter, it is checked whether or not all intermediate summary trees have been processed (S1120). If the processing has not been completed (No in S1120), the process returns to step S1100 to repeat the process. If all intermediate summary trees have been processed (Yes in S1120), the actual intermediate tree sequence obtained by the processing so far is output (S1130), and the process is terminated.

図２０は実体中間木列にフィルタ条件を適用してＸＰａｔｈ問合せ結果を得る処理を説明する図である。
まず、実体中間木列を入力し、この処理の過程で生じるアウトプットノードを入れるキューＱＯを用意する（Ｓ１２００）。そして、実体中間木列に値処理部があるか否かを調べる（Ｓ１２１０）。このとき、実体中間木列の中の実体中間木の１つ以上に値処理部があれば、実体中間木列に値処理部があるとし、１つもなければ、値処理部がないとする。 FIG. 20 is a diagram for explaining processing for obtaining an XPath query result by applying a filter condition to an entity intermediate tree sequence.
First, a real intermediate tree sequence is input, and a queue QO for preparing an output node generated in the process is prepared (S1200). Then, it is checked whether or not there is a value processing unit in the entity intermediate tree sequence (S1210). At this time, if there is a value processing unit in one or more entity intermediate trees in the entity intermediate tree sequence, there is a value processing unit in the entity intermediate tree sequence, and if there is no value processing unit, there is no value processing unit.

この結果、値処理部がなければ（Ｓ１２１０のＮｏ）、そのままステップＳ１２５０に進む。値処理部があった場合は（Ｓ１２１０のＹｅｓ）、問合せ木の値処理部のフィルタ条件を二次記憶装置１３０から取り出して実体中間木の葉ノード（一番末端のノード）に適用する（Ｓ１２２０）。そして、出力されたノードをキューＱＯに追加する（Ｓ１２３０）。そして、この時点で全ての実体中間木を処理したか否かを調べて（Ｓ１２４０）、処理が終わっていない場合には（Ｓ１２４０のＮｏ）、ステップＳ１２２０から処理を繰り返し、処理が終わっている場合には、ステップＳ１２５０に進む。 As a result, if there is no value processing unit (No in S1210), the process proceeds to step S1250 as it is. If there is a value processing unit (Yes in S1210), the filter condition of the value processing unit of the query tree is extracted from the secondary storage device 130 and applied to the leaf node (the node at the end of the entity intermediate tree) (S1220). Then, the output node is added to the queue QO (S1230). Then, it is checked whether or not all the entity intermediate trees have been processed at this time (S1240). If the processing is not completed (No in S1240), the processing is repeated from step S1220, and the processing is completed. In step S1250, the process proceeds to step S1250.

ステップＳ１２５０では、ここまでの処理でノードを追加したキューＱＯの中で重複するラベルを除去する（Ｓ１２５０）。ここでは、この処理で得られたアウトプットノードの中に同じラベルがある可能性があるので、同一のラベルは１つだけ残すようにして、他の重複したラベルは全て削除する。そして、ＱＯのラベルを用いて二次記憶装置１３０から検索結果を取り出して出力し（Ｓ１２６０）、処理を終了する。 In step S1250, duplicate labels are removed from the queue QO to which nodes have been added in the processing so far (S1250). Here, since there is a possibility that the same label exists in the output node obtained by this process, only one identical label is left and all other duplicate labels are deleted. Then, the retrieval result is extracted from the secondary storage device 130 using the QO label and output (S1260), and the process is terminated.

図２１は、実行プラン群を生成して最小コストを求める処理を説明する図である。この処理は、図１９（実体化プラン）および図２０（フィルタ条件適用）を用いて説明したような所定の実行プランに代替できる実行プランを決定する処理である。このとき、様々な実体化プランとフィルタ条件の組み合わせを調べてその最小コストのものを実行プランと決定する。 FIG. 21 is a diagram illustrating processing for generating an execution plan group and obtaining a minimum cost. This process is a process of determining an execution plan that can be substituted for a predetermined execution plan as described with reference to FIG. 19 (materialization plan) and FIG. 20 (filter condition application). At this time, combinations of various materialization plans and filter conditions are examined, and the one with the minimum cost is determined as the execution plan.

まず、実体化プランとフィルタ条件の組を出力する（Ｓ１３００）。そして、実体化プランとフィルタ条件の組を一つの実行プランとして、そのコストを求める（Ｓ１３１０）。このときのコストは、例えば、二次記憶装置１３０に対するアクセスの回数などをコストとしてもよいが、検索コストを反映する指標であればなんでもよい。 First, a set of materialization plan and filter condition is output (S1300). Then, the combination of the materialization plan and the filter condition is set as one execution plan, and the cost is obtained (S1310). The cost at this time may be, for example, the number of accesses to the secondary storage device 130 or the like, but may be any index that reflects the search cost.

そして、全ての実体化プランとフィルタ条件の組（実行プラン）を入力したか否かを調べる（Ｓ１３２０）。まだ、未入力のものがある場合は（Ｓ１３２０のＮｏ）、ステップＳ１３００に戻って処理を繰り返し、全ての組を入力し終わっている場合には（Ｓ１３２０のＹｅｓ）、求めたコストの中で最小となる実行プランを出力し（Ｓ１３３０）、処理を終了する。 Then, it is checked whether or not all materialization plans and filter condition sets (execution plans) have been input (S1320). If there are any items that have not been input yet (No in S1320), the process returns to step S1300 to repeat the process. Is output (S1330), and the process is terminated.

なお、実際の検索では、この処理の後に、この実行プランに基づいて、図１９および図２０において説明した処理に対応する処理（実際に検索結果を得る処理）を行うことになる。例えば、図１９および図２０で説明した処理も、最小コストであれば、その処理の例となり得る。最小コストの実行プランの具体的な処理については、説明を省略する。 In the actual search, after this process, a process corresponding to the process described in FIGS. 19 and 20 (a process for actually obtaining a search result) is performed based on this execution plan. For example, the processing described in FIG. 19 and FIG. 20 can be an example of the processing as long as the cost is minimum. A description of the specific processing of the execution plan with the minimum cost will be omitted.

ここまで、第１の実施形態について説明してきたが、この実施形態によれば、ＸＭＬデータの構造を集約して予め要約情報（ストロングデータガイド木）として持っておくことにより、ＸＭＬデータの木構造の中から無駄なく効率的に問合せに対応する検索を行うことができる。しかも、ＸＭＬデータ処理装置１００の主記憶装置１１０の容量にあわせて、その構成に応じて最適なデータ配置をとることで、効率的な問合せ処理を行うことができる。 Up to this point, the first embodiment has been described. According to this embodiment, the XML data tree structure is obtained by aggregating the XML data structure and holding it in advance as summary information (strong data guide tree). It is possible to perform a search corresponding to the query efficiently without waste. Moreover, efficient query processing can be performed by taking an optimal data arrangement according to the configuration of the main storage device 110 of the XML data processing device 100 in accordance with the capacity.

<<第２の実施形態>>
図２２は、第２の実施形態におけるＸＭＬデータ処理装置の構成を説明する図である。
第１の実施形態では、ＸＭＬデータ１４０は、二次記憶装置１３０に記録してあって、ディスクマネージャ部１９０が管理していたが、この二次記憶装置１３０に記録してあるＸＭＬデータ１４０をより効率的に管理するために関係データベース管理システムを導入し、ＸＭＬデータ１４０を関係データベースとして記録、管理する実施形態をとることもできる。この第２の実施形態は、関係データベース管理システムに関する部分以外の構成は、第１の実施形態と同じであるので、説明を省略し、ここでは、第１の実施形態と異なる構成要素について説明する。 << Second Embodiment >>
FIG. 22 is a diagram for explaining the configuration of the XML data processing apparatus according to the second embodiment.
In the first embodiment, the XML data 140 is recorded in the secondary storage device 130 and is managed by the disk manager 190. The XML data 140 recorded in the secondary storage device 130 is stored in the XML data 140. In order to manage more efficiently, a relational database management system may be introduced to record and manage the XML data 140 as a relational database. In the second embodiment, the configuration other than the portion related to the relational database management system is the same as that of the first embodiment, and thus the description thereof will be omitted. Here, the components different from those of the first embodiment will be described. .

第２の実施形態では、二次記憶装置１３０の中に記録されているＸＭＬデータ１４０Ａは関係データベース（ＲＤＢ）として記録されている。また、第２の実施形態では、ディスクマネージャ部１９０に替えて、関係データベース管理システムのプログラミングＡＰＩ２５０が導入され、関係データベースとして記録されているＸＭＬデータ１４０Ａに対するアクセスを管理している。 In the second embodiment, the XML data 140A recorded in the secondary storage device 130 is recorded as a relational database (RDB). In the second embodiment, a programming API 250 of a relational database management system is introduced in place of the disk manager unit 190 to manage access to XML data 140A recorded as a relational database.

第２の実施形態における問合せ処理は、第１の実施形態と同じなので説明を省略する。ただし、前記したように二次記憶装置１３０に記録されているＸＭＬデータ１４０Ａに対するアクセスの方式だけが異なっている。 Since the inquiry process in the second embodiment is the same as that in the first embodiment, a description thereof will be omitted. However, as described above, only the method of accessing the XML data 140A recorded in the secondary storage device 130 is different.

第２の実施形態によれば、第１の実施形態と同様に、ＸＭＬデータの構造を集約して予め要約情報（ストロングデータガイド木）として持っておくことにより、ＸＭＬデータの木構造の中から無駄なく効率的に問合せに対応する検索を行うことができる。さらに、第２の実施形態では、二次記憶装置に記録されているＸＭＬデータに対するアクセスが効率化されるので、さらに効率的な問合せ処理が可能になる。 According to the second embodiment, as in the first embodiment, the XML data structure is aggregated and stored in advance as summary information (strong data guide tree). A search corresponding to the query can be efficiently performed without waste. Furthermore, in the second embodiment, since the access to the XML data recorded in the secondary storage device is made efficient, more efficient query processing can be performed.

ここまで、本発明の実施形態について説明してきたが、本発明の趣旨から逸脱しない範囲内で実施形態の変更が可能である。例えば、ＸＭＬデータを二次記憶装置に記録するかわりに主記憶装置の容量を大きくして、全てのＸＭＬデータをそこに記録したり、二次記憶装置を半導体で構成されたディスク装置に変更したりしてもよい。なお、本実施形態におけるＸＭＬデータ処理装置は、演算手段を用いてプログラムで実現されており、所定の機能を備えた計算機に所定のプログラムを読み込む事で動作できる状態になる。そして、このプログラムを記憶媒体に記録しておいて、これを計算機に読み込ませることでＸＭＬデータ処理装置を実現してもよい。 So far, the embodiment of the present invention has been described, but the embodiment can be changed without departing from the gist of the present invention. For example, instead of recording the XML data in the secondary storage device, the capacity of the main storage device is increased and all the XML data is recorded there, or the secondary storage device is changed to a disk device composed of semiconductors. You may do it. Note that the XML data processing apparatus according to the present embodiment is realized by a program using arithmetic means, and can be operated by reading a predetermined program into a computer having a predetermined function. Then, the XML data processing apparatus may be realized by recording the program in a storage medium and causing the computer to read the program.

第１の実施形態におけるＸＭＬデータ処理装置の構成を説明する図である。It is a figure explaining the structure of the XML data processing apparatus in 1st Embodiment. （ａ）は、ＸＰａｔｈ処理の前に行うＸＰａｔｈ事前処理の概要を説明する図であり、図２の（ｂ）は、ＸＰａｔｈ処理の概要を説明する図である。(A) is a figure explaining the outline | summary of the XPath pre-processing performed before an XPath process, (b) of FIG. 2 is a figure explaining the outline | summary of an XPath process. ＸＭＬデータの例を説明する図である。It is a figure explaining the example of XML data. ＳＡＸのイベントシーケンスの例を示す図である。It is a figure which shows the example of the event sequence of SAX. 図３のＸＭＬデータに対して構築したＸＭＬ木の例を示す図である。It is a figure which shows the example of the XML tree constructed | assembled with respect to the XML data of FIG. ＸＰａｔｈによる問合せの例を示す図である。It is a figure which shows the example of the inquiry by XPath. 図６の問合せの例に対応する問合せ木の例を示す図である。It is a figure which shows the example of the query tree corresponding to the example of the query of FIG. 図５に示したＸＭＬ木に対応するストロングデータガイド木の例を示す図である。It is a figure which shows the example of the strong data guide tree corresponding to the XML tree shown in FIG. 範囲ラベル付ストロングデータガイド木の例を示す図である。It is a figure which shows the example of the strong data guide tree with a range label. 問合せ木の構造部と値処理部を説明する図である。It is a figure explaining the structure part and value processing part of a query tree. 範囲ラベル無しのストロングデータガイド木を生成する処理を説明する図である。It is a figure explaining the process which produces | generates the strong data guide tree without a range label. 範囲ラベル無しのストロングデータガイド木に属性に対応する子ノードを追加する処理を説明する図である。It is a figure explaining the process which adds the child node corresponding to an attribute to the strong data guide tree without a range label. 範囲ラベル付きのストロングデータガイド木を生成する処理を説明する図である。It is a figure explaining the process which produces | generates the strong data guide tree with a range label. １つのＳＡＸイベントに応じて範囲ラベル付きストロングデータガイド木を構築する処理を説明する図である。It is a figure explaining the process which builds a strong data guide tree with a range label according to one SAX event. 属性子ノード追加および範囲ラベル付与処理を説明する図である。It is a figure explaining an attribute child node addition and a range label provision process. ストロングデータガイド木からイベントシーケンスを構築する処理を説明する図である。It is a figure explaining the process which builds an event sequence from a strong data guide tree. ストロングデータガイド木に対するイテレータの探索例を説明する図である。It is a figure explaining the search example of an iterator with respect to a strong data guide tree. ストロングデータガイド木により問合せて中間要約木を得る処理を説明する図である。It is a figure explaining the process which inquires with a strong data guide tree and obtains an intermediate | middle summary tree. 中間要約木から実態中間木を生成する処理を説明する図である。It is a figure explaining the process which produces | generates the actual condition intermediate tree from an intermediate | middle summary tree. 実体中間木列にフィルタ条件を適用してＸＰａｔｈ問合せ結果を得る処理を説明する図である。It is a figure explaining the process which applies a filter condition to an entity intermediate tree sequence, and obtains an XPath inquiry result. 実行プラン群を生成して最小コストを求める処理を説明する図である。It is a figure explaining the process which produces | generates an execution plan group and calculates | requires the minimum cost. 第２の実施形態におけるＸＭＬデータ処理装置の構成を説明する図である。It is a figure explaining the structure of the XML data processing apparatus in 2nd Embodiment.

Explanation of symbols

１０利用者
２０ＸＭＬデータ（記録前）
１００ＸＭＬデータ処理装置
１１０主記憶装置
１２０ＣＰＵ
１３０二次記憶装置
１４０、１４０ＡＸＭＬデータ（記録後）
１５０問合せ解析部
１６０問合せ最適化部
１７０問合せ実行部
１８０構造ジョイン演算部
１９０ディスクマネージャ部
２００結果生成部
２１０イベントシーケンス生成部
２２０データガイド木ローダ部
２３０ＸＭＬデータローダ部
２４０データガイド木（ストロングデータガイド木）
２５０プログラミングＡＰＩ 10 users 20 XML data (before recording)
100 XML Data Processing Device 110 Main Storage Device 120 CPU
130 Secondary storage device 140, 140A XML data (after recording)
DESCRIPTION OF SYMBOLS 150 Query analysis part 160 Query optimization part 170 Query execution part 180 Structure join calculation part 190 Disk manager part 200 Result generation part 210 Event sequence generation part 220 Data guide tree loader part 230 XML data loader part 240 Data guide tree (strong data guide) wood)
250 Programming API

Claims

An XML data processing apparatus realized by using a computer having at least an arithmetic means and a storage means,
The XML data processing device is
Means for analyzing the XML data and means for storing the XML data;
At the time of registering the XML data to be searched, the means for analyzing the XML data constitutes summary information including information on the configuration and statistics of the XML data,
A label that can determine an ancestor-descendant relationship between nodes of the XML data is attached to the summary information,
An XML data processing apparatus, wherein the means for storing the XML data stores the XML data, the summary information, and the label.

The computing means receives an SAX event sequence of XML data as input, determines whether or not node information corresponding to each SAX event is included in the summary information,
When the node information is not included in the summary information, the means for analyzing the XML data adds the node information to the summary information, thereby forming the summary information by a predetermined number of scans. The XML data processing apparatus according to claim 1, wherein the apparatus is an XML data processing apparatus.

The XML data processing according to claim 1 or 2, wherein the label capable of determining the ancestor-descendant relationship is a range label having a function of identifying a node to be searched in a tree structure. apparatus.

An XML data processing apparatus realized by using a computer having at least an arithmetic means and a storage means,
The XML data processing device is
Means for analyzing a query for XML data;
Means for executing the query,
As input, summary information including information about the structure of the query and XML data and statistics,
The means for analyzing the query generates a query tree structure part by converting the query into a tree structure and deleting the attribute value or node value part,
Means for executing said query;
The intermediate query tree that is a part of the structure part of the query tree is collated with the summary information to generate a set of intermediate summary trees including a part of the summary information,
Converting the set of intermediate summary trees into the set of entity intermediate trees including data obtained by materializing the set of intermediate summary trees with respect to the XML data;
Filtering the XML data using a value portion corresponding to the set of entity intermediate trees,
An XML data processing apparatus characterized by obtaining a query result.

The XML data processing apparatus according to claim 4, wherein a search language for describing the query is XPath.

6. The XML data processing apparatus according to claim 4, wherein the means for analyzing the inquiry creates an event sequence for the summary information.

The XML data processing apparatus further comprises means for performing a structure join operation combining a plurality of search results;
The means for performing the structure join operation is:
7. The XML data processing apparatus according to claim 4, wherein the intermediate summary tree is converted into a set of the entity intermediate trees by using a structure join operation.

The XML data processing device is
Further comprising means for optimizing the query;
A method of materialization processing in which the means for optimizing the query is a process of converting the intermediate summary tree into a set of the real intermediate tree including the materialized data, and a value portion corresponding to the set of the real intermediate trees 8. The XML data processing apparatus according to claim 4, wherein an optimal execution plan is determined by calculating a cost of an execution plan using a combination of filtering methods. .

The XML data processing device is
Further comprising means for managing the database;
9. The XML data processing apparatus according to claim 4, wherein means for managing a database manages the XML data.

The XML data processing device is
Means for analyzing the XML data and means for storing the XML data;
At the time of registering the XML data to be searched, the means for analyzing the XML data constitutes summary information including information on the configuration and statistics of the XML data,
A label that can determine an ancestor-descendant relationship between nodes of the XML data is attached to the summary information,
The XML data processing apparatus according to claim 4, wherein the means for storing the XML data stores the XML data, the summary information, and the label.

The computing means receives an SAX event sequence of XML data as input, determines whether or not node information corresponding to each SAX event is included in the summary information,
When the node information is not included in the summary information, the means for analyzing the XML data adds the node information to the summary information, thereby forming the summary information by a predetermined number of scans. The XML data processing apparatus according to claim 10, wherein the apparatus is an XML data processing apparatus.

12. The XML data processing according to claim 10, wherein the label capable of determining the ancestor-descendant relationship is a range label having a function of specifying a node to be searched in a tree structure. apparatus.

An XML data processing method executed by a computer having at least an arithmetic means and a storage means,
An XML data processing method for realizing the XML data processing apparatus according to any one of claims 1 to 12.

A program executed by a computer having at least a calculation unit and a storage unit,
An XML data processing program for realizing the XML data processing apparatus according to any one of claims 1 to 12.

13. A program executed by a computer having at least an arithmetic means and a storage means, and records an XML data processing program for realizing the XML data processing apparatus according to any one of claims 1 to 12. A storage medium characterized by.