JP2006053724A

JP2006053724A - Xml data management method

Info

Publication number: JP2006053724A
Application number: JP2004234344A
Authority: JP
Inventors: Tsuneyuki Imaki; 常之今木; Itaru Nishizawa; 格西澤
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2004-08-11
Filing date: 2004-08-11
Publication date: 2006-02-23

Abstract

<P>PROBLEM TO BE SOLVED: To provide a transparent structure retrieval function while optimizing an MXL document-relational table schema mapping definition. <P>SOLUTION: A mapping definition tuning module 105 refers to issuing history of a structure retrieve formula to change a schema-to-schema mapping definition 109 so that an XML document is properly divided and stored in a relational database 105 and a structure retrieval engine 106 for the purpose of enhancing the efficiency of retrieval processing with high issuing frequency. A structure retrieval formula conversion module 102 converts the structure retrieval formula based on the schema-to-schema mapping definition 109. A query execution control module 103 issues queries to the relational database 105 and the structure retrieval engine 106, respectively, and reconstitutes a result for the original structure retrieval formula from the respective results. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、データベース管理システムに関する。特にリレーショナルデータベース（あるいは関係データベース、以下、ＲＤＢという）を用いるＸＭＬ（ｅＸｔｅｎｓｉｂｌｅＭａｒｋｕｐＬａｎｇｕａｇｅ）文書の管理方法に係わり、特に一つのＸＭＬ文書をＸＭＬ構造検索エンジンとＲＤＢに分解して管理する方法に係わり、特に該文書に対する検索履歴に基づいて分解方法を適宜改善しつつ、ユーザに対してはこの分解方法について透過的な構造検索インタフェースを提供する方法に関する。 The present invention relates to a database management system. In particular, the present invention relates to an XML (extensible Markup Language) document management method using a relational database (or relational database, hereinafter referred to as RDB), and more particularly to a method of managing one XML document by decomposing it into an XML structure search engine and RDB. In particular, the present invention relates to a method for providing a structure search interface that is transparent to a user while improving the decomposition method as appropriate based on a search history for the document.

現在、ＸＭＬ文書の管理に特化したネイティブＸＭＬデータベース（ＮＸＤＢ）と呼ばれる製品がいくつか存在する。しかし、ＮＸＤＢは何れも発展途上であり、一般に大量データの管理や集計処理の目的には性能的に不十分であるため、基幹系業務などには適さないとされている。ＸＭＬの基幹業務応用は、ＸＢＲＬ（ｅＸｔｅｎｓｉｂｌｅＢｕｓｉｎｅｓｓＲｅｐｏｒｔｉｎｇＬａｎｇｕａｇｅ）などのビジネス関連のＸＭＬ仕様の登場により今後の発展が期待されるため、大量のＸＭＬ文書を十分な性能で処理可能な技術が必要とされている。一方、主要ＲＤＢ製品においてもＸＭＬ文書管理機能が提供されている。ＲＤＢは長年にわたる改良により大量データの処理にも十分耐えうる性能を提供するため、ＸＭＬの基幹系業務応用にも適していると言える。 Currently, there are several products called Native XML Database (NXDB) that specialize in managing XML documents. However, all NXDBs are in the process of development and generally are insufficient in performance for the purpose of managing a large amount of data and a totaling process, and are therefore not suitable for mission-critical tasks. Since the development of business-related XML specifications such as XBRL (extensible Business Reporting Language) is expected for the basic business application of XML, technology capable of processing a large amount of XML documents with sufficient performance is required. ing. On the other hand, XML document management functions are also provided in main RDB products. RDB provides performance that can withstand the processing of large amounts of data due to improvements over many years, so it can be said that RDB is also suitable for XML mission-critical business applications.

主要ＲＤＢ製品に関する代表的なＸＭＬ文書管理方法については、非特許文献１および非特許文献２に記載されている。 Non-Patent Document 1 and Non-Patent Document 2 describe typical XML document management methods related to main RDB products.

非特許文献１の方法は、管理対象であるＸＭＬ文書の文書スキーマと、ＲＤＢの関係表スキーマとの間の対応関係に従って、ＸＭＬ文書に含まれるタグ付けされたデータを構造分解して、複数の関係表に分けて値単位で格納する。このような格納方式を、以下、ＸＭＬ文書スキーマ−関係表スキーマ間のマッピング方式と呼ぶ。非特許文献１の方法は、ＸＭＬ文書スキーマの定義を元に、その定義に妥当であるＸＭＬ文書を格納するための関係表スキーマの定義、および、該ＸＭＬ文書スキーマと該関係表スキーマとの間の対応関係の定義（以下、スキーママッピング定義という）を、自動的に作成する。またＸＭＬの標準検索仕様ＸＰａｔｈ形式の構造検索式を、スキーママッピング定義に従って、関係表検索式（以下、ＳＱＬ式という）に自動変換する。 According to the method of Non-Patent Document 1, the tagged data included in the XML document is structurally decomposed according to the correspondence between the document schema of the XML document to be managed and the relation table schema of the RDB, and a plurality of data is decomposed. Store in value units by dividing into relational tables. Such a storage method is hereinafter referred to as an XML document schema-relational table schema mapping method. The method of Non-Patent Document 1 is based on the definition of an XML document schema, defines a relation table schema for storing an XML document valid for the definition, and between the XML document schema and the relation table schema. The correspondence definition (hereinafter referred to as schema mapping definition) is automatically created. In addition, a structure search expression in the XML standard search specification XPath format is automatically converted into a relational table search expression (hereinafter referred to as an SQL expression) in accordance with the schema mapping definition.

非特許文献２の方法も、基本的にスキーママッピング方式である。ただしスキーママッピング定義は、ＲＤＢに格納されたデータからＸＭＬ文書を構築する方向で、ユーザがマニュアルで定義する。またＸＭＬの標準検索仕様ＸＱｕｅｒｙ形式の構造検索式を、スキーママッピング定義に従って、ＳＱＬ式に自動変換する。 The method of Non-Patent Document 2 is also basically a schema mapping method. However, the schema mapping definition is manually defined by the user in the direction of constructing an XML document from data stored in the RDB. In addition, a structure search expression in the XML standard search specification XQuery format is automatically converted into an SQL expression according to the schema mapping definition.

店ML Schemas in Oracle XML DB R. Murthy, S, Banerjee; VLDB2003Store ML Schemas in Oracle XML DB R. Murthy, S, Banerjee; VLDB2003 轍uerying XML Views of Relational Data J. Shanmugasundaram, et al., VLDB2001轍 uerying XML Views of Relational Data J. Shanmugasundaram, et al., VLDB2001

一般に、ＲＤＢでのＸＭＬ文書管理は、ＸＭＬ文書に含まれるタグ付けされたデータを構造分解して、複数の関係表に分けて値単位で格納する方式に則っている。このようなＸＭＬ文書スキーマ−関係表スキーマ間のスキーママッピング方式には、ＸＭＬ文書を管理するうえで以下のような欠点が存在する：
（ａ）検索効率を考慮したマッピング定義
一般に、マッピング方法の違いによって検索性能は異なってくる。最適な検索性能を得るためには、マッピング定義のチューニングが必要であるが、ユーザにとってこの作業は大変な負担となる。
（ｂ）非定型データの管理
ＸＭＬでは、厳密なスキーマ定義に従わない非定型部分データを文書中に含むことが可能であり、これによるデータ表現の柔軟性がＸＭＬ利用拡大の大きな要因となっているが、ＲＤＢではこのようなデータをＬＯＢとよばれる一次元の文字列データとして管理することになるため、その部分に対して高度な検索をかけることができない。
（ｃ）複雑な構造を持つ文書の管理
ＸＭＬでは、木構造に基づいたデータモデルにより、複雑なデータ構造を表現することが可能である。一方、関係表は一次元の値の集まりを単位としてデータを管理するため、木構造のような複雑なデータは、複数の関係表間における外部参照関係によって表現しなくてはならない。しかし、ＸＭＬ文書スキーマの階層が深い場合は多数の関係表に分けて管理することになるため、検索効率および格納効率の点で望ましくない。このように、ＲＤＢとＸＭＬとのデータモデルの違いに基づく関係表での管理が非効率的なＸＭＬ文書が存在する。
（ｄ）検索指定方法
関係表にスキーママッピングしたＸＭＬ文書に対する検索は、そのマッピング定義に沿って定義される必要があるため、ユーザがマッピング定義を意識して関係表検索式（以下、ＳＱＬ式）を記述する必要がある。また、（ａ）の課題にあげたように検索効率性を考慮してマッピング定義を変更した場合は、ＳＱＬ式も記述し直す必要がある。一般に、ユーザにとっては、ＸＭＬ文書スキーマのみを意識して構造検索を指定できることが理想であり、ＸＭＬ文書の管理においては本来存在しないこれらの必要性は、ユーザにとって大変な負担となる。 In general, XML document management in the RDB is based on a system in which tagged data included in an XML document is structurally decomposed and divided into a plurality of relational tables and stored in units of values. The schema mapping method between the XML document schema and the relational table schema has the following disadvantages in managing the XML document:
(A) Mapping definition in consideration of search efficiency Generally, the search performance varies depending on the mapping method. In order to obtain optimum search performance, tuning of the mapping definition is necessary, but this work is a heavy burden for the user.
(B) Management of atypical data In XML, it is possible to include atypical partial data that does not conform to a strict schema definition in a document, and the flexibility of data representation due to this becomes a major factor in expanding the use of XML. However, in RDB, such data is managed as one-dimensional character string data called LOB, so that it is not possible to perform an advanced search for that portion.
(C) Management of a document having a complicated structure In XML, a complicated data structure can be expressed by a data model based on a tree structure. On the other hand, since the relational table manages data in units of one-dimensional values, complicated data such as a tree structure must be expressed by an external reference relationship between a plurality of relational tables. However, when the XML document schema has a deep hierarchy, the XML document schema is managed by dividing it into a large number of relational tables, which is undesirable in terms of search efficiency and storage efficiency. As described above, there is an XML document that is inefficiently managed in the relational table based on the difference in data model between RDB and XML.
(D) Retrieval designation method Retrieval for an XML document schema-mapped to a relational table needs to be defined according to the mapping definition. Therefore, the user is aware of the mapping definition and the relational table retrieval expression (hereinafter, SQL expression). Need to be described. Further, when the mapping definition is changed in consideration of the search efficiency as described in the problem (a), it is necessary to rewrite the SQL expression. In general, it is ideal for a user to be able to specify a structure search in consideration of only the XML document schema, and these necessitys that do not exist in the management of an XML document are very burdensome for the user.

上記のＸＭＬ文書スキーマ−関係表スキーマ間スキーママッピング方式における（ａ）〜（ｄ）の欠点を克服するために、本発明ではそれぞれ以下の課題を解決することを目的とする。 In order to overcome the disadvantages (a) to (d) in the schema mapping scheme between the XML document schema and the relational table schema, the present invention aims to solve the following problems.

第一に、スキーママッピング定義の自動チューニング機能を提供すること。 First, provide an automatic tuning function for schema mapping definitions.

第二に、従来ＬＯＢで管理していたような非定型部分データに対しても構造検索機能を提供すること。 Second, to provide a structure search function for atypical partial data that has been managed by LOBs.

第三に、関係表での管理が非効率的なデータを切り分けて、効率的な手段で管理すること。 Third, data that is inefficient in the management of relational tables should be isolated and managed by efficient means.

第四に、ＸＭＬ文書の関係表への格納方法に関して、透過なＸＭＬ文書の構造検索機能を提供すること。 Fourthly, a transparent XML document structure search function is provided with respect to a method for storing XML documents in a relational table.

まず、第二、第三の課題を解決するために、ＲＤＢの外部データベース、あるいはＲＤＢのプラグインとして存在するＸＭＬ構造検索エンジンと連携する。 First, in order to solve the second and third problems, it cooperates with an RDB external database or an XML structure search engine that exists as an RDB plug-in.

従来、関係表のＬＯＢカラムに格納していた非定型部分データを構造検索エンジンに格納することによって、第二の課題を解決する。関係表での管理が非効率的なデータも、代わりに構造検索エンジンで管理することによって、第三の課題を解決する。 Conventionally, the atypical partial data stored in the LOB column of the relational table is stored in the structure search engine, thereby solving the second problem. The third problem is solved by managing inefficient data in the relational table with the structural search engine instead.

また、第一の課題を解決するために、クエリ発行履歴を参照し、頻出クエリの検索性能効率化を指標として適切なスキーママッピング定義を導出するマッピング定義チューニングモジュールを導入する。第三の課題解決における関係表での管理が不適切なＸＭＬ部分データの切り分けもこのモジュールで行う。 In order to solve the first problem, a mapping definition tuning module that refers to the query issuance history and derives an appropriate schema mapping definition by using the retrieval performance efficiency of frequent queries as an index is introduced. This module also separates XML partial data that is inappropriately managed in the relationship table in the third problem solving.

さらに、第四の課題を解決するために、ＸＭＬ文書に対する構造検索式を、スキーママッピング定義に基づいてＳＱＬ式に自動変換するクエリリライト機能を提供する。検索対象が構造検索エンジンで管理されている部分データにも及ぶ場合は、この検索エンジンへの検索式をＵＤＦ（ＵｓｅｒＤｅｆｉｎｅＦｕｎｃｔｉｏｎ）として含むＳＱＬ式に変換する。このクエリリライトにより、ユーザは、第一〜第三の課題解決における、ＸＭＬ文書の関係表および構造検索エンジンへの格納方法の違いに対して、透過的に構造検索指定が可能となる。 Furthermore, in order to solve the fourth problem, a query rewrite function for automatically converting a structure search expression for an XML document into an SQL expression based on a schema mapping definition is provided. When the search target extends to partial data managed by the structural search engine, the search target is converted into an SQL expression that includes a search expression for the search engine as a UDF (User Define Function). By this query rewrite, the user can transparently specify the structure search with respect to the difference between the XML document relation table and the structure search engine storage method in the first to third problem solving.

ＸＭＬ文書スキーマ−関係表スキーマ間のスキーママッピング方式において、
（１）クエリ発行履歴に基づいて、検索処理コストを削減するようにスキーママッピング定義を自動的に改善することが可能である。
（２）非定型の部分データを構造検索エンジンで管理することによって、該部分データに対する構造検索が可能である。
（３）関係表での管理が非効率的なデータを切り分けて構造検索エンジンで管理することによって、非効率的な検索処理を回避することが可能である。
（４）クエリリライト機能により、ＸＭＬ文書の関係表および構造検索エンジンへの格納方法に関し、透過的にＸＭＬ文書に対する構造検索を指定することが可能である。 In the schema mapping method between the XML document schema and the relation table schema,
(1) Based on the query issuance history, the schema mapping definition can be automatically improved so as to reduce the search processing cost.
(2) By managing the atypical partial data with the structural search engine, a structural search for the partial data can be performed.
(3) It is possible to avoid inefficient search processing by separating data that is inefficiently managed in the relational table and managing it with the structural search engine.
(4) With the query rewrite function, it is possible to transparently specify the structure search for the XML document regarding the relation table of the XML document and the storage method in the structure search engine.

以下、本発明の実施の一形態を、図面を参照しながら説明する。なお簡単のために、本明細書中では以下に述べる発明の実施の形態を単に「本実施例」と呼ぶことにする。 Hereinafter, an embodiment of the present invention will be described with reference to the drawings. For the sake of simplicity, an embodiment of the invention described below will be simply referred to as “this example” in the present specification.

図１を用いて、本実施例の概略構成について説明する。 A schematic configuration of the present embodiment will be described with reference to FIG.

本実施例のシステムは、以下に挙げる４つのモジュールを基本構成要素として成立している：
・タグ付き構造化文書−関係表間データ変換モジュール１０１
・構造検索式変換モジュール１０２
・クエリ（問合せ）実行制御モジュール１０３
・マッピング定義チューニングモジュール１０４
以下、それぞれのモジュールについて概説する。 The system of the present embodiment is composed of the following four modules as basic components:
Tagged structured document-relational table data conversion module 101
Structure search expression conversion module 102
-Query execution control module 103
Mapping definition tuning module 104
The following outlines each module.

タグ付き構造化文書−関係表間データ変換モジュール１０１は、タグ付き構造化文書（以下、ＸＭＬ文書という）１０７を構造分解してタグを取り除いたデータ本体を、リレーショナルデータベース（以下、ＲＤＢという）１０５の関係表のカラムに対応して属性値を格納する。ただし一部のＸＭＬ文書については、タグが付いた部分木単位でＸＭＬ文書専用の構造検索エンジン１０６に格納する。ＸＭＬ文書のうち、ＲＤＢ１０５に格納する部分、格納先の関係表カラム、および構造検索エンジン１０６にタグごと格納する部分の区別は、タグ付き構造化文書スキーマ定義（以下、ＸＭＬ文書構造定義という）１０８、スキーマ間マッピング定義１０９、および関係表スキーマ定義１１０に従って決定される。 The tagged structured document-relationship table data conversion module 101 converts a tagged structured document (hereinafter referred to as an XML document) 107 into a relational database (hereinafter referred to as RDB) 105 from a data body obtained by structurally decomposing and removing the tag. Store attribute values corresponding to the columns of the relationship table. However, some XML documents are stored in the structure search engine 106 dedicated to the XML document in units of subtrees with tags. Among the XML documents, the distinction between the part stored in the RDB 105, the relation table column of the storage destination, and the part stored for each tag in the structure search engine 106 is a tagged structured document schema definition (hereinafter referred to as an XML document structure definition) 108. , The mapping definition between schemas 109, and the relation table schema definition 110.

ＸＭＬ文書構造定義１０８はＸＭＬ文書の構造定義を、関係表スキーマ定義１１０は関係表のスキーマ定義をそれぞれ表す。ＸＭＬ文書１０７は、ＸＭＬ文書構造定義１０８に対して妥当である必要があるし、ＲＤＢ１０５に格納されている関係表２０１、２０２は、関係表スキーマ定義１１０に従って構成されている。スキーマ間マッピング定義１０９は、ＸＭＬ文書のノード値（タグで修飾された要素値、あるいは属性値）とそれを格納する関係表のカラムの対応付けを定義する。 The XML document structure definition 108 represents the structure definition of the XML document, and the relation table schema definition 110 represents the schema definition of the relation table. The XML document 107 needs to be valid with respect to the XML document structure definition 108, and the relationship tables 201 and 202 stored in the RDB 105 are configured according to the relationship table schema definition 110. The inter-schema mapping definition 109 defines a correspondence between a node value (an element value or an attribute value modified with a tag) of an XML document and a relation table column storing the node value.

構造検索式変換モジュール１０２は、ＸＱｕｅｒｙ、ＸＰａｔｈなどのＸＭＬの標準検索仕様に従ってユーザが定義した構造検索式１１１を、ＲＤＢ１０５用の検索仕様であるＳＱＬ言語の検索式（以下、ＳＱＬ式という）１１２に変換するモジュールである。この変換は、スキーマ間マッピング定義１０９に従って行われる。検索範囲が構造検索エンジンに格納した部分ＸＭＬ文書にも及ぶ場合には、ＳＱＬ式１１２中に構造検索エンジン用の拡張関数（ＵＤＦ）を埋め込んだ式に変換する。 The structure search expression conversion module 102 converts a structure search expression 111 defined by the user in accordance with an XML standard search specification such as XQuery or XPath into a SQL language search expression (hereinafter referred to as an SQL expression) 112 which is a search specification for the RDB 105. This is the module to convert. This conversion is performed according to the inter-schema mapping definition 109. If the search range extends to the partial XML document stored in the structural search engine, the SQL expression 112 is converted into an expression in which an extended function (UDF) for the structural search engine is embedded.

クエリ実行制御モジュール１０３は、ＵＤＦを含んだＳＱＬ式１１２を、ＳＱＬ部分とＵＤＦ部分に分離し、前者をＲＤＢ１０５に、後者を構造検索エンジン１０６に対して発行し、その結果を統合して、元の構造検索式１１１に対する結果１１３を構築するモジュールである。このモジュールは、ＲＤＢ１０５にプラグイン処理機構がある場合は、その機能上で自然に実現される（この場合については、図３を用いて後述する）。 The query execution control module 103 separates the SQL expression 112 including the UDF into an SQL part and a UDF part, issues the former to the RDB 105 and the latter to the structural search engine 106, and integrates the results to obtain the original This is a module for constructing a result 113 for the structure search formula 111. When the RDB 105 has a plug-in processing mechanism, this module is naturally realized on the function (this case will be described later with reference to FIG. 3).

マッピング定義チューニングモジュール１０４は、ユーザの構造検索式１１１の発行履歴を参照して、頻出する検索式の処理の効率化を指標として、ＸＭＬ文書構造定義１０８を参照しつつ、スキーマ間マッピング定義１０９、および関係表スキーマ定義１１０を適宜更新する。関係表スキーマ定義１１０の更新に伴う関係表の変更は、ＲＤＢ１０５の機能に任せる。 The mapping definition tuning module 104 refers to the issuance history of the structure search formula 111 of the user, and refers to the XML document structure definition 108 by using the efficiency of processing of the frequently used search formula as an index. The relation table schema definition 110 is updated as appropriate. The change of the relation table accompanying the update of the relation table schema definition 110 is left to the function of the RDB 105.

以下、図１に示すシステムを実現するためのハードウェア構成について説明する。本システムは、ハードウェア的にはＣＰＵ、メモリ、外部記憶装置、入力装置、表示装置などを備える１台又は複数台の計算機によって構成される。ＸＭＬ文書１０７、ＸＭＬ文書構造定義１０８、スキーマ間マッピング定義１０９および関係表スキーマ定義１１０は、ファイルとして記憶装置上に格納される。構造検索式１１１は、テキストエディタを介して入力装置から入力されるか、図示しないアプリケーションプログラムを介して生成され、メモリに格納される。結果１１３は、メモリに格納され、表示装置やプリンタに出力されるか、さらに処理のためにアプリケーションに渡されるデータである。構造検索エンジン１０６は、記憶装置に格納されるＸＭＬ文書の木構造ファイルを有し、これらファイルを管理するためのデータベース・マネージメント・システムである。タグ付き構造化文書−関係表間データ変換モジュール１０１、構造検索式変換モジュール１０２、クエリ実行制御モジュール１０３およびマッピング定義チューニングモジュール１０４は、計算機のメモリに格納され、そのＣＰＵによって実行されるプログラムである。ＲＤＢ１０５は、記憶装置上に格納されるリレーショナルデータベースを有し、このデータベースを管理するためのデータベース・マネージメント・システムである。データ変換モジュール１０１、構造検索式変換モジュール１０２、クエリ実行制御モジュール１０３およびマッピング定義チューニングモジュール１０４の一部又は全部がＲＤＢ１０５に組み込まれて実装されてもよい。これらモジュール、ＲＤＢ１０５および構造検索エンジン１０６は、同一の計算機上で実行されてもよいし、その一部又は全部がネットワークを介して異なる計算機上で実行されてもよい。また本システムは、クライアント−サーバ型のシステムで実現されてもよい。 Hereinafter, a hardware configuration for realizing the system shown in FIG. 1 will be described. This system is configured by one or a plurality of computers including a CPU, a memory, an external storage device, an input device, a display device, and the like in hardware. The XML document 107, the XML document structure definition 108, the inter-schema mapping definition 109, and the relation table schema definition 110 are stored as files on the storage device. The structure search formula 111 is input from an input device via a text editor or generated via an application program (not shown) and stored in a memory. The result 113 is data stored in the memory and output to a display device or a printer, or passed to an application for further processing. The structure search engine 106 has a tree structure file of an XML document stored in a storage device, and is a database management system for managing these files. The tagged structured document-relational table data conversion module 101, the structure retrieval formula conversion module 102, the query execution control module 103, and the mapping definition tuning module 104 are programs stored in the memory of a computer and executed by the CPU. . The RDB 105 has a relational database stored on a storage device, and is a database management system for managing this database. A part or all of the data conversion module 101, the structure search expression conversion module 102, the query execution control module 103, and the mapping definition tuning module 104 may be incorporated in the RDB 105 and implemented. These modules, RDB 105, and structure search engine 106 may be executed on the same computer, or a part or all of them may be executed on different computers via a network. The system may be realized by a client-server type system.

以上が、本実施例の概略である。以降、本実施例における、データ変換モジュール１０１の動作概要を図２で、構造検索式変換モジュール１０２の動作概要を図３で、クエリ実行制御モジュール１０３の動作概要を、実現方法のバリエーション別に図３、図４、図５を用いて説明する。 The above is the outline of the present embodiment. Hereinafter, the operation outline of the data conversion module 101 in this embodiment is shown in FIG. 2, the operation outline of the structure search expression conversion module 102 is shown in FIG. 3, and the operation outline of the query execution control module 103 is shown in FIG. This will be described with reference to FIGS.

図２を用いて、データ変換モジュール１０１の動作について説明する。本説明では、ＸＭＬ文書構造定義１０８に対して妥当であるＸＭＬ文書１０７をＲＤＢ１０５に格納する場合を例にとる。ＸＭＬ文書構造定義１０８は、ルート要素ｘの下に複数のｉ要素が出現し、各ｉ要素は属性ａ，ｂを持ち、さらにその下には複数のｊ要素が出現し、各ｊ要素は属性ａ，ｂ，ｃを持ち、さらにその下には複数のｋ要素が出現し、各ｋ要素は属性ａ，ｂを持つことを表している。ｓｔｒは文字列を示す。×０…ｎは、対応する要素が０個からｎ個まで出現可能なことを示す。また、各ｊ要素の下には、ｋ要素以外にも任意の要素が登場し得ることを｛ＡＮＹ｝で示す。なお、図２のＸＭＬ文書構造定義１０８の記法は実施例を限定するものではなく、同様の意味を表現し得るＸＭＬ文書構造の定義仕様であれば、どのような記法でも適用可能である。例えば、ＸＭＬ文書の標準的な文書構造定義仕様であるＤＴＤ（ＤｏｃｕｍｅｎｔＴｙｐｅＤｅｆｉｎｉｔｉｏｎ）では、上記と同様の文書構造定義を以下のように表現する：
＜!ＥＬＥＭＥＮＴｘｉ＊＞
＜!ＥＬＥＭＥＮＴｉｊ＊＞
＜!ＡＴＴＬＩＳＴｉ
ａＣＤＡＴＡ＃ＲＥＱＵＩＲＥＤ
ｂＣＤＡＴＡ＃ＲＥＱＵＩＲＥＤ＞
＜!ＥＬＥＭＥＮＴｊＡＮＹ＞
＜!ＡＴＴＬＩＳＴｊ
ａＣＤＡＴＡ＃ＲＥＱＵＩＲＥＤ
ｂＣＤＡＴＡ＃ＲＥＱＵＩＲＥＤ
ｃＣＤＡＴＡ＃ＲＥＱＵＩＲＥＤ＞
＜!ＥＬＥＭＥＮＴｋＥＭＰＴＹ＞
＜!ＡＴＴＬＩＳＴｋ
ａＣＤＡＴＡ＃ＲＥＱＵＩＲＥＤ
ｂＣＤＡＴＡ＃ＲＥＱＵＩＲＥＤ＞
一方、ＸＭＬ文書１０７を格納する関係表２０１、２０２、および２０３のスキーマは、関係表スキーマ定義１１０で与えられる。この例では、関係表ｉがａ，ｂ，ｉｄの３つのカラムを、関係表ｊがｐｉｄ，ａ，ｂ，ｃ，ｗ，ｉｄの６つのカラムを、関係表ｋがｐｉｄ，ａ，ｂ，ｉｄの４つのカラムを持つことをそれぞれ表現している。ここでｉｄとｐｉｄは親と子のつながりを示す識別子である。ｉｄは自身を親とする識別子、ｐｉｄは子に設けられる識別子であり、どの親に接続するかを示す識別子である。ｉｄとｐｉｄが同一である場合に親子関係の接続があることを示す。なお、図２の関係表スキーマ定義１１０の記法は実施例を限定するものではなく、同様の意味を表現し得る関係表スキーマの定義仕様であれば、どのような記法でも適用可能である。例えば、一般に関係表のスキーマはＳＱＬ式で表作成時に定義するため、そのＳＱＬ式を関係表スキーマ定義１１０として利用できる。 The operation of the data conversion module 101 will be described with reference to FIG. In this description, an example is described in which an XML document 107 that is valid for the XML document structure definition 108 is stored in the RDB 105. In the XML document structure definition 108, a plurality of i elements appear under the root element x, each i element has attributes a and b, and a plurality of j elements appear below it, and each j element has an attribute. It has a, b, c, and a plurality of k elements appear below it, indicating that each k element has attributes a, b. str indicates a character string. X0... N indicates that 0 to n corresponding elements can appear. Also, {ANY} indicates that any element other than the k element can appear under each j element. Note that the notation of the XML document structure definition 108 in FIG. 2 does not limit the embodiment, and any notation is applicable as long as the XML document structure definition specification can express the same meaning. For example, in DTD (Document Type Definition) which is a standard document structure definition specification of an XML document, a document structure definition similar to the above is expressed as follows:
<! ELEMENT x i *>
<! ELEMENT i j *>
<! ATTLIST i
a CDATA #REQUIRED
b CDATA #REQUIRED>
<! ELEMENT j ANY>
<! ATTLIST j
a CDATA #REQUIRED
b CDATA #REQUIRED
c CDATA #REQUIRED>
<! ELEMENT k EMPTY>
<! ATTLIST k
a CDATA #REQUIRED
b CDATA #REQUIRED>
On the other hand, the schemas of the relationship tables 201, 202, and 203 that store the XML document 107 are given by the relationship table schema definition 110. In this example, the relation table i has three columns a, b and id, the relation table j has six columns pid, a, b, c, w and id, and the relation table k has pid, a, b and id. It expresses having 4 columns of id. Here, id and pid are identifiers indicating the connection between the parent and the child. id is an identifier having itself as a parent, pid is an identifier provided in a child, and is an identifier indicating which parent is connected. When id and pid are the same, it indicates that there is a parent-child connection. The notation of the relation table schema definition 110 in FIG. 2 does not limit the embodiment, and any notation can be applied as long as the definition specification of the relation table schema can express the same meaning. For example, since the schema of the relation table is generally defined at the time of creating the table using an SQL expression, the SQL expression can be used as the relation table schema definition 110.

上記のＸＭＬ文書構造定義１０８および関係表スキーマ定義１１０に基づいて、両定義間の値の対応付けを定義するのが、スキーマ間マッピング定義１０９である。１行目の「／ｘ／i／＠ａ⇔ｉ．ａ」は、ＸＭＬ文書１０７のｉ要素の属性ａの値を、関係表ｉ（２０１）のカラムａに格納することを表現している。２，４，５，６，８，９行目も同様である。３行目の「／ｘ／ｉ／ｊ／．．⇔ｊ．ｐｉｄ＝ｉ．ｉｄ」は、ｊ要素の親を関係表ｊ（２０２）のカラムｐｉｄで示し、関係表ｉ（２０１）のカラムｉｄを外部参照していることを表現している。７行目も同様に、関係表ｋ（２０３）と関係表ｊ（２０２）の間の外部参照を表現している。１０行目は、ＸＭＬ文書構造定義１０８において定義されていないｊ要素の部分内容を関係表ｊ（２０２）のカラムｗに格納することを示している。ただし実際にはその部分構造は、タグごと構造検索エンジン１０６に格納され、その格納イメージ２０４に対して構造検索エンジン１０６上で付されたファイルのＩＤ（この例ではｘｉ−ａ）のみが関係表ｊ（２０２）のカラムｗに格納される。 Based on the XML document structure definition 108 and the relation table schema definition 110 described above, an inter-schema mapping definition 109 defines the association of values between the two definitions. “/X/i/@a⇔i.a” on the first line expresses that the value of the attribute “a” of the i element of the XML document 107 is stored in the column “a” of the relation table i (201). . The same applies to the second, fourth, fifth, sixth, eighth and ninth lines. “/ X / i / j /... J.pid = i.id” in the third row indicates the parent of the j element by the column pid of the relation table j (202), and the column of the relation table i (201). It expresses that id is externally referenced. Similarly, the seventh line expresses an external reference between the relation table k (203) and the relation table j (202). The tenth line indicates that the partial contents of the j element not defined in the XML document structure definition 108 are stored in the column w of the relation table j (202). However, the partial structure is actually stored in the structure search engine 106 for each tag, and only the file ID (xi-a in this example) assigned to the stored image 204 on the structure search engine 106 is a relation table. Stored in column w of j (202).

データ変換モジュール１０１は、まず関係表スキーマ定義１１０に基づきＲＤＢ１０５を介してその記憶領域内に関係表ｉ，ｊ，ｋの各領域を確保する。次にデータ変換モジュール１０１は、ＸＭＬ文書１０７からｉ，ｊ，ｋ又はｏ要素の１つを取り出し、ＸＭＬ文書構造定義１０８を参照して取り出した要素の形式がそのスキーマ定義に合致するか否かチェックする。定義に合致すれば、データ変換モジュール１０１は、スキーマ間マッピング定義１０９を参照して取り出した要素の各属性の属性値をＲＤＢの該当する関係表の１レコードとして格納し、そのレコードのｉｄとｐｉｄを設定する。ｉｄはその関係表のレコード位置に応じた識別子を生成して設定する。ｐｉｄにはメモリに保存された親要素のｉｄがあればそのｉｄを格納する。次にデータ変換モジュール１０１は、当該要素の要素名とそのｉｄをメモリに一時保存する。データ変換モジュール１０１は、ＸＭＬ文書１０７からｏ要素を取り出したとき、ＸＭＬ文書構造定義１０８を参照して取り出した要素がＡＮＹに相当することを認識し、関係表ｊの該当するレコードのカラムに構造検索エンジン１０６で指定されたファイルＩＤを設定し、取り出したｏ要素にｊのタグを付けた部分構造を構造検索エンジン１０６に送る。構造検索エンジン１０６は、受け取った部分構造をその記憶領域に格納イメージ２０４として格納する。データ変換モジュール１０１は、ＸＭＬ文書１０７のすべての要素を取り出し終わるまでＸＭＬ文書１０７から次の要素を取り出すステップに戻って上記処理を繰り返す。 The data conversion module 101 first secures each area of the relation table i, j, k in the storage area via the RDB 105 based on the relation table schema definition 110. Next, the data conversion module 101 extracts one of the i, j, k, or o elements from the XML document 107, and whether or not the format of the extracted element with reference to the XML document structure definition 108 matches the schema definition. To check. If the definition is met, the data conversion module 101 stores the attribute value of each attribute of the element extracted by referring to the inter-schema mapping definition 109 as one record of the corresponding relation table in the RDB, and the id and pid of the record Set. id is generated and set according to the record position of the relation table. If the id of the parent element stored in the memory is stored in pid, the id is stored. Next, the data conversion module 101 temporarily stores the element name of the element and its id in the memory. When the o element is extracted from the XML document 107, the data conversion module 101 recognizes that the element extracted with reference to the XML document structure definition 108 corresponds to ANY, and the structure is displayed in the column of the corresponding record in the relation table j. A file ID designated by the search engine 106 is set, and a partial structure in which a tag j is added to the extracted o element is sent to the structure search engine 106. The structure search engine 106 stores the received partial structure as a storage image 204 in the storage area. The data conversion module 101 returns to the step of extracting the next element from the XML document 107 until the extraction of all elements of the XML document 107 is completed, and repeats the above processing.

図３を用いて、構造検索式変換モジュール１０２の動作について説明する。本説明では、図２の例で取り上げたＸＭＬ文書構造定義１０８、スキーマ間マッピング定義１０９、および関係表スキーマ定義１１０に従って、データ変換モジュール１０１によりＲＤＢ１０５の関係表２０１、２０２、２０３、および構造検索エンジン１０６に格納イメージ２０４として格納されたＸＭＬ文書１０７に対して、ＸＱｕｅｒｙ標準に従って記述された構造検索式１１１を処理する場合を例にとる。 The operation of the structure retrieval formula conversion module 102 will be described with reference to FIG. In this description, in accordance with the XML document structure definition 108, the inter-schema mapping definition 109, and the relation table schema definition 110 taken up in the example of FIG. The case where the structure retrieval formula 111 described according to the XQuery standard is processed with respect to the XML document 107 stored as the stored image 204 in the example 106 is taken as an example.

構造検索式１１１は、ＦＯＲ句により変数＄ｉにｉ要素名を代入する。従って、ＷＨＥＲＥ句はｉ要素の属性ａが“ｘｘ１９”であり、そのｉ要素は、属性ｂが“ｘｘ２２”であるようなｊ要素を子要素として持ち、さらにそのｊ要素は、属性ｕが“ｘｘ２４”であるようなｏ要素を子要素として持つことを条件としている。さらに、上記の条件で抽出したｉ要素全体を結果として取得することを要求している。 The structure search formula 111 assigns the i element name to the variable $ i by the FOR clause. Therefore, in the WHERE clause, the attribute a of the i element is “xx19”, and the i element has a j element whose attribute b is “xx22” as a child element. It is conditional on having an o element such as xx24 ″ as a child element. Furthermore, it is required to acquire the entire i element extracted under the above conditions as a result.

構造検索式変換モジュール１０２は、スキーマ間マッピング定義１０９を参照して、上記の意味を持つ構造検索式１１１を構造指定ＵＤＦを含むＳＱＬ式１１２に変換する。マッピング定義１０９に従うと、上記の検索式の意味は、関係表ｉ（２０１）のカラムａ，関係表ｊ（２０２）のカラムｂ，および、関係表ｊ（２０２）のカラムｗに対して条件を指定していることと等価である。ｏ要素はｊ要素の子要素としては定義されていないため、マッピング定義１０９の１０行目を適用して、関係表ｊ（２０２）のカラムｗに対する条件となる。但し、このカラムは構造検索エンジン１０６に格納されたイメージ２０４への参照であるため、この条件は構造指定ＵＤＦで表現される。 The structure search expression conversion module 102 refers to the schema mapping definition 109 and converts the structure search expression 111 having the above meaning into an SQL expression 112 including a structure designation UDF. According to the mapping definition 109, the meaning of the above search expression is that the condition is applied to the column a of the relation table i (201), the column b of the relation table j (202), and the column w of the relation table j (202). It is equivalent to specifying it. Since the o element is not defined as a child element of the j element, the 10th line of the mapping definition 109 is applied and becomes a condition for the column w of the relation table j (202). However, since this column is a reference to the image 204 stored in the structure search engine 106, this condition is expressed by a structure designation UDF.

以上から、構造検索式１１１を変換したクエリ１１２は、関係表ｉ（２０１）から、カラムａが“ｘｘ１９”であるようなレコードを抽出し、関係表ｊ（２０２）のレコードから、カラムｂが“ｘｘ２２”、かつ、カラムｗで参照される構造検索エンジン１０６の格納イメージ（２０４）が、属性ｕの値＝“ｘｘ２４”であるようなｏ要素を含んでいるようなレコードを抽出し、さらに両レコードの間に外部参照関係が成り立っていることを条件として指定するＳＱＬ式として生成される。関係表ｊ（２０２）のカラムｗに対する構造指定は、ＸＭＬＨＡＳというＵＤＦで表現される。これは、第一引数で指定したカラムの値が指し示す構造検索エンジン１０６上の格納イメージが、第二引数で指定したＸＰａｔｈ構造式にマッチする部分データを含むか否かを判定するブール関数である。 From the above, the query 112 obtained by converting the structure search formula 111 extracts a record in which the column a is “xx19” from the relation table i (201), and the column b is extracted from the record in the relation table j (202). A record is extracted such that “xx22” and the storage image (204) of the structural search engine 106 referred to in the column w includes an o element such that the value of the attribute u = “xx24”. It is generated as an SQL expression that specifies that the external reference relationship is established between both records. The structure designation for the column w of the relation table j (202) is expressed by UDF XMLHAS. This is a Boolean function that determines whether the stored image on the structure search engine 106 indicated by the value of the column specified by the first argument contains partial data that matches the XPath structural formula specified by the second argument. .

また構造検索式１１１は、上記の条件を満たすｉ要素全体を結果として取得することを要求しているため、ＳＱＬ式１１２のＳＥＬＥＣＴ句には、ＸＭＬＶＡＬというＸＭＬ文書構築ＵＤＦを指定する。これは、引数でＩＤ指定された要素について、全ての子孫要素をＲＤＢ１０５および構造検索エンジン１０６から抽出して、ＸＭＬ文書を再構築して返すスカラ関数である。 Since the structure retrieval formula 111 requires that all i elements satisfying the above conditions are obtained as a result, the XML document construction UDF XMLVAL is specified in the SELECT clause of the SQL formula 112. This is a scalar function that extracts all descendant elements from the RDB 105 and the structure search engine 106 for the element whose ID is specified by an argument, and reconstructs and returns an XML document.

図３〜図５を用いて、クエリ実行制御モジュール１０３の動作を説明する。 The operation of the query execution control module 103 will be described with reference to FIGS.

図３は、ＲＤＢ１０５がプラグイン処理機構３０１を持つ場合を示している。この場合、クエリ実行制御モジュール１０３が実現すべき機能はＲＤＢ１０５に組み込まれていることになる。ここでは、説明のためにこの機能を単独で実現するプログラムをクエリ実行制御モジュールと呼び、ＲＤＢ１０５自身と区別する。クエリ実行制御モジュール１０３は、ＵＤＦを含むＳＱＬ式１１２をネイティブなＳＱＬ部とＵＤＦ部に分離し、ＲＤＢ１０５がＳＱＬ部を処理し、プラグイン処理機構３０１がＵＤＦ部を処理する。構造指定ＵＤＦであるＸＭＬＨＡＳは、実際には構造検索エンジン１０６で処理されるため、ほとんどの構造検索エンジンが対応している構造検索仕様ＸＰａｔｈのクエリ３０２として、該エンジンに対して発行する。但し、ＸＭＬＨＡＳがＲＤＢ１０５に組込みのプラグインとして実現されている場合は、ＲＤＢ１０５上で直接この構造指定ＵＤＦを実行する。ＸＭＬ文書構築ＵＤＦであるＸＭＬＶＡＬも、プラグイン処理機構３０１上で実行する。 FIG. 3 shows a case where the RDB 105 has a plug-in processing mechanism 301. In this case, the function to be realized by the query execution control module 103 is incorporated in the RDB 105. Here, for the sake of explanation, a program that realizes this function alone is called a query execution control module, and is distinguished from the RDB 105 itself. The query execution control module 103 separates the SQL expression 112 including the UDF into a native SQL unit and a UDF unit, the RDB 105 processes the SQL unit, and the plug-in processing mechanism 301 processes the UDF unit. Since XMLHAS, which is a structure designation UDF, is actually processed by the structure search engine 106, it is issued to the engine as a structure search specification XPath query 302 supported by most structure search engines. However, when XMLHAS is realized as a plug-in built in the RDB 105, the structure designation UDF is directly executed on the RDB 105. XMLVAL, which is an XML document construction UDF, is also executed on the plug-in processing mechanism 301.

クエリ実行制御モジュール１０３は、クエリを実行する際に、先に構造検索エンジン１０６に対する条件でデータを絞るか、あるいは関係表に対する条件でデータを絞るか、クエリの処理効率を指標にして決定する。 When executing the query, the query execution control module 103 determines whether to narrow down the data first with the condition for the structure search engine 106 or narrow down the data with the condition for the relational table, using the processing efficiency of the query as an index.

ＳＱＬ式１１２を例にとると、前者の場合は、まずＸＭＬＨＡＳの条件判定に適合する構造検索エンジン１０６上の格納イメージ２０４を抽出し、そのＩＤ（この例ではｘｉ−ａ）と関係表ｊ（２０２）のカラムｗの値が一致することも条件に含めて、関係表２０１，２０２からデータを抽出することになる。 Taking the SQL expression 112 as an example, in the former case, first, the stored image 204 on the structure search engine 106 that conforms to the XMLHAS condition determination is extracted, and its ID (xi-a in this example) and the relation table j ( 202), the value of the column w matches, and the data is extracted from the relationship tables 201 and 202, including the condition.

一方、後者の場合は、まず関係表ｉ（２０１）のカラムａと関係表ｊ（２０２）のカラムｂ、および関係表ｉ（２０１）のカラムｉｄと関係表ｊ（２０２）のカラムｐｉｄの間の外部参照関係を条件にデータを絞り、抽出した関係表ｊ（２０２）のレコードのカラムｗの値が指し示す、構造検索エンジン１０６上の格納イメージ２０４に対してＸＭＬＨＡＳによる条件判定を行う。 On the other hand, in the latter case, first, the column a of the relation table i (201), the column b of the relation table j (202), and the column id of the relation table i (201) and the column pid of the relation table j (202). The data is narrowed down by using the external reference relationship as a condition, and the condition determination by XMLHAS is performed on the storage image 204 on the structure search engine 106 indicated by the value of the column w of the record of the extracted relation table j (202).

上記のようなクエリ実行手順の決定は、一般的なＲＤＢ１０５が備える実行計画決定処理により最適化される。従って、プラグイン処理機構３０１を備えるＲＤＢ１０５を利用する場合は、クエリ実行制御モジュール１０３を新たに設ける必要はない。 The determination of the query execution procedure as described above is optimized by an execution plan determination process provided in a general RDB 105. Therefore, when the RDB 105 including the plug-in processing mechanism 301 is used, it is not necessary to newly provide the query execution control module 103.

一方、ＲＤＢ１０５にプラグイン処理機構３０１が備わっておらず、クエリ実行制御モジュール１０３をＲＤＢ１０５の外部に設ける必要がある場合の動作概要について、図４、図５を用いて説明する。クエリ実行制御モジュール１０３をＲＤＢ１０５の外に新たに設ける場合、上記のようなクエリ実行手順も独自に決定する必要がある。 On the other hand, an outline of operation when the RDB 105 does not include the plug-in processing mechanism 301 and the query execution control module 103 needs to be provided outside the RDB 105 will be described with reference to FIGS. 4 and 5. When the query execution control module 103 is newly provided outside the RDB 105, it is necessary to uniquely determine the query execution procedure as described above.

図４を用いて、先に構造検索エンジン１０６に対する条件でデータを絞る場合について説明する。クエリ実行制御モジュール１０３がＵＤＦを含むＳＱＬ式１１２を受けとると、該モジュール内のＵＤＦ分離処理４０１がネイティブなＳＱＬ式４０３とＵＤＦ部に分離する。次にクエリ実行制御モジュール１０３は、ＵＤＦ部を構造検索エンジン１０６に対するＸＰａｔｈ検索式３０２として発行し（丸付き数字１）、この式にマッチするデータを含む格納イメージ２０４のＩＤ（この例ではｘｉ−ａ）を獲得し、ＲＤＢ１０５に一時表ｘ（４０４）として格納する。次にクエリ実行制御モジュール１０３は、関係表ｊ（２０２）のカラムｗに格納されているＩＤが、この一時表ｘに含まれることも条件にして、ＳＱＬ式４０３により関係表２０１，２０２からデータを抽出する（丸付き数字２）。ただしＳＱＬ式４０３のｘは、一時表ｘを意味し、ｘ．ｉｄは一時表ｘのｉｄカラムを意味する。ＳＱＬ式４０３による検索の結果として、クエリ実行制御モジュール１０３にはｉ．ｉｄとして“ｒ０２”というデータが返る。 With reference to FIG. 4, the case where data is first narrowed down by the conditions for the structure search engine 106 will be described. When the query execution control module 103 receives the SQL expression 112 including the UDF, the UDF separation process 401 in the module separates into the native SQL expression 403 and the UDF section. Next, the query execution control module 103 issues the UDF part as an XPath search expression 302 to the structure search engine 106 (circled number 1), and the ID of the storage image 204 including data matching this expression (in this example, xi− a) is acquired and stored in the RDB 105 as a temporary table x (404). Next, on the condition that the ID stored in the column w of the relation table j (202) is included in the temporary table x, the query execution control module 103 performs data from the relation tables 201 and 202 using the SQL expression 403. Is extracted (circled number 2). However, x in the SQL expression 403 means a temporary table x, and x. id means the id column of the temporary table x. As a result of the search based on the SQL expression 403, the query execution control module 103 receives i. Data “r02” is returned as the id.

このようにしてタグ付き構造化文書再構成処理４０２は、抽出したＩＤを持つｉ要素を関係表２０１〜２０３および構造検索エンジン１０６の格納イメージ２０４のデータから再構成する。このため、タグ付き構造化文書再構成処理４０２は、スキーマ間マッピング定義１０９を参照して関係表よりデータを抽出するＳＱＬ式４０５〜４０７を作成し、これらのＳＱＬ式をＲＤＢ１０５に対して発行する（丸付き数字３）。これらは、関係表間の外部参照関係に基づき、抽出したＩＤ“ｒ０２”を持つｉ要素の全子孫要素を抽出するものである。ここでＳＱＬ式４０５のｉｉｄには“ｒ０２”が代入される。ｊｉｄには何も代入されず、結果的にはＳＱＬ式４０７の結果は返らない。本例の場合にはＳＱＬ式４０７がなくても構わない。一方、ｉ要素は一部に構造検索エンジン１０６に保存された格納イメージ２０４のデータも含むため、タグ付き構造化文書再構成処理４０２は、それを取得するためのクエリ４０８を、構造検索エンジン１０６に対して発行し（丸付き数字３’）し、格納イメージ２０４を取得する。タグ付き構造化文書再構成処理４０２は、ＸＭＬ文書構造定義１０８、スキーマ間マッピング定義１０９および関係表スキーマ定義１１０を参照し、抽出したデータと格納イメージ２０４からＸＭＬ文書を再構成し、結果１１３を得る（丸付き数字４）。 In this way, the tagged structured document reconstruction process 402 reconstructs the i element having the extracted ID from the data of the relational tables 201 to 203 and the stored image 204 of the structure search engine 106. For this reason, the tagged structured document reconstruction process 402 creates SQL expressions 405 to 407 that extract data from the relational table with reference to the inter-schema mapping definition 109 and issues these SQL expressions to the RDB 105. (Circled number 3). These are for extracting all descendant elements of the i element having the extracted ID “r02” based on the external reference relationship between the relationship tables. Here, “r02” is assigned to iid of the SQL expression 405. Nothing is substituted for jid, and as a result, the result of the SQL expression 407 is not returned. In the case of this example, the SQL expression 407 may be omitted. On the other hand, since the i element partially includes data of the stored image 204 stored in the structural search engine 106, the tagged structured document reconstruction process 402 executes a query 408 for acquiring the query 408 to obtain the structural image. Is issued (circled number 3 ') to obtain the stored image 204. The tagged structured document reconstruction process 402 refers to the XML document structure definition 108, the inter-schema mapping definition 109, and the relation table schema definition 110, reconstructs the XML document from the extracted data and the storage image 204, and obtains a result 113. Obtain (circled number 4).

図５を用いて、先に関係表２０１，２０２に対する条件でデータを絞る場合について説明する。この場合、ＵＤＦ分離処理４０１は、ＵＤＦを含むＳＱＬ式１１２を、ネイティブＳＱＬ式５０２のようなクエリに分離する。クエリ実行制御モジュール１０３は、このＳＱＬ式をＲＤＢ１０５に発行し（丸付き数字１）、関係表２０１，２０２に対する条件でデータを絞る。その際に、関係表ｊ（２０２）のカラムｗの値も同時に抽出する。クエリ実行制御モジュール１０３は、ｉ．ｉｄとして“ｒ０２”、ｊ．ｗとして“ｘｉ−ａ”という値を受け取る。構造判定処理５０１は、ＳＱＬ式５０２の結果を受け取り、格納イメージＩＤ“ｘｉ−ａ”とＸＰａｔｈ式３０２を構造検索エンジン１０６に送り（丸付き数字２）、これらの条件に合う格納イメージが構造検索エンジン１０６に登録されているか否かを判定する。構造検索エンジン１０６に該当するデータがあれば、格納イメージＩＤ“ｘｉ−ａ”をタグ付き構造化文書再構成処理４０２に渡す。以降の処理は、図４の場合と同一である。 With reference to FIG. 5, the case where data is first narrowed down by the conditions for the relational tables 201 and 202 will be described. In this case, the UDF separation process 401 separates the SQL expression 112 including the UDF into a query like the native SQL expression 502. The query execution control module 103 issues this SQL expression to the RDB 105 (circled number 1), and narrows down the data according to the conditions for the relationship tables 201 and 202. At that time, the value of the column w of the relation table j (202) is also extracted at the same time. The query execution control module 103 performs i. id “r02”, j. The value “xi-a” is received as w. The structure determination processing 501 receives the result of the SQL expression 502, sends the storage image ID “xi-a” and the XPath expression 302 to the structure search engine 106 (circled number 2), and the stored image that matches these conditions is searched for the structure. It is determined whether or not it is registered in the engine 106. If there is data corresponding to the structure search engine 106, the storage image ID “xi-a” is passed to the structured document reconstruction process 402 with tag. The subsequent processing is the same as in the case of FIG.

なお、以上の説明では、構造検索式変換モジュール１０２とクエリ実行制御モジュール１０３を区別して説明したが、これらは一つのモジュールとして実現されていても構わない。その場合は、ＵＤＦを含むＳＱＬ式１１２を生成せずに、構造検索式１１１から直接ＳＱＬ式４０３、５０２に変換するような実施例もあり得ることは自明である。 In the above description, the structure search expression conversion module 102 and the query execution control module 103 are distinguished from each other, but these may be realized as one module. In that case, it is obvious that there may be an embodiment in which the SQL expression 112 including the UDF is not generated and the structure search expression 111 is directly converted into the SQL expressions 403 and 502.

以降、本実施例におけるマッピング定義チューニングモジュール１０４が行う具体的なマッピング定義改善の処理手順について、図６、図７（ａ）、図７（ｂ）、図８を用いて説明する。 Hereinafter, specific mapping definition improvement processing procedures performed by the mapping definition tuning module 104 according to the present embodiment will be described with reference to FIGS. 6, 7 (a), 7 (b), and 8.

図６を用いて、ＸＭＬ文書構造定義で明示的に定義されていない部分データについての検索頻度が所定数を越える場合の、マッピング定義改善処理について説明する。 The mapping definition improving process when the search frequency for partial data not explicitly defined in the XML document structure definition exceeds a predetermined number will be described with reference to FIG.

システムは、タグ付き構造化文書に対する構造検索式の発行履歴を図示しない検索履歴データベースに記録する。マッピング定義チューニングモジュール１０４は、検索履歴データベースを参照し、同一のＵＤＦについての構造検索式の発行頻度を計数する。図２〜図５の例における構造検索式１１１中のｏ要素についての条件指定のように、ＸＭＬ文書構造定義に登場せず、従って明示的にスキーマ間マッピングを定義していない部分に対して検索が頻出する場合、マッピング定義チューニングモジュール１０４は、この部分を格納する関係表とマッピング定義を自動的に生成する。 The system records the issuance history of the structure search formula for the tagged structured document in a search history database (not shown). The mapping definition tuning module 104 refers to the search history database and counts the frequency of issuing structure search formulas for the same UDF. Like the condition specification for the o element in the structure search formula 111 in the examples of FIGS. Mapping definition tuning module 104 automatically generates a relation table storing this part and a mapping definition.

ＲＤＢの検索処理性能は、長年の改良の結果、一般的な構造検索エンジンに比べ高速であり、また他の関係表データに対するのと同時に条件指定することを考慮した場合、データは、ＲＤＢ外部の構造検索エンジンではなく、可能な限り関係表で管理した方が効率的に優れるため、このようなマッピングの変更は性能改善に繋がる。 As a result of improvements over many years, the RDB search processing performance is faster than general structural search engines, and when considering specifying conditions simultaneously with other relational table data, the data is stored outside the RDB. Since it is more efficient to manage with a relational table as much as possible rather than a structural search engine, such a change in mapping leads to improved performance.

本例では、マッピング定義チューニングモジュール１０４は、ＲＤＢ１０５上に関係表ｏ（６０１）を新規に作成し、関係表ｊ（２０２）との間に外部参照関係を規定する。この時、関係表スキーマ定義１１０は、関係表スキーマ定義６０３に変更される。関係表ｏは、カラムｐｉｄ，ｕ，ｖ，ｉｄの４つのカラムを持つと定義される。同時に、スキーマ間マッピング定義１０９は、スキーマ間マッピング定義６０４に変更される。マッピング定義チューニングモジュール１０４は、マッピング定義１０９の１０行目にあった未定義部分を構造検索エンジンにマッピングすることを表す記述を削除し、新たに１０行目に関係表ｊと関係表ｏの外部参照関係を表す記述、および１１，１２行目に、ｏ要素の各属性と関係表ｏの各カラムとの対応を表す記述を追加する。またマッピング定義チューニングモジュール１０４は、ＸＭＬ文書構造定義１０８の｛ＡＮＹ｝を＜ｏｕ＝“ｓｔｒ” ｂ＝“ｓｔｒ”／＞×０…ｎに変更する。 In this example, the mapping definition tuning module 104 newly creates a relation table o (601) on the RDB 105, and defines an external reference relationship with the relation table j (202). At this time, the relation table schema definition 110 is changed to the relation table schema definition 603. The relation table o is defined as having four columns, columns pid, u, v, and id. At the same time, the inter-schema mapping definition 109 is changed to the inter-schema mapping definition 604. The mapping definition tuning module 104 deletes the description indicating that the undefined part existing in the 10th line of the mapping definition 109 is mapped to the structure search engine, and newly adds the relation table j and the relation table o to the 10th line. A description representing the reference relationship and a description representing the correspondence between each attribute of the o element and each column of the relationship table o are added to the 11th and 12th lines. Further, the mapping definition tuning module 104 changes {ANY} of the XML document structure definition 108 to <ou = “str” b = “str” /> × 0... N.

マッピング定義チューニングモジュール１０４が実行する処理手順の詳細は次の通りである。マッピング定義チューニングモジュール１０４は、スキーマ間マッピング定義１０９の各定義レコードをたどり、ＸＭＬ文書構造定義１０８に定義されていない要素の部分内容を見つける。次にマッピング定義チューニングモジュール１０４は、関係表スキーマ定義１１０を参照してその部分内容に定義された関係表とカラムの識別子を取得する。次にマッピング定義チューニングモジュール１０４は、ＲＤＢ１０５に対してＳＱＬ検索式を送付し、その関係表とカラム位置の属性値を取得する。その属性値が構造検索エンジン１０６のファイルＩＤを示しているので、マッピング定義チューニングモジュール１０４は、構造検索エンジン１０６からその格納イメージ２０４を取得する。次にマッピング定義チューニングモジュール１０４は、ＲＤＢ１０５を介してその記憶領域内に関係表ｏの記憶領域を確保する。次にマッピング定義チューニングモジュール１０４は、上記のデータ変換モジュール１０１の処理手順に従って格納イメージ２０４からｏ要素を取り出し、ＲＤＢ１０５を介して関係表ｏを作成する。次にマッピング定義チューニングモジュール１０４は、関係表スキーマ定義１１０に関係表ｏの定義を追加し、関係表スキーマ定義１１０を関係表スキーマ定義６０３に更新する。次にマッピング定義チューニングモジュール１０４は、スキーマ間マッピング定義１０９に関係表ｏについてのマッピング定義を追加し、スキーマ間マッピング定義１０９をスキーマ間マッピング定義６０４に更新する。次にマッピング定義チューニングモジュール１０４は、ＸＭＬ文書構造定義１０８の定義文をたどり、未定義の要素を見つけ、ｏ要素の定義に置き換える。次にマッピング定義チューニングモジュール１０４は、構造検索エンジン１０６から格納イメージ２０４を削除する。 Details of the processing procedure executed by the mapping definition tuning module 104 are as follows. The mapping definition tuning module 104 follows each definition record of the inter-schema mapping definition 109 and finds partial contents of elements not defined in the XML document structure definition 108. Next, the mapping definition tuning module 104 refers to the relation table schema definition 110 and acquires the relation table and column identifier defined in the partial contents. Next, the mapping definition tuning module 104 sends an SQL search expression to the RDB 105, and acquires an attribute value of the relation table and column position. Since the attribute value indicates the file ID of the structure search engine 106, the mapping definition tuning module 104 acquires the stored image 204 from the structure search engine 106. Next, the mapping definition tuning module 104 secures a storage area of the relation table o in the storage area via the RDB 105. Next, the mapping definition tuning module 104 extracts the o element from the stored image 204 in accordance with the processing procedure of the data conversion module 101 and creates the relation table o via the RDB 105. Next, the mapping definition tuning module 104 adds the definition of the relation table o to the relation table schema definition 110 and updates the relation table schema definition 110 to the relation table schema definition 603. Next, the mapping definition tuning module 104 adds the mapping definition for the relational table o to the inter-schema mapping definition 109 and updates the inter-schema mapping definition 109 to the inter-schema mapping definition 604. Next, the mapping definition tuning module 104 follows the definition sentence of the XML document structure definition 108, finds an undefined element, and replaces it with the definition of the o element. Next, the mapping definition tuning module 104 deletes the stored image 204 from the structure search engine 106.

以上の変更が加えられたマッピング定義においては、構造検索式１１１は、構造検索式変換モジュール１０２によって、ＳＱＬ式６０２に変換されることになる。このＳＱＬ式は、構造指定ＵＤＦを含まないため、ＲＤＢ１０５で処理するのに望ましい形となっている。 In the mapping definition to which the above changes are made, the structure search formula 111 is converted into the SQL formula 602 by the structure search formula conversion module 102. Since this SQL expression does not include the structure designation UDF, it is a desirable form for processing by the RDB 105.

図７（ａ）及び図７（ｂ）を用いて、再帰的な構造を持つＸＭＬデータ管理の改善を実現する処理手順について説明する。図７（ａ）に示すように、ＸＭＬ文書７０１は、ＸＭＬ文書構造定義７０２に妥当である、自己再帰的な構造を持つ。すなわちｊ要素の子要素としてｊ要素自身が複数出現する。このようなＸＭＬ文書の関係表への格納方法は、スキーマ間マッピング定義７０３、および関係表スキーマ定義７０４によって定義される。マッピング定義７０３の３行目は、ｊ要素の親はｉ要素かｊ要素であり、その区別を関係表ｊのカラムｐｒｌの値（“ｉ”または“ｊ”）で表現することを意味している。ＸＭＬ文書７０１の格納先となる関係表は、関係表ｉ（７０５）および関係表ｊ（７０６）の２つで、関係表ｉと関係表ｊの間の外部参照関係、および関係表ｊ内部での自己参照関係が規定されている。 A processing procedure for realizing improvement of XML data management having a recursive structure will be described with reference to FIGS. 7A and 7B. As shown in FIG. 7A, the XML document 701 has a self-recursive structure that is valid for the XML document structure definition 702. That is, a plurality of j elements themselves appear as child elements of the j element. A method for storing such an XML document in a relation table is defined by an inter-schema mapping definition 703 and a relation table schema definition 704. The third line of the mapping definition 703 means that the parent of the j element is an i element or a j element, and the distinction is expressed by the value of the column prl (“i” or “j”) of the relation table j. Yes. There are two relation tables as the storage destination of the XML document 701, the relation table i (705) and the relation table j (706). The external reference relationship between the relation table i and the relation table j, and the relation table j Self-reference relationship is defined.

一方、構造検索式７０７は、属性ａの値が“ｘｘ０１”であるｉ要素の子孫要素として任意の階層に出現する、属性ａの値が“ｘｘ１８”であるようなｊ要素を抽出することを要求している。このことをＳＱＬ式で表現するには、再帰クエリを利用する必要がある。構造検索式７０７は、構造検索式変換モジュール１０２によって、ＳＱＬ式７０８に変換される。このＳＱＬ式は、再帰的に関係表ｊ（７０６）の自己参照関係を辿って、一時表ｔｍｐに、ｉ要素の全ての子孫を抽出して行く再帰クエリである。 On the other hand, the structure search expression 707 extracts a j element that appears in an arbitrary hierarchy as a descendant element of an i element whose attribute a value is “xx01” and whose attribute a value is “xx18”. Demands. In order to express this in an SQL expression, it is necessary to use a recursive query. The structure search expression 707 is converted into an SQL expression 708 by the structure search expression conversion module 102. This SQL expression is a recursive query that recursively follows the self-reference relationship of the relation table j (706) and extracts all descendants of the i element in the temporary table tmp.

しかし、一般的にＲＤＢの再帰クエリは効率の悪い処理であり、このような構造検索式が頻出する場合には、上記のようなマッピング定義は好ましくない。 However, the recursive query of RDB is generally an inefficient process, and such a mapping definition is not preferable when such a structure search expression appears frequently.

これに対し、再帰構造を持つＸＭＬ部分データを、敢えて構造検索エンジン１０６に格納することで改善を図る。一般的に構造検索エンジンは、階層の深いデータに対しても妥当な性能で検索処理が可能であるように設計されているため、関係表で管理するよりも効率が良い場合がある。 On the other hand, the XML partial data having a recursive structure is intentionally stored in the structure search engine 106 to improve. In general, a structural search engine is designed so that a search process can be performed with a reasonable performance even for deep data, and may be more efficient than managing with a relational table.

図７（ｂ）に示すスキーマ間マッピング定義７０９は、上記のスキーマ間マッピング定義７０３における３〜６行目のｊ要素を関係表ｊ（７０６）に対応付けている記述を削除し、新たに３行目に、ｉ要素の子孫を全て構造検索エンジン１０６に格納する記述を追加している。関係表スキーマ定義７１０は、関係表ｉ（７０５）に構造検索エンジン１０６での格納イメージのＩＤを格納するカラムｗを追加している。 The inter-schema mapping definition 709 shown in FIG. 7B deletes the description in which the j element on the 3rd to 6th lines in the inter-schema mapping definition 703 is associated with the relation table j (706), and 3 A description for storing all descendants of the i element in the structure search engine 106 is added to the line. In the relation table schema definition 710, a column w for storing the ID of the stored image in the structure search engine 106 is added to the relation table i (705).

以上のマッピング定義においては、ＸＭＬ文書７０１は、関係表７０５および構造検索エンジン１０６の格納イメージ７１１，７１２に分解して格納される。また構造検索式７０７は、構造検索式変換モジュール１０２によって、ＵＤＦを含むＳＱＬ式７１３に変換される。ＳＱＬ式７１３は、ＳＱＬ式７０８と比較して再帰を含まないシンプルなクエリとなっており、ＲＤＢ１０５と構造検索エンジン１０６の適切な使い分けが成される。 In the above mapping definition, the XML document 701 is decomposed and stored in the relationship table 705 and the storage images 711 and 712 of the structure search engine 106. The structure retrieval formula 707 is converted into an SQL expression 713 including UDF by the structure retrieval formula conversion module 102. The SQL expression 713 is a simple query that does not include recursion compared to the SQL expression 708, and appropriate use of the RDB 105 and the structure search engine 106 is achieved.

なお、以上のようなＲＤＢ１０５での管理が非効率的であるＸＭＬ文書を、敢えて構造検索エンジンに格納するように変更する改善手法は、再帰構造を持つＸＭＬ文書以外でも適用可能である。例えば、階層の深いＸＭＬ文書を関係表に格納する場合は、多数の関係表を定義してその間の外部参照関係を規定することになるが、このような関係表に対して構造検索をかける場合は、外部参照関係の条件を全てＳＱＬ式に加えなくてはならない。このような条件は、ＲＤＢにおいては検索コストの高いジョイン操作として処理されるため効率が悪い。このような場合に対しても、図７（ｂ）のようなマッピング定義チューニング手法を適用することによって、検索効率を改善することが可能である。 Note that the above-described improvement method for changing an XML document that is inefficiently managed by the RDB 105 to be stored in the structure search engine can be applied to an XML document other than an XML document having a recursive structure. For example, when storing an XML document with a deep hierarchy in a relational table, a large number of relational tables are defined to define external reference relationships between them. Must add all the conditions of the external reference relationship to the SQL expression. Such a condition is not efficient because it is processed as a join operation with a high search cost in the RDB. Even in such a case, it is possible to improve the search efficiency by applying the mapping definition tuning method as shown in FIG.

階層の深いＸＭＬ文書のマッピング定義の改善には、構造検索エンジンを用いない別の手法もある。図８を用いてこれを説明する。構造検索式８０１は、ｉ要素、その子要素であるｊ要素、さらにその子要素であるｋ要素に関する条件を指定するクエリである。スキーマ間マッピング定義１０９を用いる場合には、この構造検索式は構造変換モジュール１０２によってＳＱＬ式８０３に変換されることになる。このＳＱＬ式には、二つのジョイン操作、“ｉ．ｉｄ＝ｊ．ｐｉｄ”、および“ｊ．ｉｄ＝ｋ．ｐｉｄ”の条件が含まれることになる。これに対し、関係表ｋ（２０３）を関係表ｋ（８０２）のように、関係表ｉ（２０１）のカラムａと関係表ｊ（２０２）のカラムｃの値もレコードに含むように更新することによって、同じ構造検索式を関係表ｋ（８０２）のみに対するクエリとして実行することが可能となる。 There is another method that does not use a structural search engine to improve the mapping definition of a deep XML document. This will be described with reference to FIG. The structure search expression 801 is a query that specifies conditions regarding an i element, a j element that is a child element thereof, and a k element that is a child element thereof. When the inter-schema mapping definition 109 is used, this structure search expression is converted into the SQL expression 803 by the structure conversion module 102. This SQL expression includes two join operations, “i.id = j.pid” and “j.id = k.pid”. On the other hand, the relation table k (203) is updated so as to include the values of the column a of the relation table i (201) and the column c of the relation table j (202) as in the relation table k (802). As a result, the same structure retrieval formula can be executed as a query only for the relation table k (802).

関係表スキーマ定義１１０、およびスキーマ間マッピング定義１０９は、それぞれ関係表スキーマ定義８０５、スキーマ間マッピング定義８０６に更新されることになる。スキーマ間マッピング定義８０６の１行目は、ｉ要素の属性ａの値を関係表ｋ（８０２）のカラムｉａにも格納することを表現している。６行目も同様である。以上のマッピング定義においては、構造検索式８０１は、構造検索式変換モジュール１０２によって、ＳＱＬ式８０４に変換される。該検索式はジョイン操作を含まないため検索コストが低い。 The relation table schema definition 110 and the inter-schema mapping definition 109 are updated to the relation table schema definition 805 and the inter-schema mapping definition 806, respectively. The first line of the inter-schema mapping definition 806 expresses that the value of the attribute a of the i element is also stored in the column ia of the relation table k (802). The same applies to the sixth line. In the above mapping definition, the structure search expression 801 is converted into the SQL expression 804 by the structure search expression conversion module 102. Since the search formula does not include a join operation, the search cost is low.

複数の構造検索式の効率化を目的とする場合は、全ての構造検索式のパスの和を取って、上記と同様のマッピング定義改善手法を適用することが可能である。例えば、以下の構造検索式全てに関して効率化を図る場合：
・／ｘ／ｉ［＠ａ＝“．．”］／／ｋ［＠ａ＝“．．”］
・／／ｊ［＠ｃ＝“．．”］／ｋ［＠ｂ＝“．．”］
・／／ｉ［＠ａ＝“．．” ａｎｄ＠ｂ＝“．．”］／ｊ［＠ｃ＝“．．”］／ｋ
ｉ要素の属性ａ，ｂ、およびｊ要素の属性ｃの値を含むように関係表ｋを更新する。 When the purpose is to improve the efficiency of a plurality of structure search expressions, it is possible to apply the same mapping definition improvement technique as described above by taking the sum of the paths of all structure search expressions. For example, to improve efficiency for all of the following structural search expressions:
/ X / i [@a = “...”] // k [@a = “...”]
// j [@c = “...”] / k [@b = “...”]
// i [@a = “...” and @b = “..”] / j [@c = “..”] / k
The relation table k is updated so as to include the values of the attributes a and b of the i element and the attribute c of the j element.

このようなマッピング変更は、関係表の正規化を崩すことにあたり、一つの値を複数のカラムで管理することになるため、データの更新時にはオーバヘッドとなる。マッピング定義チューニングモジュール１０４は、更新クエリの発行履歴も併せて参照し、参照系クエリと更新系クエリの発行頻度の兼ね合いに応じて、このマッピング定義改善手法を適用するか否かを決定する。 Such a change in mapping causes a single value to be managed by a plurality of columns when breaking the normalization of the relational table, and therefore it becomes an overhead when updating data. The mapping definition tuning module 104 also refers to the issuance history of the update query, and determines whether or not to apply this mapping definition improving method according to the balance between the issuance frequencies of the reference query and the update query.

なお、以上の説明で用いたスキーマ間マッピング定義の記法は実施例を限定するものではなく、同様の意味を表現し得る定義仕様であれば、どのような記法でも適用可能である。また、以上は、ＸＭＬ文書の管理方法として説明したが、本実施例における方法は、ＳＧＭＬ、ＨＴＭＬに代表されるタグ付き構造化文書一般の管理方法としても適用可能であることは自明である。 The notation of the schema-to-schema mapping definition used in the above description is not limited to the embodiment, and any notation is applicable as long as the definition specification can express the same meaning. Although the above description has been given as a method for managing XML documents, it is obvious that the method in this embodiment can also be applied as a general method for managing structured documents with tags typified by SGML and HTML.

実施例の全体構成図である。It is a whole block diagram of an Example. 実施例のＸＭＬ文書のデータ変換機能に関する部分の構成図である。It is a block diagram of the part regarding the data conversion function of the XML document of an Example. 実施例のクエリリライト機能に関する部分の構成図である。It is a block diagram of the part regarding the query rewrite function of an Example. 実施例のクエリ実行機能に関する部分の構成図である。It is a block diagram of the part regarding the query execution function of an Example. 実施例のクエリ実行機能に関する部分の構成図（続き）である。It is a block diagram (continuation) of the part regarding the query execution function of an Example. 実施例のスキーママッピング改善例を説明する図である。It is a figure explaining the schema mapping improvement example of an Example. 実施例のスキーママッピング改善例を説明する図（続き）である。It is a figure (continuation) explaining the schema mapping improvement example of an Example. 実施例のスキーママッピング改善例を説明する図（続き）である。It is a figure (continuation) explaining the schema mapping improvement example of an Example. 実施例のスキーママッピング改善例を説明する図（続き）である。It is a figure (continuation) explaining the schema mapping improvement example of an Example.

Explanation of symbols

１０１．．．タグ付き構造化文書−関係表間データ変換モジュール，１０２．．．構造検索式変換モジュール，１０３．．．クエリ実行制御モジュール，１０４．．．マッピング定義チューニングモジュール，１０５．．．リレーショナルデータベース，１０６．．．構造検索エンジン，１０７／７０１．．．タグ付き構造化文書，１０８／７０２．．．タグ付き構造化文書スキーマ定義，１０９／６０４／７０３／７０９／８０６．．．スキーマ間マッピング定義，１１０／６０３／７０４／７１０／８０５．．．関係表スキーマ定義，１１１／７０７／８０１．．．構造検索式，１１２／６０２／７０８／７１３／８０３／８０４．．．リライト結果のクエリ，１１３．．．（構造検索式の）結果，２０１〜２０３／６０１／７０５／７０６／８０２．．．関係表，２０４／７１１／７１２．．．（構造検索エンジンに対する部分ＸＭＬ文書の）格納イメージ
101. . . Tagged structured document-relational table data conversion module, 102. . . Structure retrieval formula conversion module, 103. . . Query execution control module, 104. . . Mapping definition tuning module, 105. . . Relational database, 106. . . Structure search engine, 107/701. . . Tagged structured document, 108/702. . . Tagged structured document schema definition, 109/604/703/709/806. . . Mapping definition between schemas, 110/603/704/710/805. . . Relation table schema definition, 111/707/801. . . Structure retrieval formula, 112/602/708/713/803/804. . . Rewrite result query, 113. . . As a result (of structure retrieval formula), 201-203 / 601/705/706/802. . . Relation table, 204/711/712. . . Storage image (partial XML document for structural search engine)

Claims

In a method for managing a tree-structured tagged structured document using a relational database and a database dedicated to structure search,
In accordance with the structured document storage definition, a single structured document is decomposed into a first structure part stored in the relational database and a second structure part stored in the structure search dedicated database.
For the first structure part, the data itself with the tag removed is extracted and stored in the column of the relation table associated with the storage definition,
The second structure portion is stored in a structure search dedicated database with the tag included,
Converting a structure search expression for the original structured document into a first search expression for the relational database and a second search expression for the structure search-dedicated database according to the storage definition;
Issuing the first search formula to the relational database and receiving the result, issuing the second search formula to the structure search dedicated database and receiving the result;
An XML data management method comprising a procedure for constructing a structured document equivalent to a result for the structure retrieval formula from both results.

In a method for managing a tree-structured tagged structured document using a relational database and a database dedicated to structure search,
In accordance with the structured document storage definition, a single structured document is decomposed into a first structure part stored in the relational database and a second structure part stored in the structure search dedicated database.
For the first structure part, the data itself with the tag removed is extracted and stored in the column of the relation table associated with the storage definition,
The second structure portion is stored in a structure search dedicated database with the tag included,
Converting a structure search expression for the original structured document into a first search expression for the relational database and a second search expression for the structure search-dedicated database according to the storage definition;
The second search expression is expressed by a relational database search expression extension function that can be embedded in the first search expression that executes a structure search process equivalent to the second search expression.
Generating a relational database search expression with an extension function in which the extension function is embedded in the first search expression;
An XML data management method comprising issuing a relational database retrieval formula with an extension function to the relational database and obtaining a structured document equivalent to a result of the structural retrieval formula.

Record the issuance history of structural search formulas for tagged structured documents,
Using the search processing efficiency of frequently issued search expressions as an index,
The XML data management method according to claim 1, wherein the structured document storage definition and the schema definition of the relation table are updated.

Among the tagged structured documents, when the frequency of issuing the same structure search formula for the partially structured document stored in the structure search dedicated database exceeds a predetermined number,
Create a new relational table for storing the partially structured document stored in the structure search database;
Add the correspondence between the partially structured document and the newly created relation table to the structured document storage definition,
4. The XML data management method according to claim 3, wherein the partially structured document stored in the structure search dedicated database is stored again in the newly created relational table.

In a structured document with a tag, the element indicated by a tag has a self-recursive structure with an element of the same type as the tag as a child element,
And the tag data is stored in the relational database,
And when the same structural search expression appears regardless of the depth of the hierarchy for the tag,
4. The XML data management method according to claim 3, wherein the self-recursive data is stored again in a structure search database.

In the structured document with tag, when the same structure search expression that specifies the multi-level hierarchy for the data stored in the relational table database appears,
4. The XML data management method according to claim 3, wherein the data is stored again in a structure search database.

In the structured document with tag, when the same structure search expression that specifies the multi-level hierarchy for the data stored in the relational table database appears,
Create a new single relational table with columns that list all levels of data appearing in the structure search expression,
4. The XML data management method according to claim 3, wherein the data is stored again in the newly created relationship table.

In the tagged structured document, when multiple identical structure search expressions that specify multiple levels of hierarchy for the data stored in the relational table database appear,
Create a new single relational table with a column in which the union of the data of all layers appearing in the plurality of structural search expressions is arranged,
4. The XML data management method according to claim 3, wherein the data is stored again in the newly created relationship table.