JP2000348038A

JP2000348038A - Device and method for storing data for semi-structured database

Info

Publication number: JP2000348038A
Application number: JP11154783A
Authority: JP
Inventors: Hiroshi Ishikawa; 博石川; Yasuhiko Kanemasa; 泰彦金政; Kazumi Kubota; 和己久保田; Yasuo Noguchi; 泰生野口
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1999-06-02
Filing date: 1999-06-02
Publication date: 2000-12-15

Abstract

PROBLEM TO BE SOLVED: To efficiently retrieve large-scale data in a semi-structured database. SOLUTION: A designating means 1 designates the structure of a partial tree which can be a retrieving object by using the label, etc., of an edge in tree structure data. An extraction means 2 extracts information of at least one partial tree suited to the designated structure and a storing means 3 gathers information of a node, edge, etc., included in the extracted tree by each partial tree to store in a physical storing area.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、半構造データベー
スを構成するために用いられるデータ格納装置およびそ
の方法に関する。[0001] 1. Field of the Invention [0002] The present invention relates to a data storage device and a method for constructing a semi-structured database.

【０００２】[0002]

【従来の技術】従来のリレーショナルデータベースやオ
ブジェクト指向データベース等においては、あらかじめ
データ構造を定義するスキーマとスキーマに従うデータ
の集まりとが管理される。2. Description of the Related Art In a conventional relational database, object-oriented database, or the like, a schema defining a data structure and a group of data according to the schema are managed in advance.

【０００３】例えば、リレーショナルデータベースを用
いて蔵書目録等を作るときは、書籍のスキーマを作る際
に、著者、書籍名、出版社等の属性を定義する。しか
し、著者の人数は事前に決定できないため、通常は著者
の人数の上限を仮定して、スキーマではその上限の人数
までの繰り返しを定義する。このため、上限を超える数
の著者により執筆された書籍が出現した場合には、その
情報を格納することができない。For example, when creating a book catalog or the like using a relational database, attributes such as an author, a book name, and a publisher are defined when a book schema is created. However, since the number of authors cannot be determined in advance, the schema usually defines an upper limit for the number of authors, and the schema defines repetitions up to the upper limit. For this reason, if a book written by more authors than the upper limit appears, that information cannot be stored.

【０００４】これに対して、オブジェクト指向データベ
ースでは、スキーマにより任意数の繰り返しを記述でき
るので、このような問題は解決できる。しかし、あらか
じめ想定していた属性と全く異なるものを格納する必要
があるときには、やはり対応することができない。例え
ば、蔵書目録のスキーマに著者の所属組織の属性が定義
されていない場合、研究レポートの研究機関の名前等を
格納することができない。On the other hand, in an object-oriented database, an arbitrary number of repetitions can be described by a schema, so that such a problem can be solved. However, when it is necessary to store an attribute that is completely different from the attribute assumed in advance, it cannot be handled. For example, if the attribute of the organization to which the author belongs is not defined in the collection catalog schema, the name of the research institution of the research report cannot be stored.

【０００５】このように、リレーショナルデータベース
やオブジェクト指向データベースを利用できる分野は、
あらかじめ業務の分析ができ、扱うデータの構造を限定
できるような分野である。したがって、外部から新規の
構造のデータを収集して格納する用途には、これらのデ
ータベースは適していない。例えば、小説等を想定した
蔵書目録のスキーマでは、研究レポートのような文献が
飛び込んできた場合に、著者の人数や所属組織等の属性
を格納できずに困ることになる。As described above, fields in which a relational database or an object-oriented database can be used are as follows.
This is an area where the business can be analyzed in advance and the structure of the data to be handled can be limited. Therefore, these databases are not suitable for the purpose of collecting and storing data of a new structure from the outside. For example, in a schema of a collection catalog assuming a novel or the like, when a document such as a research report jumps in, it becomes difficult to store attributes such as the number of authors and the organization to which the author belongs.

【０００６】これに対して、半構造（semi-structured
）データベースでは、リレーショナルデータベースや
オブジェクト指向データベースとは異なり、データ構造
を規定するスキーマがなく、データの中に構造情報が一
緒に管理される。このため、半構造データベースは、あ
らかじめ想定していない新規の構造を持つ未知データ
を、通信ネットワークや外部のソースから収集して格納
していくことが可能である。On the other hand, semi-structured
2.) In a database, unlike a relational database or an object-oriented database, there is no schema for defining a data structure, and structural information is managed together in data. For this reason, the semi-structure database can collect and store unknown data having a new structure that is not assumed in advance from a communication network or an external source.

【０００７】近年のネットワークの発達により、様々な
業務分野で外部から新規の構造のデータを収集して管理
するシステムが必要になっている。これらの分野におい
ては、新規の構造のデータを格納するために半構造デー
タベースが利用されるようになるだろうと期待される。With the development of networks in recent years, a system for collecting and managing data of a new structure from outside in various business fields is required. In these areas, it is expected that semi-structured databases will be used to store new structured data.

【０００８】最近では、半構造データベースが、特に、
ＸＭＬ（extensible markup language）等で記述された
構造化文書のデータベース化に利用できるとして注目を
集めている。ＸＭＬは、電子商取引等で使われるデータ
構造で、その需要は急激に増加しており、これを取り扱
うデータベースやそのデータベースを核としたインフラ
ストラクチャが望まれている。すでに、半構造データベ
ースに格納されることを前提にしたＸＭＬデータに関す
る問い合わせ言語が、Ｗ３Ｃ（The World WideWeb Cons
ortium ）により提案されている。More recently, semi-structured databases, in particular,
Attention has been drawn to its use as a database of structured documents described in extensible markup language (XML) or the like. XML is a data structure used in electronic commerce and the like, and its demand is rapidly increasing. A database that handles the XML and an infrastructure centered on the database are desired. A query language for XML data that is presumed to be stored in a semi-structured database is already W3C (The World Wide Web Constraint).
ortium).

【０００９】そこで、半構造データベースにおけるデー
タモデルと格納形式について説明する。半構造データベ
ースはスキーマを持たず、データの中に構造情報を持っ
ている。半構造データベースとして提案されているシス
テムはいくつかあるが、そのデータモデルは、おおむね
図３７に示すようなものである。Therefore, a data model and a storage format in the semi-structured database will be described. Semistructured databases do not have schemas, but have structural information in the data. Some systems have been proposed as semi-structured databases, but their data models are generally as shown in FIG.

【００１０】図３７のデータモデルは、ノードとエッジ
（リンク）からなる木構造で表現され、複数の論文に関
するデータを表している。エッジにはデータの属性を表
現するラベルが付けられ、末端のノードには値が格納さ
れる。ラベル“ｐａｐｅｒ”、“ｉｄ”、“ｔｉｔｌ
ｅ”、“ａｕｔｈｏｒ”、“ｎａｍｅ”、“ｐｏｓｉｔ
ｉｏｎ”、“ｐａｇｅ”、“ｆｉｒｓｔｐａｇｅ”、お
よび“ｌａｓｔｐａｇｅ”は、それぞれ、論文、論文Ｉ
Ｄ（識別子）、タイトル、著者、著者名、著者の所属組
織、ページ情報、最初のページ、および最後のページを
表している。The data model of FIG. 37 is represented by a tree structure composed of nodes and edges (links), and represents data on a plurality of papers. A label representing the attribute of the data is attached to the edge, and a value is stored in the terminal node. Labels “paper”, “id”, “titl
e "," author "," name "," post "
, “page”, “firstpage”, and “lastpage” are a dissertation and a dissertation I, respectively.
D (identifier), title, author, author name, author's organization, page information, first page, and last page.

【００１１】このような木構造モデルを表現するため
に、半構造データベースでは、エッジ情報とノード情報
が分離して記憶装置上に格納される。エッジは、その両
端のノードのノードＩＤとエッジのラベルからなるテー
ブル形式で格納され、ノードは、ノードＩＤとノードの
値を納めたレコードからなるテーブル形式で格納され
る。このようなデータモデルと格納形式により、半構造
データベースでは、外部から取得した新規の構造のデー
タを自由に追加することが可能である。In order to express such a tree structure model, in a semi-structure database, edge information and node information are separately stored in a storage device. The edge is stored in a table format including the node IDs of the nodes at both ends thereof and the label of the edge, and the node is stored in a table format including records in which the node IDs and the values of the nodes are stored. With such a data model and storage format, it is possible to freely add externally acquired data of a new structure in the semi-structured database.

【００１２】また、半構造データベースでは、一般的
に、以下のようなインデックスを用いてデータ検索の高
速化が図られている。（１）値インデックス値からノードＩＤを求める値インデックスを使って、検
索条件の値や範囲から、それに適合するノードＩＤを検
索することができる。（２）構造（パス）インデックスパスからノードＩＤを求める構造インデックスを使っ
て、検索条件のパスからノードＩＤを検索することがで
きる。パスは木構造モデルの枝を指定する情報であり、
通常、１つ以上のエッジのラベルを用いて表される。図
３７のデータモデルの場合は、例えば、“／ｐａｐｅｒ
／ａｕｔｈｏｒ／ｎａｍｅ”というパスと２つのノード
ＩＤ“３”、“６”の対応関係が構造インデックスに登
録され、このパスからノードＩＤ“３”、“６”を検索
できる。（３）エッジインデックスエッジの両端のノードのノードＩＤに対するエッジイン
デックスを使って、ノードＩＤからそのノードに隣接す
るエッジを求めることができる。これにより、エッジを
辿る（トラバースする）処理が高速化される。[0012] In a semi-structured database, data retrieval is generally speeded up using the following indexes. (1) Value index Using a value index for obtaining a node ID from a value, a node ID that matches the value can be searched from the value or range of the search condition. (2) Structure (Path) Index The node ID can be searched from the path of the search condition using the structure index for obtaining the node ID from the path. The path is information that specifies the branch of the tree structure model.
Usually represented using one or more edge labels. In the case of the data model of FIG. 37, for example, “/ paper
The correspondence between the path "/ author / name" and the two node IDs "3" and "6" is registered in the structure index, and the node IDs "3" and "6" can be searched from this path. The edge adjacent to the node can be obtained from the node ID using the edge index for the node ID of the node at both ends of the edge, thereby speeding up the process of traversing the edge.

【００１３】[0013]

【発明が解決しようとする課題】しかしながら、上述し
た従来の半構造データベースに大規模なデータを格納し
てデータ検索を行う場合、以下の理由により、十分な性
能が得られない可能性がある。（１）検索対象の部分木に含まれるデータが物理記憶領
域に分散する。However, when large-scale data is stored in the above-described conventional semi-structured database for data search, sufficient performance may not be obtained for the following reasons. (1) Data included in a subtree to be searched is distributed to physical storage areas.

【００１４】ノードおよびエッジの各テーブル内ではレ
コードの並び順に特に制限がないので、図３７の“ｐａ
ｐｅｒ”以下の部分木のような単一の部分木に属するノ
ードやエッジが、物理記憶領域に分散して格納される可
能性がある。このため、この部分木を検索する際に記憶
装置へのアクセスに膨大な時間を要し、検索対象を１つ
のレコードとして格納できるリレーショナルデータベー
スに比べて不利となる。（２）検索対象の部分木内のトラバースが必要である。In each of the node and edge tables, there is no particular limitation on the order of records.
Nodes and edges belonging to a single subtree such as a subtree below “per” may be distributed and stored in the physical storage area. Therefore, when searching for this subtree, It takes an enormous amount of time to access the file, which is disadvantageous compared to a relational database that can store a search target as one record. (2) A traversal in a subtree to be searched is required.

【００１５】例えば、著者名が“金政泰彦”でタイトル
が“ｘｘｘｘ”の“ｐａｐｅｒ”というような検索の場
合、木構造モデルではエッジを辿る処理、すなわち、部
分木内のトラバースが必要になる。エッジインデックス
を用いてトラバースを高速化したとしても、検索対象を
１つのレコードとして格納できるリレーショナルデータ
ベースに比べて不利となる。（３）構造インデックスは必ずしも効果的ではない。For example, in the case of a search such as “paper” with the author name “Yasuhiko Kinmasa” and the title “xxxx”, the tree structure model requires processing for tracing edges, ie, traversal in a subtree. Even if the speed of the traverse is increased by using the edge index, it is disadvantageous compared to a relational database that can store a search target as one record. (3) The structure index is not always effective.

【００１６】構造インデックスは、データの構造（パ
ス）が多種なわりには同じ構造のデータの数が少ないと
きに、効果の大きな技術である。これに対して、同じ構
造のデータが大量にある場合は、絞り込みがきかず、効
果が小さい。The structure index is a technique effective when the data structure (path) is various but the number of data having the same structure is small. On the other hand, when there is a large amount of data having the same structure, it is difficult to narrow down and the effect is small.

【００１７】本発明の課題は、半構造データベースにお
いて、大規模なデータの検索を効率化するデータ格納装
置およびその方法を提供することである。It is an object of the present invention to provide a data storage device and a method for efficiently searching large-scale data in a semi-structured database.

【００１８】[0018]

【課題を解決するための手段】図１は、本発明のデータ
格納装置の原理図である。図１のデータ格納装置は、指
定手段１、抽出手段２、および格納手段３を備える。FIG. 1 is a diagram showing the principle of a data storage device according to the present invention. The data storage device of FIG. 1 includes a designation unit 1, an extraction unit 2, and a storage unit 3.

【００１９】指定手段１は、木構造データにおいて、検
索対象となる可能性のある部分木の構造を指定し、抽出
手段２は、その木構造データから、指定された構造に適
合する部分木を抽出する。そして、格納手段３は、抽出
された部分木の情報をまとめて格納する。The specifying means 1 specifies a partial tree structure which may be a search target in the tree structure data, and the extracting means 2 extracts a partial tree conforming to the specified structure from the tree structure data. Extract. Then, the storage unit 3 collectively stores the information of the extracted partial trees.

【００２０】検索対象となる可能性のある部分木の構造
とは、木構造データ内でデータ検索の対象となることが
予想されるような部分木を定義する情報である。指定手
段１は、例えば、特定のラベルを指定情報として入力す
ることで、そのラベルを持つエッジの下に接続された部
分木を指定することができる。このような指定情報は、
例えば、ユーザが入力したり、システムが自動的に生成
して入力したりする。The structure of a subtree that may be a search target is information that defines a subtree that is expected to be a data search target in tree structure data. For example, by inputting a specific label as specification information, the specifying unit 1 can specify a subtree connected below an edge having the label. Such designation information,
For example, the input is made by the user, or the system automatically generates and inputs.

【００２１】抽出手段２は、木構造データを走査して、
指定された構造に適合する１つ以上の部分木の情報を抽
出し、格納手段３は、抽出された部分木に含まれるノー
ド、エッジ等の情報をまとめて格納する。このとき、格
納手段３は、例えば、１つの部分木の情報を物理的に近
接した連続領域にまとめて格納する。したがって、複数
の部分木が抽出された場合には、部分木毎に情報がまと
められて格納される。The extracting means 2 scans the tree structure data,
The storage unit 3 extracts information of one or more subtrees conforming to the specified structure, and collectively stores information on nodes, edges, and the like included in the extracted subtree. At this time, the storage unit 3 collectively stores, for example, information of one partial tree in a physically adjacent continuous area. Therefore, when a plurality of subtrees are extracted, information is grouped and stored for each subtree.

【００２２】このようなデータ格納装置によれば、指定
された構造の部分木の情報が分散することなくまとめて
格納されるため、その部分木を検索対象としてデータ検
索が行われたとき、必要な情報に効率良くアクセスする
ことができる。したがって、大規模な半構造データベー
スにおいても、データ検索が効率化される。According to such a data storage device, information on subtrees having a specified structure is stored together without being dispersed, so that when a data search is performed on the subtree as a search target, Information can be accessed efficiently. Therefore, even in a large-scale semi-structured database, data retrieval is made more efficient.

【００２３】また、指定手段１は、検索対象となる可能
性のある部分木内の１つ以上のパスを個別に指定して、
それらのパスにより構成される構造を指定することもで
きる。この場合、抽出手段２は、木構造データから、指
定されたパスの末端にあるノードを抽出し、格納手段３
は、抽出されたノードの情報をまとめて格納する。The specifying means 1 individually specifies one or more paths in a subtree which may be a search target,
It is also possible to specify a structure consisting of those paths. In this case, the extracting means 2 extracts the node at the end of the designated path from the tree structure data,
Collectively stores information on the extracted nodes.

【００２４】このようなデータ格納装置によれば、１つ
以上のパス上に格納された情報が分散することなくまと
めて格納されるため、それらの情報を検索対象としてデ
ータ検索が行われたとき、必要な情報に効率良くアクセ
スすることができる。According to such a data storage device, information stored on one or more paths is stored collectively without being dispersed, so that when such data is searched for, a data search is performed. , And necessary information can be efficiently accessed.

【００２５】例えば、図１の指定手段１は、後述する図
３５の入力装置２３に対応し、図１の抽出手段２は、図
３５のＣＰＵ（中央処理装置）２１とメモリ２２に対応
する。また、図１の格納手段３は、例えば、図３５のメ
モリ２２、外部記憶装置２５、または可搬記録媒体２
９、あるいは、後述する図３６のデータベース３０に対
応する。For example, the designation means 1 in FIG. 1 corresponds to the input device 23 in FIG. 35 described later, and the extraction means 2 in FIG. 1 corresponds to the CPU (central processing unit) 21 and the memory 22 in FIG. The storage unit 3 in FIG. 1 is, for example, the memory 22, the external storage device 25, or the portable recording medium 2 in FIG.
9 or the database 30 of FIG. 36 described later.

【００２６】[0026]

【発明の実施の形態】以下、図面を参照しながら、本発
明の実施の形態を詳細に説明する。本実施形態のデータ
格納装置では、半構造データベースの中に存在している
複数の類似の構造を指定して、データアクセスを効率化
するような格納形式を採用する。これにより、リレーシ
ョナルデータベースやオブジェクト指向データベースに
おけるスキーマに基づく最適化と同等の効果が得られ
る。また、既存のリレーショナルデータベースやオブジ
ェクト指向データベースを補助的に用いることで、半構
造データベースにおけるデータ検索が高速化される。Embodiments of the present invention will be described below in detail with reference to the drawings. In the data storage device of the present embodiment, a plurality of similar structures existing in the semi-structured database are designated to adopt a storage format that makes data access more efficient. As a result, an effect equivalent to the optimization based on the schema in the relational database or the object-oriented database can be obtained. In addition, by using an existing relational database or object-oriented database as an auxiliary, data retrieval in a semi-structured database is speeded up.

【００２７】まず、木構造モデルの部分木のクラスタリ
ングを行って、木構造モデルをノード毎に分割してデー
タベースに格納することで、検索を高速化することを考
える。データを格納する際に、検索対象となることが予
想される部分木をあらかじめ指定して、その部分木内の
情報を物理的に近接する格納領域（近傍領域）にまとめ
て格納することで、検索を高速化することができる。First, it is considered that the retrieval is speeded up by clustering a partial tree of the tree structure model, dividing the tree structure model for each node, and storing the divided nodes in a database. When storing data, a subtree that is expected to be a search target is specified in advance, and information in the subtree is collectively stored in a physically adjacent storage area (neighboring area), thereby performing a search. Can be speeded up.

【００２８】例えば、図３７の木構造モデルにおいて、
“ｐａｐｅｒ”以下の部分木を検索対象として指定し、
データベースの更新時に、その部分木内の情報をノード
およびエッジの各テーブル内でなるべく連続領域に配置
する。これにより、図２のようなノードテーブルと図３
のようなエッジテーブルが得られる。リレーショナルデ
ータベースの場合は、ノードテーブルとエッジテーブル
がそれぞれ異なるリレーションとして実装される。For example, in the tree structure model of FIG.
Specify the subtree under “paper” as a search target,
When the database is updated, the information in the subtree is arranged in a continuous area as much as possible in each node and edge table. Thereby, the node table as shown in FIG.
Is obtained. In the case of a relational database, the node table and the edge table are implemented as different relations.

【００２９】図２において、“ＩＤ”はノードＩＤを表
し、“ＶＡＬＵＥ”はそのノードの値を表す。ここで
は、ノード“１”、“２”、“３”、“４”、“６”、
“７”、“８”、および“９”に対して、それぞれ、値
“１２”、“○○に関する研究”、“金政泰彦”、“富
士通研究所”、“久保田和己”、“富士通研究所”、
“５８”、および“６３”が格納されている。In FIG. 2, "ID" represents a node ID, and "VALUE" represents a value of the node. Here, nodes “1”, “2”, “3”, “4”, “6”,
For “7”, “8”, and “9”, the values “12”, “Research on XX”, “Yasuhiko Kanasa”, “Fujitsu Laboratories”, “Kazuki Kubota”, “Fujitsu Research” Place ",
“58” and “63” are stored.

【００３０】また、図３において、“ＬＡＢＥＬ”はエ
ッジのラベルを表し、“ＩＤ”はそのエッジの両端のノ
ードのノードＩＤを表す。ここでは、指定された部分木
内の１２個のエッジのラベルと、各エッジの両端のノー
ドのノードＩＤが格納されている。これらのテーブルを
用いれば、１つの“ｐａｐｅｒ”に属する様々な属性デ
ータを、１回のアクセスで記憶装置から主記憶に読み出
すことが可能になる。In FIG. 3, "LABEL" indicates a label of an edge, and "ID" indicates a node ID of a node at both ends of the edge. Here, labels of 12 edges in the specified subtree and node IDs of nodes at both ends of each edge are stored. By using these tables, various attribute data belonging to one “paper” can be read from the storage device to the main storage by one access.

【００３１】このように、検索対象となる可能性のある
部分木のノードやエッジを記録媒体の物理的な近傍領域
にまとめて格納することにより、記録媒体アクセスの頻
度が減少し、検索の高速化が図られる。As described above, by storing nodes and edges of a partial tree which may be a search target in a physically neighboring area of the recording medium, the frequency of access to the recording medium is reduced, and the search speed is reduced. Is achieved.

【００３２】次に、部分木をレコード化して、検索を高
速化することを考える。部分木の一部の情報を１つのレ
コードとして集めることにより、部分木内のトラバース
を軽減して、検索をさらに高速化することができる。部
分木のレコード化は、次のようにして行われる。Next, consider the case where the subtree is converted into a record to speed up the retrieval. By collecting information of a part of the subtree as one record, the traversal in the subtree can be reduced, and the search can be further speeded up. The recording of the partial tree is performed as follows.

【００３３】まず、部分木に選択規則を作用させること
で、レコードを生成する。選択規則は、部分木内の１つ
以上のパスを個別に数え上げて指定する指定情報であ
り、部分木内のパス群を外延的に記述している。この選
択規則は、ユーザまたは外部のシステムから与えられる
か、またはシステムにより自動的に生成される。First, a record is generated by applying a selection rule to a subtree. The selection rule is specification information for individually enumerating and specifying one or more paths in the subtree, and describes the path group in the subtree in an extensible manner. This selection rule is provided by the user or an external system, or is automatically generated by the system.

【００３４】例えば、図４の部分木Ｔ１は、ノード“ｎ
１”、“ｎ２”、“ｎ３”、“ｎ４”、“ｎ５”、およ
び“ｎ６”と、エッジ“ｅ１”、“ｅ２”、“ｅ３”、
“ｅ４”、“ｅ５”、および“ｅ６”からなっている。For example, the subtree T1 in FIG.
1, "n2", "n3", "n4", "n5", and "n6", and edges "e1", "e2", "e3",
It consists of “e4”, “e5”, and “e6”.

【００３５】部分木Ｔ１に属するパスは、｛ｅ１／ｅ
２，ｅ１／ｅ２，ｅ１／ｅ２，ｅ３，ｅ４／ｅ５，ｅ
６｝の６つである。このうち、５つのパスの末端のノー
ド“ｎ１”、“ｎ２”、“ｎ３”、“ｎ４”、および
“ｎ６”には、それぞれ、値“ａｂｃ”、“ｘｙｚ”、
“ｉｊｋ”、“１００”、および“３００”が格納され
ている。The path belonging to the subtree T1 is represented by {e1 / e
2, e1 / e2, e1 / e2, e3, e4 / e5, e
It is six of 6｝. Of these, the nodes “n1”, “n2”, “n3”, “n4”, and “n6” at the end of the five paths have values “abc”, “xyz”,
“Ijk”, “100”, and “300” are stored.

【００３６】ここで、パス“ｅ１／ｅ２”を出現順に２
つ選択し、パス“ｅ３”および“ｅ６”を出現順に１つ
ずつ選択して、それらの４つのパスを１つのレコードに
まとめることを意味する選択規則｛ｅ１／ｅ２，ｅ１／
ｅ２，ｅ３，ｅ６｝を適用する。この場合、パス“ｅ４
／ｅ５”は選択されない。Here, the paths "e1 / e2" are set in the order of appearance by 2
And selecting the paths “e3” and “e6” one by one in the order of appearance, and selecting those four paths into one record, a selection rule {e1 / e2, e1 /
e2, e3, e6} are applied. In this case, the path “e4
/ E5 "is not selected.

【００３７】この選択規則により生成されたレコードｒ
１は、選択されたパスに対応する４つのフィールドｆ
１、ｆ２、ｆ３、およびｆ４を含む。そして、これらの
フィールドには、それぞれ、値“ａｂｃ”、“ｘｙ
ｚ”、“１００”、および“３００”が格納される。こ
のように、各フィールドには、対応するパスにより指定
されるノードの値が格納される。Record r generated by this selection rule
1 is the four fields f corresponding to the selected path
1, f2, f3, and f4. The values “abc” and “xy” are stored in these fields, respectively.
"z", "100", and "300" are stored in each field as described above, and the value of the node specified by the corresponding path is stored in each field.

【００３８】また、図５の部分木Ｔ２は、ノード“ｎ
７”、“ｎ８”、“ｎ９”、“ｎ１０”、および“ｎ１
１”と、エッジ“ｅ１”、“ｅ２”、“ｅ３”、“ｅ
４”、“ｅ５”、“ｅ６”、および“ｅ７”からなって
いる。部分木Ｔ２に属するパスは、｛ｅ１／ｅ２，ｅ
３，ｅ４／ｅ５，ｅ６，ｅ７｝の５つである。このう
ち、３つのパスの末端のノード“ｎ７”、“ｎ８”、お
よび“ｎ１０”には、それぞれ、値“ｌｍｎ”、“４０
０”、および“５００”が格納されている。Further, the subtree T2 in FIG.
7, "n8", "n9", "n10", and "n1"
1 "and edges" e1 "," e2 "," e3 "," e
4 ”,“ e5 ”,“ e6 ”, and“ e7. ”The paths belonging to the subtree T2 are {e1 / e2, e
3, e4 / e5, e6, e7}. Of these, the nodes “n7”, “n8”, and “n10” at the terminal of the three paths have values “lmn”, “40”, respectively.
"0" and "500" are stored.

【００３９】部分木Ｔ２に、上述の選択規則｛ｅ１／ｅ
２，ｅ１／ｅ２，ｅ３，ｅ６｝を適用すると、レコード
ｒ２が生成される。この場合、フィールドｆ１、ｆ２、
ｆ３、およびｆ４には、それぞれ、値“ｌｍｎ”、“Ｎ
ＵＬＬ”、“４００”、および“５００”が格納され
る。フィールドｆ２の“ＮＵＬＬ”は、選択規則の２番
目のパス“ｅ１／ｅ２”に対応するパスが部分木Ｔ２に
存在しないことを表している。In the subtree T2, the above selection rule {e1 / e
2, e1 / e2, e3, e6}, a record r2 is generated. In this case, the fields f1, f2,
f3 and f4 have values “lmn” and “N”, respectively.
“NULL” in the field f2 indicates that a path corresponding to the second path “e1 / e2” of the selection rule does not exist in the subtree T2. ing.

【００４０】次に、生成されたレコードへのポインタを
部分木に付加する。このとき、部分木のルートに新たな
エッジ“ｅｒ”を付加し、このエッジ“ｅｒ”の先に新
たなノードを生成して、そのノードにレコードへのポイ
ンタを格納する。これにより、部分木のルートとレコー
ドがエッジ“ｅｒ”を介して結合される。そして、レコ
ードに集められたパス群に含まれるエッジとノードを部
分木から削除し、レコードには、新たに生成されたノー
ドへのポインタを格納する。Next, a pointer to the generated record is added to the subtree. At this time, a new edge “er” is added to the root of the subtree, a new node is generated ahead of the edge “er”, and a pointer to a record is stored in that node. As a result, the root of the subtree and the record are connected via the edge “er”. Then, edges and nodes included in the path group collected in the record are deleted from the subtree, and a pointer to the newly generated node is stored in the record.

【００４１】図４および図５のデータにこのような操作
を施すと、図６のようなデータ表現が得られる。図６に
おいては、修正された部分木Ｔ１′およびＴ２′と、選
択規則｛ｅ１／ｅ２，ｅ１／ｅ２，ｅ３，ｅ６｝と、レ
コードｒ１およびｒ２とにより、データ表現が最適化さ
れている。When such operations are performed on the data shown in FIGS. 4 and 5, a data representation as shown in FIG. 6 is obtained. In FIG. 6, the data representation is optimized by the modified partial trees T1 'and T2', the selection rule {e1 / e2, e1 / e2, e3, e6}, and the records r1 and r2.

【００４２】部分木Ｔ１′のルートにはエッジ“ｅｒ”
を介してノード“ｎ１２”が付加され、このノードには
レコードｒ１へのポインタ“ｐ−ｒ１”が格納される。
また、レコードｒ１のフィールドｆ０には、ノード“ｎ
１２”へのポインタ“ｐ−ｎ１２”が格納される。The root of the subtree T1 'has an edge "er"
, A node "n12" is added, and a pointer "p-r1" to the record r1 is stored in this node.
The field f0 of the record r1 contains the node “n”.
12 "is stored.

【００４３】部分木Ｔ２′のルートにはエッジ“ｅｒ”
を介してノード“ｎ１３”が付加され、このノードには
レコードｒ２へのポインタ“ｐ−ｒ２”が格納される。
また、レコードｒ２のフィールドｆ０には、ノード“ｎ
１３”へのポインタ“ｐ−ｎ１３”が格納される。The root of the subtree T2 'has an edge "er"
, A node "n13" is added, and a pointer "pr-2" to the record r2 is stored in this node.
In the field f0 of the record r2, the node "n"
13 "is stored.

【００４４】図７は、このようにして最適化された部分
木を繋ぎ合わせた全体木のデータ表現を示している。一
般には、複数の選択規則を用いて部分木の構造を定義す
ることにより、様々な部分木をレコード化することがで
きる。ここでは、２つの選択規則Ｓ１＝｛ｓ１，ｓ２，
ｓ３｝、Ｓ２＝｛ｓ４，ｓ５，ｓ６，ｓ７｝を用いて、
それぞれ異なる部分木群に対応するレコード群Ｒ１、Ｒ
２を生成している。FIG. 7 shows a data representation of the whole tree obtained by joining the subtrees thus optimized. In general, various subtrees can be recorded by defining the structure of the subtree using a plurality of selection rules. Here, two selection rules S1 = ｛s1, s2,
s3}, S2 = {s4, s5, s6, s7},
Record groups R1 and R respectively corresponding to different subtree groups
2 has been generated.

【００４５】修正された全体木に格納されたポインタｐ
−ｒ１，ｐ−ｒ２，．．．，ｐ−ｒｎは、それぞれ、レ
コード群Ｒ１のレコードｒ１，ｒ２，．．．，ｒｎを指
し、ポインタｐ−ｒ（ｎ＋１），ｐ−ｒ（ｎ＋
２），．．．，ｐ−ｒｍは、それぞれ、レコード群Ｒ２
のレコードｒ（ｎ＋１），ｒ（ｎ＋２），．．．，ｒｍ
を指している。Pointer p stored in the modified whole tree
-R1, p-r2,. . . , P-rn are records r1, r2,. . . , Rn, and pointers pr (n + 1), pr (n +
2),. . . , P-rm are respectively the record group R2
Records r (n + 1), r (n + 2),. . . , Rm
Pointing to.

【００４６】リレーショナルデータベースとのアナロジ
ーでとらえれば、選択規則がスキーマに対応し、レコー
ド群がリレーションに対応し、レコードがタプルに対応
する。また、修正された全体木にレコードへのポインタ
が存在している点が、リレーショナルデータベースとは
異なっている。In terms of analogy with a relational database, selection rules correspond to schemas, records correspond to relations, and records correspond to tuples. Further, the point that a pointer to a record exists in the corrected whole tree is different from the relational database.

【００４７】このような修正された全体木のノードとエ
ッジをテーブル形式で格納すると、図８のようなデータ
ベースが得られる。図８では、ノードテーブルおよびエ
ッジテーブルが、修正された全体木の情報に対応し、選
択規則Ｓ１、Ｓ２とレコード群Ｒ１、Ｒ２に対応する２
つのレコードテーブルが、全体木から削除された情報に
対応する。When the nodes and edges of such a modified whole tree are stored in a table format, a database as shown in FIG. 8 is obtained. In FIG. 8, the node table and the edge table correspond to the information of the corrected whole tree, and correspond to the selection rules S1 and S2 and the record groups R1 and R2.
One record table corresponds to information deleted from the entire tree.

【００４８】リレーショナルデータベースの場合は、こ
れらのテーブルがそれぞれ異なるリレーションとして実
装され、選択規則は部分木内のパス群の指定情報として
保存される。こうしてレコード化された部分木に関して
も、レコード用のインデックス等を用いて検索の高速化
を図ることが可能である。In the case of a relational database, these tables are implemented as different relations, and the selection rules are stored as path group designation information in the subtree. Even for the subtrees that have been recorded in this way, it is possible to speed up the search by using a record index or the like.

【００４９】次に、このようにして格納されたデータを
検索する手順について説明する。例えば、図３７のよう
な木構造モデルにおいて、選択規則を｛／ｐａｐｅｒ／
ｉｄ／，／ｐａｐｅｒ／ｔｉｔｌｅ／，／ｐａｐｅｒ／
ａｕｔｈｏｒ／ｎａｍｅ／｝として、部分木をレコード
化する。Next, a procedure for retrieving data stored in this manner will be described. For example, in a tree structure model as shown in FIG. 37, the selection rule is set to {/ paper /
id /, / paper / title /, / paper /
Record the subtree as author / name /｝.

【００５０】これにより、各論文の論文ＩＤ、タイト
ル、および第１著者名の情報が全体木から削除されて、
それらの値が、それぞれ、レコードのフィールドｆ１、
ｆ２、およびｆ３に格納される。第１著者の所属組織、
第２著者以降の著者名および所属組織、ページ情報等は
全体木に残される。As a result, information on the article ID, title, and first author of each article is deleted from the entire tree.
These values are the fields f1,
stored in f2 and f3. Organization of the first author,
The names of authors, affiliations, page information, etc. after the second author remain in the entire tree.

【００５１】こうして生成された最適化データベースに
おいて、タイトルが“ｘｘｘｘ”で著者名が“ｙｙｙ
ｙ”の論文をデータ検索装置が検索する場合の手順は、
以下のようになる。［Ｐ１］まず、データ検索装置は、図９に示すように、
部分木のレコード群において、ｆ２（ｔｉｔｌｅ）およ
びｆ３（ｎａｍｅ）のインデックスを順に使って、与え
られた検索条件“ｆ２：ｘｘｘｘＡＮＤｆ３：ｙｙ
ｙｙ”を満たすレコードを検索する。そして、得られた
レコードを結果Ａ１として保持する。また、タイトルに
対する検索条件“ｆ２：ｘｘｘｘ”のみを用いて絞り込
みを行い、得られたレコードを中間結果Ｂ１として保持
する。［Ｐ２］次に、図１０に示すように、全体木においてパ
ス“／ｐａｐｅｒ／ａｕｔｈｏｒ／ｎａｍｅ／”を用い
て第２著者以降の著者名を検索し、著者名に対する検索
条件“ｙｙｙｙ”を満たすノード“６”を求める。そし
て、そのノードからエッジを辿ってその部分木に含まれ
るエッジ“ｅｒ”を求め、そのエッジの先のノード“１
４”に格納されたレコードへのポインタ“ｐ−ｒｉ”を
取り出す。このような処理を繰り返して、得られたポイ
ンタの集合｛ｐ−ｒｉ，ｐ−ｒｊ，．．．｝を中間結果
Ｂ２として保持する。［Ｐ３］次に、中間結果Ｂ１と中間結果Ｂ２をジョイン
して、中間結果Ｂ２の各ポインタが指すレコードを中間
結果Ｂ１から抽出し、得られたレコードを結果Ａ２とし
て保持する。［Ｐ４］結果Ａ１は、タイトルが“ｘｘｘｘ”で第１著
者名が“ｙｙｙｙ”の論文のレコードに対応し、結果Ａ
２は、タイトルが“ｘｘｘｘ”で第２著者以降の著者名
が“ｙｙｙｙ”の論文のレコードに対応する。そこで、
結果Ａ１と結果Ａ２のユニオン（和集合）を求めて、検
索結果とする。In the generated optimization database, the title is “xxxx” and the author is “yyy”.
The procedure when the data retrieval device retrieves the article "y" is as follows:
It looks like this: [P1] First, as shown in FIG.
In the record group of the subtree, the given search condition “f2: xxxx AND f3: yy” is sequentially used using the indexes of f2 (title) and f3 (name).
yy ”is retrieved, and the obtained record is stored as the result A1. Also, the search is performed using only the search condition“ f2: xxxx ”for the title, and the obtained record is used as the intermediate result B1. [P2] Next, as shown in Fig. 10, the entire tree is searched for the author names of the second and subsequent authors using the path "/ paper / author / name /", and the search condition "yyyy" for the author name is obtained. Then, an edge “er” included in the subtree is obtained by tracing an edge from the node, and a node “1” ahead of the edge is obtained.
4 ". The pointer" p-ri "to the record stored in the record No. 4 is taken out. By repeating such processing, the obtained set of pointers {p-ri, p-rj, ...} is set as the intermediate result B2. [P3] Next, the intermediate result B1 and the intermediate result B2 are joined, a record indicated by each pointer of the intermediate result B2 is extracted from the intermediate result B1, and the obtained record is stored as the result A2. P4] The result A1 corresponds to the record of the dissertation whose title is “xxxx” and whose first author is “yyyy”.
Reference numeral 2 corresponds to a record of a dissertation whose title is "xxxx" and whose second and subsequent author names are "yyyy". Therefore,
A union (union) of the result A1 and the result A2 is obtained and set as a search result.

【００５２】これらのレコードのフィールドｆ１の論文
ＩＤを用いれば、論文のテキストや添付図等の情報を得
ることができる。また、ページ情報等の全体木に残って
いる属性データを求める場合は、レコードのフィールド
ｆ０のポインタを用いて、全体木に残された部分木を辿
ればよい。By using the article ID in the field f1 of these records, information such as the text of the article and attached drawings can be obtained. When the attribute data remaining in the entire tree such as page information is obtained, the partial tree left in the entire tree may be traced using the pointer of the field f0 of the record.

【００５３】この検索方法の特徴は、［Ｐ１］のレコー
ド群の検索にあり、レコード化された部分木を用いるこ
とで、部分木内のトラバースを抑制することができる。
したがって、検索対象のレコード化率を上げれば、検索
速度をリレーショナルデータベースに近づけることが可
能である。また、［Ｐ１］と［Ｐ２］の処理を並列に行
えば、検索速度はさらに向上する。The feature of this search method lies in the search for the record group of [P1]. Traversal in the subtree can be suppressed by using the recorded subtree.
Therefore, if the record conversion rate of the search target is increased, the search speed can be made closer to the relational database. If the processes of [P1] and [P2] are performed in parallel, the search speed is further improved.

【００５４】次に、上述した部分木のクラスタリングと
選択的レコード化の処理について、より詳細に説明す
る。まず、図３７の部分木のクラスタリングにおいて、
図２のノードテーブルと図３のエッジテーブルをリレー
ショナルデータベースのテーブルとして格納する場合を
考える。この場合、データベースへのアクセスには、Ｓ
ＱＬ（structured query language ）等のインタフェー
スが用いられる。ここでは、高速化のため、図３のエッ
ジテーブルを図１１、１２、１３に示す３つのテーブル
に分割して格納し、さらに、図１４のパステーブルを追
加している。これらのテーブルは、それぞれ、リレーシ
ョナルデータベースの異なるリレーションとして実装さ
れる。Next, the processing of the above-described clustering of subtrees and selective recording will be described in more detail. First, in the clustering of the subtree in FIG.
Consider a case where the node table of FIG. 2 and the edge table of FIG. 3 are stored as tables of a relational database. In this case, to access the database, S
An interface such as QL (structured query language) is used. Here, in order to increase the speed, the edge table of FIG. 3 is divided into three tables shown in FIGS. 11, 12 and 13 and stored, and the path table of FIG. 14 is further added. Each of these tables is implemented as a different relation in a relational database.

【００５５】図１１のラベルテーブルは、ラベルＩＤ
（ＬＡＢＥＬＩＤ）とラベル（ＬＡＢＥＬ）の対応関係
を格納し、図１２の親ノードテーブルは、ノードＩＤ
（ＩＤ）、そのノードの親ノードのノードＩＤ（ＰＡＲ
ＥＮＴ）、およびそれらのノードを結ぶエッジのラベル
のラベルＩＤ（ＬＡＢＥＬＩＤ）の対応関係を格納す
る。The label table shown in FIG.
(LABELID) and a label (LABEL) are stored, and the parent node table of FIG.
(ID), the node ID of the parent node of the node (PAR
ENT) and a label ID (LABELID) of an edge label connecting these nodes is stored.

【００５６】また、図１３の子ノードテーブルは、ノー
ドＩＤ（ＩＤ）、そのノードの子ノードのノードＩＤ
（ＣＨＩＬＤ）、およびそれらのノードを結ぶエッジの
ラベルのラベルＩＤ（ＬＡＢＥＬＩＤ）の対応関係を格
納し、図１４のパステーブルは、ノードＩＤ（ＩＤ）と
ルートノードからそのノードに至るパス（ＰＡＴＨ）の
対応関係を格納する。The child node table shown in FIG. 13 includes a node ID (ID) and a node ID of a child node of the node.
(CHILD) and a label ID (LABELID) of an edge label connecting those nodes are stored. The path table in FIG. 14 shows the node ID (ID) and the path from the root node to the node (PATH). Store the correspondence of

【００５７】ノードテーブルは、値インデックスとして
用いることができ、親ノードテーブルと子ノードテーブ
ルは、エッジインデックスとして用いることができ、パ
ステーブルは、構造インデックスとして用いることがで
きる。The node table can be used as a value index, the parent node table and the child node table can be used as edge indexes, and the path table can be used as a structure index.

【００５８】これらのテーブルで用いられる属性データ
のデータ型と長さは、例えば、図１５に示すようにな
る。図１５において、データ型“ＮＵＭＢＥＲ”は数を
表し、データ型“ＶＡＲＣＨＡＲ２”は可変長文字列を
表す。また、図２のノードテーブルと図１１のラベルテ
ーブルは、それぞれ、深さ優先（depth-first traversa
l ）アルゴリズムによりクラスタリングされている。The data types and lengths of the attribute data used in these tables are as shown in FIG. 15, for example. In FIG. 15, the data type “NUMBER” represents a number, and the data type “VARCHAR2” represents a variable-length character string. Further, the node table of FIG. 2 and the label table of FIG. 11 are respectively depth-first traversa.
l) Clustered by algorithm.

【００５９】明示的に検索対象の部分木を指定しないで
クラスタリングを行う場合は、全体木に対してクラスタ
リングのアルゴリズムを指定することにより、階層的に
部分木がクラスタリングされる。したがって、図３７に
示されていない他の論文の部分木についても、同様のク
ラスタリングが行われ、その結果が各テーブルに追加さ
れる。このようなクラスタリングによれば、木構造モデ
ルの各階層において、１つの部分木に属するノードやエ
ッジの情報が近傍領域にまとめて格納され、検索が高速
化される。When clustering is performed without explicitly specifying a subtree to be searched, the subtree is hierarchically clustered by specifying a clustering algorithm for the entire tree. Therefore, the same clustering is performed on the subtrees of other papers not shown in FIG. 37, and the result is added to each table. According to such clustering, information on nodes and edges belonging to one subtree is collectively stored in a neighboring area in each hierarchy of the tree structure model, and the search is speeded up.

【００６０】図１６は、このようなクラスタリングに基
づくデータ格納処理のフローチャートである。データ格
納装置は、まず、ルートノードを現在ノードとしてセッ
トし（ステップＳＴ１）、現在ノードをデータベースに
格納する（ステップＳＴ２）。FIG. 16 is a flowchart of data storage processing based on such clustering. First, the data storage device sets the root node as the current node (step ST1), and stores the current node in the database (step ST2).

【００６１】次に、現在ノードが終端ノードか否かをチ
ェックする（ステップＳＴ３）。終端ノードとは、図３
７のノード“３”等のように、それより下にエッジが存
在しないノードを意味する。現在ノードが終端ノードで
なければ、まだトラバースされていないエッジの１つを
選択し、現在エッジとしてセットする（ステップＳＴ
７）。そして、現在エッジを下方向にトラバースして、
子ノードを現在ノードにセットし（ステップＳＴ８）、
ステップＳＴ２以降の処理を繰り返す。ステップＳＴ２
では、現在ノードと現在エッジがデータベースに格納さ
れる。Next, it is checked whether or not the current node is a terminal node (step ST3). The terminating node is shown in FIG.
It means a node having no edge below it, such as node “3” of No. 7. If the current node is not the terminal node, one of the edges that have not been traversed is selected and set as the current edge (step ST
7). And traverse the current edge downward,
Set the child node as the current node (step ST8)
The processing after step ST2 is repeated. Step ST2
Now, the current node and the current edge are stored in the database.

【００６２】ステップＳＴ３において、現在ノードが終
端ノードであれば、現在エッジを上方向にトラバースし
て、親ノードを現在ノードにセットする（ステップＳＴ
４）。そして、現在ノードのすべてのエッジについて、
下方向のトラバースが終了したか否かをチェックし（ス
テップＳＴ５）、トラバースされていないエッジがあれ
ば、ステップＳＴ２以降の処理を繰り返す。In step ST3, if the current node is the terminal node, the current edge is traversed upward and the parent node is set as the current node (step ST3).
4). And for all edges of the current node,
It is checked whether or not the downward traverse has been completed (step ST5), and if there is an edge that has not been traversed, the processing from step ST2 is repeated.

【００６３】ステップＳＴ５において、すべてのエッジ
が下方向にトラバースされていれば、現在ノードがルー
トノードか否かをチェックする（ステップＳＴ６）。現
在ノードがルートノードでなければ、ステップＳＴ４以
降の処理を繰り返し、現在ノードがルートノードであれ
ば、処理を終了する。In step ST5, if all edges have been traversed downward, it is checked whether the current node is the root node (step ST6). If the current node is not the root node, the process from step ST4 is repeated, and if the current node is the root node, the process ends.

【００６４】図１７は、図１６のステップＳＴ２におけ
る現在ノードと現在エッジの格納処理のフローチャート
である。データ格納装置は、まず、現在ノードをノード
テーブルに追加する（ステップＳＴ１１）。次に、現在
エッジが選択されていれば、そのラベルが新規のラベル
か否かをチェックする（ステップＳＴ１２）。ここで、
ラベルテーブルに存在しないラベルは、新規のラベルと
みなされる。FIG. 17 is a flowchart of the storing process of the current node and the current edge in step ST2 of FIG. The data storage device first adds the current node to the node table (step ST11). Next, if an edge is currently selected, it is checked whether the label is a new label (step ST12). here,
Labels that do not exist in the label table are considered as new labels.

【００６５】現在エッジのラベルが新規のラベルでなけ
れば、現在ノードに関する情報を親ノードテーブルに追
加し（ステップＳＴ１３）、現在ノードのパスをパステ
ーブルに追加して（ステップＳＴ１４）、図１６の処理
に復帰する。また、ステップＳＴ１２において、現在エ
ッジのラベルが新規のラベルであれば、そのラベルをラ
ベルテーブルに追加して（ステップＳＴ１５）、ステッ
プＳＴ１３以降の処理を行う。If the label of the current edge is not a new label, information on the current node is added to the parent node table (step ST13), and the path of the current node is added to the path table (step ST14). Return to processing. In step ST12, if the label of the current edge is a new label, the label is added to the label table (step ST15), and the processes after step ST13 are performed.

【００６６】また、図１８は、図１６のステップＳＴ８
における下方向のトラバース処理のフローチャートであ
る。データ格納装置は、まず、現在エッジに関する情報
を子ノードテーブルに追加する（ステップＳＴ２１）。
そして、現在エッジの先の子ノードを現在ノードにセッ
トして（ステップＳＴ２２）、図１６の処理に復帰す
る。FIG. 18 is a flowchart showing the operation of step ST8 in FIG.
5 is a flowchart of a downward traverse process in FIG. First, the data storage device adds information on the current edge to the child node table (step ST21).
Then, the child node ahead of the current edge is set as the current node (step ST22), and the process returns to the processing in FIG.

【００６７】このようなデータ格納処理により、半構造
データベースの木構造モデルがリレーショナルデータベ
ースに格納される。ラベルテーブル、親ノードテーブ
ル、および子ノードテーブルの代わりに、図３のような
エッジテーブルを用いる場合も同様である。また、リレ
ーショナルデータベースの代わりにオブジェクト指向デ
ータベースを利用する場合は、各テーブルのレコードに
対応するオブジェクトを生成して、オブジェクト指向デ
ータベースに格納すればよい。By such data storage processing, the tree structure model of the semi-structure database is stored in the relational database. The same applies when an edge table as shown in FIG. 3 is used instead of the label table, the parent node table, and the child node table. When an object-oriented database is used instead of a relational database, objects corresponding to records in each table may be generated and stored in the object-oriented database.

【００６８】また、ノードテーブル、ラベルテーブル、
親ノードテーブル、子ノードテーブル、およびパステー
ブルをリレーショナルデータベースに格納する代わり
に、直接、ページ管理機構上に実装することも可能であ
る。ページとは、あらかじめ決められた固定長の格納領
域（ブロック）に格納された情報に対応する。ページ管
理機構上に実装する場合、５つのテーブルのそれぞれに
対応するページが用意される。Further, a node table, a label table,
Instead of storing the parent node table, the child node table, and the path table in the relational database, they can be directly implemented on the page management mechanism. A page corresponds to information stored in a predetermined fixed-length storage area (block). When implemented on a page management mechanism, pages corresponding to each of the five tables are prepared.

【００６９】この場合のデータ格納処理は、基本的に図
１６と同様であるが、ステップＳＴ２における現在ノー
ドと現在エッジの格納処理は、図１９に示すようにな
る。データ格納装置は、まず、現在ノードをノードのペ
ージに追加し（ステップＳＴ３１）。現在エッジのラベ
ルが新規のラベルか否かをチェックする（ステップＳＴ
３２）。The data storing process in this case is basically the same as that of FIG. 16, but the storing process of the current node and the current edge in step ST2 is as shown in FIG. The data storage device first adds the current node to the page of the node (step ST31). Check whether the label of the current edge is a new label (step ST
32).

【００７０】現在エッジのラベルが新規のラベルでなけ
れば、現在ノードに関する情報を親ノードのページに追
加し（ステップＳＴ３３）、現在ノードのパスをパスの
ページに追加して（ステップＳＴ３４）、図１６の処理
に復帰する。また、ステップＳＴ３２において、現在エ
ッジのラベルが新規のラベルであれば、そのラベルをラ
ベルのページに追加して（ステップＳＴ３５）、ステッ
プＳＴ３３以降の処理を行う。If the label of the current edge is not a new label, information about the current node is added to the page of the parent node (step ST33), and the path of the current node is added to the page of the path (step ST34). It returns to the process of step 16. In step ST32, if the label of the current edge is a new label, the label is added to the page of the label (step ST35), and the processing after step ST33 is performed.

【００７１】ステップＳＴ３１、ＳＴ３３、ＳＴ３４、
およびＳＴ３５において、格納領域が不足する場合は、
データ格納装置は、新たなページを確保する。また、情
報を追加する毎に、アクセスのためのインデックスを更
新する。Steps ST31, ST33, ST34,
If the storage area is insufficient in ST35 and ST35,
The data storage device secures a new page. Also, every time information is added, the index for access is updated.

【００７２】また、図１６のステップＳＴ８における下
方向のトラバース処理は、図２０に示すようになる。デ
ータ格納装置は、まず、現在エッジに関する情報を子ノ
ードのページに追加し、インデックスを更新する（ステ
ップＳＴ４１）。このとき、格納領域が不足する場合
は、新たなページを確保する。そして、現在エッジの先
の子ノードを現在ノードにセットして（ステップＳＴ４
２）、図１６の処理に復帰する。The downward traverse process in step ST8 of FIG. 16 is as shown in FIG. First, the data storage device adds information on the current edge to the page of the child node and updates the index (step ST41). At this time, if the storage area is insufficient, a new page is reserved. Then, the child node ahead of the current edge is set as the current node (step ST4).
2) Return to the process of FIG.

【００７３】ところで、リレーショナルデータベースに
は、複数のテーブルを共通属性でクラスタリングする機
能を持つものがある。このクラスタリング機能を利用し
て、ノードテーブル、親ノードテーブル、および子ノー
ドテーブルをノードＩＤでクラスタリングすることがで
きる。この場合のデータ格納処理は、図１６、１７、１
８の処理と同様である。このようなクラスタリングによ
り、記憶装置アクセスをさらに削減することができる。Incidentally, some relational databases have a function of clustering a plurality of tables with a common attribute. Using this clustering function, the node table, parent node table, and child node table can be clustered by node ID. The data storage process in this case is described in FIGS.
8 is the same as the processing in FIG. Such clustering can further reduce storage device access.

【００７４】また、共通属性によるクラスタリングをペ
ージ管理機構上で実現する場合は、ノードテーブル、親
ノードテーブル、および子ノードテーブルに対応する共
通のページと、ラベルテーブルおよびパステーブルのそ
れぞれに対応するページが用意される。When clustering based on the common attribute is realized on the page management mechanism, a common page corresponding to the node table, the parent node table, and the child node table, and a page corresponding to the label table and the path table, respectively. Is prepared.

【００７５】この場合のデータ格納処理は、基本的に図
１６と同様であるが、ステップＳＴ２における現在ノー
ドと現在エッジの格納処理は、図２１に示すようにな
る。図２１のステップＳＴ５１〜ＳＴ５５の処理は、基
本的に図１９のステップＳＴ３１〜ＳＴ３５と同様であ
る。ただし、ステップＳＴ５１、ＳＴ５３、およびＳＴ
５４においては、ノードテーブル、親ノードテーブル、
および子ノードテーブルに対応する共通のページに情報
が追加される。The data storing process in this case is basically the same as that of FIG. 16, but the storing process of the current node and the current edge in step ST2 is as shown in FIG. The processing in steps ST51 to ST55 in FIG. 21 is basically the same as the processing in steps ST31 to ST35 in FIG. However, steps ST51, ST53, and ST
At 54, a node table, a parent node table,
And information is added to a common page corresponding to the child node table.

【００７６】また、図１６のステップＳＴ８における下
方向のトラバース処理は、図２２に示すようになる。図
２２のステップＳＴ６１、ＳＴ６２の処理は、基本的に
図２０のステップＳＴ４１、ＳＴ４２と同様である。た
だし、ステップＳＴ６１においては、ノードテーブル、
親ノードテーブル、および子ノードテーブルに対応する
共通のページが新たに作成され、そのページに情報が追
加される。The downward traversing process in step ST8 in FIG. 16 is as shown in FIG. The processing in steps ST61 and ST62 in FIG. 22 is basically the same as the processing in steps ST41 and ST42 in FIG. However, in step ST61, the node table,
A common page corresponding to the parent node table and the child node table is newly created, and information is added to the page.

【００７７】次に、部分木内のパス群を記述する選択規
則を用いて、部分木の選択的レコード化を行う場合を考
える。例えば、図２３のような木構造のデータモデルが
与えられたとき、検索対象として“ｐａｐｅｒ”以下の
部分木を指定し、その部分木に選択規則を作用させて、
全体木を最適化する。Next, consider a case where selective recording of a subtree is performed using a selection rule that describes a path group in the subtree. For example, when a tree-structured data model as shown in FIG. 23 is given, a subtree under “paper” is specified as a search target, and a selection rule is applied to the subtree,
Optimize the whole tree.

【００７８】図２３において、ラベル“ｐｏｓ．”、
“ｆｉｒｓｔ．”、および“ｌａｓｔ．”は、それぞ
れ、“ｐｏｓｉｔｉｏｎ”、“ｆｉｒｓｔｐａｇｅ”、
および“ｌａｓｔｐａｇｅ”と等価であるものとする。
また、選択規則としては、ｓ＝｛ｉｄ，ｔｉｔｌｅ，ａ
ｕｔｈｏｒ／ｎａｍｅ，ａｕｔｈｏｒ／ｐｏｓｉｔｉｏ
ｎ，ａｕｔｈｏｒ／ｎａｍｅ，ａｕｔｈｏｒ／ｐｏｓｉ
ｔｉｏｎ｝が用いられる。In FIG. 23, the labels “pos.”,
“First.” And “last.” Are “position”, “firstpage”,
And "lastpage".
As a selection rule, s = ｓid, title, a
uthor / name, author / position
n, author / name, author / posi
｝ is used.

【００７９】このとき、選択規則ｓにより最適化された
全体木とレコード群は、図２４のようになる。ここで
は、選択規則ｓにより、２つの論文の部分木に対応する
２つのレコードｒ１およびｒ２が生成され、レコードテ
ーブルに格納されている。また、全体木において、ラベ
ル“ｓ”を有するエッジの先のノードには、生成された
レコードのＩＤがそのレコードへのポインタとして格納
されている。ノード“２９”は、レコードｒ１のＩＤ
“１”を格納し、ノード“３０”は、レコードｒ２のＩ
Ｄ“２”を格納する。At this time, the entire tree and the record group optimized by the selection rule s are as shown in FIG. Here, two records r1 and r2 corresponding to the subtrees of the two papers are generated according to the selection rule s and stored in the record table. In the whole tree, the node of the edge having the label “s” stores the ID of the generated record as a pointer to the record. Node “29” is the ID of record r1
“1” is stored, and the node “30” stores the I
D “2” is stored.

【００８０】また、レコードテーブルのレコードｒ１お
よびｒ２の先頭のフィールドｆｉｄには、レコードＩＤ
が格納され、２番目のフィールドｆ０には、全体木の対
応するノードのノードＩＤがポインタとして格納され
る。３番目以降のフィールドｆ１〜ｆ６は、選択規則ｓ
の６つのパスに対応し、それぞれ、対応するパスの末端
のノードの値を格納している。In the record field, the first field fid of the records r1 and r2 has a record ID.
Is stored, and the node ID of the corresponding node in the entire tree is stored as a pointer in the second field f0. The third and subsequent fields f1 to f6 correspond to the selection rule s
, Respectively, and stores the value of the terminal node of the corresponding path.

【００８１】例えば、レコードｒ１のフィールドｆ１、
ｆ２、ｆ３、ｆ４、ｆ５、およびｆ６には、それぞれ、
値“ｉｄ１”、“ｘｘｘｘ”、“ｎａｍｅ１”、“ｐｏ
ｓ１”、“ｎａｍｅ２”、および“ｐｏｓ２”が格納さ
れている。For example, the field f1 of the record r1,
f2, f3, f4, f5, and f6 respectively
The values “id1”, “xxxx”, “name1”, “po”
"s1", "name2", and "pos2" are stored.

【００８２】図２４のデータ構造をリレーショナルデー
タベースに格納する場合は、全体木に対応して図２５か
ら図２９までに示すような５つのテーブルが生成され
る。図２５はノードテーブルを表し、図２６はラベルテ
ーブルを表し、図２７は親ノードテーブルを表し、図２
８は子ノードテーブルを表し、図２９はパステーブルを
表す。これらの５つのテーブルと図２４のレコードテー
ブルとを合わせて、合計６つのテーブルが格納される。When the data structure of FIG. 24 is stored in the relational database, five tables as shown in FIGS. 25 to 29 are generated corresponding to the entire tree. FIG. 25 shows a node table, FIG. 26 shows a label table, FIG. 27 shows a parent node table, and FIG.
8 shows a child node table, and FIG. 29 shows a path table. A total of six tables are stored by combining these five tables and the record table of FIG.

【００８３】このように、検索対象の部分木に選択規則
を適用してレコード化を行う場合も、深さ優先アルゴリ
ズムのような適当なアルゴリズムを用いてクラスタリン
グを行うことができる。この場合のデータ格納処理は、
基本的に図１６と同様であるが、ステップＳＴ２におけ
る現在ノードと現在エッジの格納処理は、図３０に示す
ようになる。As described above, even when a record is formed by applying a selection rule to a subtree to be searched, clustering can be performed using an appropriate algorithm such as a depth-first algorithm. The data storage process in this case is
Basically, it is the same as FIG. 16, but the process of storing the current node and the current edge in step ST2 is as shown in FIG.

【００８４】データ格納装置は、まず、現在ノードが選
択規則を満たすか否かをチェックする（ステップＳＴ７
１）。現在ノードが選択規則を満たさなければ、現在ノ
ードをノードテーブルに追加する（ステップＳＴ７
２）。次に、現在エッジが選択されていれば、そのラベ
ルが新規のラベルか否かをチェックする（ステップＳＴ
７３）。The data storage device first checks whether the current node satisfies the selection rule (step ST7).
1). If the current node does not satisfy the selection rule, the current node is added to the node table (step ST7).
2). Next, if an edge is currently selected, it is checked whether the label is a new label (step ST).
73).

【００８５】現在エッジのラベルが新規のラベルでなけ
れば、現在ノードを親ノードテーブルに追加し、現在ノ
ードのパスをパステーブルに追加して（ステップＳＴ７
４）、現在エッジが指定された部分木の先頭に対応する
か否かをチェックする（ステップＳＴ７５）。そして、
現在エッジが部分木の先頭に対応しなければ、図１６の
処理に復帰する。If the label of the current edge is not a new label, the current node is added to the parent node table, and the path of the current node is added to the path table (step ST7).
4) It is checked whether or not the current edge corresponds to the head of the specified subtree (step ST75). And
If the current edge does not correspond to the head of the subtree, the process returns to the processing in FIG.

【００８６】また、ステップＳＴ７１において、現在ノ
ードが選択規則を満たせば、現在ノードの値をレコード
の対応するフィールドに格納し（ステップＳＴ７６）、
ステップＳＴ７３以降の処理を行う。If the current node satisfies the selection rule in step ST71, the value of the current node is stored in the corresponding field of the record (step ST76).
The processing after step ST73 is performed.

【００８７】また、ステップＳＴ７３において、現在エ
ッジのラベルが新規のラベルであれば、そのラベルをラ
ベルテーブルに追加して（ステップＳＴ７７）、ステッ
プＳＴ７４以降の処理を行う。In step ST73, if the label of the current edge is a new label, the label is added to the label table (step ST77), and the processing after step ST74 is performed.

【００８８】また、ステップＳＴ７５において、現在エ
ッジが部分木の先頭に対応していれば、その部分木の情
報を格納するレコードを生成し、各フィールドに“ＮＵ
ＬＬ”を格納して、レコードを初期化する（ステップＳ
Ｔ７８）。次に、現在ノードとレコードの間のエッジを
生成し、そのエッジのラベルが新規のラベルか否かをチ
ェックする（ステップＳＴ７９）。In step ST75, if the current edge corresponds to the head of the subtree, a record for storing the information of the subtree is generated, and “NU” is stored in each field.
LL ”is stored and the record is initialized (step S
T78). Next, an edge between the current node and the record is generated, and it is checked whether or not the label of the edge is a new label (step ST79).

【００８９】生成されたエッジのラベルが新規のラベル
でなければ、そのエッジを親ノードテーブルと子ノード
テーブルに追加し（ステップＳＴ８０）、図１６の処理
に復帰する。生成されたエッジのラベルが新規のラベル
であれば、そのエッジのラベルをラベルテーブルに追加
し（ステップＳＴ８１）、ステップＳＴ８０以降の処理
を行う。If the label of the generated edge is not a new label, the edge is added to the parent node table and the child node table (step ST80), and the process returns to FIG. If the label of the generated edge is a new label, the label of the edge is added to the label table (step ST81), and the processing from step ST80 is performed.

【００９０】例えば、図２３において、現在ノードがノ
ード“１２”であり、現在エッジが“ｐａｐｅｒ”であ
る場合、現在エッジは指定された部分木の先頭に対応す
ることになる。そこで、図２４に示すように、この部分
木に対応するレコードｒ１が生成される（ステップＳＴ
７８）。また、ノード“１２”とレコードｒ１の間のエ
ッジ“ｓ”が生成され、このエッジの情報が図２７の親
ノードテーブルと図２８の子ノードテーブルに追加され
る（ステップＳＴ８０）。For example, in FIG. 23, if the current node is node “12” and the current edge is “paper”, the current edge corresponds to the head of the specified subtree. Therefore, as shown in FIG. 24, a record r1 corresponding to this subtree is generated.
78). Further, an edge "s" between the node "12" and the record r1 is generated, and information on this edge is added to the parent node table of FIG. 27 and the child node table of FIG. 28 (step ST80).

【００９１】図２４ではラベル“ｓ”を持つ２つのエッ
ジが生成されているが、最初のエッジ“ｓ”が生成され
たときに、ラベル“ｓ”がラベルテーブルに追加され
（ステップＳＴ８１）、２番目のエッジ“ｓ”が生成さ
れたときは、ラベルの追加は行われない。In FIG. 24, two edges having the label "s" are generated. When the first edge "s" is generated, the label "s" is added to the label table (step ST81). When the second edge “s” is generated, no label is added.

【００９２】また、図２３において、現在ノードがノー
ド“１”であり、現在エッジが“ｉｄ”である場合、ノ
ード“１”は、選択規則ｓに含まれる最初のパス“ｉ
ｄ”に対応するため、選択規則ｓを満たしていることが
分かる。そこで、このノードの値“ｉｄ１”が、図２４
のレコードｒ１のフィールドｆ１に格納される（ステッ
プＳＴ７６）。ノード“２”、“３”、“４”、
“６”、“７”の値についても、同様にしてレコードｒ
１に格納される。In FIG. 23, when the current node is the node “1” and the current edge is “id”, the node “1” is the first path “i” included in the selection rule s.
24, the selection rule s is satisfied. Therefore, the value of this node "id1" is
Is stored in the field f1 of the record r1 (step ST76). Nodes “2”, “3”, “4”,
Similarly, for the values of “6” and “7”, the record r
1 is stored.

【００９３】また、図１６のステップＳＴ８における下
方向トラバース処理は、図１８と同様であるが、現在エ
ッジがレコードへのエッジである場合は、そのレコード
のレコードＩＤを格納するノードが現在ノードにセット
される。The downward traversal process in step ST8 of FIG. 16 is the same as that of FIG. 18, except that if the current edge is an edge to a record, the node storing the record ID of that record becomes the current node. Set.

【００９４】例えば、図２４において、現在エッジがノ
ード“１２”とノード“２９”の間のエッジ“ｓ”であ
る場合、ノード“２９”が新たな現在ノードとなる。こ
のとき、図３０のステップＳＴ７２においては、ノード
“２９”に対応するレコードｒ１のＩＤ“１”が図２５
のノードテーブルに格納される。現在エッジがノード
“２７”とノード“３０”の間のエッジ“ｓ”である場
合も、同様にして、レコードｒ２のＩＤ“２”がノード
テーブルに格納される。For example, in FIG. 24, when the current edge is the edge “s” between the node “12” and the node “29”, the node “29” becomes a new current node. At this time, in step ST72 of FIG. 30, ID “1” of record r1 corresponding to node “29” is
Is stored in the node table. Similarly, when the current edge is the edge “s” between the nodes “27” and “30”, the ID “2” of the record r2 is stored in the node table.

【００９５】また、これらのテーブルをリレーショナル
データベースに格納する代わりに、直接、ページ管理機
構上に実装することも可能である。この場合のデータ格
納処理も、基本的に図１６と同様であるが、ステップＳ
Ｔ２における現在ノードと現在エッジの格納処理は、図
３１に示すようになる。Instead of storing these tables in a relational database, it is also possible to implement them directly on a page management mechanism. The data storage process in this case is basically the same as that in FIG.
The process of storing the current node and the current edge at T2 is as shown in FIG.

【００９６】データ格納装置は、まず、現在ノードが選
択規則を満たすか否かをチェックする（ステップＳＴ９
１）。現在ノードが選択規則を満たさなければ、現在ノ
ードをノードのページに追加する（ステップＳＴ９
２）。次に、現在エッジが選択されていれば、そのラベ
ルが新規のラベルか否かをチェックする（ステップＳＴ
９３）。The data storage device first checks whether the current node satisfies the selection rule (step ST9).
1). If the current node does not satisfy the selection rule, the current node is added to the page of the node (step ST9).
2). Next, if an edge is currently selected, it is checked whether the label is a new label (step ST).
93).

【００９７】現在エッジのラベルが新規のラベルでなけ
れば、現在ノードを親ノードのページに追加し、現在ノ
ードのパスをパスのページに追加して（ステップＳＴ９
４）、現在エッジが指定された部分木の先頭に対応する
か否かをチェックする（ステップＳＴ９５）。そして、
現在エッジが部分木の先頭に対応しなければ、図１６の
処理に復帰する。If the label of the current edge is not a new label, the current node is added to the page of the parent node, and the path of the current node is added to the page of the path (step ST9).
4) It is checked whether or not the current edge corresponds to the head of the specified subtree (step ST95). And
If the current edge does not correspond to the head of the subtree, the process returns to the processing in FIG.

【００９８】また、ステップＳＴ９１において、現在ノ
ードが選択規則を満たせば、現在ノードの値をレコード
の対応するフィールドに格納し（ステップＳＴ９６）、
ステップＳＴ９３以降の処理を行う。If the current node satisfies the selection rule in step ST91, the value of the current node is stored in the corresponding field of the record (step ST96).
The processing after step ST93 is performed.

【００９９】また、ステップＳＴ９３において、現在エ
ッジのラベルが新規のラベルであれば、そのラベルをラ
ベルのページに追加して（ステップＳＴ９７）、ステッ
プＳＴ９４以降の処理を行う。In step ST93, if the label of the current edge is a new label, the label is added to the label page (step ST97), and the processing after step ST94 is performed.

【０１００】また、ステップＳＴ９５において、現在エ
ッジが部分木の先頭に対応していれば、その部分木の情
報を格納するレコードを生成し、そのレコードを初期化
する（ステップＳＴ９８）。次に、現在ノードとレコー
ドの間のエッジを生成し、そのエッジのラベルが新規の
ラベルか否かをチェックする（ステップＳＴ９９）。If the current edge corresponds to the head of the subtree in step ST95, a record for storing the information of the subtree is generated, and the record is initialized (step ST98). Next, an edge between the current node and the record is generated, and it is checked whether or not the label of the edge is a new label (step ST99).

【０１０１】生成されたエッジのラベルが新規のラベル
でなければ、そのエッジを親ノードのページと子ノード
のページに追加し（ステップＳＴ１００）、図１６の処
理に復帰する。生成されたエッジのラベルが新規のラベ
ルであれば、そのエッジのラベルをラベルのページに追
加し（ステップＳＴ１０１）、ステップＳＴ１００以降
の処理を行う。If the label of the generated edge is not a new label, the edge is added to the page of the parent node and the page of the child node (step ST100), and the process returns to FIG. If the generated edge label is a new label, the edge label is added to the label page (step ST101), and the processes from step ST100 are performed.

【０１０２】ステップＳＴ９２、ＳＴ９４、ＳＴ９７、
ＳＴ１００、およびＳＴ１０１において、格納領域が不
足する場合は、データ格納装置は、新たなページを確保
する。また、情報を追加する毎に、アクセスのためのイ
ンデックスを更新する。Steps ST92, ST94, ST97,
If the storage area is insufficient in ST100 and ST101, the data storage device secures a new page. Also, every time information is added, the index for access is updated.

【０１０３】また、図１６のステップＳＴ８における下
方向トラバース処理は、図２０と同様であるが、現在エ
ッジがレコードへのエッジである場合は、そのレコード
のレコードＩＤを格納するノードが現在ノードにセット
される。The downward traversing process in step ST8 of FIG. 16 is the same as that of FIG. 20, but if the current edge is an edge to a record, the node storing the record ID of the record is set to the current node. Set.

【０１０４】また、前述したように、リレーショナルデ
ータベースにおいて、複数のテーブルを共通属性でクラ
スタリングする機能を利用して、ノードテーブル、親ノ
ードテーブル、および子ノードテーブルをノードＩＤで
クラスタリングしてもよい。さらに、共通属性によるク
ラスタリングをページ管理機構上で実現することも可能
である。As described above, in the relational database, the node table, the parent node table, and the child node table may be clustered by the node ID by using a function of clustering a plurality of tables with a common attribute. Further, it is also possible to realize clustering based on a common attribute on a page management mechanism.

【０１０５】次に、本実施形態のデータ格納方法を、Ｘ
ＭＬで記述された構造化文書に適用した例について説明
する。図３２は、ＸＭＬ文書のデータの例を示してい
る。図３２において、例えば、最も外側のタグ＜ｐａｐ
ｅｒ＞と＜／ｐａｐｅｒ＞の間には、１つの論文に関す
る情報が記述されている。また、その内側のタグ＜ｉｄ
＞と＜／ｉｄ＞の間には、その論文のＩＤが記述され、
タグ＜ｔｉｔｌｅ＞と＜／ｔｉｔｌｅ＞の間には、その
論文のタイトルが記述されている。一般には、多数の論
文がデータベースに登録されるため、各論文について同
様のＸＭＬデータが作成される。Next, the data storage method of this embodiment is described as X
An example applied to a structured document described in ML will be described. FIG. 32 shows an example of data of an XML document. In FIG. 32, for example, the outermost tag <pap
er> and </ paper>, information on one paper is described. Also, the tag <id inside
> And </ id> describe the ID of the paper,
The title of the paper is described between the tags <title> and </ title>. Generally, since many articles are registered in the database, similar XML data is created for each article.

【０１０６】このように、ＸＭＬデータでは、複数のタ
グの包含関係が階層的なデータ構造を表しており、対応
する２つのタグの間のデータを階層的な部分木とみなせ
ば、ＸＭＬデータを木構造データに置き換えることがで
きる。例えば、図３２のＸＭＬデータを木構造で表す
と、図３７に示したデータが得られる。ここでは、タグ
の名称をエッジのラベルとして用いており、ルートノー
ド“１３”には、図３２の論文以外の論文の部分木も接
続されている。As described above, in the XML data, the inclusion relation of a plurality of tags represents a hierarchical data structure. If the data between two corresponding tags is regarded as a hierarchical subtree, the XML data is It can be replaced with tree structure data. For example, if the XML data of FIG. 32 is represented by a tree structure, the data shown in FIG. 37 is obtained. Here, the tag name is used as the label of the edge, and the root node “13” is also connected to a partial tree of a paper other than the paper shown in FIG.

【０１０７】このようにして、ＸＭＬデータを木構造デ
ータとみなすことにより、検索対象となる可能性のある
部分木の構造を指定したり、その部分木内のパス群を指
定したりすることができ、ＸＭＬデータをリレーショナ
ルデータベースに格納したり、ページに格納したりする
ことができる。図３７の木構造データの格納処理および
クラスタリング方法については、上述した通りである。As described above, by regarding the XML data as tree-structured data, it is possible to specify the structure of a subtree that may be a search target and to specify a path group in the subtree. , XML data can be stored in a relational database or stored on a page. The storage processing and the clustering method of the tree structure data in FIG. 37 are as described above.

【０１０８】図３２のＸＭＬデータを図２、図１１〜図
１４のようなテーブル形式でリレーショナルデータベー
スに格納した場合、テーブルへのアクセスはＳＱＬ等を
用いて行われる。例えば、‘著者名が“○△◇☆”であ
る論文のタイトルをｓｅｌｅｃｔせよ’という問い合わ
せを行う場合のＳＱＬ文は、図３３に示すようになる。When the XML data of FIG. 32 is stored in a relational database in a table format as shown in FIGS. 2, 11 to 14, access to the table is performed using SQL or the like. For example, an SQL sentence for making an inquiry of “select the title of a paper whose author name is“ ○ △ ◇ ☆ ”” is as shown in FIG.

【０１０９】さらに、ＸＭＬデータを選択的にレコード
化する場合、データ格納装置は、対話的に選択規則を生
成する。このとき、文書型定義（document type defini
tion，ＤＴＤ）やＸＭＬデータを解析して、図３４のよ
うなダイアログ画面をディスプレイに表示する。文書型
定義は、ＸＭＬ文書のタグ構造を定義するスキーマに対
応し、これを解析することで、木構造データにおけるパ
スを抽出することができる。Further, when the XML data is selectively recorded, the data storage device interactively generates a selection rule. At this time, the document type definition (document type defini
, DTD) and XML data, and a dialog screen as shown in FIG. 34 is displayed on the display. The document type definition corresponds to a schema that defines the tag structure of the XML document, and by analyzing this, a path in the tree structure data can be extracted.

【０１１０】図３４において、ユーザは、ボックス１１
に選択規則の名称を入力し、ボックス１２に適当なエッ
ジのラベルを入力すると、そのラベルを持つエッジ以下
の部分木がレコード化の対象として指定される。そし
て、その部分木に含まれるすべてのパスが、自動的にボ
ックス１３内に表示される。Referring to FIG. 34, the user
When the name of the selection rule is input to the box and an appropriate edge label is input to the box 12, a subtree below the edge having the label is designated as a record target. Then, all the paths included in the subtree are automatically displayed in the box 13.

【０１１１】次に、ユーザが表示されたパスのうち所望
のものを選択すると、選択されたパスが選択規則として
登録される。このような処理を対話的に繰り返すことに
より、複数の選択規則を生成することができる。そし
て、データ格納装置は、図７に示したように、生成され
た選択規則毎にレコード化を行い、レコードテーブルを
生成する。Next, when the user selects a desired path from the displayed paths, the selected path is registered as a selection rule. By repeating such processing interactively, a plurality of selection rules can be generated. Then, as illustrated in FIG. 7, the data storage device records each of the generated selection rules to generate a record table.

【０１１２】また、データ格納装置は、ＸＭＬデータ以
外にも、任意のＳＧＭＬ（standardgeneralized markup
language）で記述された構造化文書を、同様にして格
納することができる。例えば、ＨＴＭＬ（hypertext ma
rkup language ）データの場合も、同様にして、タグ構
造が木構造に置き換えられる。Further, the data storage device can store any SGML (standard generalized markup) in addition to the XML data.
language) can be stored in a similar manner. For example, HTML (hypertext ma
Similarly, in the case of rkup language) data, the tag structure is replaced with a tree structure.

【０１１３】ところで、本実施形態のデータ格納装置
は、図３５に示すような情報処理装置（コンピュータ）
を用いて構成することができる。図３５の情報処理装置
は、ＣＰＵ（中央処理装置）２１、メモリ２２、入力装
置２３、出力装置２４、外部記憶装置２５、媒体駆動装
置２６、およびネットワーク接続装置２７を備え、それ
らはバス２８により互いに接続されている。Incidentally, the data storage device of this embodiment is an information processing device (computer) as shown in FIG.
Can be used. 35 includes a CPU (central processing unit) 21, a memory 22, an input device 23, an output device 24, an external storage device 25, a medium drive device 26, and a network connection device 27. Connected to each other.

【０１１４】メモリ２２は、例えば、ＲＯＭ（read onl
y memory）、ＲＡＭ（random access memory）等を含
み、処理に用いられるプログラムとデータを格納する。
ＣＰＵ２１は、メモリ２２を利用してプログラムを実行
することにより、必要な処理を行う。The memory 22 is, for example, a ROM (read onl
y memory), RAM (random access memory), etc., and stores programs and data used for processing.
The CPU 21 performs necessary processing by executing a program using the memory 22.

【０１１５】入力装置２３は、例えば、キーボード、ポ
インティングデバイス、タッチパネル等であり、ユーザ
からの指示や情報の入力に用いられる。出力装置２４
は、例えば、ディスプレイ、プリンタ、スピーカ等であ
り、ユーザへのメッセージや処理結果の出力に用いられ
る。The input device 23 is, for example, a keyboard, a pointing device, a touch panel or the like, and is used for inputting an instruction or information from a user. Output device 24
Is, for example, a display, a printer, a speaker, and the like, and is used for outputting a message or a processing result to a user.

【０１１６】外部記憶装置２５は、例えば、磁気ディス
ク装置、光ディスク装置、光磁気ディスク（magneto-op
tical disk）装置等であり、上述した様々なテーブル等
を格納するデータベースとして用いられる。また、情報
処理装置は、この外部記憶装置２５に、上述のプログラ
ムとデータを保存しておき、必要に応じて、それらをメ
モリ２２にロードして使用することができる。The external storage device 25 is, for example, a magnetic disk device, an optical disk device, a magneto-optical disk (magneto-op).
tical disk) device, and is used as a database for storing the various tables described above. Further, the information processing apparatus can store the above-described program and data in the external storage device 25, and can load and use them in the memory 22 as needed.

【０１１７】媒体駆動装置２６は、可搬記録媒体２９を
駆動し、その記録内容にアクセスする。可搬記録媒体２
９としては、メモリカード、フロッピーディスク、ＣＤ
−ＲＯＭ（compact disk read only memory ）、光ディ
スク、光磁気ディスク等、任意のコンピュータ読み取り
可能な記録媒体が用いられる。ユーザは、この可搬記録
媒体２９に上述のプログラムとデータを格納しておき、
必要に応じて、それらをメモリ２２にロードして使用す
ることができる。The medium driving device 26 drives the portable recording medium 29 and accesses the recorded contents. Portable recording medium 2
9 is a memory card, floppy disk, CD
An arbitrary computer-readable recording medium such as a ROM (compact disk read only memory), an optical disk, and a magneto-optical disk is used. The user stores the above-described program and data in the portable recording medium 29,
If necessary, they can be loaded into the memory 22 and used.

【０１１８】ネットワーク接続装置２７は、任意のネッ
トワーク（回線）を介して外部の装置と通信し、通信に
伴うデータ変換を行う。情報処理装置は、必要に応じ
て、ネットワーク接続装置２７を介して上述のプログラ
ムとデータを外部の装置から受け取り、それらをメモリ
２２にロードして使用することができる。The network connection device 27 communicates with an external device via an arbitrary network (line) and performs data conversion accompanying the communication. The information processing device can receive the above-described program and data from an external device via the network connection device 27 as needed, and can use them by loading them into the memory 22.

【０１１９】図３６は、図３５の情報処理装置にプログ
ラムとデータを供給することのできるコンピュータ読み
取り可能な記録媒体を示している。可搬記録媒体２９や
外部のデータベース３０に保存されたプログラムとデー
タは、メモリ２２にロードされる。そして、ＣＰＵ２１
は、そのデータを用いてそのプログラムを実行し、必要
な処理を行う。FIG. 36 shows a computer-readable recording medium capable of supplying a program and data to the information processing apparatus shown in FIG. The programs and data stored in the portable recording medium 29 and the external database 30 are loaded into the memory 22. And the CPU 21
Executes the program using the data and performs necessary processing.

【０１２０】[0120]

【発明の効果】本発明によれば、半構造データベース等
において、木構造データの部分木を対象とするデータ検
索が行われたとき、その部分木のデータをまとめて読み
出すことができ、検索が高速化される。また、部分木の
データの一部をレコード化することにより、検索がさら
に高速化される。According to the present invention, when a data search is performed on a partial tree of tree structure data in a semi-structured database or the like, the data of the partial tree can be read out collectively, and the search can be performed. Speed up. Further, by converting a part of the data of the subtree into a record, the search is further speeded up.

[Brief description of the drawings]

【図１】本発明のデータ格納装置の原理図である。FIG. 1 is a principle diagram of a data storage device of the present invention.

【図２】第１のノードテーブルを示す図である。FIG. 2 is a diagram showing a first node table.

【図３】エッジテーブルを示す図である。FIG. 3 is a diagram showing an edge table.

【図４】第１の部分木のレコード化を示す図である。FIG. 4 is a diagram showing recording of a first partial tree.

【図５】第２の部分木のレコード化を示す図である。FIG. 5 is a diagram showing recording of a second partial tree.

【図６】最適化された部分木を示す図である。FIG. 6 is a diagram showing an optimized subtree.

【図７】第１の最適化された全体木を示す図である。FIG. 7 is a diagram showing a first optimized whole tree;

【図８】全体木の格納形式を示す図である。FIG. 8 is a diagram illustrating a storage format of an entire tree.

【図９】第１のデータ検索を示す図である。FIG. 9 is a diagram showing a first data search.

【図１０】第２のデータ検索を示す図である。FIG. 10 is a diagram showing a second data search.

【図１１】第１のラベルテーブルを示す図である。FIG. 11 is a diagram showing a first label table.

【図１２】第１の親ノードテーブルを示す図である。FIG. 12 is a diagram showing a first parent node table.

【図１３】第１の子ノードテーブルを示す図である。FIG. 13 is a diagram showing a first child node table.

【図１４】第１のパステーブルを示す図である。FIG. 14 is a diagram showing a first path table.

【図１５】各属性のデータ型と長さを示す図である。FIG. 15 is a diagram showing the data type and length of each attribute.

【図１６】データ格納処理のフローチャートである。FIG. 16 is a flowchart of a data storage process.

【図１７】第１のノード／エッジ格納処理のフローチャ
ートである。FIG. 17 is a flowchart of a first node / edge storage process.

【図１８】第１の下方向トラバース処理のフローチャー
トである。FIG. 18 is a flowchart of a first downward traverse process.

【図１９】第２のノード／エッジ格納処理のフローチャ
ートである。FIG. 19 is a flowchart of a second node / edge storage process.

【図２０】第２の下方向トラバース処理のフローチャー
トである。FIG. 20 is a flowchart of a second downward traverse process.

【図２１】第３のノード／エッジ格納処理のフローチャ
ートである。FIG. 21 is a flowchart of a third node / edge storing process.

【図２２】第３の下方向トラバース処理のフローチャー
トである。FIG. 22 is a flowchart of a third downward traverse process.

【図２３】２つの論文に関する木構造データを示す図で
ある。FIG. 23 is a diagram showing tree structure data for two papers.

【図２４】第２の最適化された全体木を示す図である。FIG. 24 is a diagram showing a second optimized whole tree.

【図２５】第２のノードテーブルを示す図である。FIG. 25 is a diagram showing a second node table.

【図２６】第２のラベルテーブルを示す図である。FIG. 26 is a diagram showing a second label table.

【図２７】第２の親ノードテーブルを示す図である。FIG. 27 is a diagram showing a second parent node table.

【図２８】第２の子ノードテーブルを示す図である。FIG. 28 is a diagram showing a second child node table.

【図２９】第２のパステーブルを示す図である。FIG. 29 is a diagram showing a second path table.

【図３０】第４のノード／エッジ格納処理のフローチャ
ートである。FIG. 30 is a flowchart of a fourth node / edge storage process.

【図３１】第５のノード／エッジ格納処理のフローチャ
ートである。FIG. 31 is a flowchart of a fifth node / edge storage process.

【図３２】ＸＭＬデータを示す図である。FIG. 32 is a diagram showing XML data.

【図３３】ＳＱＬ文を示す図である。FIG. 33 is a diagram showing an SQL sentence.

【図３４】ダイアログ画面を示す図である。FIG. 34 is a diagram showing a dialog screen.

【図３５】情報処理装置の構成図である。FIG. 35 is a configuration diagram of an information processing apparatus.

【図３６】記録媒体を示す図である。FIG. 36 is a diagram showing a recording medium.

【図３７】複数の論文に関する木構造データを示す図で
ある。FIG. 37 is a diagram showing tree structure data for a plurality of papers.

[Explanation of symbols]

１指定手段２抽出手段３格納手段１１、１２、１３ボックス２１ＣＰＵ２２メモリ２３入力装置２４出力装置２５外部記憶装置２６媒体駆動装置２７ネットワーク接続装置２８バス２９可搬記録媒体３０データベース DESCRIPTION OF SYMBOLS 1 Designation means 2 Extraction means 3 Storage means 11, 12, 13 Box 21 CPU 22 Memory 23 Input device 24 Output device 25 External storage device 26 Medium drive device 27 Network connection device 28 Bus 29 Portable recording medium 30 Database

フロントページの続き (72)発明者久保田和己神奈川県川崎市中原区上小田中４丁目１番１号富士通株式会社内 (72)発明者野口泰生神奈川県川崎市中原区上小田中４丁目１番１号富士通株式会社内Ｆターム(参考） 5B075 NK04 NK44 NK46 NR06 QT06 QT10 5B082 BA09 GA02 Continued on the front page (72) Inventor Kazumi Kubota 4-1-1, Kamidadanaka, Nakahara-ku, Kawasaki-shi, Kanagawa Prefecture Inside Fujitsu Limited (72) Inventor Yasuo Noguchi 4-1-1, Kamiodanaka, Nakahara-ku, Kawasaki-shi, Kanagawa F-term in Fujitsu Limited (Reference) 5B075 NK04 NK44 NK46 NR06 QT06 QT10 5B082 BA09 GA02

Claims

[Claims]

1. A designating means for designating a subtree structure which may be a search target in tree structure data, and an extracting means for extracting a subtree conforming to a designated structure from the tree structure data. A data storage device, comprising: storage means for collectively storing information of the extracted partial trees.

2. The method according to claim 1, wherein the specifying unit individually specifies one or more paths in the subtree that may be the search target,
2. The method according to claim 1, wherein the extraction unit extracts a node at a terminal of a designated path from the tree structure data, and the storage unit collectively stores information of the extracted nodes. Data storage device.

3. A data storage device for storing tree-structured data separated into nodes and edges, wherein a designation unit that designates a structure of a subtree that may be a search target in the tree-structured data; Extraction means for extracting a subtree conforming to a specified structure from the tree structure data; node storage means for collectively storing node information of the extracted subtree; and information on edges of the extracted subtree. And an edge storage means for storing the data collectively.

4. The storage area of the node storage means is implemented as one relation of a relational database, and the storage area of the edge storage means is implemented as one or more relations of the relational database. 4. The data storage device according to claim 3, wherein:

5. The specifying means regards the document data structured by tags as the tree structure data, and regards information between two tags in the document data as a subtree and serves as the search target. 4. The data storage device according to claim 3, wherein a structure of a possible partial tree is specified.

6. A record storing means for storing, as a record, a part of information of a partial tree which may be a search target, wherein the specifying means comprises a sub-tree in the partial tree which may be a search target. One or more paths are designated as a path group, the extraction means extracts a node at the end of the specified path from the tree structure data, and the record storage means records information of the extracted nodes as a record. 4. The data storage device according to claim 3, wherein the edge storage unit omits storage of information on edges forming a path corresponding to a record in the record storage unit.

7. The data storage device according to claim 6, wherein a storage area of said record storage means is implemented as one relation of a relational database.

8. The storage area of the node storage means is implemented as one relation of a relational database. The storage area of the edge storage means is implemented as one or more relations of the relational database. 7. The data storage device according to claim 6, wherein the storage area is mounted as one relation of the relational database.

9. When the specifying unit specifies a plurality of path groups, the information processing apparatus further includes designation information storage means for storing designation information of the plurality of path groups. 7. The data storage device according to claim 6, wherein a record corresponding to the data is stored.

10. The designating means considers document data structured by tags as the tree structure data, and regards information between two tags in the document data as a subtree and serves as the search target. 7. The data storage device according to claim 6, wherein the path group in a possible partial tree is specified.

11. A data storage device comprising: extraction means for extracting a partial tree from each layer of tree structure data; and storage means for collectively storing information of the extracted partial trees.

12. A recording medium on which a program for a computer is recorded, wherein in the tree structure data, a step of specifying a structure of a subtree that may be a search target is specified. A computer-readable recording medium which stores a program for causing the computer to execute a process including a step of extracting a subtree conforming to the extracted structure, and a step of collectively storing information of the extracted subtree.

13. A recording medium for recording tree structure data for a computer, wherein when the tree structure data specifies a structure of a subtree that may be a search target of the computer,
A computer-readable recording medium in which information of a subtree conforming to a specified structure is collectively recorded so that the computer can access it.

14. In the tree structure data, a structure of a subtree that may be a search target is specified, and a subtree conforming to the specified structure is extracted from the tree structure data. A data storage method characterized by storing the above information collectively.