JP2001318935A

JP2001318935A - Information processor, its method, recording medium recording information processing software, and relational database

Info

Publication number: JP2001318935A
Application number: JP2000135432A
Authority: JP
Inventors: Takashi Komaki; 崇史小牧; Masashi Komaki; 正史小牧
Original assignee: KOMAKKUSU KK
Current assignee: KOMAKKUSU KK
Priority date: 2000-05-09
Filing date: 2000-05-09
Publication date: 2001-11-16

Abstract

PROBLEM TO BE SOLVED: To attain effective structure retrieval for contents of a document expressed by a data description language having hierarchical structure by using a relational database management system(RDBMS). SOLUTION: A conversion/storage part 1 is a means for converting the format of an XML document D and storing the converted result in a relational database(RDB). Concretely tags and a text between the tags are successively extracted as a pause from the provided XML document D, Respective lines including a document number different in each XML document D, a pause number expressing the order of each pause and the contents of the pause are recorded in a table of the RDB. A retrieval processing part 2 is a means for executing the processing such as retrieval/change of the RDB table converted from the XML document D and restoration of the original XML document D.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、階層構造を持つデ
ータ記述言語で表された文書の内容をＲＤＢＭＳで処理
する技術の改良に関するもので、効果的な構造検索を可
能とし、特にＸＭＬ文書に適したものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an improvement in technology for processing the contents of a document expressed in a data description language having a hierarchical structure by an RDBMS. It is suitable.

【０００２】[0002]

【従来の技術】近年、情報処理技術の発達に伴い、階層
構造を持つデータを明快な形式で表現するデータ記述言
語が提案されており、その典型としてＸＭＬが提案され
ている。ＸＭＬは、汎用的なデータ表現のためのデータ
記述言語であり、ＸＭＬに従って作成された文書をＸＭ
Ｌ文書と呼ぶ。ＸＭＬ文書の各部分には、人間にも読み
やすい所定のテキスト形式で、アプリケーションソフト
ウェアに依存せず意味のある情報が記述される。2. Description of the Related Art In recent years, with the development of information processing technology, a data description language for expressing data having a hierarchical structure in a clear format has been proposed, and XML has been proposed as a typical example thereof. XML is a data description language for general-purpose data representation, and converts a document created according to XML into XML.
It is called an L document. In each part of the XML document, meaningful information is described in a predetermined text format that is easy for humans to read and does not depend on application software.

【０００３】このＸＭＬでは、文書に含まれる各データ
は、タグと呼ばれる所定の識別用文字列にはさんで記述
される。タグは、相互に対応する開始タグと閉じタグの
ペアであり、このうち、開始タグの一般的な書式構造は＜タグ名［属性名＝”値” 属性名＝”値”．．．］
＞である。この開始タグ内には、オプションとして属性リ
ストを付加することができ、上の例において、［］内が
１以上の属性からなる属性リストである。この属性リス
トは１つ以上の属性名＝”値” のペアから構成され、各ペアは、属性名＝’値’ という形で記述することもできる。In this XML, each data included in a document is described between predetermined identification character strings called tags. A tag is a pair of a start tag and a close tag that correspond to each other. Among these, a general format structure of the start tag is <tag name [attribute name = "value" attribute name = "value". . . ]
>. In the start tag, an attribute list can be added as an option. In the above example, [] is an attribute list including one or more attributes. This attribute list is composed of one or more attribute name = “value” pairs, and each pair can be described in the form of attribute name = “value”.

【０００４】また、閉じタグは、対応する開始タグと同
じタグ名を用い、その一般的な書式構造は＜／タグ名＞である。The closing tag uses the same tag name as the corresponding start tag, and its general format structure is </ tag name>.

【０００５】現実の事象に関する情報は階層構造を持つ
ことが多いが、上記のようなＸＭＬでは、タグを入れ子
にすることにより階層構造を自由に記述することが可能
である。[0005] Information about actual events often has a hierarchical structure, but in the above XML, it is possible to freely describe the hierarchical structure by nesting tags.

【０００６】ところで、ＸＭＬそのものはデータ表現用
の形式であるため、大量のＸＭＬ文書について、デジタ
ル媒体上での効果的な保管及び検索を実現するために
は、なんらかの具体的な手段が必要になる。ここで、現
在、大量のデータを扱うための技術としてはＲＤＢ（Re
lational Database ：リレーショナルデータベース）
と、このＲＤＢを管理及び操作するＲＤＢＭＳ（Relati
onal Database ManagementSystem ：リレーショナルデ
ータベース管理システム）の利用が一般的であり、その
ため、ＸＭＬ文書の保管・検索のためにＲＤＢＭＳを利
用したいという要求が発生してくる。[0006] Since XML itself is a format for expressing data, some specific means is required to realize effective storage and retrieval of a large amount of XML documents on a digital medium. . Here, as a technology for handling a large amount of data, RDB (Re
lational Database: relational database)
And an RDBMS (Relati) that manages and operates this RDB
Onal Database Management System (relational database management system) is generally used, and therefore, a request to use an RDBMS for storing and retrieving XML documents arises.

【０００７】ここで、ＸＭＬは階層構造を内包するデー
タの表現に適しているが、ＲＤＢのデータ構造は２次元
の「表」モデルに基づいており、ＸＭＬとは基本構造が
異なるため、ＸＭＬ文書をＲＤＢＭＳで取り扱うために
は何らかの工夫が必要となる。Here, XML is suitable for expressing data including a hierarchical structure, but the data structure of RDB is based on a two-dimensional "table" model and has a different basic structure from XML. Requires some ingenuity in order to handle by the RDBMS.

【０００８】このようにＸＭＬ文書をＲＤＢに保管する
ための従来技術としては、ＸＭＬ文書中の特定の要素
を、ＲＤＢ上の特定の表（テーブルとも呼ぶ）のフィー
ルドにマッピングするものが知られている。As a conventional technique for storing an XML document in an RDB as described above, a technique of mapping a specific element in the XML document to a field of a specific table (also called a table) in the RDB is known. I have.

【０００９】[0009]

【発明が解決しようとする課題】しかしながら、上記の
ような従来技術では、まず、ＸＭＬ文書の種類毎に、事
前に個別にマッピング定義の作業が必要になり、煩雑で
あるという問題点があった。However, in the above-mentioned prior art, there is a problem in that it is necessary to first individually perform mapping definition work for each type of XML document, which is complicated. .

【００１０】また、上記のような従来技術では、ＲＤＢ
へ格納した際には既にもとのＸＭＬ文書の階層構造は失
われており、構造検索を行うことができないという問題
点も存在した。すなわち、ＸＭＬを利用するシステムで
は、階層構造というＸＭＬの特徴を活した検索、具体的
には「構造検索」が可能であることが望まれる。[0010] In the prior art as described above, the RDB
When the XML document is stored, the hierarchical structure of the original XML document is already lost, and there is a problem that the structure cannot be searched. That is, in a system using XML, it is desired that a search utilizing a characteristic of XML called a hierarchical structure, specifically, “structure search” can be performed.

【００１１】この構造検索は、階層構造を意識した検索
であり、例えば・指定した階層構造を持つ要素群を抜き出す・ある要素を下層に持つ要素群だけを検索の対象とするなどの検索の操作が容易に行えることが望まれる。This structure search is a search in consideration of a hierarchical structure. For example, a search operation for extracting a group of elements having a specified hierarchical structure and a search for only a group of elements having a certain element in a lower layer. It is desired that this can be easily performed.

【００１２】これに対して、上記のような従来技術で
は、ＸＭＬ文書の内容をＲＤＢ上の平板なデータ構造に
マッピングするため、階層構造が失われ、構造検索が不
可能であった。On the other hand, in the above-described prior art, since the contents of the XML document are mapped to a flat data structure in the RDB, the hierarchical structure is lost, and the structure cannot be searched.

【００１３】本発明は、上記のような従来技術の問題点
を解決するために提案されたもので、その目的は、階層
構造を持つデータ記述言語で表された文書の内容につい
て、ＲＤＢＭＳ上で効果的な構造検索を可能にする情報
処理の技術すなわち情報処理装置及び方法、情報処理用
ソフトウェアを記録した記録媒体並びにリレーショナル
データベースを提供することである。また、本発明の他
の目的は、ＸＭＬ文書に対する効果的な構造検索を実現
する情報処理の技術を提供することである。The present invention has been proposed to solve the above-mentioned problems of the prior art. The purpose of the present invention is to provide a method for converting the contents of a document expressed in a data description language having a hierarchical structure into an RDBMS. An object of the present invention is to provide an information processing technology that enables an effective structure search, that is, an information processing apparatus and method, a recording medium on which information processing software is recorded, and a relational database. Another object of the present invention is to provide an information processing technique for realizing an effective structure search for an XML document.

【００１４】[0014]

【課題を解決するための手段】上記の目的を達成するた
め、請求項１の情報処理装置は、与えられた文書から、
タグと、タグ間のテキストと、を句切として順次取り出
す手段と、リレーショナルデータベースの表に、文書に
よって異なる文書番号と、前記句切ごとの順番を表す句
切番号と、取り出した前記句切の内容と、を含む各行を
記録する手段と、を備えたことを特徴とする。請求項１
４の情報処理方法は、請求項１の発明を方法という見方
から捉えたもので、において、与えられた文書から、タ
グと、タグ間のテキストと、を句切として順次取り出す
ステップと、リレーショナルデータベースの表に、文書
によって異なる文書番号と、前記句切ごとの順番を表す
句切番号と、取り出した前記句切の内容と、を含む各行
を記録するステップと、を含むことを特徴とする。請求
項２０の発明は、請求項１,１４の発明を、コンピュー
タのソフトウェアを記録した機械読取可能な記録媒体と
いう見方から捉えたもので、コンピュータを用いて情報
を処理する情報処理用ソフトウェアを記録した記録媒体
において、そのソフトウェアは前記コンピュータに、与
えられた文書から、タグと、タグ間のテキストと、を句
切として順次取り出させ、リレーショナルデータベース
の表に、文書によって異なる文書番号と、前記句切ごと
の順番を表す句切番号と、取り出した前記句切の内容
と、を含む各行を記録させることを特徴とする。請求項
２１の発明は、請求項１,１４,２０の発明をリレーショ
ナルデータベースという見方から捉えたもので、与えら
れた文書から順次取り出されるタグと、タグ間のテキス
トと、を句切とし、文書によって異なる文書番号と、前
記句切ごとの順番を表す句切番号と、取り出した前記句
切の内容と、を含む各行が表に記録されたことを特徴と
する。請求項２の発明は、請求項１記載の情報処理装置
において、指定された文書番号に基づいて前記文書を単
位とした処理を行う手段を備えたことを特徴とする。請
求項１,１４,２０,２１,２の発明では、階層構造を持つ
データ記述言語で表された文書、例えばＸＭＬ文書の内
容を、タグと、タグ間のテキストと、を単位としてＲＤ
Ｂに格納する。これにより、ＲＤＢＭＳ上でも、もとの
ＸＭＬ文書の構造に関係なく、統一的な手順でデータの
格納・取り出し・削除等の処理を行うことが可能とな
る。特に、事前かつ個別のマッピング定義作業が不要と
なり、また、タグの入れ子で表現される階層構造をＲＤ
Ｂ上でも保持し、構造検索を行うことも可能となる。ま
た、ＸＭＬ文書が句切ごとに分解されてＲＤＢに格納さ
れるため、ＲＤＢに格納できる文字列の長さ制限が意味
的に緩和される。さらに、各句切の元となった文書を文
書番号で識別することで、複数の文書に由来する句切を
単一の表中に集積していても、文書を単位とした句切の
取り出し・削除・検索等の処理が可能となる。また、Ｒ
ＤＢでは格納した行の取り出し順序が記録順と同じであ
ることは保証されないが、シーケンシャルな句切番号に
より、元のＸＭＬ文書中と同じ順序で句切を取り出すこ
とが可能となる。According to a first aspect of the present invention, there is provided an information processing apparatus comprising:
Means for sequentially taking out tags and text between tags as breaks, a table of a relational database, a document number which differs depending on the document, a break number indicating the order of each break, and the extracted breaks. And means for recording each line including the content. Claim 1
An information processing method according to claim 4 is a method in which the invention of claim 1 is grasped from the viewpoint of a method. In the information processing method, a tag and a text between the tags are sequentially extracted from a given document as a phrase, and a relational database is provided. Recording each line including a document number that differs depending on the document, a break number indicating the order of each break, and the extracted contents of the break. According to a twentieth aspect of the present invention, the inventions of the first and fourteenth aspects are viewed from the viewpoint of a machine-readable recording medium on which computer software is recorded, and information processing software for processing information using a computer is recorded. In the recorded recording medium, the software causes the computer to sequentially extract tags and text between the tags from a given document as a phrase, and stores in the table of the relational database a document number that differs for each document and the phrase. Each line including a cut-off number indicating the order of each cut and the content of the extracted cut-off is recorded. The invention of claim 21 is based on the viewpoint of a relational database of the invention of claims 1, 14, and 20, wherein tags sequentially extracted from a given document and text between the tags are cut off, and the document Each row including a document number that differs depending on the type, a break number indicating the order of each break, and the extracted contents of the break is recorded in a table. According to a second aspect of the present invention, in the information processing apparatus according to the first aspect, there is provided an information processing apparatus, further comprising means for performing processing in units of the document based on a designated document number. According to the invention of claims 1, 14, 20, 21, and 2, the content of a document represented by a data description language having a hierarchical structure, for example, an XML document is converted into a RD by using a tag and a text between the tags as a unit.
B. This makes it possible to perform processing such as data storage, retrieval, and deletion on the RDBMS in a unified procedure regardless of the structure of the original XML document. In particular, it is not necessary to define individual mappings in advance, and the hierarchical structure expressed by nesting tags can be changed to RD.
It is also possible to hold it on B and perform a structure search. In addition, since the XML document is decomposed for each phrase and stored in the RDB, the limitation on the length of a character string that can be stored in the RDB is reduced in meaning. Furthermore, by identifying the document that is the source of each punctuation by the document number, even if punctuations derived from multiple documents are accumulated in a single table, the punctuation can be extracted in units of documents. -Processing such as deletion and search becomes possible. Also, R
The DB does not guarantee that the order in which the stored rows are retrieved is the same as the order in which they are recorded, but it is possible to retrieve the breaks in the same order as in the original XML document by using the sequential break numbers.

【００１５】請求項３の発明は、請求項１又は２記載の
情報処理装置において、前記各行に、もとの前記文書中
の階層構造中においてルートから各句切に至るパスに存
在する各タグで表したテキストパスを含むことを特徴と
する。請求項１５の発明は、請求項３の発明を方法とい
う見方から捉えたもので、請求項１４記載の情報処理方
法において、前記各行に、もとの前記文書中の階層構造
中においてルートから各句切に至るパスに存在する各タ
グで表したテキストパスを含むことを特徴とする。請求
項４の発明は、請求項３記載の情報処理装置において、
前記テキストパスに基づいて、指定された階層以下の句
切を対象とした処理を行う手段を備えたことを特徴とす
る。請求項３,１５,４の発明では、テキストパスによ
り、もとの文書における階層構造中の位置や範囲を自由
に指定してデータ検索などの処理が可能となる。この検
索で文書番号を指定することも当然可能であり、以下同
様である。According to a third aspect of the present invention, in the information processing apparatus according to the first or second aspect, each tag existing in a path from a root to each punctuation in the original hierarchical structure in the document is provided in each line. It is characterized by including a text path represented by. According to a fifteenth aspect of the present invention, the invention of the third aspect is grasped from the viewpoint of a method. In the information processing method according to the fourteenth aspect, each line is added to each line from the root in the hierarchical structure in the original document. It is characterized in that it includes a text path represented by each tag existing in the path leading to the punctuation. According to a fourth aspect of the present invention, in the information processing apparatus according to the third aspect,
The information processing apparatus further comprises means for performing a process for a punctuation at a specified level or lower based on the text path. According to the third, fifteenth, and fourth aspects of the present invention, the text path enables processing such as data search by freely designating the position or range in the hierarchical structure of the original document. It is of course possible to specify a document number in this search, and so on.

【００１６】請求項５の発明は、請求項３又は４記載の
情報処理装置において、前記タグのタグ名と、タグ名ご
とのタグ番号との対照テーブルを用い、前記テキストパ
スは、前記タグ番号を用いて表現されることを特徴とす
る。請求項１６の発明は、請求項５の発明を方法という
見方から捉えたもので、請求項１４又は１５記載の情報
処理方法において、前記タグのタグ名と、タグ名ごとの
タグ番号との対照テーブルを用い、前記テキストパス
は、前記タグ番号を用いて表現されることを特徴とす
る。請求項５,１６の発明では、句切の階層構造中の位
置を表すテキストパスが、タグ番号で表されるので、タ
グ名が長い場合でもテキストパスのデータ量が最小限で
済む。According to a fifth aspect of the present invention, in the information processing apparatus according to the third or fourth aspect, a comparison table of a tag name of the tag and a tag number for each tag name is used, and the text path is the tag number. It is characterized by being expressed using According to a sixteenth aspect of the present invention, the invention of the fifth aspect is grasped from the viewpoint of a method. In the information processing method according to the fourteenth or fifteenth aspect, the tag name of the tag is compared with a tag number for each tag name. Using a table, the text path is expressed using the tag number. According to the fifth and sixteenth aspects of the present invention, the text path indicating the position of the phrase in the hierarchical structure is represented by the tag number, so that even if the tag name is long, the data amount of the text path can be minimized.

【００１７】請求項６の発明は、請求項５記載の情報処
理装置において、前記タグ番号は予め決められた桁数で
あることを特徴とする。請求項６の発明では、例えば４
桁に達しない数字の先頭に「０」を補うなどしてタグ番
号を固定長とすることにより、ＲＤＢＭＳの検索におけ
る１文字一致の機能等を利用し、「１階層分だけ何でも
良い」といった自由度の高い検索が容易になる。According to a sixth aspect of the present invention, in the information processing apparatus according to the fifth aspect, the tag number is a predetermined number of digits. In the invention of claim 6, for example, 4
By fixing the tag number to a fixed length by supplementing “0” to the beginning of the number that does not reach the digit, the function of matching one character in the RDBMS search can be used, and freedom such as “anything for one layer is acceptable” is possible. A high-level search becomes easy.

【００１８】請求項７の発明は、請求項１から６のいず
れか１つに記載の情報処理装置において、前記タグに含
まれる属性を格納する属性テーブルを用いることを特徴
とする。請求項１７の発明は、請求項７の発明を方法と
いう見方から捉えたもので、請求項１４から１６のいず
れか１つに記載の情報処理方法において、前記タグに含
まれる属性を格納する属性テーブルを用いることを特徴
とする。請求項８の発明は、請求項７記載の情報処理装
置において、前記属性テーブルから、指定された属性名
又は属性値に係る各行を検索する手段を備えたことを特
徴とする。請求項７,１７,８の発明では、タグに含まれ
る属性を属性テーブルに格納することにより、句切内容
の文字数及びデータ量が最小限で済み、また、属性テー
ブルから所望の属性や属性値を持つ句切を自由に検索す
ることが可能となる。According to a seventh aspect of the present invention, in the information processing apparatus according to any one of the first to sixth aspects, an attribute table for storing an attribute included in the tag is used. According to a seventeenth aspect of the present invention, in the information processing method according to any one of the fourteenth to sixteenth aspects, an attribute for storing an attribute included in the tag is provided. It is characterized by using a table. According to an eighth aspect of the present invention, in the information processing apparatus according to the seventh aspect, there is provided a means for searching the attribute table for each row relating to a specified attribute name or attribute value. According to the seventh, eighth, and seventh aspects of the present invention, the attribute included in the tag is stored in the attribute table, so that the number of characters and the data amount of the punctuation content can be minimized. It is possible to freely search for punctuation with.

【００１９】請求項９の発明は、請求項１から８のいず
れか１つに記載の情報処理装置において、前記各行は、
前記各句切の種類を表す句切タイプを含むことを特徴と
する。請求項１８の発明は、請求項９の発明を方法とい
う見方から捉えたもので、請求項１４から１７のいずれ
か１つに記載の情報処理方法において、前記各行は、前
記各句切の種類を表す句切タイプを含むことを特徴とす
る。請求項１０の発明は、請求項９記載の情報処理装置
において、前記句切タイプに基づいて、前記句切の種類
に基づいた処理を行う手段を備えたことを特徴とする。
請求項９,１８,１０の発明では、句切の種類が句切タイ
プによって表される。このため、句切内容の中で特定種
類の括弧等で句切の種類を表す必要がないので、句切内
容の文字数及びデータ量が最小限で済み、また、句切の
種類などに基づく複雑な検索の実装がより容易になる。According to a ninth aspect of the present invention, in the information processing apparatus according to any one of the first to eighth aspects, each of the rows is
It is characterized by including a punctuation type indicating the type of each punctuation. The invention of claim 18 captures the invention of claim 9 from the viewpoint of a method, and in the information processing method according to any one of claims 14 to 17, each of the lines is a type of each of the punctuation. Is included. According to a tenth aspect of the present invention, in the information processing apparatus according to the ninth aspect, there is provided means for performing a process based on the type of the punctuation based on the punctuation type.
According to the ninth, eighteenth, and tenth aspects, the type of punctuation is represented by the punctuation type. For this reason, it is not necessary to represent the type of the punctuation with a specific kind of parenthesis in the punctuation content, so that the number of characters and the data amount of the punctuation content can be minimized, and the complexity based on the type of the punctuation can be reduced. Implementation of simple search becomes easier.

【００２０】請求項１１の発明は、請求項９又は１０記
載の情報処理装置において、属性名を一種のタグ名とし
て扱い、タグ名が属性か否かを前記句切タイプにより区
別することを特徴とする。請求項１１の発明では、属性
名をタグ名としてテキストパスの末尾に記述できる。こ
れにより属性名を句切内容に記述する必要がないため句
切内容の文字数及びデータ量が最小限で済み、また、テ
キストパスと属性名の複合条件による検索が実現容易と
なる。According to an eleventh aspect of the present invention, in the information processing apparatus according to the ninth or tenth aspect, the attribute name is treated as a kind of tag name, and whether or not the tag name is an attribute is distinguished by the punctuation type. And According to the eleventh aspect, the attribute name can be described as a tag name at the end of the text path. This eliminates the need to describe the attribute name in the punctuation content, thereby minimizing the number of characters and the data amount of the punctuation content, and also facilitates a search using a complex condition of a text path and an attribute name.

【００２１】請求項１２の発明は、請求項１から１１の
いずれか１つに記載の情報処理装置において、前記各行
は、もとの前記文書中の前記階層構造中においてルート
から各句切に至るパスに存在する各句切を句切番号で表
した句切番号パスを含むことを特徴とする。請求項１９
の発明は、請求項１２の発明を方法という見方から捉え
たもので、請求項１４から１８のいずれか１つに記載の
情報処理方法において、前記各行は、もとの前記文書中
の前記階層構造中においてルートから各句切に至るパス
に存在する各句切を句切番号で表した句切番号パスを含
むことを特徴とする。請求項１３の発明は、請求項１２
記載の情報処理装置において、前記句切番号パスを用い
て、前記文書中の階層構造内における絶対的位置に基づ
いた処理を行う手段を備えたことを特徴とする。請求項
１２,１９,１３の発明では、もとの文書における階層構
造中の各句切の位置が、ルートからその句切に至るパス
に存在する句切の句切番号の列によって絶対的に表示さ
れるので、同じ句切が複数箇所にあっても、それら句切
の配下にそれぞれ存在する別々の句切を明確に区別する
ことができる。このため、条件検索の結果として特定の
句切番号が得られたときにその親となる句切を順に辿っ
ていくなど多様な情報処理が容易になる。According to a twelfth aspect of the present invention, in the information processing apparatus according to any one of the first to eleventh aspects, each of the lines is separated from a root in the hierarchical structure of the original document by a respective phrase. It is characterized by including a punctuation number path in which each punctuation present in the path leading to is indicated by a punctuation number. Claim 19
The invention according to claim 12, wherein the invention according to claim 12 is grasped from the viewpoint of a method. In the information processing method according to any one of claims 14 to 18, each of the lines corresponds to the hierarchy in the original document. It is characterized in that it includes a pruning number path in which each pruning existing in the path from the root to each pruning in the structure is represented by a pruning number. The invention of claim 13 is the invention of claim 12
The information processing apparatus according to claim 1, further comprising means for performing a process based on an absolute position in a hierarchical structure in the document using the punctuation number path. According to the twelfth, nineteenth, and thirteenth aspects, the position of each break in the hierarchical structure in the original document is absolutely determined by a string of break numbers in the path from the root to the break. Since the same punctuation is displayed at a plurality of places, it is possible to clearly distinguish different punctuations that exist under the punctuation. For this reason, various information processing becomes easy, for example, when a specific punctuation number is obtained as a result of the condition search, the parent punctuation is sequentially traced.

【００２２】[0022]

【発明の実施の形態】次に、本発明の実施の形態（以下
「実施形態」と呼ぶ）について図面を参照して具体的に
説明する。なお、本実施形態は、典型的にはコンピュー
タをソフトウェアで制御することで実現される。この場
合のソフトウェアは、コンピュータのハードウェアを物
理的に活用することで本発明の作用効果を実現するもの
で、また、従来技術との共通部分には従来技術も適用さ
れる。Next, embodiments of the present invention (hereinafter, referred to as "embodiments") will be specifically described with reference to the drawings. Note that this embodiment is typically realized by controlling a computer with software. The software in this case realizes the operation and effect of the present invention by physically utilizing the hardware of the computer, and the conventional technology is applied to the common parts with the conventional technology.

【００２３】但し、この場合のハードウェアやソフトウ
ェアの種類や構成、ソフトウェアで処理する範囲などは
各種変更可能であり、例えばこのようなソフトウェアを
記録したハードディスクドライブ・ディスクパック・Ｃ
Ｄ−ＲＯＭなどの記録媒体は単独でも本発明の一態様で
ある。このため、以下の説明では、本発明及び実施形態
の各機能を実現する仮想的回路ブロックを用いる。ま
た、実施形態に含まれる各部分について、ＬＡＮなどの
ネットワーク構成を採用することも可能である。However, in this case, the types and configurations of hardware and software, the range of processing by software, and the like can be variously changed. For example, a hard disk drive, a disk pack,
A recording medium such as a D-ROM alone is one embodiment of the present invention. Therefore, in the following description, virtual circuit blocks that realize the functions of the present invention and the embodiments are used. Further, each part included in the embodiment can adopt a network configuration such as a LAN.

【００２４】〔１．構成〕本実施形態は、本発明におけ
る情報処理装置である情報処理システムと、その上で実
行される情報処理方法を示すもので、情報処理用ソフト
ウェアを記録した記録媒体及びリレーショナルデータベ
ースとして把握することもできる。[1. Configuration] This embodiment shows an information processing system which is an information processing apparatus according to the present invention and an information processing method executed on the information processing system, and can be grasped as a recording medium recording information processing software and a relational database. Can also.

【００２５】まず、本実施形態は、図１の機能ブロック
図に示すように、リレーショナルデータベース（ＲＤＢ
と表す）と、このＲＤＢのためのリレーショナルデータ
ベース管理システム（ＲＤＢＭＳと表す）と、ＸＭＬ変
換等処理部Ｘと、を備えている。ここでＸＭＬ変換等処
理部Ｘは、与えられるＸＭＬ文書ＤについてＲＤＢＭＳ
を通じ、ＲＤＢへの格納及び検索や復元等の処理を行う
部分であり、変換格納部１と検索等処理部２と、を備え
ている。First, in this embodiment, as shown in the functional block diagram of FIG. 1, a relational database (RDB)
), A relational database management system (represented as RDBMS) for the RDB, and a processing unit X for XML conversion and the like. Here, the XML conversion etc. processing unit X performs RDBMS on the given XML document D.
And performs processing such as storage in the RDB, search and restoration, and the like, and includes a conversion storage unit 1 and a search and processing unit 2.

【００２６】このうち変換格納部１は、ＸＭＬ文書Ｄの
形式を変換してＲＤＢに格納する手段であり、具体的に
は、与えられたＸＭＬ文書Ｄから、タグと、タグ間のテ
キストと、を句切として順次取り出し、ＲＤＢの表に、
個々のＸＭＬ文書Ｄによって異なる文書番号と、前記句
切ごとの順番を表す句切番号と、取り出した前記句切の
内容と、を含む各行を記録する手段である。また、検索
等処理部２は、ＸＭＬ文書Ｄから変換されたＲＤＢの表
について、検索・変更や元のＸＭＬ文書Ｄの復元等の処
理を行う手段である。The conversion storage unit 1 is a means for converting the format of the XML document D and storing it in the RDB. More specifically, the conversion storage unit 1 converts a given XML document D into a tag, a text between the tags, Are successively taken out, and the table of RDB is
This is a means for recording each line including a document number that differs depending on each XML document D, a break number indicating the order of each break, and the extracted contents of the break. The search processing unit 2 is a means for performing processing such as search / change and restoration of the original XML document D with respect to the RDB table converted from the XML document D.

【００２７】より具体的には、変換格納部１は、文書番
号決定部１１と、句切取出部１２と、句切番号決定部１
３と、テキストパス作成部１４と、句切タイプ作成部１
５と、句切番号パス作成部１６と、対照テーブル作成部
１７と、属性テーブル作成部１８と、を備えている。ま
た、検索等処理部２は、文書単位処理部２１と、テキス
トパス処理部２２と、句切タイプ処理部２３と、句切番
号パス処理部２４と、属性処理部２５と、を備えてい
る。More specifically, the conversion storage unit 1 includes a document number determination unit 11, a phrase extraction unit 12, and a phrase number determination unit 1.
3, text path creation unit 14, and punctuation type creation unit 1
5, a section number path creation unit 16, a comparison table creation unit 17, and an attribute table creation unit 18. Further, the search processing unit 2 includes a document unit processing unit 21, a text path processing unit 22, a punctuation type processing unit 23, a punctuation number path processing unit 24, and an attribute processing unit 25. .

【００２８】このうち変換格納部１の文書番号決定部１
１は、個々のＸＭＬ文書Ｄによって異なる文書番号を決
定する部分である。また、句切取出部１２は、ＸＭＬ文
書Ｄから、タグと、タグ間のテキストと、を句切として
順次取り出す部分である。また、句切番号決定部１３
は、句切ごとの順番を表す句切番号を、順次インクリメ
ントによりシーケンシャルに決定する部分である。この
ように得られた文書番号、句切番号、句切の内容（句切
内容と呼ぶ）の各セットは、ＲＤＢの表である本体テー
ブルＴ１の各行として記録される。また、検索等処理部
２の文書単位処理部２１は、指定される文書番号に基づ
いてＸＭＬ文書Ｄを単位とした処理を行う手段である。The document number determination unit 1 of the conversion storage unit 1
Reference numeral 1 denotes a part for determining a different document number for each XML document D. In addition, the phrase extracting unit 12 is a part that sequentially extracts tags and text between the tags from the XML document D as a phrase. In addition, the punctuation number determination unit 13
Is a part for sequentially determining the punctuation numbers indicating the order of each punctuation by sequentially incrementing. Each set of the document number, the punctuation number, and the punctuation content (referred to as punctuation content) thus obtained is recorded as each row of the main body table T1 which is an RDB table. Further, the document unit processing unit 21 of the search processing unit 2 is a unit that performs processing in units of the XML document D based on the designated document number.

【００２９】また、変換格納部１のテキストパス作成部
１４は、ＲＤＢ中の本体テーブルＴ１の各行に、もとの
前記文書中の前記階層構造中においてルートから各句切
に至るパスに存在する各タグで表したテキストパスを加
える手段である。また、検索等処理部２のテキストパス
処理部２２は、前記テキストパスに基づいて、指定され
た階層以下の句切を対象とした処理を行う手段である。The text path creation unit 14 of the conversion storage unit 1 exists in each row of the main body table T1 in the RDB in a path from the root to each punctuation in the hierarchical structure in the original document. This is a means for adding a text path represented by each tag. Further, the text path processing unit 22 of the search etc. processing unit 2 is a means for performing processing for a punctuation below a specified hierarchy based on the text path.

【００３０】また、変換格納部１の句切タイプ作成部１
５は、ＲＤＢ中の本体テーブルＴ１の各行に、各句切の
種類を表す句切タイプを加える手段である。また、検索
等処理部２の句切タイプ処理部２３は、前記句切タイプ
に基づいて、句切の種類に基づいた処理を行う手段であ
る。Further, the phrase type creation unit 1 of the conversion storage unit 1
Reference numeral 5 denotes a unit for adding a punctuation type representing each punctuation type to each row of the main body table T1 in the RDB. In addition, the punctuation type processing unit 23 of the search processing unit 2 is a unit that performs a process based on the type of punctuation based on the punctuation type.

【００３１】また、変換格納部１の句切番号パス作成部
１６は、ＲＤＢ中の本体テーブルＴ１の各行に、もとの
ＸＭＬ文書Ｄ中の階層構造中においてルートから各句切
に至るパスに存在する各句切を句切番号で表した句切番
号パスを加える手段である。また、検索等処理部２の句
切番号パス処理部２４は、前記句切番号パスに基づい
て、ＸＭＬ文書Ｄ中の階層構造内における絶対的位置に
基づいた処理を行う手段である。The break number path creating unit 16 of the conversion storage unit 1 stores each path of the main body table T1 in the RDB into a path from the root to each break in the hierarchical structure of the original XML document D. This is a means for adding a break number path in which each existing break is represented by a break number. Further, the phrase number path processing unit 24 of the search etc. processing unit 2 is means for performing processing based on the absolute position in the hierarchical structure in the XML document D based on the phrase number path.

【００３２】また、変換格納部１の対照テーブル作成部
１７は、各タグのタグ名と、タグ名ごとのタグ番号との
対照テーブル（タグ一覧テーブルと呼ぶ）Ｔ２を、ＲＤ
ＢＭＳを通じてＲＤＢ上に作成する手段であり、このタ
グ一覧テーブルＴ２が用いられる場合、テキストパス
は、前記タグ番号を用いて表現される。The comparison table creation unit 17 of the conversion storage unit 1 stores a comparison table (referred to as a tag list table) T2 between the tag name of each tag and the tag number of each tag name in the RD.
This is a means for creating on the RDB through the BMS. When this tag list table T2 is used, the text path is expressed using the tag number.

【００３３】また、変換格納部１の属性テーブル作成部
１８は、各タグに含まれる属性を格納する属性テーブル
Ｔ３を、ＲＤＢＭＳを通じてＲＤＢ上に作成する手段で
あり、検索等処理部２の属性処理部２５は、この属性テ
ーブルＴ３に基づいて、指定された属性名又は属性値に
係る各行を検索する手段である。The attribute table creation unit 18 of the conversion storage unit 1 is a means for creating an attribute table T3 for storing attributes included in each tag on the RDB through the RDBMS. The unit 25 is a means for searching each row related to the specified attribute name or attribute value based on the attribute table T3.

【００３４】〔２．作用及び効果〕次に、上記のように
構成された本実施形態の作用について、複数の格納モデ
ルを挙げて説明する。ここで、次の数式１は、以下の各
実施形態の説明で用いるＸＭＬ文書の例（サンプルＸＭ
Ｌ文書又はサンプル文書と呼ぶ）であり、あるパソコン
ショップにおける商品リストの一部を表したものであ
る。[2. Operation and Effect] Next, the operation of the present embodiment configured as described above will be described with reference to a plurality of storage models. Here, the following Expression 1 is an example of an XML document (sample XML format) used in the following description of each embodiment.
L document or sample document), which represents a part of a product list in a certain personal computer shop.

【数１】 (Equation 1)

【００３５】また、本実施形態においてＸＭＬ文書を扱
う際に最も基本となっているのは、「ＸＭＬ文書を句切
に分解する」という考え方である。すなわち、句切と
は、ＸＭＬ文書の内容を、タグと、タグに挟まれたテキ
ストと、を単位として分解したものである。In the present embodiment, the most basic concept when handling an XML document is the idea of "decomposing an XML document into words." That is, the phrase break is obtained by decomposing the content of the XML document in units of a tag and text sandwiched between the tags.

【００３６】〔２−１．第１の格納モデル〕第１の格納
モデルは、上記の考え方に基づいた基本的な例であり、
第１の格納モデルにおいて、ＸＭＬ文書を格納するため
のＲＤＢＭＳ上の本体テーブルＴ１における列ごとの意
味内容を表１に示す。[2-1. First Storage Model] The first storage model is a basic example based on the above concept,
Table 1 shows the meaning of each column in the main body table T1 on the RDBMS for storing an XML document in the first storage model.

【表１】ここで、「文書番号」は、複数のＸＭＬ文書を格納する
場合に、各文書を判別するためのキーとなるもので、具
体的にはＸＭＬ文書毎に別の番号を格納する。また、
「句切番号」は、その句切が文書の先頭から数えて何番
目に現れたかの順番を示す。一般的にＲＤＢＭＳでは、
格納した行（レコードとも呼ぶ）を取り出す際の順番は
保証されないため、この列が必要になる。また、「句切
内容」は、句切を構成する文字列を格納するものであ
る。[Table 1] Here, the “document number” is a key for identifying each document when storing a plurality of XML documents, and specifically, a different number is stored for each XML document. Also,
"Phrase number" indicates the order in which the punctuation appears from the beginning of the document. Generally, in RDBMS,
This column is necessary because the order in which the stored rows (also called records) are retrieved is not guaranteed. “Phrase content” stores a character string constituting a punctuation.

【００３７】〔２−１−１．第１の格納モデルを実現す
る処理手順の例〕また、この第１の格納モデルにおける
本体テーブルＴ１は、変換格納部１のうち、特に文書番
号決定部１１、句切取出部１２及び句切番号決定部１３
の作用により、図２のフローチャートに例示するような
処理手順に基づいて、与えられたＸＭＬ文書Ｄから作成
される。[2-1-1. Example of processing procedure for realizing first storage model] In addition, the main body table T1 in the first storage model includes, in the conversion storage unit 1, a document number determination unit 11, a phrase extraction unit 12, and a phrase number. Decision unit 13
Is created from the given XML document D based on the processing procedure illustrated in the flowchart of FIG.

【００３８】すなわち、この手順では、文書番号決定部
１１が文書番号を決定し（ステップ１０１）、句切番号
決定部１３が句切番号の初期値を０にリセットする（ス
テップ１０２）。そして、文書終端に到達するまで（ス
テップ１０６）、句切取出部１２がＸＭＬ文書Ｄから１
つの句切を取り出すたびに（ステップ１０３）、句切番
号決定部１３が句切番号をインクリメントし（ステップ
１０４）、文書番号と句切番号と句切内容が本体テーブ
ルＴ１の１レコードとして追加される（ステップ１０
５）。なお、このような各レコードごとの内容は句切情
報と呼び、後述の他の格納モデルのように、他の列を含
む場合も同様とする。That is, in this procedure, the document number determination section 11 determines the document number (step 101), and the break number determination section 13 resets the initial value of the break number to 0 (step 102). Then, until the end of the document is reached (step 106), the phrase extraction unit 12 outputs one phrase from the XML document D.
Each time one punctuation is extracted (step 103), the punctuation number determination unit 13 increments the punctuation number (step 104), and the document number, the punctuation number, and the punctuation content are added as one record in the main body table T1. (Step 10
5). Note that the content of each record is called punctuation information, and the same applies to the case where other columns are included as in other storage models described later.

【００３９】〔２−１−２．第１の格納モデルの実例〕
上記数式１のサンプル文書を、図２の処理手順にしたが
って、上記表１の形式でＲＤＢの本体テーブルＴ１に格
納した例を次の表２に示す。[2-1-2. Example of first storage model]
The following Table 2 shows an example in which the sample document of Expression 1 is stored in the main table T1 of the RDB in the format of Table 1 in accordance with the processing procedure of FIG.

【表２】なお、この例では、文書番号は仮に「１」とし、以降も
同様とする。[Table 2] In this example, the document number is temporarily set to “1”, and the same applies hereinafter.

【００４０】〔２−１−３．第１の格納モデルに基づい
た処理の例〕以上のような形式でＸＭＬ文書を本体テー
ブルＴ１に格納することにより、次の操作が可能とな
る。・指定された文書番号を持つＸＭＬ文書の取り出し・指定された文書番号を持つＸＭＬ文書の削除[2-1-3. Example of processing based on first storage model] By storing the XML document in the main body table T1 in the format described above, the following operation can be performed. -Retrieval of XML document with specified document number-Deletion of XML document with specified document number

【００４１】具体的には、指定された文書番号を持つＸ
ＭＬ文書Ｄの取り出しは、次の手順により容易に行うこ
とができる。すなわち、まず、テーブルから、指定した
「文書番号」を持つレコードを、「句切番号」列で並び
替えを行いつつ取り出す。この操作に用いるＳＱＬ文を
次の数式２に例示する。Specifically, X having a designated document number
Extraction of the ML document D can be easily performed by the following procedure. That is, first, the record having the designated “document number” is extracted from the table while rearranging the records in the “punctuation number” column. The following Expression 2 shows an example of an SQL sentence used for this operation.

【数２】なお、斜体の文字は検索キーであり、以下同様である。
また、続いて、取り出した「句切内容」を同じ順番です
べて連結する。これにより、指定された文書番号をＸＭ
Ｌ文書Ｄを取り出すことができる。(Equation 2) The characters in italics are search keys, and so on.
Subsequently, all the extracted “phrase contents” are connected in the same order. As a result, the designated document number is changed to XM
The L document D can be taken out.

【００４２】また、指定した文書番号を持つＸＭＬ文書
Ｄの削除は、本体テーブルＴ１から、指定した「文書番
号」を持つレコードをすべて消去することにより、容易
に行うことができる。この操作に用いるＳＱＬ文を次の
数式３に例示する。The deletion of the XML document D having the designated document number can be easily performed by deleting all the records having the designated "document number" from the main body table T1. An SQL sentence used for this operation is exemplified in the following Expression 3.

【数３】なお、これらの処理に用いるＳＱＬ文は、手作業で入力
してもよいし、他のアプリケーションソフトウェア等か
らＲＤＢＭＳへ渡してもよいし、例えば文書単位処理部
２１などの作用により、ウェブページのようなユーザイ
ンターフェースを通じた指示内容に基づいて自動作成し
てもよく、以降でも同様である。(Equation 3) The SQL sentence used for these processes may be input manually or may be passed from another application software or the like to the RDBMS. It may be automatically created based on the instruction content through a simple user interface, and the same applies to the following.

【００４３】〔２−１−４．第１の格納モデルの利点〕
上記のように、従来技術と比べ、第１の格納モデルで
は、階層構造を持つデータ記述言語で表された文書、例
えばＸＭＬ文書の内容を、タグと、タグ間のテキスト
と、を単位としてＲＤＢに格納する。これにより、ＲＤ
ＢＭＳ上でも、もとのＸＭＬ文書の構造に関係なく、統
一的な手順でデータの格納・取り出し・削除等の処理を
行うことが可能となる。特に、事前かつ個別のマッピン
グ定義作業が不要となり、また、タグの入れ子で表現さ
れる階層構造をＲＤＢ上でも保持し、構造検索を行うこ
とも可能となる。また、ＲＤＢに文字列を格納する場
合、通常は格納できる文字列の長さには制限があるが、
ＸＭＬ文書が句切ごとに分解されてＲＤＢに格納される
ため、ＲＤＢに格納できる文字列の長さ制限が意味的に
緩和される。[2-1-4. Advantages of the first storage model]
As described above, in the first storage model, the content of a document represented by a data description language having a hierarchical structure, for example, an XML document, is expressed in RDB in units of tags and text between tags, as compared with the prior art. To be stored. Thereby, RD
Even on the BMS, processing such as data storage, retrieval, and deletion can be performed in a uniform procedure regardless of the structure of the original XML document. In particular, it is not necessary to define mapping in advance and individually, and it is also possible to hold a hierarchical structure expressed by nesting tags on the RDB and perform a structure search. In addition, when storing a character string in the RDB, there is usually a limit to the length of a character string that can be stored.
Since the XML document is decomposed for each phrase and stored in the RDB, the limit on the length of a character string that can be stored in the RDB is relaxed semantically.

【００４４】さらに、各句切の元となった文書を文書番
号で識別することで、複数の文書に由来する句切を単一
の表中に集積していても、文書を単位とした句切の取り
出し・削除・検索等の処理が可能となる。また、ＲＤＢ
では格納した行の取り出し順序が記録順と同じであるこ
とは保証されないが、シーケンシャルな句切番号によ
り、元のＸＭＬ文書中と同じ順序で句切を取り出すこと
が可能となる。Further, by identifying the document that is the source of each punctuation by a document number, even if punctuations originating from a plurality of documents are accumulated in a single table, a phrase in units of a document can be obtained. Processing such as removal, deletion, and retrieval of cuts can be performed. Also, RDB
Although it is not guaranteed that the order in which the stored rows are retrieved is the same as the order in which they are recorded, it is possible to retrieve the breaks in the same order as in the original XML document by using the sequential break numbers.

【００４５】〔２−２．第２の格納モデル〕第２の格納
モデルは、構造検索を実現するために、テキストパスを
導入するものである。テキストパスは、ＲＤＢＭＳ上の
テーブルに格納される各句切が、もとのＸＭＬ文書中の
意味的な階層構造のなかでどの階層に所属しているかを
示す列である。本体テーブルＴ１について、このテキス
トパスを導入した場合の列ごとの意味内容を次の表３に
示す。[2-2. Second Storage Model] The second storage model introduces a text path to realize a structure search. The text path is a column indicating to which layer each of the breaks stored in the table on the RDBMS belongs in the semantic hierarchical structure in the original XML document. Table 3 below shows the meaning of each column when the text path is introduced in the main body table T1.

【表３】ここでは、階層構造における根（ルート）を’／’で示
し、各階層を’／’で区切る表記を用いる。このような
区切り記号を仮にデリミタと呼ぶ。なお、この区切り記
号の具体的種類は自由に定義することができる。その他
の列については、上記第１の格納モデルについて示した
表１と同様である。[Table 3] Here, the root (root) in the hierarchical structure is indicated by “/”, and the notation that separates each layer by “/” is used. Such a delimiter is temporarily called a delimiter. The specific type of the delimiter can be freely defined. Other columns are the same as in Table 1 shown for the first storage model.

【００４６】〔２−２−１．第２の格納モデルを実現す
る処理手順の例〕また、この第２の格納モデルにおける
本体テーブルＴ１は、変換格納部１のうち、文書番号決
定部１１、句切取出部１２、句切番号決定部１３に加
え、特にテキストパス作成部１４の作用により、図３の
フローチャートに例示するような処理手順に基づいて、
与えられたＸＭＬ文書Ｄから実現される。[2-2-1. Example of processing procedure for realizing second storage model] In addition, the main body table T1 in the second storage model includes a document number determination unit 11, a phrase extraction unit 12, and a phrase separation number determination in the conversion storage unit 1. In addition to the operation of the unit 13, the operation of the text path creating unit 14 in particular, based on the processing procedure illustrated in the flowchart of FIG.
It is realized from a given XML document D.

【００４７】すなわち、この手順では、句切取出部１２
が取り出した句切について（ステップ２０１）、開始タ
グの場合は（ステップ２０２）タグ名をテキストパス取
得用スタックにプッシュし（ステップ２０３）、一方、
閉じタグの場合は（ステップ２０６）テキストパス取得
用スタックからポップする（ステップ２０７）。そし
て、句切ごとに、このテキストパス取得用スタックから
テキストパスを取得し（ステップ２０４）、テキストパ
スを含む句切情報を本体テーブルＴ１に追加する（ステ
ップ２０５）。That is, in this procedure, the phrase extracting section 12
, The tag name is pushed to the text path acquisition stack (step 203), while in the case of the start tag (step 202),
If the tag is a closing tag (step 206), the tag is popped from the text path acquisition stack (step 207). Then, a text path is acquired from the text path acquisition stack for each punctuation (step 204), and punctuation information including the text path is added to the main body table T1 (step 205).

【００４８】なお、各時点において、テキストパス取得
用スタックからテキストパスを取得するには、その時点
でのテキストパス取得用スタックの内容のコピーに含ま
れる各タグ名を、プッシュした順にデリミタで接続すれ
ばよい。In order to obtain a text path from the text path acquisition stack at each time, the tag names included in the copy of the text path acquisition stack at that time are connected by a delimiter in the order in which they were pushed. do it.

【００４９】〔２−２−２．第２の格納モデルの実例〕
上記数式１のサンプル文書を、図３の処理手順にしたが
って、上記表３の形式でＲＤＢの本体テーブルＴ１に格
納した例を次の表４に示す。[2-2-2. Example of second storage model]
Table 4 below shows an example in which the sample document of the above formula 1 is stored in the main table T1 of the RDB in the format of the above table 3 according to the processing procedure of FIG.

【表４】 [Table 4]

【００５０】〔２−２−３．第２の格納モデルに基づい
た処理の例〕第２の格納モデルでは、上記のようにテキ
ストパスという概念を導入したことにより、「全文書に
またがった／又は指定した文書における、指定した階層
構造以下を取り出す」といった基本的な構造検索が可能
となる。[2-2-3. Example of processing based on second storage model] In the second storage model, by introducing the concept of a text path as described above, the “specified hierarchical structure in a document that spans all / or a specified document” A basic structure search such as "retrieve the following" can be performed.

【００５１】ここで、構造検索の際に、外部から検索キ
ーや検索条件として与える階層構造の指定には、同様
に’／’を区切り文字として用いるものとし、これを
「階層構造名」と表すものとする。この場合、次のいず
れかの条件に当てはまるレコードを、文書番号と句切番
号をキーとして並び替えつつ取り出し、それらを連結す
ることにより構造検索の結果とすることができる。（１）テキストパス列が階層構造名と同一のもの（２）テキストパス列が階層構造名＋’／’から構成さ
れる文字列と先頭一致するものHere, at the time of structure search, in order to designate a hierarchical structure given as a search key or a search condition from the outside, similarly, '/' is used as a delimiter, and this is expressed as "hierarchical structure name". Shall be. In this case, records satisfying any of the following conditions can be taken out by rearranging them using the document number and the break number as keys, and by connecting them, the result of the structure search can be obtained. (1) The text path string is the same as the hierarchical structure name. (2) The text path string starts with the character string composed of the hierarchical structure name + '/'.

【００５２】また、例えば指定した文書番号の文書につ
いてのみ、このような構造検索の対象としたい場合に
は、上のそれぞれの条件にさらに文書番号についての条
件が追加されることになる。If, for example, only a document having a designated document number is to be subjected to such a structure search, a condition for a document number is added to the above conditions.

【００５３】ここで、第２の格納モデルにおいて、ＲＤ
ＢＭＳへの指示として用いる具体的なＳＱＬ文を次の数
式４及び数式５に示す。（１）全文書にまたがって検索をする場合：Here, in the second storage model, RD
Specific SQL statements used as an instruction to the BMS are shown in Expressions 4 and 5 below. (1) When searching across all documents:

【数４】（２）指定した番号の文書について検索をする場合：(Equation 4) (2) When searching for a document with the specified number:

【数５】 (Equation 5)

【００５４】続いて、このような構造検索についてより
具体的な例を示す。例えば、「文書番号１の文書に対し
て階層構造名”／商品リスト／商品／商品名”を問い合
わせる」構造検索の場合、まずＲＤＢＭＳに対して、例
えばテキストパス処理部２２の作用により、次のＳＱＬ
文を発行する。Next, a more specific example of such a structure search will be described. For example, in the case of a structure search of “query the hierarchical structure name“ / product list / product / product name ”for the document of document number 1”, first, for example, the following operation is performed on the RDBMS by the operation of the text path processing unit 22. SQL
Issue a statement.

【数６】この場合、次の結果セットがＲＤＢＭＳから返される。(Equation 6) In this case, the next result set is returned from the RDBMS.

【表５】そして、上記の結果セットをすべて連結し、１つの文字
列とすることにより、”＜商品名＞Ａ４用紙＜／商品
名＞＜商品名＞薄型プリンタ＜／商品名＞”のような検
索結果が得られる。[Table 5] By combining all of the result sets described above and forming one character string, a search result such as "<product name> A4 paper </ product name><productname> thin printer </ product name>" is obtained. can get.

【００５５】なお、上記の検索結果は、意味的には次の
２件に分離することができる。１件目： ”＜商品名＞Ａ４用紙＜／商品名＞” ２件名： ”＜商品名＞薄型プリンタ＜／商品名＞”The above search results can be semantically separated into the following two cases. First case: "<Product name> A4 paper </ Product name>" 2 Subject: "<Product name> Thin printer </ Product name>"

【００５６】ここで、図４は、結果セットをすべて連結
し１つの文字列とする代りに、このように分離された形
での結果を取得するための手順を示したフローチャート
である。この手順では、例えばテキストパス処理部２２
が、作業領域をクリアした上（ステップ３０１）結果セ
ットから１行読み込み（ステップ３０２）、読み込んだ
句切内容が開始タグであれば（ステップ３０３）スタッ
クにプッシュする（ステップ３０４）。また、作業領域
の末尾に読み込んだものを追加する（ステップ３０
５）。Here, FIG. 4 is a flowchart showing a procedure for obtaining a result in such a separated form instead of concatenating all the result sets into one character string. In this procedure, for example, the text path processing unit 22
However, after clearing the work area (step 301), one line is read from the result set (step 302), and if the read punctuation content is the start tag (step 303), the content is pushed onto the stack (step 304). Further, the read data is added to the end of the work area (step 30).
5).

【００５７】また、読み込んだ句切内容が閉じタグであ
れば（ステップ３０６）スタックからポップし（ステッ
プ３０７）、スタックが空になれば（ステップ３０８）
作業領域をリストに追加し作業領域をクリアする（ステ
ップ３０９）。このような処理を繰り返した結果、結果
セット終端に至ると（ステップ３１０）、処理を終了す
る。このような処理手順では、結果セットの内容をもと
に、それぞれが「リスト」（配列）の１件として格納さ
れる。If the read punctuation content is a closing tag (step 306), it is popped from the stack (step 307), and if the stack becomes empty (step 308).
The work area is added to the list and the work area is cleared (step 309). As a result of repeating such processing, when the end of the result set is reached (step 310), the processing is terminated. In such a processing procedure, each is stored as one item in a “list” (array) based on the contents of the result set.

【００５８】〔２−２−４．第２の格納モデルに基づい
た変形例〕上記のような第２の格納モデルの変形とし
て、対照テーブル作成部１７の作用により、タグ名に一
意のタグ名番号を振り、このタグ名とタグ名番号の対応
関係を、本体テーブルＴ１とは別のテーブル（タグ一覧
テーブルと呼ぶ）Ｔ２に格納し、本体テーブルＴ１のテ
キストパス列はこのタグ名番号を使って表現する形が考
えられる。ここで、タグ一覧テーブルＴ２に含まれる各
列の意味を次の表６に例示する。[2-2-4. Modification Example Based on Second Storage Model] As a modification of the second storage model as described above, a unique tag name number is assigned to the tag name by the operation of the comparison table creating unit 17, and the tag name and the tag name are assigned. The correspondence between the numbers may be stored in a table (referred to as a tag list table) T2 different from the main body table T1, and the text path column of the main body table T1 may be expressed using this tag name number. Here, the meaning of each column included in the tag list table T2 is exemplified in Table 6 below.

【表６】 [Table 6]

【００５９】また、この場合において、サンプル文書に
基づいたタグ一覧テーブルＴ２の具体例と、対応する本
体テーブルＴ１の具体例をそれぞれ次の表７と表８に示
す。In this case, a specific example of the tag list table T2 based on the sample document and a specific example of the corresponding main body table T1 are shown in Tables 7 and 8, respectively.

【表７】 [Table 7]

【表８】 [Table 8]

【００６０】さらに、テキストパス列にタグ名番号を格
納する際に、例えば４桁なら４桁にタグ名番号の桁数が
達しない場合はそれぞれの先頭に’０’を補うようにし
てもよい。次の表９は、この場合における本体テーブル
Ｔ１への格納例のうち、先頭の一部を示すものである。Further, when storing the tag name number in the text path string, for example, if the number of digits of the tag name number does not reach four digits in the case of four digits, '0' may be added to the head of each. . Table 9 below shows a part of the head of the storage example in the main body table T1 in this case.

【表９】 [Table 9]

【００６１】このような構造によれば、テキストパス列
において１つの階層が占める文字数があらかじめ決まっ
ているため、ＲＤＢＭＳの検索における１文字一致の機
能を利用して「１階層分だけ何でも良い」という意味の
検索を容易に行うことができる。According to such a structure, since the number of characters occupied by one layer in the text path string is determined in advance, the function of one character matching in the RDBMS search is used to say, "Anything in one layer is acceptable." A meaning search can be easily performed.

【００６２】このような構造において、階層構造名にお
いて、「１階層分だけすべてに当てはまる」というもの
をワイルドカード’＊’で表すものとし、「文書番号１
の文書に対して階層構造名”／Ｅ／＊／Ｗ”を問い合わ
せる」という検索を行う例を示す。この場合の前提条件
として、テキストパス内の１つのタグ名番号は常に４桁
で格納し、４桁に満たない場合は先頭に０を補うことと
する。In such a structure, in the name of the hierarchical structure, the expression “applies to all of one hierarchy” is represented by a wildcard “*”, and “document number 1”
Inquiry about hierarchical structure name "/ E / * / W" for document "" is performed. As a precondition in this case, one tag name number in the text path is always stored in four digits, and if it is less than four digits, a leading zero is added.

【００６３】このような検索は、次のような各段階から
なる処理によって実現される。（１）タグ一覧テーブルＴ２から、タグＥ及びタグＷの
タグ名番号を求める。（２）例としてタグＥのタグ名番号が１７、タグＷのタ
グ名番号が２３として得られた場合、以下のＳＱＬ文を
発行する。なお、この例では、１文字一致のワイルドカ
ード’＿’を使用している。Such a search is realized by a process including the following steps. (1) The tag name numbers of the tags E and W are obtained from the tag list table T2. (2) As an example, when the tag name number of the tag E is obtained as 17 and the tag name number of the tag W is obtained as 23, the following SQL statement is issued. In this example, a wildcard '_' that matches one character is used.

【数７】（３）ＲＤＢＭＳから結果セットが得られた後の処理
は、前述の表４における検索例と同様である。(Equation 7) (3) The processing after the result set is obtained from the RDBMS is the same as the search example in Table 4 described above.

【００６４】以上のように、第２の格納モデルでは、テ
キストパスにより、もとの文書における階層構造中の位
置や範囲を自由に指定してデータ検索などの処理が可能
となる。また、句切の階層構造中の位置を表すテキスト
パスが、タグ番号で表されるので、タグ名が長い場合で
もテキストパスのデータ量が最小限で済む。また、４桁
に達しない数字の先頭に「０」を補うなどしてタグ番号
を固定長とすることにより、ＲＤＢＭＳの検索における
１文字一致の機能等を利用し、「１階層分だけ何でも良
い」といった自由度の高い検索が容易になる。As described above, in the second storage model, it is possible to freely specify the position or range in the hierarchical structure in the original document by using the text path and perform processing such as data search. Also, since the text path indicating the position of the phrase in the hierarchical structure is represented by the tag number, the data amount of the text path can be minimized even when the tag name is long. Also, by making the tag number a fixed length by supplementing “0” at the beginning of the number that does not reach four digits, the function of matching one character in the search of the RDBMS or the like is used. ”Can be easily searched.

【００６５】〔２−３．第３の格納モデル〕上記第２の
格納モデルまでは、属性については常に開始タグの一部
として扱ってきたが、属性テーブルＴ３を導入すること
により、属性内容に関する容易な検索を実現することが
できる。〔２−３−１．第３の格納モデルの実例と実現のための
手順〕まず、属性テーブルＴ３の各列の意味内容を次の
表１０に示す。[2-3. Third Storage Model] Up to the second storage model, the attribute has always been treated as a part of the start tag, but by introducing the attribute table T3, it is possible to realize an easy search for the attribute content. it can. [2-3-1. Example of Third Storage Model and Procedure for Realization] First, the meaning of each column of the attribute table T3 is shown in Table 10 below.

【表１０】 [Table 10]

【００６６】また、第３の格納モデルを実現するための
文書格納時の処理手順は、図５に示すように、図３に示
した処理手順に加えて、属性テーブル作成部１８の作用
により、開始タグに属性リストが含まれていた場合には
（ステップ４０３）それらの属性を属性テーブルに格納
する（ステップ４０４）という処理が加わる。As shown in FIG. 5, in addition to the processing procedure shown in FIG. 3, the processing procedure at the time of document storage for realizing the third storage model is performed by the operation of the attribute table creating unit 18. If an attribute list is included in the start tag (step 403), a process of storing those attributes in the attribute table (step 404) is added.

【００６７】このような図５の処理手順により、数式１
のサンプル文書に基づいて作成される属性テーブルＴ３
の例を次に示す。According to the processing procedure shown in FIG.
Attribute table T3 created based on the sample document
The following is an example.

【表１１】なお、この場合の本体テーブルＴ１については既に説明
した各例と同様となる。[Table 11] Note that the main body table T1 in this case is the same as each example described above.

【００６８】〔２−３−２．第３の格納モデルにおける
検索の例〕上記のような第３の格納モデルでは、属性処
理部２５の作用により、次に例示するような手順にした
がって、特定の属性名や属性値を持つ句切を検索するこ
とができる。[2-3-2. Example of Retrieval in Third Storage Model] In the above-described third storage model, the operation of the attribute processing unit 25 causes a phrase having a specific attribute name or attribute value to be processed according to the following procedure. Can be searched.

【００６９】（１）まず、検索したい属性のレコードを
属性テーブルＴ２から検索する。この検索のために、例
えば属性処理部２５により用いられるＳＱＬ文の例を次
に示す。(1) First, a record of an attribute to be searched is searched from the attribute table T2. An example of an SQL sentence used by the attribute processing unit 25 for this search is shown below.

【数８】 (Equation 8)

【００７０】（２）次に、得られた文書番号及び句切番
号をもとに、本体テーブルからこれを含む要素全体を取
り出す。ここで、図６は、文書番号及び句切番号が指定
された場合に、それらを含む要素全体を取り出すための
処理手順であり、「作業領域」に結果が入るものであ
る。すなわち、この手順では、作業領域をクリアしたう
え（ステップ５０１）、文書番号、要素番号をキーに本
体テーブルＴ１から句切内容を取り出す（ステップ５０
２）。このとき、取り出した句切内容が開始タグの場合
は（ステップ５０３）スタックにプッシュするとともに
（ステップ５０４）、その開始タグに含まれる属性リス
トを取得する（ステップ５０５）。取得した属性リスト
により元の句切内容が復元される。(2) Next, based on the obtained document number and punctuation number, the entire element including the element is extracted from the main body table. Here, FIG. 6 shows a processing procedure for extracting the entire element including the document number and the punctuation number when the document number and the punctuation number are specified, and the result is entered in the “work area”. That is, in this procedure, after clearing the work area (step 501), the content of the punctuation is extracted from the main body table T1 using the document number and the element number as keys (step 50).
2). At this time, if the extracted punctuation content is a start tag (step 503), it is pushed onto the stack (step 504), and an attribute list included in the start tag is obtained (step 505). The original punctuation content is restored by the acquired attribute list.

【００７１】また、句切内容は作業領域の末尾に結合さ
れ（ステップ５０６）、句切内容が閉じタグの場合は
（ステップ５０７）スタックからポップし（ステップ５
０８）これによりスタックが空になれば手順を終了する
が（ステップ５０９）、閉じタグでない場合及びスタッ
クが空でない場合は要素番号を１だけ増加させ（ステッ
プ５１０）、ステップ５０２に戻る。The punctuation content is combined with the end of the work area (step 506). If the punctuation content is a closed tag (step 507), it is popped from the stack (step 5).
08) As a result, if the stack becomes empty, the procedure ends (step 509). If the tag is not a closing tag or the stack is not empty, the element number is increased by 1 (step 510), and the process returns to step 502.

【００７２】次に、属性名及び属性値による検索の具体
例を示す。例えば、数式１に示したサンプル文書におい
て、属性名が’分類’、属性値が’周辺機器’である要
素を検索するとする。この場合、（１）まず、属性テーブルに対して以下のＳＱＬ文を発
行する。Next, a specific example of a search using an attribute name and an attribute value will be described. For example, in the sample document shown in Expression 1, it is assumed that an element whose attribute name is “classification” and whose attribute value is “peripheral device” is searched. In this case, (1) First, the following SQL statement is issued to the attribute table.

【数９】この結果、次の結果セットが得られる。(Equation 9) This results in the following result set:

【表１２】（２）続いて、図６のフローチャートにしたがい、得ら
れた文書番号と句切番号を元に、要素全体を取り出す。[Table 12] (2) Then, according to the flowchart of FIG. 6, the entire element is extracted based on the obtained document number and punctuation number.

【００７３】この結果、以下の最終結果が得られる。＜商品分類＝”周辺機器”＞＜商品名＞薄型プリンタ＜／商品名＞＜価格単位＝”ＵＳ＄”＞９８０＜／価格＞＜／商品＞As a result, the following final result is obtained. <Product category = "Peripheral equipment"> <Product name> Thin printer </ Product name> <Price unit = "US $"> 980 </ Price> </ Product>

【００７４】また、上記のように単に属性名及び属性値
による検索を行うだけでなく、文書構造と結びつけた検
索、すなわち、指定の属性名及び属性値を持つ階層構造
名の検索を行うことも可能であり、このような検索は、
例えば次に例示するようなＳＱＬ文により実現すること
ができる。Further, as described above, not only the search based on the attribute name and the attribute value but also the search linked to the document structure, that is, the search for the hierarchical structure name having the specified attribute name and the attribute value can be performed. Is possible, and such a search
For example, it can be realized by an SQL statement as exemplified below.

【数１０】これにより、該当の句切の文書番号及び句切番号を取得
することができる。(Equation 10) Thereby, the document number and the punctuation number of the corresponding punctuation can be obtained.

【００７５】また、上記の構造では、属性についての情
報が本体テーブルと属性テーブルの両方に格納されてい
るが、本体テーブルから属性を分離することによりこの
重複を取り除いた形式も採用可能であり、その場合にお
ける文書格納のための処理手順を図７のフローチャート
に例示する。In the above structure, information about attributes is stored in both the main body table and the attribute table. However, a format in which the attributes are separated from the main body table to eliminate the duplication can be adopted. A processing procedure for storing a document in that case is illustrated in a flowchart of FIG.

【００７６】この手順では、取り出した句切が（ステッ
プ６０１）開始タグの場合（ステップ６０２）、属性リ
ストを持っていれば（ステップ６０３）属性リストを属
性テーブルＴ３に追加し（ステップ６０４）、句切から
属性リストを削除する（ステップ６０５）。また、取り
出した句切についてはタグ名をテキストパス取得用スタ
ックにプッシュし（ステップ６０６）、テキストパスを
取得したうえ（ステップ６０７）句切を本体テーブルＴ
１に追加する（ステップ６０８）。また、取り出した句
切が閉じタグの場合は（ステップ６０９）テキストパス
取得用スタックからポップし（ステップ６１０）、以上
の手順を文書終端まで繰り返す（ステップ６１１）。In this procedure, if the extracted phrase is a start tag (step 601), the attribute list is added to the attribute table T3 (step 604) if the attribute list is present (step 603) if the attribute list is present (step 603). The attribute list is deleted from the punctuation (step 605). In addition, the tag name is pushed to the text path acquisition stack for the extracted punctuation (step 606), and the text path is acquired (step 607).
1 (step 608). If the extracted phrase is a closing tag (step 609), the phrase is popped from the text path acquisition stack (step 610), and the above procedure is repeated until the end of the document (step 611).

【００７７】このような図７の処理手順にしたがって、
数式１のサンプル文書を格納した例を次に示す。According to the processing procedure of FIG. 7,
An example in which the sample document of Expression 1 is stored is shown below.

【表１３】 [Table 13]

【００７８】このように、第３の格納モデルでは、タグ
に含まれる属性を属性テーブルに格納することにより、
句切内容の文字数及びデータ量が最小限で済み、また、
属性テーブルから所望の属性や属性値を持つ句切を自由
に検索することが可能となる。As described above, in the third storage model, by storing the attributes included in the tag in the attribute table,
The number of characters and the amount of data in the punctuation content are minimal, and
It is possible to freely search for a phrase having a desired attribute or attribute value from the attribute table.

【００７９】〔２−４．第４の格納モデル〕第４の格納
モデルは、句切タイプ作成部１５の作用により、各句切
の種類を表す「句切タイプ」列を本体テーブルＴ１に導
入することにより、本体テーブルＴ１の句切内容として
各句切の名前そのものだけを格納できるようにし、また
複雑な検索を実現するための実装をより容易にしたもの
である。[2-4. Fourth storage model] The fourth storage model introduces a “phrase type” column representing the type of each punctuation into the main body table T1 by the operation of the punctuation type creation section 15, thereby obtaining the main storage table T1. Only the name of each punctuation can be stored as punctuation content, and the implementation for implementing a complicated search is made easier.

【００８０】まず、句切タイプの定義内容を表１４に、
また、この定義内容にしたがって数式１のサンプル文書
を格納した例を表１５に示す。なお、ここでは句切タイ
プとして「属性」も定義し、属性テーブルを別に用いず
に本体テーブルにすべてを格納している。First, Table 14 shows the definition contents of the punctuation type.
Table 15 shows an example in which a sample document of Expression 1 is stored according to the definition. Here, “attribute” is also defined as a punctuation type, and all are stored in the main body table without using an attribute table separately.

【表１４】 [Table 14]

【表１５】この場合、検索等処理部２の句切タイプ処理部２３は、
句切タイプに基づいて、各句切の種類を判断することが
できる。[Table 15] In this case, the phrase separation type processing unit 23 of the search etc. processing unit 2
The type of each punctuation can be determined based on the punctuation type.

【００８１】また、属性をタグ名の一種として扱うこと
もできる。すなわち、表１５に示した本体テーブルＴ１
の変形として、属性名を一種のタグ名として扱って格納
する方式が考えられる。この場合、属性かどうかは句切
タイプにより区別をつけることができる。この場合の本
体テーブルＴ１の格納例を次に示す。Further, the attribute can be treated as a kind of tag name. That is, the main body table T1 shown in Table 15
As a modification of, there is a method in which attribute names are treated as a kind of tag name and stored. In this case, the attribute can be distinguished by the punctuation type. A storage example of the main body table T1 in this case is shown below.

【表１６】 [Table 16]

【００８２】このような第４の格納モデルでは、句切の
種類が句切タイプによって表される。このため、句切内
容の中で特定種類の括弧等で句切の種類を表す必要がな
いので、句切内容の文字数及びデータ量が最小限で済
み、また、句切の種類などに基づく複雑な検索の実装が
より容易になる。In such a fourth storage model, the type of punctuation is represented by the punctuation type. For this reason, it is not necessary to represent the type of the punctuation with a specific kind of parenthesis in the punctuation content, so that the number of characters and the data amount of the punctuation content are minimized, and complicated Implementation of simple search becomes easier.

【００８３】〔２−５．第５の格納モデル〕第５の格納
モデルは、句切番号パスを導入した例である。すなわ
ち、第２の格納モデルにおける「テキストパス」は、文
書中の個々の句切間の階層関係を完全に規定するもので
はない。つまり、ＸＭＬ文書Ｄにおいて、同じ句切が複
数箇所に登場する可能性があるので、例えば本体テーブ
ルＴ１内のある句切に注目した時に、その上位階層（親
句切）はどれであるかを一義的に判断することは不可能
である。このような判断を可能とするため、句切タイプ
作成部１５の作用により「句切番号パス」を導入するこ
とにより、文書の階層構造に関する情報を完全に格納す
ることが可能になる。[2-5. Fifth Storage Model] The fifth storage model is an example in which a punctuation number path is introduced. That is, the “text path” in the second storage model does not completely define the hierarchical relationship between individual breaks in the document. That is, in the XML document D, there is a possibility that the same punctuation appears in a plurality of places. Therefore, for example, when attention is paid to a certain punctuation in the main body table T1, which layer is higher (parent punctuation) is determined. It is impossible to make an unambiguous judgment. In order to make such a determination possible, by introducing a “phrase number path” by the operation of the phrasing type creation unit 15, it becomes possible to completely store information relating to the hierarchical structure of the document.

【００８４】すなわち、本発明において、文書は句切に
分解され、それぞれに句切番号が割り振られることにな
るが、ここでいう句切番号パスは、この句切番号を用い
て、それぞれの句切の文書ルートからの階層構造を示し
たものである。That is, in the present invention, a document is decomposed into phrases, and a phrase number is assigned to each document. The phrase number path referred to here is used for each phrase using this phrase number. This shows the hierarchical structure from the document root of the document.

【００８５】この句切番号パスについて、句切番号同士
を区切る記号などの手段は自由に選択できるが、ここで
は句切番号同士は’／’で区切って格納するものとし、
このような句切番号パスを含む本体テーブルＴ１を構成
する各列の意味内容を次に示す。For this phrase number path, a means such as a symbol for separating the phrase numbers can be freely selected, but here, the phrase numbers are stored separated by '/'.
The meaning of each column constituting the main body table T1 including such a pass numbering path is shown below.

【表１７】 [Table 17]

【００８６】また、このような形式にしたがって数式１
のサンプル文書を格納した本体テーブルＴ１の例を次に
示す。なお、この例では、属性は属性テーブルに格納さ
れているものとする。Further, according to such a format, the following equation 1 is obtained.
The following is an example of the main body table T1 storing the sample document. In this example, the attributes are stored in the attribute table.

【表１８】 [Table 18]

【００８７】このような第５の格納モデルでは、句切番
号パスを利用することにより、もとの文書における階層
構造中の各句切の位置が、ルートからその句切に至るパ
スに存在する句切の句切番号の列によって絶対的に表示
されるので、同じ句切が複数箇所にあっても、それら句
切の配下にそれぞれ存在する別々の句切を明確に区別す
ることができる。このため、例えば、句切番号パス処理
部２４の作用により、条件検索の結果として特定の句切
番号が得られたときにその親となる句切を順に辿ってい
くなど多様な情報処理が容易になる。〔３．他の実施形
態〕In the fifth storage model, the position of each break in the hierarchical structure in the original document exists in the path from the root to the break by using the break number path. Since the phrase is absolutely displayed by the column of the phrase number, even if the same phrase is present in a plurality of places, it is possible to clearly distinguish different phrases present under the respective phrases. For this reason, various information processing is facilitated, for example, by the operation of the phrase number pass processing unit 24, when a specific phrase number is obtained as a result of the condition search, the parent phrase is sequentially traced. become. [3. Other embodiments]

【００８８】なお、本発明は上記各実施形態に限定され
るものではなく、次に例示するような他の実施形態も含
むものである。例えば、上記実施形態におけるＲＤＢＭ
Ｓとしては、標準的なＳＱＬ言語による問い合わせをサ
ポートしているデータベース管理システムを想定した
が、他の種類のデータベース管理システムにも本発明に
おける手法を適用することが可能である。The present invention is not limited to the above embodiments, but includes other embodiments as exemplified below. For example, the RDBM in the above embodiment
Although S is assumed to be a database management system that supports queries in a standard SQL language, the method of the present invention can be applied to other types of database management systems.

【００８９】また、タグ名番号や句切番号等の割り振り
については、文書冒頭で常に１から開始するのではな
く、その番号に関して全文書を通じたユニークな番号を
振ること（番号のグローバルユニーク化）ことも可能で
あり、これにより、複数文書に存在する同じ句切を、文
書番号を参照することなく区別するなど、実装が容易に
なる場合がある。また、閉じタグについては、ＲＤＢＭ
Ｓ上のテーブルへの格納を省略し、文書取り出し時に復
元させることもできる。As for the assignment of the tag name number and the punctuation number, a unique number is assigned to the number throughout the entire document instead of always starting from 1 at the beginning of the document (global uniqueness of the number). It is also possible to make the implementation easier, for example, by distinguishing the same punctuation that exists in a plurality of documents without referring to the document number. For the closing tag, RDBM
The storage in the table on S may be omitted, and the document may be restored at the time of document retrieval.

【００９０】また、上記各格納モデルについて説明した
個々の特徴については、その組み合わせは無限にある
が、自由に組み合わせて実施することが可能である。The individual features described for each of the storage models described above can be combined infinitely, but can be implemented in any combination.

【００９１】[0091]

【発明の効果】以上説明したように、本発明によれば、
階層構造を持つデータ記述言語で表された文書の内容に
ついて、ＲＤＢＭＳ上で効果的な構造検索を可能にする
優れた情報処理の技術すなわち情報処理装置及び方法、
情報処理用ソフトウェアを記録した記録媒体並びにリレ
ーショナルデータベースを提供することが可能となる。As described above, according to the present invention,
An excellent information processing technique, that is, an information processing apparatus and method, which enables an effective structure search on an RDBMS with respect to the content of a document expressed in a data description language having a hierarchical structure,
It is possible to provide a recording medium on which information processing software is recorded and a relational database.

[Brief description of the drawings]

【図１】本発明の実施形態の構成を示す機能ブロック
図。FIG. 1 is a functional block diagram showing a configuration of an embodiment of the present invention.

【図２】本発明の実施形態のうち第１の格納モデルにお
いて、ＸＭＬ文書から本体テーブルを作成する処理手順
を示すフローチャート。FIG. 2 is a flowchart showing a processing procedure for creating a main body table from an XML document in a first storage model in the embodiment of the present invention.

【図３】本発明の実施形態のうち第２の格納モデルにお
いて、ＸＭＬ文書から本体テーブルを作成する処理手順
を示すフローチャート。FIG. 3 is a flowchart showing a processing procedure for creating a main body table from an XML document in a second storage model in the embodiment of the present invention.

【図４】本発明の実施形態のうち第２の格納モデルにお
いて、分離された形式の結果を得るための処理手順を示
すフローチャート。FIG. 4 is a flowchart showing a processing procedure for obtaining a result in a separated format in a second storage model in the embodiment of the present invention.

【図５】本発明の実施形態のうち第３の格納モデルにお
いて、文書をＲＤＢに格納する処理手順を示すフローチ
ャート。FIG. 5 is a flowchart showing a processing procedure for storing a document in an RDB in a third storage model according to the embodiment of the present invention.

【図６】本発明の実施形態のうち第３の格納モデルにお
いて、文書番号及び句切番号が指定された場合に、それ
らを含む要素全体を取り出すための処理手順を示すフロ
ーチャート。FIG. 6 is a flowchart showing a processing procedure for extracting an entire element including a document number and a break number when a document number and a break number are specified in the third storage model in the embodiment of the present invention.

【図７】本発明の実施形態のうち第３の格納モデルにお
いて、本体テーブルから属性を分離した形式で文書をＲ
ＤＢに格納する処理手順を示すフローチャート。FIG. 7 illustrates a third storage model according to the embodiment of the present invention in which a document is stored in a format in which attributes are separated from a main body table.
9 is a flowchart showing a processing procedure to be stored in a DB.

[Explanation of symbols]

Ｄ…ＸＭＬ文書Ｘ…ＸＭＬ変換等処理部ＲＤＢ…リレーショナルデータベースＴ１…本体テーブルＴ２…タグ一覧テーブルＴ３…属性テーブルＲＤＢＭＳ…リレーショナルデータベース管理システム１…変換格納部１１…文書番号決定部１２…句切取出部１３…句切番号決定部１４…テキストパス作成部１５…句切タイプ作成部１６…句切番号パス作成部１７…対照テーブル作成部１８…属性テーブル作成部２…検索等処理部２１…文書単位処理部２２…テキストパス処理部２３…句切タイプ処理部２４…句切番号パス処理部２５…属性処理部 D: XML document X: XML conversion processing unit RDB: Relational database T1: Body table T2: Tag list table T3: Attribute table RDBMS: Relational database management system 1: Conversion storage unit 11: Document number determination unit 12: Phrase extraction Unit 13: Punctuation number determination unit 14: Text path creation unit 15 ... Punctuation type creation unit 16 ... Punctuation number path creation unit 17 ... Contrast table creation unit 18 ... Attribute table creation unit 2 ... Search processing unit 21 ... Document Unit processing unit 22 Text path processing unit 23 Punctuation type processing unit 24 Punctuation number path processing unit 25 Attribute processing unit

Claims

[Claims]

1. A means for sequentially taking out tags and text between tags as a break from a given document, a relational database table showing a document number which differs for each document, and an order for each break. The punctuation number,
Means for recording each line including the extracted contents of the punctuation, and an information processing apparatus comprising:

2. The information processing apparatus according to claim 1, further comprising means for performing processing in units of the document based on a designated document number.

3. The method according to claim 1, wherein each line includes a text path represented by each tag existing in a path from a root to each punctuation in the hierarchical structure in the original document. An information processing apparatus according to claim 1.

4. The information processing apparatus according to claim 3, further comprising means for performing a process for a punctuation at a specified hierarchy or lower based on the text path.

5. The text path is expressed using the tag number using a comparison table of the tag name of the tag and a tag number for each tag name. Information processing device.

6. The information processing apparatus according to claim 5, wherein the tag number has a predetermined number of digits.

7. The information processing apparatus according to claim 1, wherein an attribute table storing attributes included in the tag is used.

8. The information processing apparatus according to claim 7, further comprising means for searching the attribute table for each row related to a specified attribute name or attribute value.

9. The information processing apparatus according to claim 1, wherein each row includes a punctuation type indicating a type of each punctuation.

10. The information processing apparatus according to claim 9, further comprising means for performing processing based on the type of the punctuation based on the punctuation type.

11. The information processing apparatus according to claim 9, wherein the attribute name is treated as a kind of tag name, and whether or not the tag name is an attribute is distinguished by the punctuation type.

12. The method according to claim 1, wherein each line includes a break number path in which each break present in a path from a root to each break in the hierarchical structure in the original document is represented by a break number. The information processing apparatus according to any one of claims 1 to 11, wherein

13. The information processing apparatus according to claim 12, further comprising means for performing processing based on an absolute position in a hierarchical structure in the document using the break number path.

14. A step of sequentially extracting a tag and a text between tags as a break from a given document; and displaying a document number which differs depending on the document and an order for each break in a table of a relational database. The punctuation number,
Recording each line including the extracted contents of the punctuation.

15. The method according to claim 14, wherein each line includes a text path represented by each tag present in a path from a root to each punctuation in the hierarchical structure in the original document. Information processing method.

16. The text path is expressed using the tag number, using a comparison table of the tag name of the tag and a tag number for each tag name. Information processing method.

17. An apparatus according to claim 14, wherein an attribute table storing attributes included in said tag is used.
7. The information processing method according to any one of 6.

18. The method according to claim 14, wherein each of the lines includes a punctuation type indicating the type of each punctuation.
The information processing method according to any one of the above.

19. The method according to claim 19, wherein each line includes a break number path in which each break present in a path from the root to each break in the hierarchical structure in the original document is represented by a break number. The information processing method according to any one of claims 14 to 18, wherein:

20. A recording medium on which information processing software for processing information by using a computer is recorded, wherein the software separates a tag and a text between the tags from a given document. In the relational database table, a document number that differs depending on the document, a punctuation number indicating the order of each punctuation,
A recording medium on which information processing software is recorded, wherein each line including the extracted content of the punctuation is recorded.

21. A tag which is sequentially extracted from a given document and a text between the tags are cut off, and a document number which differs depending on the document, a cut-off number indicating the order of each cut-off, A relational database, characterized in that each row containing the contents of the punctuation is recorded in a table.