JP4394964B2

JP4394964B2 - Data compression apparatus, data restoration apparatus, template generation apparatus, and data compression system

Info

Publication number: JP4394964B2
Application number: JP2004005842A
Authority: JP
Inventors: 晃金野; 英記行友; 雄大中山
Original assignee: NTT Docomo Inc
Current assignee: NTT Docomo Inc
Priority date: 2004-01-13
Filing date: 2004-01-13
Publication date: 2010-01-06
Anticipated expiration: 2024-01-13
Also published as: JP2005202507A

Description

本発明は、電子データの圧縮装置、復元装置、テンプレート生成装置およびデータ圧縮システムに関する。 The present invention relates to an electronic data compression apparatus, decompression apparatus, template generation apparatus, and data compression system.

近年、ＷＷＷ(World Wide Web)の普及により、ＨＴＭＬ(Hyper Text Markup Language)やＸＭＬ(Extensible Markup Language)等、構造化文書を用いたデータ交換が増加している。特に、ＸＭＬはＨＴＭＬを補う次世代の言語として注目を集めており、今後インターネットにおける情報交換の場において最も普及していくことが予想される。 In recent years, with the spread of the World Wide Web (WWW), data exchange using structured documents such as HTML (Hyper Text Markup Language) and XML (Extensible Markup Language) is increasing. In particular, XML is attracting attention as a next-generation language that supplements HTML, and is expected to be most popular in the field of information exchange on the Internet.

ＸＭＬは要素の階層構造を表すデータ表現形式を伴った言語であり、ＸＭＬを用いた文書（ＸＭＬ文書）は、例えば図３２のように記述される。図３２は、ＸＭＬ文書１０を示す図である。図３２に示すとおり、ＸＭＬは大きくマークアップとテキスト情報に分けられる。図３２に示すＸＭＬ文書１０では、マークアップは、要素開始記号（開始タグ）Ｍａ、要素終了記号（終了タグ）Ｍｂ、空要素記号（空要素タグ）Ｍｃからなっている。図３２では、<book>、<title>、<authors>、<author>、<contents>および<chapter>が要素開始記号Ｍａを表している。また、</book>、</title>、</authors>、</author>、</contents>および</chapter>が要素終了記号Ｍｂを表し、<misc/>が空要素記号Ｍｃを表している。これらの要素開始記号Ｍａから対応する要素終了記号Ｍｂまでの領域、または空要素記号Ｍｃが要素（ＸＭＬの基本となる情報単位）を表している。 XML is a language with a data expression format representing a hierarchical structure of elements, and a document (XML document) using XML is described as shown in FIG. 32, for example. FIG. 32 is a diagram showing the XML document 10. As shown in FIG. 32, XML is roughly divided into markup and text information. In the XML document 10 shown in FIG. 32, the markup includes an element start symbol (start tag) Ma, an element end symbol (end tag) Mb, and an empty element symbol (empty element tag) Mc. In FIG. 32, <book>, <title>, <authors>, <author>, <contents>, and <chapter> represent the element start symbol Ma. Also, </ book>, </ title>, </ authors>, </ author>, </ contents>, and </ chapter> represent the element end symbol Mb, and <misc /> represents the empty element symbol Mc. ing. The area from the element start symbol Ma to the corresponding element end symbol Mb, or the empty element symbol Mc represents an element (information unit that is the basis of XML).

要素開始記号Ｍａと要素終了記号Ｍｂの間には、別な要素記号の他、テキスト情報を記述することができる。例えば、図３２に示すＸＭＬ文書１０では、要素<title>には、文字列“ＸＭＬの基礎”が、要素<authors>の中に現れる最初の要素<author>には、文字列“山田太郎”がそれぞれテキスト情報として定義されている。 In addition to other element symbols, text information can be described between the element start symbol Ma and the element end symbol Mb. For example, in the XML document 10 shown in FIG. 32, the character string “XML basics” is included in the element <title>, and the character string “Taro Yamada” is included in the first element <author> that appears in the element <authors>. Are defined as text information.

要素やテキスト情報の間には、親子関係、兄弟関係が定義されている。図３２に示すＸＭＬ文書１０の場合、要素開始記号Ｍａが<book>で始まり、要素終了記号Ｍｂが</book>
で終了する要素（要素<book>）の中に、要素開始記号Ｍａが<title>で始まり、要素終了
記号Ｍｂが</title>で終了する要素（要素<title>）が含まれている。このとき、要素<book>は要素<title>の親要素であるといい、要素<title>は要素<book>の子要素であるという。これが要素の親子関係である。 Parent-child relationships and sibling relationships are defined between elements and text information. In the case of the XML document 10 shown in FIG. 32, the element start symbol Ma starts with <book>, and the element end symbol Mb starts with </ book>.
Includes an element (element <book>) that ends with <element> and an element start symbol Ma that starts with <title> and an element end symbol Mb that ends with </ title> (element <title>). At this time, the element <book> is said to be a parent element of the element <title>, and the element <title> is said to be a child element of the element <book>. This is the parent-child relationship of elements.

また、要素<title>と要素<authors>とは、同一の親要素<book>を持ち、かつ連続している。このとき、要素<title>と要素<authors>とは兄弟であるといい、要素<title>は要素
<authors>の前兄弟、要素<authors>は要素<title>の次兄弟であるという。これが要素の
兄弟関係である。 The element <title> and the element <authors> have the same parent element <book> and are continuous. At this time, element <title> and element <authors> are said to be siblings, and element <title> is element
The previous sibling of <authors>, element <authors> is said to be the next sibling of element <title>. This is an element sibling.

一般に、ＸＭＬは、コンピュータ間で通信を行う際や、ハードディスク装置やフラッシュメモリに蓄積する際には、図３２に示すＸＭＬ文書１０ようにテキスト形式で表現されている。一方、コンピュータ内部で検索や修正用に利用するときは、解析されてコンピュータ内部に適したデータ構造に変換されている。 In general, when XML is communicated between computers or stored in a hard disk device or flash memory, XML is expressed in a text format as an XML document 10 shown in FIG. On the other hand, when it is used for searching and correction in the computer, it is analyzed and converted into a data structure suitable for the computer.

図３３は、図３２に示すＸＭＬ文書１０を解析し、コンピュータの内部利用に適した形式に変換したデータ構造１１を示す図である。図３３では、各要素及びテキスト情報が型および値を有する頂点３０１〜３１７として記述されている。型は各頂点３０１〜３１７の左側に記述され、“Ｅ”であれば要素を表し、“Ｔ”であればテキスト情報を表している。例えば、頂点３０１では型３０１ａは“Ｅ”である。また、値は頂点の右側に記述され、例えば、頂点３０１では値３０１ｂは“ｂｏｏｋ”である。そして、頂点が要素を表す場合は値に要素の名称（要素名）が記述され、テキスト情報を表す場合は文字列が記述される。例えば、頂点３０２では要素名<title>を表し、頂点３０６ではテキスト情報“ＸＭＬの基礎”を表している。 FIG. 33 is a diagram showing a data structure 11 obtained by analyzing the XML document 10 shown in FIG. 32 and converting it into a format suitable for internal use of the computer. In FIG. 33, each element and text information are described as vertices 301 to 317 having types and values. The type is described on the left side of each of the vertices 301 to 317. “E” represents an element, and “T” represents text information. For example, at the vertex 301, the type 301a is “E”. The value is described on the right side of the vertex. For example, in the vertex 301, the value 301b is “book”. When the vertex represents an element, the element name (element name) is described in the value. When the vertex represents text information, a character string is described. For example, the vertex 302 represents the element name <title>, and the vertex 306 represents the text information “XML basics”.

また、各頂点３０１〜３１７は、もとの（変換前の）ＸＭＬ文書１０の親子関係および兄弟関係を表現するため、親参照、子参照、次兄弟参照および前兄弟参照の４つの参照を表す参照情報を有している。上述のＸＭＬ文書１０の場合、要素<title>は要素<book>の子要素であり、要素<book>は要素<title>の親要素であるから、図３３に示すデータ構造１１では、例えば、頂点３０１、３０２については、それぞれ、<book>から<title>への子参照Ｐ１と、<title>から<book>への親参照Ｐ２を有し、それらが矢印によって表現されている。また、要素<book>は<title>の次の子要素として要素<authors>も有している。この場合、頂点３０２、３０３については、要素<title>から要素<authors>への次兄弟参照Ｐ３、要素<authors>から要素<title>への前兄弟参照Ｐ４が保持されている。なお、データ構造１１の場合、兄弟関係にある要素では、先頭の子要素（例えば要素<title>）以外は親参照を直接に有しないようになっている。 Each vertex 301 to 317 represents four references of a parent reference, a child reference, a next sibling reference, and a previous sibling reference in order to express the parent-child relationship and sibling relationship of the original (before conversion) XML document 10. Has reference information. In the case of the XML document 10 described above, the element <title> is a child element of the element <book>, and the element <book> is a parent element of the element <title>. Therefore, in the data structure 11 illustrated in FIG. The vertices 301 and 302 each have a child reference P1 from <book> to <title> and a parent reference P2 from <title> to <book>, which are represented by arrows. The element <book> also has an element <authors> as a child element next to <title>. In this case, for the vertices 302 and 303, the next sibling reference P3 from the element <title> to the element <authors> and the previous sibling reference P4 from the element <authors> to the element <title> are held. In the case of the data structure 11, elements that are in a sibling relationship do not directly have a parent reference other than the first child element (for example, element <title>).

データ構造は、各頂点間の参照情報と、要素名やテキスト情報とを分離して管理することができ、例えば、そのそれぞれを図３４（ａ）、図３４（ｂ）のように表現することができる。ここで、図３４（ａ）は各頂点間の参照情報を有する相互参照関係データ４００を示す図であり、図３４（ｂ）は要素とテキスト情報のいずれかに設定される型と値を有する複数の頂点の集合（頂点群ともいう）を示すテーブル４５０を示す図である。 In the data structure, reference information between vertices and element names and text information can be managed separately. For example, each of them is expressed as shown in FIGS. 34 (a) and 34 (b). Can do. Here, FIG. 34 (a) is a diagram showing cross-reference relationship data 400 having reference information between vertices, and FIG. 34 (b) has a type and value set in either element or text information. It is a figure which shows the table 450 which shows the collection (it is also called vertex group) of a some vertex.

しかしながら、メモリ等の記憶装置の容量は有限であるため、データ構造を蓄積するときは、そのデータ構造を効率的に圧縮して蓄積することが求められる。この点に関し、非特許文献１には、図３４（ｂ）に示すような要素名やテキスト情報を圧縮する方法が開示されている。非特許文献１では、各頂点が保持する要素名やテキスト情報を辞書として別途蓄積し、各頂点には辞書のインデックスを持たせ、同じ文字列を複数蓄積しないようにすることで、圧縮する方法が開示されている。 However, since the capacity of a storage device such as a memory is finite, when storing a data structure, it is required to efficiently compress and store the data structure. In this regard, Non-Patent Document 1 discloses a method for compressing element names and text information as shown in FIG. In Non-Patent Document 1, the element name and text information held by each vertex are separately accumulated as a dictionary, and each vertex has a dictionary index so that a plurality of the same character strings are not accumulated. Is disclosed.

一方、非特許文献２には、ＸＭＬ文書中の部分的な構造を再利用することで、ＸＭＬ文書を圧縮する方法が開示されている。この方法は元のＸＭＬ文書を構造、要素名情報、テキスト情報の３つに分離したのち、そのそれぞれをＬＺ７７等の一般的な圧縮アルゴリズムで圧縮するというものである（ＬＺ７７について詳しくは、Jacob Ziv、Abraham Lempel:A Universal Algorithm for Sequential Data Compression。IEEE Transactions on Information Theory 23(3):337-343(1977)を参照）。
ここで、非特許文献２に開示されている圧縮方法について説明する。この圧縮方法ではまず、要素開始記号や空要素記号をそれぞれ「＃１」、「＃２」のような短い要素名で置換し、要素終了記号を「／」で置換する。また、テキスト情報は「Ｃ」で置換する。
以上の圧縮方法を分離したＸＭＬ文書１０に適用すると、分離後のデータ構造１２、要素名情報１３およびテキスト情報１４はそれぞれ図３５、図３６、図３７のように表現される。 On the other hand, Non-Patent Document 2 discloses a method for compressing an XML document by reusing a partial structure in the XML document. In this method, the original XML document is separated into the structure, element name information, and text information, and each is compressed by a general compression algorithm such as LZ77 (for details on LZ77, refer to Jacob Ziv). Abraham Lempel: A Universal Algorithm for Sequential Data Compression, see IEEE Transactions on Information Theory 23 (3): 337-343 (1977)).
Here, the compression method disclosed in Non-Patent Document 2 will be described. In this compression method, first, an element start symbol and an empty element symbol are replaced with short element names such as “# 1” and “# 2”, respectively, and an element end symbol is replaced with “/”. The text information is replaced with “C”.
When the above compression method is applied to the separated XML document 10, the data structure 12, the element name information 13, and the text information 14 after the separation are expressed as shown in FIGS. 35, 36, and 37, respectively.

また、非特許文献２に記載の圧縮方法では、ＬＺ７７等に代表される圧縮アルゴリズムを用いてそれぞれを独立に圧縮するが、ここではその圧縮アルゴリズムの概要について説明する。ＬＺ７７等の圧縮アルゴリズムは元の入力情報に含まれる部分的なパターンを発見し、それをテンプレートとして繰り返し再利用することにより、圧縮を行う。例えば、図３５に示すデータ構造１２の圧縮について説明すると、テンプレートとして、テンプレートＸ、Ｙ、Ｚ、Ｗ、Ｖを用いるとし、それぞれのテンプレートの割り当てを、Ｘ＝“＃１＃２Ｃ／＃３”、Ｙ＝“＃４Ｃ／”、Ｚ＝“／＃５”、Ｗ＝“＃６Ｃ／”、Ｖ＝“／＃７／／”のように設定すると、図３５に示したデータ構造１２は“ＸＹＹＹＺＷＷＶ”のように表せる。これは一部の文書構造をあらわすテンプレートとして、Ｙ、Ｗを複数回利用している。このように、テンプレートが繰り返し利用でき、元の文書を少ないテンプレートで表現することができれば、元のＸＭＬ文書を表す情報量が少なくて済むから圧縮が可能になる。 In the compression method described in Non-Patent Document 2, each is independently compressed using a compression algorithm typified by LZ77 or the like. Here, an outline of the compression algorithm will be described. A compression algorithm such as LZ77 finds a partial pattern included in the original input information, and performs compression by repeatedly reusing it as a template. For example, the compression of the data structure 12 shown in FIG. 35 will be described. Assume that templates X, Y, Z, W, and V are used as templates, and the assignment of each template is X = “# 1 # 2 C / # 3. "= Y #" # 4C / ", Z =" / # 5 ", W =" # 6 C / ", V =" / # 7 // ", the data structure 12 shown in FIG. Can be expressed as “X Y Y Y Z W W V”. This uses Y and W a plurality of times as templates representing a part of the document structure. In this way, if the template can be used repeatedly and the original document can be expressed with a small number of templates, the amount of information representing the original XML document can be reduced, so that compression is possible.

Mathias Neumuller and John N． Wilson:“Compact In-Memory Representation of XML”Internal Report of University of strathclydeMathias Neumuller and John N. Wilson: “Compact In-Memory Representation of XML” Internal Report of University of strathclyde Hartmut Liefke and Dan Suciu．:“XMill: An Efficient Compressor for XML Data”、In proceedings of ACM SIGMOD International Conference on Management of Data、2000Hartmut Liefke and Dan Suciu. : “XMill: An Efficient Compressor for XML Data”, In proceedings of ACM SIGMOD International Conference on Management of Data, 2000

しかしながら、従来の技術では、メモリ量の制約などにより、利用できるテンプレートが限られている場合、十分な圧縮を行うことができなかった。そのため、テンプレートを用いて圧縮されたデータ構造に対しても再圧縮を行い、その再圧縮で効率的に圧縮されたデータを復元できるようにすることが望ましい。 However, in the conventional technology, when the templates that can be used are limited due to the limitation of the amount of memory or the like, sufficient compression cannot be performed. Therefore, it is desirable to perform recompression on a data structure compressed using a template so that the data compressed efficiently by the recompression can be restored.

そこで、本発明は上記課題を解決するためになされたもので、ＸＭＬ文書のデータ構造をテンプレートにより効率的に圧縮し、効率的に圧縮されたデータをテンプレートにより復元して、効率的な圧縮および復元を可能にする構成を備えたデータ圧縮装置、その圧縮されたデータ構造を復元するデータ復元装置、その圧縮に用いるテンプレートを生成するテンプレート生成装置およびデータ圧縮システムを提供することを目的とする。 Accordingly, the present invention has been made to solve the above-described problems. The data structure of an XML document is efficiently compressed using a template, the efficiently compressed data is restored using a template, and efficient compression and It is an object of the present invention to provide a data compression apparatus having a configuration that enables restoration, a data restoration apparatus that restores the compressed data structure, a template generation apparatus that generates a template used for the compression, and a data compression system.

上記課題を解決するため、本発明は型と値をそれぞれ有する複数の頂点と、その頂点間の参照情報とを有する入力データを、頂点間の参照情報を有する相互参照関係データと、型と値を有する複数の頂点からなる頂点群とに分離し、その分離された頂点群のデータを出力する分離手段と、頂点間の参照情報の特定の部分的なパターンを表すテンプレートとして蓄積するテンプレート蓄積手段と、分離手段により分離された相互参照関係データから、テンプレート蓄積手段に蓄積されているテンプレートに対応する一致箇所を検出するテンプレート一致箇所検出手段と、分離手段により分離された相互参照関係データのうち、テンプレート一致箇所検出手段により検出された一致箇所をテンプレートで置換する際に、そのテンプレートによる置換箇所を示す指示付頂点を一致箇所に設けて、相互参照関係データの一致箇所以外の部分と、指示付頂点とから成るデータについて、テンプレートによる再置換を可能にするテンプレート置換手段とを有し、頂点間の参照情報をテンプレートにより復元可能な状態で、テンプレート置換手段により繰り返し置換された相互参照関係データを記憶装置に出力する出力手段とを有するデータ圧縮装置を特徴とする。
このデータ圧縮装置は、置換箇所を頂点と同様の構成で表現することができ、再帰的にテンプレートを適用することができるから、少ないテンプレートで高い圧縮効率を実現することができる。
In order to solve the above problems, the present invention provides input data having a plurality of vertices each having a type and a value and reference information between the vertices, cross-reference relationship data having reference information between vertices, a type and a value. separated into a vertex group consisting of a plurality of vertices having a separating means for outputting the data of the separated vertices, template storage for storing a template representing a specific partial pattern of the reference information between vertices Means for detecting a matching point corresponding to the template stored in the template storage unit from the cross-reference relationship data separated by the separation unit, and the cross-reference relationship data separated by the separation unit Of these, when replacing the matching part detected by the template matching part detection means with a template, Provided instructions with vertices indicate where the matching portion has a portion other than the matching portion of the cross reference data, the data comprising an instruction with the vertex, and a template replacing means to enable re-replacement by a template, The data compression apparatus includes output means for outputting cross-reference relationship data repeatedly replaced by the template replacement means to a storage device in a state where the reference information between the vertices can be restored by the template .
In this data compression apparatus, the replacement location can be expressed with the same configuration as the apex, and the template can be recursively applied, so that high compression efficiency can be realized with a small number of templates.

また、テンプレート一致箇所検出手段が、入力データが根付木構造を有するときはテンプレート蓄積手段に蓄積されているテンプレートに対応する一致箇所を検出するときに、根付木構造の最も深い位置にある頂点である葉頂点を含む一致箇所のみを検出することが好ましい。
これにより、圧縮された相互参照関係データを復元し、その復元された相互参照関係データの一部の頂点を参照する際に、その頂点の参照に必要なテンプレートのみを展開し、頂点の参照が高速に行える。
In addition, when the template matching location detection means detects a matching location corresponding to the template stored in the template storage means when the input data has a rooting tree structure , the vertex at the deepest position of the rooting tree structure is detected. It is preferable to detect only a coincident portion including a certain leaf vertex.
As a result, when the compressed cross-reference data is restored and some vertices of the restored cross-reference data are referenced, only the templates necessary for referencing the vertex are expanded. It can be done at high speed.

本発明は、複数の頂点間の参照情報の特定の部分的なパターンを表すテンプレートとして蓄積するテンプレート蓄積手段と、頂点間の参照情報を有し、かつ、テンプレートにより置換された圧縮済みの相互参照関係データを入力し、その圧縮済みの相互参照関係データをテンプレートを用いて展開し、その展開した相互参照関係データの中に指示付頂点が含まれるときに、展開した相互参照関係データをテンプレートを用いて展開する再展開を繰り返し行い、圧縮済みの相互参照関係データから圧縮前の相互参照関係データを復元する展開手段と、複数の頂点からなる頂点群のデータを入力し、その頂点群のデータと、展開手段により復元された圧縮前の相互参照関係データとを合成して、その合成されたデータを出力する合成手段とを有するデータ復元装置を提供する。
このデータ復元装置は入力される相互参照関係データについてテンプレートを用いた再展開を繰り返し行うため、再帰的にテンプレートの適用が行われ、少ないテンプレートを用い、高い圧縮率で圧縮された相互参照関係データを展開して復元することができる。
The present invention includes a template storage means for storing a template representing a specific partial pattern of the reference information between the vertices of the multiple has reference information between the vertices and compressed mutual substituted by template Input the reference relationship data, expand the compressed cross-reference relationship data using the template, and when the expanded cross-reference relationship data includes the indicated vertex , the expanded cross-reference relationship data is the template. Repeat the re-expansion using, and input the expansion means for restoring the cross-reference relation data before compression from the compressed cross-reference relation data and the data of the vertex group consisting of a plurality of vertices. Combining the data with the uncompressed cross-reference relationship data restored by the decompression means, and having the synthesis means for outputting the synthesized data Providing data restoration device.
Since this data restoration device repeatedly re-expands the input cross-reference relationship data using the template, the template is recursively applied, and the cross-reference relationship data compressed with a high compression ratio using a small number of templates. Can be expanded and restored.

また、展開手段は、相互参照関係データの中でテンプレートを用いて展開すべき頂点が指定されたときに、その参照すべき頂点の参照に必要なテンプレートを用いて再展開を行うとよい。
これにより、参照すべき頂点の参照に必要なテンプレートのみを展開するようにでき、圧縮された相互参照関係データから高速にデータを参照することができる。
In addition, when a vertex to be expanded using a template is specified in the cross-reference relationship data, the expansion means may perform reexpansion using a template necessary for referring to the vertex to be referred to.
As a result, only a template necessary for referring to the vertex to be referred to can be developed, and data can be referred to at high speed from the compressed cross-reference relationship data.

そして、本発明は、複数の頂点と、その頂点間の参照情報とを有する根付木構造のデータを入力し、入力データを取得する入力データ取得手段と、その手段により取得される入力データにおける各頂点間の参照情報に含まれる根付木構造の参照情報をテンプレート候補として検出するテンプレート候補検出手段と、その手段により取得されるテンプレート候補と、入力データにおけるテンプレート候補の出現頻度を記憶するテンプレート候補蓄積手段と、その手段に蓄積されているテンプレート候補により、入力データを置換するテンプレート置換手段と、テンプレート候補蓄積手段に記憶されているテンプレート候補から、出現頻度をもとにテンプレート候補を選択し、その選択されたテンプレート候補を出力するテンプレート選択手段とを有するテンプレート生成装置を提供する。
このテンプレート生成装置は、入力データに含まれる根付木構造の参照情報をテンプレート候補として検出してテンプレート候補蓄積手段に蓄積し、このテンプレート候補を用いて入力データを置換する。また、蓄積されているテンプレート候補から、出現頻度をもとにテンプレート候補を選択し、それをテンプレートとして出力する。 And this invention inputs the data of the netsuke tree structure which has a plurality of vertices and reference information between the vertices, acquires the input data, and each of the input data acquired by the means Template candidate detection means for detecting the reference information of the rooted tree structure included in the reference information between vertices as a template candidate, a template candidate acquired by the means, and a template candidate storage for storing the appearance frequency of the template candidate in the input data The template candidate is selected based on the appearance frequency from the template, the template replacement means for replacing the input data according to the template candidate stored in the means, and the template candidate stored in the template candidate storage means, A template selection means for outputting the selected template candidates; Providing a template generator having.
The template generation device detects reference information of the rooted tree structure included in the input data as a template candidate, stores it in the template candidate storage means, and replaces the input data using the template candidate. Further, a template candidate is selected from the accumulated template candidates based on the appearance frequency, and is output as a template.

また、テンプレート選択手段は、テンプレート候補を選択する際に、テンプレート置換手段が入力データの置換に用いたテンプレート候補のみを選択することができる。
こうすると、与えられた入力データのうち、必要な部分の参照情報をテンプレート候補として出力することができる。
また、テンプレート候補検出手段は入力データにおける根頂点から最も浅い葉頂点と同じ深さの頂点すべてを含む最小木構造を有する部分の参照情報をテンプレート候補として検出することができる。
さらに、入力データにおけるテンプレート候補検出手段により検出されなかった箇所の根付木構造を有する部分の参照情報が、テンプレート候補検出手段に入力されるようにしてもよい。
このようにしてテンプレート生成装置を構成すると、入力データから根頂点のみが接続情報を有する根付木構造をテンプレートとして生成できるようになるから、再帰的な圧縮に効果的なテンプレートを生成することができる。 Further, the template selecting means can select only the template candidates used by the template replacing means for replacing the input data when selecting the template candidates.
In this way, it is possible to output reference information of a necessary portion of given input data as a template candidate.
Further, the template candidate detection means can detect, as a template candidate, reference information of a portion having a minimum tree structure including all vertices having the same depth as the shallowest leaf vertex from the root vertex in the input data.
Furthermore, reference information of a portion having a rooted tree structure at a location not detected by the template candidate detection unit in the input data may be input to the template candidate detection unit.
When the template generation apparatus is configured in this way, a rooted tree structure in which only root vertices have connection information can be generated from input data as a template, so that a template effective for recursive compression can be generated. .

そして、テンプレート候補検出手段は入力データの根頂点から深さ優先探索で最初に遭遇する葉頂点を起点とし、その起点となる頂点と、その親頂点とを有する根付木構造部分の参照情報をテンプレート候補とし、参照情報に頂点を一つ追加して得られる根付木構造部分の参照情報を新たなテンプレート候補とする拡張候補検出を繰り返すこともできる。
このようにしてテンプレート生成装置を構成すると、テンプレートと入力データの一致確認を逐次行うことなく、入力データの構造解析を行うことにより、テンプレートとの一致を検証することができるため、テンプレートを高速に生成することができる。
また、テンプレート候補検出手段は、新たなテンプレート候補の検出を頂点数によって制限するとよく、根付木構造部分の参照情報の高さにより、新たなテンプレート候補の検出を制限してもよい。
これらのようにすると、生成されるテンプレートの大きさを制限し、出現頻度の低い大きなテンプレートの生成を抑制することができる。 Then, the template candidate detecting means starts from the leaf vertex first encountered in the depth-first search from the root vertex of the input data, and sets the reference information of the rooted tree structure portion having the starting vertex and its parent vertex as a template. It is also possible to repeat the extension candidate detection using the reference information of the rooted tree structure part obtained by adding one vertex to the reference information as a candidate and a new template candidate.
By configuring the template generation device in this way, it is possible to verify the match with the template by performing the structural analysis of the input data without sequentially confirming the match between the template and the input data. Can be generated.
Further, the template candidate detection means may limit the detection of a new template candidate by the number of vertices, and may limit the detection of a new template candidate by the height of the reference information of the rooted tree structure portion.
By doing so, it is possible to limit the size of the generated template and suppress the generation of a large template with a low appearance frequency.

さらに、テンプレート候補検出手段が、拡張候補検出を行う際に、起点となる頂点との深さの差が所定値よりも大きい新頂点への拡張は行わず、新頂点を起点として、拡張候補検出を繰り返すことが好ましい。
このようにしてテンプレート生成装置を構成すると、生成されるテンプレートの大きさを制限することができるだけでなく、起点とする頂点と深さの差が所定値よりも大きい頂点の子孫となる頂点が別のテンプレートにより置換されて、すべて葉頂点として扱うことが可能になる。入力データから、根頂点のみが接続情報を有する根付木構造をテンプレートとして生成することが可能になるから、再帰的な圧縮に効果的なテンプレートを動的に生成することが可能になる。 Furthermore, when the template candidate detection means performs extension candidate detection, it does not extend to a new vertex whose depth difference from the starting vertex is larger than a predetermined value, but detects an extension candidate starting from the new vertex. Is preferably repeated.
By configuring the template generation device in this way, not only can the size of the generated template be restricted, but also the vertices that are descendants of the vertices whose depth difference is greater than a predetermined value are separated from the vertices as the starting point. Can be treated as leaf vertices. Since a rooted tree structure in which only root vertices have connection information can be generated from input data as a template, a template effective for recursive compression can be dynamically generated.

そして、本発明は、型と値をそれぞれ有する複数の頂点と、その頂点間の参照情報とを有する第１の入力データを圧縮するデータ圧縮システムであって、第１の入力データと同様の構造を有する第２の入力データから抽出した頂点間の参照情報を有する相互参照関係データを入力し、頂点間の参照情報の特定の部分的なパターンを表すテンプレートを生成するテンプレート生成装置と、テンプレート生成装置によって生成されたテンプレートを蓄積するテンプレート蓄積手段を備えた上記いずれかのデータ圧縮装置と、頂点間の参照情報をテンプレートにより復元可能な状態で、テンプレート蓄積手段に蓄積されたテンプレートを用いて前記データ圧縮装置により繰り返し置換された相互参照関係データを第１の出力データとして出力し、第１の入力データから分離された型と値をそれぞれ有する複数の頂点からなる頂点群のデータを第２の出力データとして出力する出力手段とを有するデータ圧縮システムを提供する。
このようにしてデータ圧縮システムを構成すると、与えられた入力データの圧縮に適したテンプレートを動的に生成しながら、その生成されたテンプレートを用いて入力データを圧縮することができるため、圧縮効果の高いデータ圧縮を実現することができる。
The present invention is a data compression system for compressing first input data having a plurality of vertices each having a type and a value, and reference information between the vertices, and has the same structure as the first input data enter a cross reference data with reference information between extracted vertex from the second input data having a lutein plates generator to generate a template representing a specific partial pattern of the reference information between the vertices the any one of the data compression apparatus comprising a template storing means for storing a template generated by template generator, with recoverable state by references to the template between vertices, the stored template to the template storage means the cross reference data are repeatedly replaced by the data compression apparatus using output as the first output data, the Providing data compression system and an output means for outputting the data of the vertex group consisting of a plurality of vertices each having a separate type and value from the input data as the second output data.
By configuring the data compression system in this way, it is possible to compress the input data using the generated template while dynamically generating a template suitable for compression of the given input data. High data compression can be realized.

本発明によれば、ＸＭＬ文書のデータ構造を少ないテンプレートを用いて再帰的に効率良く圧縮し、その効率よく圧縮されたデータ構造を復元することができる。 According to the present invention, the data structure of an XML document can be efficiently recursively compressed using a small number of templates, and the efficiently compressed data structure can be restored.

以下、本発明の実施形態について図面を用いて詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

データ圧縮装置の実施の形態
（第１の実施の形態）
図１は、本実施の形態に係るデータ圧縮装置１０１の構成を示すブロック図である。図１に示すように、データ圧縮装置１０１はテンプレート蓄積手段１０２、テンプレート一致箇所検出手段１０３、分離手段１０７、テンプレート置換手段１１１および切り替え手段１０４を有している。このデータ圧縮装置１０１は、型と値をそれぞれ有する複数の頂点（頂点群）と、その各頂点間の参照情報とを有する入力データ１０８を入力し、第１の出力データ１０９と第２の出力データ１１０とを出力するようになっている。 Embodiment of data compression apparatus (first embodiment)
FIG. 1 is a block diagram showing a configuration of a data compression apparatus 101 according to the present embodiment. As shown in FIG. 1, the data compression apparatus 101 includes a template storage unit 102, a template matching location detection unit 103, a separation unit 107, a template replacement unit 111, and a switching unit 104. The data compression apparatus 101 receives input data 108 having a plurality of vertices (vertex group) each having a type and a value, and reference information between the vertices, and outputs first output data 109 and second output data. Data 110 is output.

本実施の形態では、図２に示すＸＭＬ文書２０を圧縮する手順をもって発明の詳細を説明する。図２（ａ）は、ＸＭＬ文書２０のテキスト表現の一例を示す図である。ＸＭＬ文書２０は既知の手法（例えばhttp://xml.apache.org/xerces2-j/において示されるXercesなど）により、図２（ｂ）に示すようなコンピュータの内部利用に適したデータ構造２１に変換可能である。そこで、以下では、ＸＭＬ文書２０を図２（ｂ）に示すデータ構造２１に変換した後の圧縮工程について説明する。このデータ構造２１は、型と値をそれぞれ有する複数の頂点（頂点群）と、その各頂点間の参照情報とを有している。 In the present embodiment, the details of the invention will be described with a procedure for compressing the XML document 20 shown in FIG. FIG. 2A is a diagram illustrating an example of a text representation of the XML document 20. The XML document 20 has a data structure 21 suitable for internal use of a computer as shown in FIG. 2B by a known method (for example, Xerces shown in http://xml.apache.org/xerces2-j/). Can be converted to Therefore, hereinafter, the compression process after the XML document 20 is converted into the data structure 21 shown in FIG. 2B will be described. This data structure 21 has a plurality of vertices (vertex group) each having a type and a value, and reference information between the vertices.

データ圧縮装置１０１は、分離手段１０７が図２（ｂ）に示すデータ構造２１を入力データ１０８として入力し、その入力データ１０８を各頂点間の参照情報を有する相互参照関係データＲｄと、型と値を有する複数の頂点からなる頂点群のデータＴｄとに分離している。すなわち、分離手段１０７は次のようにして、図３に示す相互参照関係データ１０００と、図４に示すテーブル９００を生成し、そのテーブル９００を型と値を有する頂点１００１〜１００７の集合（頂点群）とすることによって、データ構造２１を分離している。その相互参照関係データ１０００は各頂点を一意に識別可能なＩＤ（頂点ＩＤ）を各頂点１００１〜１００７に順に割り振って生成する。また、テーブル９００はその割り振った頂点ＩＤと、対応する頂点１００１〜１００７がもともと有していた型および値との組を列記して生成する。各頂点１００１〜１００７への頂点ＩＤの割り振り方には、幅優先探索や、深さ優先探索等があるが、ここでは深さ優先探索を用いている。また、分離手段１０７は、分離して得られる頂点群のデータを第２の出力データ１１０として出力する。 In the data compression apparatus 101, the separating unit 107 inputs the data structure 21 shown in FIG. 2B as input data 108, and the input data 108 is cross-reference relationship data Rd having reference information between vertices, type, It is separated into vertex group data Td consisting of a plurality of vertices having values. That is, the separating means 107 generates the cross-reference relationship data 1000 shown in FIG. 3 and the table 900 shown in FIG. 4 as follows, and the table 900 is a set of vertices 1001 to 1007 (vertices having types and values). Group)), the data structure 21 is separated. The cross-reference relationship data 1000 is generated by sequentially assigning each vertex 1001 to 1007 an ID (vertex ID) that can uniquely identify each vertex. Further, the table 900 generates a list of the assigned vertex IDs and the types and values originally possessed by the corresponding vertices 1001 to 1007. There are a breadth-first search, a depth-first search, and the like as a method of assigning the vertex IDs to the vertices 1001 to 1007. Here, the depth-first search is used. Further, the separating unit 107 outputs the vertex group data obtained by the separation as the second output data 110.

ＸＭＬ文書２０から分離された相互参照関係データ１０００、およびテーブル９００はそれぞれ図３，図４に示す通りである。ここで、図４に示すテーブル９００は頂点ＩＤ９０１、型９０２および値９０３を一行とする形式で表現されている。
テンプレート蓄積手段１０２は、圧縮に先立ちあらかじめテンプレートとテンプレート実体を蓄積している。このとき、テンプレート蓄積手段１０２は、テンプレートやテンプレート実体として、例えば、あらかじめ高い頻度で適用されることが分かっている高頻度のものや、後述するテンプレート生成装置１６０１で生成されたテンプレートを蓄積している。このようなテンプレートとテンプレート実体としては、例えば、図５（ａ），図５（ｂ）に示すテンプレート１１０５とテンプレート実体１１０９とがある。 The cross-reference relationship data 1000 and the table 900 separated from the XML document 20 are as shown in FIGS. 3 and 4, respectively. Here, the table 900 shown in FIG. 4 is expressed in a format in which the vertex ID 901, the type 902, and the value 903 are one line.
The template storage unit 102 stores templates and template entities in advance prior to compression. At this time, the template storage unit 102 stores, for example, a high-frequency template that is known to be frequently applied in advance as a template or a template entity, or a template generated by a template generation device 1601 described later. Yes. Examples of such templates and template entities include a template 1105 and a template entity 1109 shown in FIGS. 5 (a) and 5 (b).

テンプレート１１０５は、テンプレートＩＤ１１０６、接続情報１１０７およびパターン情報１１０８を有している。テンプレートＩＤ１１０６はテンプレート蓄積手段１０２に複数のテンプレートが蓄積された際に、それら各テンプレートを一意に識別するために用いられる。パターン情報１１０８はそのテンプレートによって表現される頂点間の参照情報のパターンを表し、複数の頂点とそれら相互の参照情報とを有している。パターン情報１１０８に含まれる頂点間の参照情報には、親参照、子参照、次兄弟参照、前兄弟参照の４種類の参照が設けられている。なお、接続先の頂点がない参照のうち、後述するようにしてテンプレートを適用し、相互参照関係データ１０００を圧縮する際に利用されない参照については、その旨がパターン情報１１０８に記述されている。これは、例えば、無効な頂点を定義しておき、その頂点への参照とすることで実現可能である。接続情報１１０７にはテンプレート１１０５を実際に適用し、相互参照関係データ１０００を圧縮する際における他のテンプレートや頂点との接続を示す情報が列挙されている。 The template 1105 has a template ID 1106, connection information 1107, and pattern information 1108. The template ID 1106 is used to uniquely identify each template when a plurality of templates are stored in the template storage unit 102. The pattern information 1108 represents a reference information pattern between vertices represented by the template, and includes a plurality of vertices and mutual reference information. The reference information between vertices included in the pattern information 1108 includes four types of references: parent reference, child reference, next sibling reference, and previous sibling reference. Of the references that do not have a vertex at the connection destination, a reference is used in the pattern information 1108 for a reference that is not used when the template is applied and the cross-reference relationship data 1000 is compressed as described later. This can be realized, for example, by defining an invalid vertex and making a reference to that vertex. The connection information 1107 lists information indicating connections with other templates and vertices when the template 1105 is actually applied and the cross-reference relationship data 1000 is compressed.

このようなテンプレート１１０５は、接続情報１１０７と、参照情報を有するパターン情報１１０８とを区別して構成しているから、異なるテンプレート１１０５同士でパターン情報１１０８を共有することができる。つまり、接続情報１１０７を異ならせることにより、接続され得る頂点や他のテンプレートを異ならせ、パターン情報１１０８が同じでも、別テンプレートのようにして利用することができる。すると、テンプレート内に含まれる頂点間の参照情報が省略可能となり、テンプレート蓄積手段１０２のメモリ使用量（記憶領域）を効率よく利用することが可能となる。 Since such a template 1105 is configured by distinguishing connection information 1107 and pattern information 1108 having reference information, the pattern information 1108 can be shared by different templates 1105. That is, by making the connection information 1107 different, vertices and other templates that can be connected are made different, and even if the pattern information 1108 is the same, it can be used like another template. Then, the reference information between the vertices included in the template can be omitted, and the memory usage (storage area) of the template storage unit 102 can be used efficiently.

また、図５（ａ）において、テンプレートＩＤ１１０６には具体的な値として“Ｔ１”が設定されている。パターン情報１１０８は４つの頂点１１０１〜１１０４と、それらの間の参照を示す参照情報とにより構成され、参照は矢印で記述されている。なお、参照の種類は矢印に対して、親参照はｐ、子参照はｃ、次兄弟参照はｎｓ、前兄弟参照はｐｓとして記述されている。例えば、頂点１１０１の子参照ｃは頂点１１０２を指定しており、頂点１１０３の次兄弟参照ｎｓは頂点１１０４を指定している。 In FIG. 5A, “T1” is set as a specific value in the template ID 1106. The pattern information 1108 includes four vertices 1101 to 1104 and reference information indicating references between them, and the references are described by arrows. The reference type is described as an arrow, the parent reference is p, the child reference is c, the next sibling reference is ns, and the previous sibling reference is ps. For example, the child reference c of the vertex 1101 designates the vertex 1102, and the next sibling reference ns of the vertex 1103 designates the vertex 1104.

また、テンプレートを適用し、相互参照関係データ１０００を圧縮した際に利用されないことを示す参照は、端点を「×」で記述し、テンプレートを適用し相互参照関係データ１０００を圧縮する際に他のテンプレートや頂点と接続されることを示す参照は端点を「○」で記述している。後者に該当する４つの参照、すなわち、頂点１１０１の親参照、頂点１１０２、頂点１１０３、頂点１１０４の子参照については、接続情報１１０７に頂点のＩＤと参照の種類が列挙されている。 In addition, a reference indicating that a template is applied and is not used when the cross-reference relationship data 1000 is compressed is described by using “x” as an end point, and when the template is applied and the cross-reference relationship data 1000 is compressed, References indicating that they are connected to templates and vertices are described with “◯” at the end points. For the four references corresponding to the latter, that is, the parent reference of the vertex 1101, the child reference of the vertex 1102, the vertex 1103, and the vertex 1104, the vertex information and the type of reference are listed in the connection information 1107.

図５（ｂ）に示すテンプレート実体１１０９は、入力データ１０８に対し、相互参照関係データ１０００を圧縮する際に、テンプレートを適用したこと（テンプレート適用済み）を表すために用いられる。このテンプレート実体１１０９は、テンプレート実体ＩＤ１１１０と、適用するテンプレートを表す利用テンプレートＩＤ１１１１と、実体接続情報１１１２と、実体情報１１１３とを有している。
テンプレート実体ＩＤ１１１０は、テンプレートＩＤ１１１１で示されるテンプレート１１０５を適用して相互参照関係データ１０００を圧縮した際、そのテンプレート１１０５の適用箇所を一意に特定するために用いられる。実体接続情報１１１２は、テンプレート１１０５を適用し相互参照関係データ１０００を圧縮した際に接続する先の頂点が列挙されている。実体情報１１１３はテンプレート１１０５を適用し、相互参照関係データ１０００を圧縮した際に、テンプレート１１０５に内包されている頂点のＩＤ（頂点ＩＤ）が蓄積されている。実体接続情報１１１２と実体情報１１１３に入る具体的な値については後述する。 A template entity 1109 shown in FIG. 5B is used to indicate that a template has been applied to the input data 108 when the cross-reference relationship data 1000 is compressed (template applied). The template entity 1109 has a template entity ID 1110, a use template ID 1111 representing a template to be applied, entity connection information 1112, and entity information 1113.
The template entity ID 1110 is used to uniquely identify the application location of the template 1105 when the template 1105 indicated by the template ID 1111 is applied and the cross-reference relationship data 1000 is compressed. The entity connection information 1112 lists vertices to be connected when the template 1105 is applied and the cross-reference relationship data 1000 is compressed. As the entity information 1113, when the template 1105 is applied and the cross-reference relationship data 1000 is compressed, vertex IDs (vertex IDs) included in the template 1105 are accumulated. Specific values entering the entity connection information 1112 and the entity information 1113 will be described later.

テンプレート一致箇所検出手段（以下「一致箇所検出手段」という）１０３は図３に示す相互参照関係データ１０００から、テンプレート蓄積手段１０２に蓄積されているテンプレートに対応する一致箇所を検出する。テンプレート蓄積手段１０２には、テンプレート１１０５のほか、複数のテンプレートが蓄積されることが予想されるため、一致箇所検出手段１０３による検出結果は複数通り存在すると考えられる。ただし、例えば図１２に示す一致箇所検出手順によれば検出結果は一意に定まる。
なお、図１２に示す一致箇所検出手順は以下のとおりである。
処理開始後ステップ１で、テンプレート蓄積手段に蓄積されたテンプレートから、頂点の数が多い順に１つずつ選択し、以下の処理を繰り返す。
選択したテンプレートをＰｊとし、ステップ２に進む。
続くステップ２では、相互参照関係データに含まれる頂点から、選択したテンプレートＰの頂点の数と一致する頂点を選択する組み合わせをＸ１，Ｘ２，Ｘｍとし、その中から１つずつ選択して、以下を繰り返す。
選択した組み合わせをＸｋとする。
次に、ステップ３に進み、Ｘｋに含まれる頂点はすべて置換済みマークが無いか否かを判断し、無ければステップ４に進み、そうでなければｋがｍに達するまでステップ３からステップ５を繰り返す。
ステップ４に進むと、ＸｋがＰｊと同型か否かを判断し、同型であればステップ５に進み、そうでなければｋをひとつ進め、ステップ３に戻る。
ステップ５に進むと、テンプレートＰｊ，Ｘｋを一致箇所としてパターン一致情報に登録し、Ｘｋに含まれる頂点は置換済みとしてマークする。 A template matching location detection unit (hereinafter referred to as “matching location detection unit”) 103 detects a matching location corresponding to the template stored in the template storage unit 102 from the cross-reference relationship data 1000 shown in FIG. In addition to the template 1105, it is expected that a plurality of templates are stored in the template storage unit 102. Therefore, it is considered that there are a plurality of detection results by the coincidence point detection unit 103. However, for example, according to the matching location detection procedure shown in FIG. 12, the detection result is uniquely determined.
In addition, the matching location detection procedure shown in FIG. 12 is as follows.
In step 1 after the processing is started, the templates stored in the template storage means are selected one by one in descending order of the number of vertices, and the following processing is repeated.
The selected template is set as Pj, and the process proceeds to Step 2.
In the subsequent step 2, the combinations for selecting the vertices that match the number of vertices of the selected template P from the vertices included in the cross-reference relationship data are X1, X2, and Xm, and one of them is selected one by one. repeat.
Let Xk be the selected combination.
Next, the process proceeds to step 3, where it is determined whether or not all the vertices included in Xk have a replaced mark. If not, the process proceeds to step 4; otherwise, the process proceeds from step 3 to step 5 until k reaches m. repeat.
In step 4, it is determined whether Xk is the same type as Pj. If it is the same type, the process proceeds to step 5; otherwise, k is incremented by one and the process returns to step 3.
In step 5, the templates Pj and Xk are registered in the pattern matching information as matching locations, and the vertices included in Xk are marked as replaced.

本実施の形態では、テンプレート蓄積手段１０２にテンプレート１１０５のみが蓄積されているとき、相互参照関係データ１０００に対して、図１２に示す一致箇所検出手順により求めた一致箇所を示す。そのテンプレートの一致箇所は、例えば図１１（ａ）に示す一致箇所情報１７０１のように、利用したテンプレートを示すテンプレートＩＤ１７０２と、テンプレートの頂点から元の相互参照関係データ１０００の頂点への割り当てを示す頂点対応情報１７０３とにより表すことができる。
ここで、相互参照関係データ１０００のうち、テンプレートに対応する一致箇所を検出した結果の一例として、一致箇所情報１７０４を図１１（ｂ）に示す。この一致箇所情報１７０４はテンプレート蓄積手段１０２にテンプレート１１０５のみが蓄積されているときに、図１２に示す一致箇所検出手順により、相互参照関係データ１０００のうち、テンプレート１１０５に対応する一致箇所を検出した結果であり、一致箇所は１つであったことを示している。この一致箇所情報１７０４は、一致箇所検出手段１０３からテンプレート置換手段１１１に入力される。 In the present embodiment, when only the template 1105 is stored in the template storage unit 102, the matching location obtained by the matching location detection procedure shown in FIG. The matching part of the template indicates, for example, a template ID 1702 indicating the used template and assignment from the vertex of the template to the vertex of the original cross-reference relationship data 1000 as in the matching part information 1701 shown in FIG. It can be represented by vertex correspondence information 1703.
Here, as an example of the result of detecting the matching part corresponding to the template in the cross-reference relationship data 1000, the matching part information 1704 is shown in FIG. When only the template 1105 is stored in the template storage unit 102, the matching location information 1704 detects a matching location corresponding to the template 1105 in the cross-reference relationship data 1000 by the matching location detection procedure shown in FIG. It is a result and it has shown that there was one coincidence part. The matching location information 1704 is input from the matching location detection unit 103 to the template replacement unit 111.

テンプレート置換手段１１１は、一致箇所検出手段１０３から一致箇所情報１７０４（図１に示す一致箇所情報Ｕｄ）を入力し、該当する頂点を起点としてテンプレートのパターン情報に内包されている頂点に対応する頂点の集合をすべてテンプレート実体１１０９に置換し、その置換結果を圧縮済の相互参照関係データＣｄとして出力する。この置換は例えば図１２に示す一致箇所検出手順で実現することができる。
ここで、その置換結果を図６に示す。図６では、テンプレート実体としてテンプレート実体１２０２が存在している。テンプレート実体１２０２はテンプレート実体ＩＤ１２０３と、利用テンプレートＩＤ１２０４と、実体接続情報１２０５と、実体情報１２０６とを有している。テンプレート実体ＩＤ１２０３には“Ｘ１”が設定され、利用テンプレートＩＤ１２０４には、“Ｔ１”が設定されている。これにより、テンプレート実体１２０２は図５（ａ）に示すテンプレート１１０５が適用されたことを示している。 The template replacement unit 111 receives the matching part information 1704 (matching part information Ud shown in FIG. 1) from the matching part detection unit 103, and uses the corresponding vertex as a starting point and corresponds to the vertex included in the pattern information of the template. Are replaced with the template entity 1109, and the result of the replacement is output as compressed cross-reference relationship data Cd. This replacement can be realized, for example, by the matching location detection procedure shown in FIG.
Here, the result of the replacement is shown in FIG. In FIG. 6, a template entity 1202 exists as a template entity. The template entity 1202 has a template entity ID 1203, a use template ID 1204, entity connection information 1205, and entity information 1206. “X1” is set in the template entity ID 1203, and “T1” is set in the usage template ID 1204. As a result, the template entity 1202 indicates that the template 1105 shown in FIG.

実体情報１２０６は、利用したテンプレート（のパターン情報）に内包されている頂点と、テンプレートを適用する前の相互参照関係データ１０００の頂点との対応を示している。例えば、テンプレート実体１２０２の場合、テンプレート１１０５の頂点１１０１，１１０２，１１０３，１１０４がそれぞれ圧縮前の相互参照関係データ１０００の頂点１００１，頂点１００２，頂点１００６，頂点１００７に対応していることを示している。
実体接続情報１２０５は、そのテンプレート実体における他のテンプレート実体や頂点との接続情報を示している。ここで、上述したとおり、テンプレート実体１２０２が適用しているテンプレートはテンプレート１１０５であるが、そのテンプレート１１０５は外部と接続できる参照を４つ保持していることが、テンプレート１１０５における接続情報１１０７に記述されている（図５（ａ）参照）。 The entity information 1206 indicates the correspondence between the vertices included in the used template (pattern information) and the vertices of the cross-reference relationship data 1000 before the template is applied. For example, in the case of the template entity 1202, the vertices 1101, 1102, 1103, and 1104 of the template 1105 correspond to the vertex 1001, the vertex 1002, the vertex 1006, and the vertex 1007 of the cross-reference relationship data 1000 before compression, respectively. Yes.
The entity connection information 1205 indicates connection information with other template entities and vertices in the template entity. Here, as described above, the template applied by the template entity 1202 is the template 1105, but it is described in the connection information 1107 in the template 1105 that the template 1105 holds four references that can be connected to the outside. (See FIG. 5A).

そして、テンプレート実体１２０２には、これらの参照先がどの頂点なのかが記述されている。すなわち、頂点１１０２の子参照が頂点１００３へ接続されるように記述され、頂点１１０１の親参照、頂点１１０３の子参照および頂点１１０４の子参照はいずれの頂点にも接続されないことが記述されている。
テンプレート置換手段１１１は置換する際に、テンプレート実体１２０２へ置換されたことを示す情報を頂点と同様の構成で表現することによって、テンプレートで置換した後の相互参照関係データ１０００を再圧縮が可能なデータとしている。 The template entity 1202 describes which vertex is the reference destination. That is, it is described that the child reference of the vertex 1102 is connected to the vertex 1003, and the parent reference of the vertex 1101, the child reference of the vertex 1103, and the child reference of the vertex 1104 are described as not connected to any vertex. .
When replacing, the template replacement unit 111 can re-compress the cross-reference relationship data 1000 after replacement with the template by expressing the information indicating the replacement with the template entity 1202 with the same configuration as the vertex. It is data.

本実施の形態では、テンプレート置換手段１１１が相互参照関係データ１０００について、テンプレート１１０５を用いた置換を行うときに、テンプレート１１０５の適用を示すテンプレート実体１２０２で置換するのではなく、次のようにしている。すなわち、テンプレート置換手段１１１は、図６に示すように、テンプレート実体ＩＤ１２０３に設定されている実体ＩＤ（Ｘ１）を示すポインタ付頂点（指示付頂点）１２０７を設けることによって、テンプレート実体１２０２による置換が行われたことを示す一方、ポインタ付き頂点１２０７以外の頂点、すなわち頂点１００３、頂点１００４、頂点１００５が圧縮前と同様にそれぞれの参照を有するようにして、テンプレート１１０５を用いた置換を行っている。これにより、テンプレート実体１２０２への置換を表すポインタ付き頂点１２０７はテンプレート１１０５により置換されていない他の頂点（頂点１００３、頂点１００４、頂点１００５）と同様の構成となり、テンプレート実体１２０２を頂点の１つとして表現できるようになる。こうして、テンプレート置換手段１１１は、相互参照関係データ１０００について、テンプレートによる圧縮後の再圧縮を実現可能にしている。 In the present embodiment, when the template replacement unit 111 replaces the cross-reference relationship data 1000 using the template 1105, it is not replaced with the template entity 1202 indicating the application of the template 1105, but as follows. Yes. That is, as shown in FIG. 6, the template replacement unit 111 provides a vertex with pointer (vertex with instruction) 1207 indicating the entity ID (X1) set in the template entity ID 1203, so that the replacement by the template entity 1202 is performed. On the other hand, the vertices other than the pointed vertex 1207, that is, the vertex 1003, the vertex 1004, and the vertex 1005 have their respective references as before the compression, and the replacement using the template 1105 is performed. . As a result, the vertex 1207 with a pointer representing the replacement with the template entity 1202 has the same configuration as the other vertices (vertex 1003, vertex 1004, vertex 1005) not replaced by the template 1105, and the template entity 1202 is one of the vertices. Can be expressed as In this way, the template replacement unit 111 can realize re-compression of the cross-reference relationship data 1000 after compression using the template.

切り替え手段１０４は、圧縮前の相互参照関係データ１０００に対する圧縮が行われた圧縮済の相互参照関係データ１２０１（図１に示すＣｄ）を第１の出力データとして出力するか、一致箇所検出手段１０３に入力して再圧縮を行うか（出力または再圧縮）を切り替え、両者を選択できるように構成されている。この切り替え手段１０４は再圧縮の余地があるか否か（再圧縮余地の有無）の判定結果や、予め指定しておいた再圧縮の回数（以下「指定回数」という）等を用いて、出力または再圧縮を選択し得るようになっている。再圧縮余地の有無は、一致箇所検出手段１０３が出力するテンプレート一致箇所情報Ｕｄが空か否かで行える。 The switching unit 104 outputs the compressed cross-reference relationship data 1201 (Cd shown in FIG. 1), which has been compressed with respect to the cross-reference relationship data 1000 before compression, as the first output data, or the matching location detection unit 103. Is switched to perform recompression (output or recompression) and both can be selected. This switching means 104 outputs using a determination result of whether or not there is room for recompression (presence / absence of recompression room), the number of recompressions specified in advance (hereinafter referred to as “specified number of times”), and the like. Or recompression can be selected. Whether or not there is room for recompression can be determined by checking whether or not the template matching location information Ud output by the matching location detection means 103 is empty.

本実施の形態では、指定回数を利用して出力または再圧縮を選択することとしている。入力データとしては、図２（ｂ）に示すＸＭＬ文書のデータ構造２１を与え、テンプレート蓄積手段１０２に、図５に示すテンプレート１１０５のみが蓄積されている場合の動作を例にとって説明する。
切り替え手段１０４は、指定回数を利用して、圧縮済みの相互参照関係データ１２０１の出力または再圧縮を選択する。ここでは、例えば指定回数がＮであり、今回の圧縮がｎ回目であるとする。 In this embodiment, output or recompression is selected using the designated number of times. As an example of the input data, the data structure 21 of the XML document shown in FIG. 2B is given, and the operation when only the template 1105 shown in FIG.
The switching unit 104 selects output or recompression of the compressed cross-reference relationship data 1201 using the specified number of times. Here, for example, it is assumed that the designated number of times is N, and the current compression is the nth time.

ｎ≧Ｎであるときは出力が選択される。このとき、切り替え手段１０４は、相互参照関係データ１２０１を第１の出力データ１０９として出力する。
ｎ＜Ｎであるときは再圧縮が選択される。このとき、切り替え手段１０４は、相互参照関係データ１２０１を一致箇所検出手段１０３に入力する。相互参照関係データ１２０１は、テンプレート実体１２０２の適用を頂点と同様の構成で表現しているため、テンプレートの適用箇所を一つの頂点とみなし、例えば図１２に示す一致箇所検出手順で一致箇所情報を導出することができる。図６に示した相互参照関係データ１２０１が一致箇所検出手段１０３に入力されるとき、検出された一致箇所検出情報は図１１（ｃ）に示す一致箇所情報１７１０となる。 The output is selected when n ≧ N. At this time, the switching unit 104 outputs the cross-reference relationship data 1201 as the first output data 109.
Recompression is selected when n <N. At this time, the switching unit 104 inputs the cross-reference relationship data 1201 to the matching location detection unit 103. Since the cross-reference relationship data 1201 expresses the application of the template entity 1202 in the same configuration as the vertex, the application location of the template is regarded as one vertex, and for example, the matching location information is obtained by the matching location detection procedure shown in FIG. Can be derived. When the cross-reference relationship data 1201 shown in FIG. 6 is input to the matching part detection unit 103, the detected matching part detection information becomes matching part information 1710 shown in FIG. 11C.

図７は図６に示す相互参照関係データ１２０１を再圧縮した相互参照関係データ１３０１を示す図である。テンプレート置換手段１１１は、図６に示した相互参照関係データ１２０１のうち、一致箇所検出情報に示された頂点１２０７を起点として、テンプレート１２０５のパターン情報と対応する頂点群、すなわち、頂点１２０７、頂点１００３、頂点１００４および頂点１００５を、テンプレート実体１３０２を用いて置換する。この置換の結果、図７に示すとおり、単一の頂点１３０７のみからなる相互参照関係データ１３０１が得られる。本実施の形態では、例えば頂点１３０７がテンプレート実体１３０２のテンプレート実体ＩＤ１３０３を保持して、テンプレート実体１３０２を参照することを示し、他のテンプレートにより置換されていない頂点と同一の構成としている。 FIG. 7 is a diagram showing cross-reference relationship data 1301 obtained by recompressing the cross-reference relationship data 1201 shown in FIG. The template replacement unit 111 starts from the vertex 1207 indicated in the matching part detection information in the cross-reference relationship data 1201 shown in FIG. 6, and the vertex group corresponding to the pattern information of the template 1205, that is, the vertex 1207, the vertex 1003, vertex 1004, and vertex 1005 are replaced using the template entity 1302. As a result of this replacement, as shown in FIG. 7, cross-reference relationship data 1301 including only a single vertex 1307 is obtained. In this embodiment, for example, the vertex 1307 holds the template entity ID 1303 of the template entity 1302 to indicate that the template entity 1302 is referred to, and has the same configuration as the vertex that is not replaced by another template.

なお、テンプレートの接続情報の数が頂点の有する接続情報の数（親参照、子参照、次兄弟参照、前兄弟参照をそれぞれ１つずつ）よりも多くなる場合が考えられるが、接続情報の数が多い頂点については、一致箇所検出手段１０３において、一致箇所検出の対象から除外してもよい。
以上のように、データ圧縮装置は、一致箇所検出手段１０３により検出された一致箇所をテンプレートで置換する際に、そのテンプレートによる置換箇所を示す指示付き頂点１２０７を設けてテンプレートによる再置換を可能とし、再帰的にテンプレートを適用できるようにしている。そのため、同一のテンプレートを複数回適用することができ、より少ないテンプレートで高い圧縮率を実現できるようになっている。 Note that the number of connection information in the template may be larger than the number of connection information at the vertex (one each for parent reference, child reference, next sibling reference, and previous sibling reference). The vertices with a large number of points may be excluded from matching point detection by the matching point detection unit 103.
As described above, the data compression apparatus, when replacing the matching part detected by the matching part detection unit 103 with the template, provides the designated vertex 1207 indicating the replacement part by the template, and enables re-substitution by the template. , So that templates can be applied recursively. Therefore, the same template can be applied a plurality of times, and a high compression rate can be realized with a smaller number of templates.

（第２の実施の形態）
上述のような再圧縮を行うことにより、型と値をそれぞれ有する複数の頂点と、各頂点間の参照情報とを有する相互参照関係データにおける頂点の一部を参照するときは、圧縮された相互参照関係データに対し、テンプレートを用いて繰返し展開処理を行い、復元処理を再帰的に行う必要がある。そのため、再圧縮の回数が増えるにしたがい、頂点の参照に重大な時間を要するおそれがある。したがって、頂点の参照に要する時間が可能な限り短縮できることが好ましい。
そのためには、入力データとして提供される相互参照関係データが根付木構造を有するときに、その木構造を有する元の相互参照関係データの最も浅い位置にある頂点（根頂点）から最も深い位置にある頂点（葉頂点）へと順次復元することが可能となれば、すなわち、テンプレートが復元された相互参照関係データにおける葉頂点に存在するようになれば、復元が必要な箇所のみを選択しながら復元することが可能となる。そのため、参照に要する時間を短縮できると考えられる。 (Second Embodiment)
By referencing a part of the vertices in the cross-reference relationship data having a plurality of vertices each having a type and a value and reference information between the vertices by performing the recompression as described above, the compressed mutual It is necessary to perform a repetitive expansion process on the reference relationship data using a template and perform a recursion process recursively. Therefore, as the number of recompression increases, it may take a significant time to refer to the vertex. Therefore, it is preferable that the time required to refer to the vertex can be shortened as much as possible.
For this purpose, when the cross-reference relationship data provided as input data has a rooted tree structure, the vertex from the shallowest position (root vertex) of the original cross-reference relationship data having the tree structure to the deepest position. If it is possible to sequentially restore to a certain vertex (leaf vertex), that is, if the template is present at the leaf vertex in the restored cross-reference data, only the points that need to be restored are selected. It can be restored. Therefore, it is considered that the time required for reference can be shortened.

これは、例えば、一致箇所検出手段１０３が一致箇所の検出において、木構造の葉頂点を含む一致箇所のみを検出し、その一致箇所をテンプレート置換手段１１１がテンプレートに置換する処理を再帰的に行うことによって実現することができる。
ここで、例えば、図３に示した相互参照関係データ１０００が一致箇所検出手段１０３に入力され、テンプレート蓄積手段１０２に図５に示すテンプレート１１０５が蓄積されているとする。このとき、一致箇所検出手段１０３は一致箇所の検出で葉頂点を優先するため、葉頂点１００５から一致箇所を検出していく。そして、葉頂点１００５を含む一致箇所として、頂点１００２が検出されるので、一致箇所検出手段１０３は、一致箇所情報Ｖｄとして、頂点１００２をテンプレート置換手段１１１に入力する。 This is because, for example, the matching part detection unit 103 detects only the matching part including the leaf vertex of the tree structure in the detection of the matching part, and the template replacement unit 111 recursively replaces the matching part with the template. Can be realized.
Here, for example, it is assumed that the cross-reference relationship data 1000 shown in FIG. 3 is input to the coincidence portion detection unit 103 and the template 1105 shown in FIG. At this time, the matching point detection unit 103 prioritizes the leaf vertex in the detection of the matching point, and therefore detects the matching point from the leaf vertex 1005. Then, since the vertex 1002 is detected as a matching location including the leaf vertex 1005, the matching location detection unit 103 inputs the vertex 1002 to the template replacement unit 111 as the matching location information Vd.

そして、テンプレート置換手段１１１は、入力する一致箇所情報Ｖｄを用いて、頂点１００２を起点としてテンプレート１１０５のパターン情報１１０８に対応する頂点群、すなわち頂点１００２、頂点１００３、頂点１００４および頂点１００５を再圧縮可能となるようにテンプレートで置換する。
一方、切り替え手段１０４は、テンプレートで置換された圧縮後の相互参照関係データの出力または再圧縮を選択する。ここでは、再圧縮が選択されたとすると、切り替え手段１０４はテンプレートで置換された相互参照関係データを一致箇所検出手段１０３に入力する。 Then, the template replacement unit 111 recompresses the vertex group corresponding to the pattern information 1108 of the template 1105, that is, the vertex 1002, the vertex 1003, the vertex 1004, and the vertex 1005, with the vertex 1002 as the starting point, using the input matching part information Vd. Replace with templates as possible.
On the other hand, the switching unit 104 selects the output or recompression of the compressed cross-reference relationship data replaced with the template. Here, assuming that recompression is selected, the switching unit 104 inputs the cross-reference relationship data replaced with the template to the matching location detection unit 103.

図８は、上述の要領で葉頂点を優先する再圧縮（優先圧縮）を行ったときの相互参照関係データ１４０１を示す図である。この相互参照関係データ１４０１を、図７に示すようにして再圧縮の順序を考慮せずに再圧縮を行った相互参照関係データ１３０１と比較すると、以下の点で相違がある。後者の相互参照関係データ１３０１は、テンプレートを元の相互参照関係データへ完全に復元しなければ圧縮前の相互参照関係データ１０００の根頂点、すなわち頂点１００１を復元することができない。そのため、復元した箇所が元の相互参照関係データ１０００におけるどの位置に存在していたかが不明である。これに対して、前者の相互参照関係データ１４０１は、テンプレートを相互参照関係データへ１回復元するだけで（根）頂点１００１を復元することができるため、元の相互参照関係データ１０００における位置を確認しながら復元することが可能となる。したがって、上述の要領で優先圧縮を行うと、頂点の参照に要する時間を短縮できるようになる。 FIG. 8 is a diagram showing the cross-reference relationship data 1401 when recompression (priority compression) that prioritizes leaf vertices in the manner described above. When the cross-reference relationship data 1401 is compared with the cross-reference relationship data 1301 that has been recompressed without considering the recompression order as shown in FIG. 7, there are differences in the following points. The latter cross-reference relationship data 1301 cannot restore the root vertex of the cross-reference relationship data 1000 before compression, that is, the vertex 1001, unless the template is completely restored to the original cross-reference relationship data. Therefore, it is unclear at which position in the original cross-reference relationship data 1000 the restored location was located. On the other hand, the former cross-reference relationship data 1401 can restore the (root) vertex 1001 only by restoring the template to the cross-reference relationship data once. It is possible to restore while confirming. Therefore, if the priority compression is performed as described above, the time required to refer to the vertex can be shortened.

（第３の実施の形態）
データ圧縮装置１０１から出力される第１の出力データ１０９は、テンプレートの適用を示すテンプレート実体を頂点として表現しているため、圧縮前の相互参照関係データと同様、一致箇所検出手段１０３に入力することができる。そのため、複数のデータ圧縮装置を直列に接続することにより、相互参照関係データの再圧縮を行うことができる。ここで、図１５は、複数のデータ圧縮手段を直列に接続したデータ圧縮装置２１０４の構成を示すブロック図である。図に示すように、データ圧縮装置２１０４は一致箇所検出手段２１０７およびテンプレート置換手段２１０８を有する第１の圧縮手段２１０１と、第２の圧縮手段２１０２と、・・・、第Ｎの圧縮手段２１０ｎを連続して直列に接続している。第１〜第Ｎまでの各圧縮手段２１０１〜２１０ｎは、いずれも同一のテンプレート蓄積手段２１０９を参照するようになっている。また、各圧縮手段２１０１〜２１０ｎは、それぞれ相互参照関係データを入力し、その相互参照関係データをテンプレートを用いて圧縮し、圧縮済みの相互参照関係データを出力データとして出力するようになっている。このような構成を有するデータ圧縮装置２１０４により、相互参照関係データの再圧縮が行える。 (Third embodiment)
Since the first output data 109 output from the data compression apparatus 101 expresses the template entity indicating the application of the template as a vertex, the first output data 109 is input to the coincidence portion detection unit 103 as with the cross-reference relationship data before compression. be able to. Therefore, recompression of cross-reference relationship data can be performed by connecting a plurality of data compression devices in series. Here, FIG. 15 is a block diagram showing a configuration of a data compression apparatus 2104 in which a plurality of data compression means are connected in series. As shown in the figure, the data compression apparatus 2104 includes a first compression unit 2101 having a matching point detection unit 2107 and a template replacement unit 2108, a second compression unit 2102,..., An Nth compression unit 210n. Connected in series continuously. The first to Nth compression units 2101 to 210n all refer to the same template storage unit 2109. Each of the compression means 2101 to 210n inputs cross-reference relationship data, compresses the cross-reference relationship data using a template, and outputs the compressed cross-reference relationship data as output data. . The data compression apparatus 2104 having such a configuration can re-compress cross reference data.

データ復元装置の実施の形態
図９は、本発明の実施の形態に係るデータ復元装置１５０１の構成を示すブロック図である。データ復元装置１５０１は、テンプレート蓄積手段１５０２と、テンプレート展開手段１５０５と、切り替え手段１５０３および合成手段１５０４を有している。このデータ復元装置１５０１は圧縮済みの相互参照関係データを第１の入力データ１５０６とし、型と値を有する頂点群のデータを第２の入力データ１５０７とし、その第１の入力データ１５０６から圧縮前の相互参照関係データを復元し、それを第２の入力データ１５０７として入力される頂点群のデータと合成して、出力データ１５０８を出力するようになっている。本実施の形態では、データ復元装置１５０１に対し、第１の入力データ１５０６として、図７に示す相互参照関係データ１３０１を入力し、第２の入力データ１５０７として、図３に示すテーブル９００を入力するとし、テンプレート蓄積手段１５０２に図５に示すテンプレート１１０５のみが蓄積されている場合を例にとって、データ復元装置１５０１について説明する。 Embodiment of Data Restoration Apparatus FIG. 9 is a block diagram showing a configuration of a data restoration apparatus 1501 according to an embodiment of the present invention. The data restoration apparatus 1501 includes a template storage unit 1502, a template development unit 1505, a switching unit 1503, and a synthesis unit 1504. This data decompression apparatus 1501 uses the compressed cross-reference relationship data as the first input data 1506 and the vertex group data having the type and value as the second input data 1507. The cross-reference relationship data is restored and combined with the vertex group data input as the second input data 1507, and output data 1508 is output. In this embodiment, the cross-reference relationship data 1301 shown in FIG. 7 is input as the first input data 1506 and the table 900 shown in FIG. 3 is input as the second input data 1507 to the data restoration device 1501. Then, the data restoration device 1501 will be described by taking as an example a case where only the template 1105 shown in FIG.

テンプレート展開手段１５０５は第１の入力データ１５０６として与えられた相互参照関係データ１３０１（のテンプレート適用箇所）をテンプレート蓄積手段１５０２に蓄積されているテンプレート１１０５を用いて展開する。その展開は、例えば図１３に示す置換手順により行うことができる。
なお、置換手順は以下のとおりである。
置換手順は、処理開始後、ステップ６に進みｉに０をセットして、ステップ７に進む。
ステップ７では、テンプレート一致箇所情報に含まれるすべての一致箇所について、それぞれ１つづつステップ８およびステップ９の処理を繰り返す。
ステップ８では、テンプレート実体を一つ作成し、実体ＩＤ＝ｉとする。このテンプレート実体をＯｉとし，以下の処理を行う
利用テンプレートＩＤを一致箇所Ｍｉの利用テンプレートＩＤより複製する。
実体情報をＭｉの頂点対応情報より複製する。実体接続情報は実体情報に記述された対応関係より、元の参照をそのまま代入する。
ステップ９では、ｉ＝ｉ＋１を計算する、
ステップ７の実行により、ステップ８およびステップ９を繰り返した後、ステップ１０では、作成済のテンプレート実体を１つずつ選択し、以下の処理を繰り返す。選択したテンプレート実体をＯｉとする。
次にステップ１１に進み、実体接続情報に記述された参照の接続先頂点が他のテンプレート実体に含まれる場合はテンプレート実体ＩＤとテンプレートの頂点の組に置換する。
相互参照関係データ１３０１を図１４に示す復元手順により展開すると、図６に示す相互参照関係データ１２０１のようになる。
なお、復元手順は以下のとおりである。
図１４において、開始後のステップ１２で、圧縮済みの相互参照関係データに含まれるすべてのテンプレート実体をＸ１，Ｘ２，Ｘｎとし、すべてについて以下を行う。
選択したテンプレート実体をＸｉとする。
次にステップ１３に進み、テンプレート実体Ｘｉが利用するテンプレートが持つ頂点間の参照情報を複製し、テンプレート実体Ｘｉの実体情報に記述される頂点のＩＤを割り振る。
次いでステップ１４に進み、テンプレート実体Ｘｉの実体接続情報に記述された頂点が他のテンプレート実体Ｘｍに含まれる頂点の場合、テンプレート実体Ｘｍに記述される頂点ＩＤで置換する。 The template expansion unit 1505 expands the cross-reference relationship data 1301 (the template application location) given as the first input data 1506 using the template 1105 stored in the template storage unit 1502. The development can be performed by, for example, the replacement procedure shown in FIG.
The replacement procedure is as follows.
In the replacement procedure, after the processing is started, the process proceeds to step 6, i is set to 0, and the process proceeds to step 7.
In step 7, the processing of step 8 and step 9 is repeated one by one for all the matching locations included in the template matching location information.
In step 8, one template entity is created and entity ID = i. This template entity is set to Oi, and the following processing is performed. The usage template ID is duplicated from the usage template ID of the matching location Mi.
The entity information is duplicated from the vertex correspondence information of Mi. For the entity connection information, the original reference is substituted as it is based on the correspondence described in the entity information.
In step 9, i = i + 1 is calculated.
After executing Step 7 and repeating Step 8 and Step 9, in Step 10, the created template entities are selected one by one, and the following processing is repeated. Let the selected template entity be Oi.
Next, the process proceeds to step 11, and if the connection destination vertex of the reference described in the entity connection information is included in another template entity, it is replaced with a set of the template entity ID and the template vertex.
When the cross-reference relationship data 1301 is expanded by the restoration procedure shown in FIG. 14, the cross-reference relationship data 1201 shown in FIG. 6 is obtained.
The restoration procedure is as follows.
In FIG. 14, in step 12 after the start, all template entities included in the compressed cross-reference relationship data are X1, X2, and Xn, and the following is performed for all.
Let Xi be the selected template entity.
In step 13, the reference information between the vertices of the template used by the template entity Xi is copied, and the vertex ID described in the entity information of the template entity Xi is assigned.
Next, the process proceeds to step 14, and if the vertex described in the entity connection information of the template entity Xi is a vertex included in another template entity Xm, the vertex ID described in the template entity Xm is replaced.

切り替え手段１５０３は展開された相互参照関係データ１２０１の中に、更なるテンプレートの適用箇所が有るか否か（再適用箇所の有無）を判断する。そして、切り替え手段１５０３は再適用箇所が有ると判断すると、更なる展開を行うため、展開した結果の相互参照関係データをテンプレート展開手段１５０５に入力する。また、切り替え手段１５０３は、再適用箇所が無いと判断すると、展開した結果の相互参照関係データの出力先を合成手段１５０４へと切り替える。なお、再適用箇所は、例えば、全頂点のうち、テンプレート実体への参照を有する頂点を検出することによって、検出可能である。
テンプレート展開手段１５０５と、切り替え手段１５０３は、適用箇所がなくなるまで上述した展開を繰り返し実行する。
本実施の形態の場合、テンプレート１１０５を用いた１回目の展開結果は図６に示す相互参照関係データ１２０１のようになる。この相互参照関係データ１２０１は、テンプレート実体１２０２への参照を有しているため（ポインタ付き頂点１２０７が有るため）、まだ再適用箇所を有している。そのため、切り替え手段１５０３は展開した結果の相互参照関係データ１２０１をテンプレート展開手段１５０５に入力し、テンプレート展開手段１５０５がその再適用箇所を再び展開する。すると、図３に示す相互参照関係データ１０００が復元され、相互参照関係データ１０００が切り替え手段１５０３へ入力される。相互参照関係データ１０００は再適用箇所を含まないので、切り替え手段１５０３は出力先を合成手段１５０４へと切り替える。 The switching unit 1503 determines whether or not there is a further template application location in the expanded cross-reference relationship data 1201 (presence / absence of a re-application location). When the switching unit 1503 determines that there is a re-applied portion, the cross-reference relationship data as a result of the expansion is input to the template expansion unit 1505 for further expansion. If the switching unit 1503 determines that there is no re-applied portion, the switching unit 1503 switches the output destination of the cross-reference relationship data as a result of the expansion to the combining unit 1504. Note that the reapplied portion can be detected by detecting, for example, a vertex having a reference to the template entity among all the vertices.
The template expansion unit 1505 and the switching unit 1503 repeatedly execute the above-described expansion until there are no more application locations.
In the case of the present embodiment, the first development result using the template 1105 is the cross-reference relationship data 1201 shown in FIG. Since the cross-reference relationship data 1201 has a reference to the template entity 1202 (because there is a vertex 1207 with a pointer), it still has a reapplied portion. Therefore, the switching unit 1503 inputs the cross-reference relationship data 1201 as a result of the expansion to the template expansion unit 1505, and the template expansion unit 1505 expands the reapplied portion again. Then, the cross-reference relationship data 1000 shown in FIG. 3 is restored, and the cross-reference relationship data 1000 is input to the switching unit 1503. Since the cross-reference relationship data 1000 does not include a reapplied part, the switching unit 1503 switches the output destination to the synthesis unit 1504.

合成手段１５０４は入力された相互参照関係データ１０００と、第２の入力データ１５０７として入力されたテーブル９００とを合成する。図４に示したテーブル９００には、各頂点に対応する頂点ＩＤ９０１が割り振られている。そのため、合成手段１５０４は相互参照関係データ１０００における頂点ＩＤ９０１が一致する各頂点に、型９０２と値９０３に設定されている情報を当て嵌めていくことにより、相互参照関係データ１０００とテーブル９００とを合成する。この合成を行うことにより、型と値をそれぞれ有する頂点群のデータと、各頂点間の参照情報とを有するデータ構造２１を復元することができる。 The synthesizing unit 1504 synthesizes the input cross-reference relationship data 1000 and the table 900 input as the second input data 1507. In the table 900 shown in FIG. 4, a vertex ID 901 corresponding to each vertex is assigned. Therefore, the synthesizing unit 1504 applies the information set in the type 902 and the value 903 to each vertex having the same vertex ID 901 in the cross-reference relationship data 1000, thereby obtaining the cross-reference relationship data 1000 and the table 900. Synthesize. By performing this synthesis, it is possible to restore the data structure 21 having vertex group data each having a type and a value and reference information between the vertices.

なお、図９において、データ復元装置１５０１は各手段が一体化された単一の装置として構成されているが、必ずしも単一の装置として実現される必要はなく、複数の装置を図示しない通信手段により接続して実現することも可能である。例えば、データ復元装置１５０１からテンプレート蓄積手段１５０２を分離した上で、テンプレート蓄積手段１５０２を別な単一装置として実現し、両装置間を図示しない通信手段により接続する構成にしてもよい。そうすると、テンプレート蓄積手段１５０２を有しない複数のデータ復元装置間でテンプレート蓄積手段１５０２を共有するといったことも可能になる。
また、本実施の形態では、切り替え手段１５０３がテンプレート展開手段１５０５への入力と合成手段１５０４への出力を切り替えることにより、テンプレート展開手段１５０５におけるテンプレートの展開処理を繰り返し実行している。その他にも、展開された相互参照関係データを常にテンプレート展開手段１５０５に入力し、テンプレートの展開が終了した段階で、合成手段１５０４に出力させるような切り替え手段を用いてもよい。 In FIG. 9, the data restoration device 1501 is configured as a single device in which the respective units are integrated. However, the data restoration device 1501 is not necessarily realized as a single device, and a communication unit that does not illustrate a plurality of devices is illustrated. It is also possible to implement by connecting with. For example, after separating the template storage unit 1502 from the data restoration device 1501, the template storage unit 1502 may be realized as another single device, and the two devices may be connected by a communication unit (not shown). Then, the template storage unit 1502 can be shared among a plurality of data restoration apparatuses that do not have the template storage unit 1502.
In the present embodiment, the switching unit 1503 switches the input to the template expanding unit 1505 and the output to the combining unit 1504, so that the template expanding process in the template expanding unit 1505 is repeatedly executed. In addition, a switching unit may be used in which the expanded cross-reference relationship data is always input to the template expansion unit 1505 and output to the synthesis unit 1504 when the template expansion is completed.

さらに、切り替え手段１５０３に対して、相互参照関係データの参照位置やテンプレートの展開回数等のデータをデータ復元装置１５０１の外部から与えるように構成してもよい。こうすると、テンプレート展開手段１５０５が、テンプレートの適用箇所を展開することにより入手可能な型と値をそれぞれ有する頂点群のデータと、各頂点間の参照情報とを有するデータ構造のうち、参照すべき（参照する必要がある）頂点の参照に必要なテンプレートを用いた展開のみを行うことが可能になる。 Furthermore, the switching unit 1503 may be configured to provide data such as the reference position of the cross-reference relationship data and the number of times of template expansion from the outside of the data restoration device 1501. In this way, the template expansion means 1505 should refer to a data structure having data of vertices each having a type and a value that can be obtained by expanding the application location of the template, and reference information between the vertices. It is possible to perform only the expansion using a template necessary for referencing a vertex (which needs to be referred to).

テンプレート生成装置の実施の形態
（第１の実施の形態）
図１０は、本実施の形態に係るテンプレート生成装置１６０１の構成を示すブロック図である。テンプレート生成装置１６０１はテンプレート候補検出手段１６０２と、テンプレート置換手段１６０３と、入力データ取得手段１６０４と、テンプレート候補蓄積手段１６０５と、テンプレート選択手段１６０６とを有している。このテンプレート生成装置１６０１は、入力データ群１６０７を入力し、出力データ（テンプレート群）１６０８を出力するようになっている。 Embodiment of template generation apparatus (first embodiment)
FIG. 10 is a block diagram showing a configuration of template generation apparatus 1601 according to the present embodiment. The template generation device 1601 includes a template candidate detection unit 1602, a template replacement unit 1603, an input data acquisition unit 1604, a template candidate storage unit 1605, and a template selection unit 1606. The template generation device 1601 receives an input data group 1607 and outputs output data (template group) 1608.

入力データ群１６０７は根付木構造を有する複数の相互参照関係データにより構成されている。本実施の形態では、図２１に示すＸＭＬ文書３０および図２２に示すＸＭＬ文書４０からそれぞれ分離された図２３および図２４に示す２つの相互参照関係データ２９００，３０００を入力データ群１６０７として、テンプレート生成装置１６０１について説明する。
なお、図２１、図２２に示すＸＭＬ文書３０、４０から、相互参照関係データ２９００，３０００を分離するためには、第１の実施の形態と同様にして行えばよい。すなわち、この分離は、各頂点を一意に識別可能な頂点ＩＤを各頂点に順に割り振って相互参照関係データ２９００，３０００を生成し、その割り振った頂点ＩＤと、対応する頂点がもともと有していた型および値との組を列記してテーブルを生成し、そのテーブルを頂点群とすることで容易に行うことができる。 The input data group 1607 is composed of a plurality of cross-reference relationship data having a netting tree structure. In the present embodiment, two cross-reference relationship data 2900 and 3000 shown in FIGS. 23 and 24 separated from the XML document 30 shown in FIG. 21 and the XML document 40 shown in FIG. The generation device 1601 will be described.
In order to separate the cross-reference relationship data 2900 and 3000 from the XML documents 30 and 40 shown in FIG. 21 and FIG. 22, it may be performed in the same manner as in the first embodiment. That is, in this separation, a vertex ID that can uniquely identify each vertex is assigned to each vertex in order to generate cross-reference relationship data 2900, 3000, and the assigned vertex ID and the corresponding vertex originally possessed. A table can be generated by listing pairs of types and values, and the table can be easily used as a vertex group.

本発明を実現するための手順として複数の手順が考えられる。本実施の形態では、図２５、図２６、図２７に示すテンプレート生成の第１〜第３の手順を用いるものとする。
まず、図２５に示すテンプレート生成の第１の手順について説明する。テンプレート生成装置１６０１は、第１の手順の処理開始後、ステップ３０に進み、入力データ取得手段１６０４がテンプレート生成装置１６０１に与えられた入力データ群１６０７から相互参照関係データを１つ取り出し、その相互参照関係データをテンプレート候補検出手段１６０２に入力する。テンプレート候補検出手段１６０２は、入力する相互参照関係データを次の相互参照関係データが入力されるまで保持データＤに設定する。なお、テンプレート候補検出手段１６０２への相互参照関係データの選択順はいずれでもよいが、本実施の形態では、相互参照関係データ２９００，３０００の順に入力されるものとする。 A plurality of procedures are conceivable as procedures for realizing the present invention. In this embodiment, it is assumed that the first to third procedures of template generation shown in FIGS. 25, 26, and 27 are used.
First, a first procedure for generating a template shown in FIG. 25 will be described. After starting the processing of the first procedure, the template generation device 1601 proceeds to step 30, and the input data acquisition unit 1604 extracts one cross-reference relationship data from the input data group 1607 given to the template generation device 1601. Reference relationship data is input to the template candidate detection means 1602. The template candidate detecting unit 1602 sets the input cross-reference relationship data to the retained data D until the next cross-reference relationship data is input. The selection order of the cross-reference relationship data to the template candidate detection unit 1602 may be any order, but in this embodiment, the cross-reference relationship data 2900 and 3000 are input in this order.

（中間手順１）
次にステップ３１に進み、テンプレート生成装置１６０１は保持データＤの根頂点を引数として、第２の手順の呼び出しを行う。ここでは、相互参照関係データ２９００の根頂点２９０１を引数に設定して第２の手順の呼び出しを行う。続くステップ３２では、入力データ群１６０７に相互参照関係データが有るか否かを判断し、相互参照関係データが有ればステップ３０に戻り、無ければステップ３３に進む。このステップ３２により、テンプレート生成装置１６０１は入力データ群１６０７に相互参照関係データがある限り、各相互参照関係データ（の根頂点）を引数とする第２の手順の呼び出しを繰り返し行う。
一方、入力データ群１６０７に相互参照関係データが無くなると、それを入力データ取得手段１６０４が検知してテンプレート選択手段１６０６に通知する。テンプレート生成装置１６０１では、この通知があるとステップ３３に進み、テンプレート選択手段１６０６がテンプレート候補蓄積手段１６０５に蓄積されているテンプレート候補の中からテンプレートを選択し、選択したテンプレートをテンプレート群１６０８として出力する。 (Intermediate procedure 1)
Next, the process proceeds to step 31 where the template generation device 1601 calls the second procedure using the root vertex of the retained data D as an argument. Here, the root vertex 2901 of the cross-reference relationship data 2900 is set as an argument, and the second procedure is called. In the following step 32, it is determined whether or not the cross reference relationship data exists in the input data group 1607. If there is cross reference relationship data, the process returns to step 30; As a result of this step 32, as long as the input data group 1607 includes cross-reference relationship data, the template generation device 1601 repeatedly calls the second procedure using each cross-reference relationship data (root vertex) as an argument.
On the other hand, when there is no cross-reference relationship data in the input data group 1607, the input data acquisition unit 1604 detects this and notifies the template selection unit 1606. When the notification is received, the template generation apparatus 1601 proceeds to step 33, where the template selection unit 1606 selects a template from the template candidates stored in the template candidate storage unit 1605, and outputs the selected template as a template group 1608. To do.

次に、第２の手順について説明する。第２の手順では、保持データＤの頂点を第１の手順または第３の手順から取得して引数cursorに設定する。この引数cursorは、第２の手順において、保持データＤのどの位置を辿っているかを表している。
ここでは、上述の中間手順１に示すとおり、第２の手順を呼び出すときの引数に保持データＤの根頂点２９０１が設定されている。
図２６に示すように、第２の手順を開始すると、ステップ４０に進み、保持データＤにおいて、引数cursorを根とする木構造を形成する部分参照情報をＴに設定する。ここでは引数cursorに保持データＤの根頂点２９０１が設定されているので、部分参照情報Ｔは保持データＤ（ここでは相互参照関係データ２９００）と一致している。 Next, the second procedure will be described. In the second procedure, the vertex of the retained data D is acquired from the first procedure or the third procedure and set as the argument cursor. This argument “cursor” represents which position of the retained data D is traced in the second procedure.
Here, as shown in the intermediate procedure 1 described above, the root vertex 2901 of the retained data D is set as an argument when the second procedure is called.
As shown in FIG. 26, when the second procedure is started, the process proceeds to step 40, and in the retained data D, the partial reference information that forms the tree structure rooted in the argument cursor is set to T. Here, since the root vertex 2901 of the retained data D is set in the argument “cursor”, the partial reference information T matches the retained data D (here, the cross-reference relationship data 2900).

次に、ステップ４１に進み、引数cursorの値を変数Ｒに設定する。この変数Ｒは、保持データＤの部分参照情報Ｔの根頂点を示している。
続いてステップ４２に進み、変数Ｒの示す頂点を起点として、深さ優先探索で部分参照情報Ｔにおける木構造を形成する頂点を探索し、最初に遭遇する（葉）頂点を引数cursorに設定する。この段階で、引数cursorに頂点２９０４が設定される。変数Ｒは頂点２９０１を示す。 Next, proceeding to step 41, the value of the argument cursor is set to the variable R. This variable R indicates the root vertex of the partial reference information T of the retained data D.
Subsequently, the process proceeds to step 42, where the vertex forming the tree structure in the partial reference information T is searched by the depth-first search using the vertex indicated by the variable R as the starting point, and the first (leaf) vertex encountered is set as the argument cursor. . At this stage, the vertex 2904 is set in the argument cursor. The variable R indicates the vertex 2901.

次に、ステップ４３では引数cursorに親となる頂点があるか否かを判断する。ここでは引数cursorが示す頂点２９０４は頂点２９０３を示す親参照を有しているため、ステップ４４に進む。このとき、テンプレート生成装置１６０１はテンプレート候補検出手段１６０２により、引数cursorが示す頂点と引数cursorの親参照が示す頂点とを有するパターンとなる候補パターンを設定し、この候補パターンが保持データＤに存在するか否かを検出する。ここでは、頂点２９０４と頂点２９０３とを有するパターンが候補パターンに設定される。
そして、テンプレート生成装置１６０１は、ステップ４４において、引数cursorとその親参照が示す頂点とで構成される候補パターンを第１の引数に設定し、変数Ｒ、引数cursorをそれぞれ第２の引数、第３の引数に設定して、第３の手順を呼び出す。なお、第３の手順では、第１の引数をＣ、第２の引数、第３の引数をそれぞれＲ、cursorとして表現する。 Next, in step 43, it is determined whether or not there is a parent vertex in the argument cursor. Here, since the vertex 2904 indicated by the argument “cursor” has a parent reference indicating the vertex 2903, the process proceeds to step 44. At this time, the template generation device 1601 sets a candidate pattern to be a pattern having the vertex indicated by the argument cursor and the vertex indicated by the parent reference of the argument cursor by the template candidate detection unit 1602, and this candidate pattern exists in the retained data D Whether or not to do is detected. Here, a pattern having a vertex 2904 and a vertex 2903 is set as a candidate pattern.
In step 44, the template generation device 1601 sets a candidate pattern composed of the argument cursor and the vertex indicated by the parent reference as the first argument, and sets the variable R and the argument cursor as the second argument and the second argument, respectively. Set the third argument to call the third procedure. In the third procedure, the first argument is expressed as C, the second argument, and the third argument as R and cursor, respectively.

（中間手順２）
テンプレート生成装置１６０１は第３の手順を呼び出すと、図２７に示すステップ５０に進み、第３の引数cursorをデータＳに設定して保存する。なお、このステップ５０は必ずしも必要ではないが、本実施の形態では、データＳを後述するステップ５７におけるテンプレート候補検索の中断条件の判定に利用している。
そして、テンプレート生成装置１６０１はステップ５１に進み、第１の引数Ｃが示すパターン（候補パターンＣ）を有するテンプレート候補がテンプレート候補蓄積手段１６０５に蓄積されているか否かを判断する。ここでは、候補パターンＣを有するテンプレート候補はテンプレート候補蓄積手段１６０５に蓄積されていないので、テンプレート生成装置１６０１はステップ５３に進み、候補パターンＣを有するテンプレート候補を通過回数（出現回数）に“１”を設定してテンプレート候補蓄積手段１６０５に保存する。 (Intermediate procedure 2)
When the template generating apparatus 1601 calls the third procedure, the process proceeds to step 50 shown in FIG. 27, in which the third argument cursor is set in the data S and stored. Although this step 50 is not necessarily required, in the present embodiment, the data S is used for determining the interruption condition for the template candidate search in step 57 described later.
Then, the template generation device 1601 proceeds to step 51 and determines whether or not a template candidate having the pattern (candidate pattern C) indicated by the first argument C is accumulated in the template candidate accumulation unit 1605. Here, since the template candidate having the candidate pattern C is not accumulated in the template candidate accumulating unit 1605, the template generating apparatus 1601 proceeds to step 53 and sets the template candidate having the candidate pattern C to “1” as the number of passages (appearance number). "Is set and stored in the template candidate storage means 1605.

なお、テンプレート候補蓄積手段１６０５へのテンプレート候補の蓄積方法は、様々な方法があるが、本実施の形態では、図２８に示すような形式で木構造を構築しながら蓄積する。
ここで、テンプレート候補の構成について図２８を用いて説明する。図２８に示すテンプレート候補３４０１は、パターン情報３４０２と、各テンプレート候補を一意に識別するための識別ＩＤ３４０３と、通過回数３４０５と、置換回数３４０６とを有している。通過回数３４０５は、テンプレート生成装置１６０１に入力された相互参照関係データ内における置換可能なテンプレート候補を探索する過程でカウントされるテンプレート候補により置換可能な延べ回数を示している。この通過回数３４０５は、テンプレートを蓄積するときに“１”に初期化されている。置換回数３４０６は、実際にそのテンプレートを適用した回数を示している。この置換回数３４０６は、テンプレートを蓄積するときに“０”に初期化されている。 There are various methods for storing the template candidates in the template candidate storage unit 1605. In this embodiment, the template candidates are stored while building a tree structure in the format shown in FIG.
Here, the configuration of the template candidates will be described with reference to FIG. A template candidate 3401 shown in FIG. 28 has pattern information 3402, an identification ID 3403 for uniquely identifying each template candidate, a pass count 3405, and a replacement count 3406. The number of passes 3405 indicates the total number of times that can be replaced by the template candidates counted in the process of searching for replaceable template candidates in the cross-reference relationship data input to the template generation device 1601. This number of passes 3405 is initialized to “1” when the template is stored. The number of replacements 3406 indicates the number of times that the template is actually applied. The number of replacements 3406 is initialized to “0” when the template is stored.

テンプレート生成装置１６０１は、テンプレート候補蓄積手段１６０５内にテンプレート候補を複数蓄積している。本実施の形態では、図２９に示すように、複数のテンプレート候補を各テンプレート候補が頂点となる木構造で蓄積している。図２９において、各テンプレート候補間の矢印は、あるテンプレート候補に対し頂点を一つ加えることにより、その矢印が示すテンプレート候補と同じ構造になることを示している。矢印に添えられたラベルＮ（Ｍ）は頂点Ｍに対し、参照Ｎで頂点を加えることを意味している。参照Ｎは親参照ｐ，子参照ｃ，次兄弟参照ｎｓの３種類で記述している。なお、テンプレート候補の通過回数および置換回数は刻々と変化するため、図２９では図示を省略している。 The template generation device 1601 stores a plurality of template candidates in the template candidate storage unit 1605. In the present embodiment, as shown in FIG. 29, a plurality of template candidates are accumulated in a tree structure in which each template candidate is a vertex. In FIG. 29, the arrows between the template candidates indicate that, by adding one vertex to a certain template candidate, the same structure as the template candidate indicated by the arrow is obtained. A label N (M) attached to the arrow means that a vertex is added at the reference N to the vertex M. Reference N is described in three types: parent reference p, child reference c, and next sibling reference ns. Since the number of passes and the number of replacements of the template candidate change every moment, the illustration is omitted in FIG.

テンプレート生成装置１６０１は、初めて第３の手順を呼び出したときはステップ５３に進む。そして、図２９に示すように、候補パターンＣがテンプレート候補３５０１のパターン情報と一致することから、テンプレート候補３５０１をテンプレート候補蓄積手段１６０５に追加する。この時、候補パターンＣは第３の引数cursorが示す保持データＤの頂点において置換可能であることを示すため、テンプレート候補３５０１の通過回数は“１”に設定される。 When the template generation device 1601 calls the third procedure for the first time, it proceeds to step 53. Then, as shown in FIG. 29, since the candidate pattern C matches the pattern information of the template candidate 3501, the template candidate 3501 is added to the template candidate accumulation unit 1605. At this time, since the candidate pattern C can be replaced at the vertex of the retained data D indicated by the third argument “cursor”, the number of times the template candidate 3501 is passed is set to “1”.

テンプレート生成装置１６０１はステップ５４に進むと、候補パターンＣを有するテンプレート候補に対する停止条件の成立可否を判断し、停止条件に基づき、テンプレート候補をこれ以上生成するか否かを判定する。停止条件は複数考えられるが、本実施の形態では、頂点のうち、第２の引数Ｒに対応する頂点（頂点Ｒ）が木構造の根頂点になる部分参照情報におけるすべての頂点がテンプレート候補により置換されたことを停止条件とすることができる。これは、第３の引数cursorが頂点Ｒ以下の部分木の最後の頂点であるか否かを確認することにより判定することができる（停止条件１）。また、テンプレート候補蓄積手段１６０５の記憶容量が限られているので、テンプレート候補の検出をテンプレート候補内の頂点数や木の高さ等で止めることを停止条件にしてもよい（停止条件２）。この停止条件２は、テンプレート候補の候補パターンＣに含まれる頂点数が５以上であるか否かにすることができるが、これには限られない。
（中間手順３）
ここでは、候補パターンＣはテンプレート候補３５０１のパターン情報と一致し、テンプレート候補内の頂点数は“２”である。また、第２の引数Ｒは頂点２９０１を示すが、
候補パターンＣは第２の引数Ｒ以下の頂点をすべて含んでいないため、停止条件２（テンプレート候補の候補パターンＣに含まれる頂点数が５以上であること）および停止条件１のいずれの停止条件も成立しない。そのため、テンプレート生成装置１６０１の処理はステップ５５に進む。 When the template generating apparatus 1601 proceeds to step 54, it determines whether or not a stop condition is satisfied for the template candidate having the candidate pattern C, and determines whether or not to generate more template candidates based on the stop condition. Although multiple stop conditions can be considered, in this embodiment, all vertices in the partial reference information in which the vertex corresponding to the second argument R (vertex R) is the root vertex of the tree structure among the vertices are template candidates. The replacement condition can be a stop condition. This can be determined by checking whether or not the third argument cursor is the last vertex of the subtree below the vertex R (stop condition 1). Further, since the storage capacity of the template candidate accumulation unit 1605 is limited, it may be set as a stop condition to stop the detection of the template candidate by the number of vertices in the template candidate, the height of the tree, or the like (stop condition 2). The stop condition 2 can be determined as to whether or not the number of vertices included in the candidate pattern C of the template candidate is 5 or more, but is not limited thereto.
(Intermediate procedure 3)
Here, the candidate pattern C matches the pattern information of the template candidate 3501, and the number of vertices in the template candidate is “2”. The second argument R indicates the vertex 2901,
Since candidate pattern C does not include all vertices less than or equal to second argument R, either stop condition 2 (the number of vertices included in template candidate pattern C is 5 or more) and stop condition 1 Is not established. Therefore, the process of the template generation device 1601 proceeds to step 55.

テンプレート生成装置１６０１はステップ５５で、第３の引数cursorより、保持データＤを深さ優先探索でトレースし、そのときに遭遇する次の頂点を候補パターンＣに追加して、追加後のパターンを候補パターンＣに設定する。これにより、テンプレート生成装置１６０１は候補パターンＣに頂点を一つ加えて新たなテンプレート候補検出を行う。これにより、本発明における拡張候補検出が行われる。ここでは、第３の引数cursorは頂点２９０４を示している。また、候補パターンＣはテンプレート候補３５０１のパターンと形状が一致し、保持データＤ上では頂点２９０３および頂点２９０４に一致している。そのため、深さ優先探索により、第３の引数cursorの示す頂点（頂点２９０４）の次の頂点である頂点２９０５を候補パターンＣに加えた構造、すなわち図２９に示すテンプレート候補３５０２のパターンと同一のパターンがテンプレート候補検出手段１６０２により検出され、その検出されたパターンが新たに候補パターンＣに設定される。 In step 55, the template generation device 1601 traces the retained data D by the depth-first search from the third argument “cursor”, adds the next vertex encountered at that time to the candidate pattern C, and adds the added pattern to the candidate pattern C. Candidate pattern C is set. As a result, the template generation device 1601 adds a vertex to the candidate pattern C and detects a new template candidate. Thereby, extended candidate detection in the present invention is performed. Here, the third argument “cursor” indicates the vertex 2904. The candidate pattern C matches the pattern of the template candidate 3501 and matches the vertex 2903 and the vertex 2904 on the retained data D. Therefore, by the depth-first search, the same structure as that of the template candidate 3502 shown in FIG. 29, that is, the structure obtained by adding the vertex 2905, which is the next vertex of the vertex (vertex 2904) indicated by the third argument cursor, to the candidate pattern C is obtained. A pattern is detected by the template candidate detection unit 1602 and the detected pattern is newly set as a candidate pattern C.

さらにステップ５６に進み、保持データＤにおける新たに追加された頂点を第３の引数cursorに設定する。ここでは、第３の引数cursorに頂点２９０５が設定される。
次に、ステップ５７に進み、候補パターンＣを有するテンプレート候補に対する中断条件の成立可否を判断する。中断条件が成立すればステップ５８に進み、未成立ならばステップ５１に戻る。このステップ５７の中断条件の判定と、ステップ５８の処理を加えることにより、テンプレートの大きさを制限し、出現頻度の低いテンプレートの生成を抑制することができる。中断条件としては、保持データＤにおける第３の引数cursorが示す頂点の高さと、データＳが示す頂点（頂点Ｓ）の保持データＤにおける高さの差がＮ以上であるとか、候補パターンＣに含まれる頂点の数がＮ以下である等、複数の方法やその組み合わせを用いることが可能である。 Further, the process proceeds to step 56, where the newly added vertex in the retained data D is set as the third argument cursor. Here, the vertex 2905 is set in the third argument “cursor”.
Next, proceeding to step 57, it is determined whether or not the interruption condition is satisfied for the template candidate having the candidate pattern C. If the interruption condition is satisfied, the process proceeds to step 58, and if not satisfied, the process returns to step 51. By adding the determination of the interruption condition in step 57 and the processing in step 58, it is possible to limit the size of the template and suppress the generation of a template with a low appearance frequency. As the interruption condition, the difference between the height of the vertex indicated by the third argument “cursor” in the retained data D and the height of the retained data D of the vertex (vertex S) indicated by the data S is N or more. It is possible to use a plurality of methods and combinations thereof, such as the number of vertices included is N or less.

（中間手順４）
本実施の形態では、ステップ５７の中断条件は次のようにしている。すなわち、保持データＤにおける第３の引数cursorが示す頂点の高さと、データＳが示す頂点の高さとが同じであり、かつ第３の引数cursorが示す頂点が子頂点を有していることを中断条件に設定している。
この中断条件が成立するとステップ５８に進み、第３の引数cursorを引数に設定して上述した第２の手順を呼び出す。第２の手順が終了するとステップ５５に戻る。このように中断条件が成立するとステップ５８に進み、第３の引数cursorを起点として別なテンプレート候補（候補パターン）の検出を繰返し行う。これにより、テンプレート候補検出手段１６０２は、検出するテンプレート候補の大きさを制限している。 (Intermediate procedure 4)
In the present embodiment, the interruption condition in step 57 is as follows. That is, the height of the vertex indicated by the third argument “cursor” in the retained data D is the same as the height of the vertex indicated by the data S, and the vertex indicated by the third argument “cursor” has a child vertex. The interruption condition is set.
When this interruption condition is satisfied, the process proceeds to step 58, in which the third argument “cursor” is set as an argument and the above-described second procedure is called. When the second procedure ends, the process returns to step 55. When the interruption condition is satisfied in this way, the process proceeds to step 58, and another template candidate (candidate pattern) is repeatedly detected starting from the third argument cursor. As a result, the template candidate detection unit 1602 limits the size of the template candidates to be detected.

ここでは、図２３に示すように、第３の引数cursorが頂点２９０５を示し、頂点２９０５は子頂点を有していない。そのため、中断条件は成立せず、テンプレート生成装置１６０１はステップ５７からステップ５１に戻る。この時点での候補パターンＣは図２９に示すテンプレート候補３５０２のパターンと一致している。また、テンプレート候補蓄積手段１６０５には、テンプレート候補３５０１のみが蓄積されている。そのため、テンプレート生成装置１６０１は、ステップ５１からステップ５３に進み、テンプレート候補検出手段１６０２が候補パターンＣを有するテンプレート候補３５０２を出現回数に“１”を設定して、テンプレート候補蓄積手段１６０５に追加する。また、テンプレート候補検出手段１６０２は、この追加の際、テンプレート候補３５０１とテンプレート候補３５０２の間に矢印を設け、頂点３６０２の次兄弟参照として、パターンに頂点を追加したことを表すラベルｎｓ（３６０２）を付する。 Here, as shown in FIG. 23, the third argument “cursor” indicates a vertex 2905, and the vertex 2905 has no child vertex. Therefore, the interruption condition is not satisfied, and the template generation device 1601 returns from step 57 to step 51. The candidate pattern C at this point matches the pattern of the template candidate 3502 shown in FIG. Further, only the template candidate 3501 is stored in the template candidate storage unit 1605. Therefore, the template generating apparatus 1601 proceeds from step 51 to step 53, and the template candidate detecting unit 1602 sets the template candidate 3502 having the candidate pattern C to “1” as the number of appearances and adds it to the template candidate accumulating unit 1605. . In addition, the template candidate detection unit 1602 provides an arrow between the template candidate 3501 and the template candidate 3502 at the time of this addition, and a label ns (3602) indicating that a vertex has been added to the pattern as the next sibling reference of the vertex 3602 Is attached.

そして、テンプレート生成装置１６０１はステップ５４からステップ５７までを上記同様に繰り返し実行する。このとき、テンプレート候補３５０３のパターンと同様の構造を候補パターンＣとして取得してステップ５１からステップ５３までを実行する。これにより、テンプレート候補蓄積手段１６０５にテンプレート候補３５０３が追加され、テンプレート候補３５０２とテンプレート候補３５０３の間に矢印が設定され、ラベルｎｓ（３６０３）が付される。 Then, the template generation device 1601 repeatedly executes step 54 to step 57 as described above. At this time, a structure similar to the pattern of the template candidate 3503 is acquired as the candidate pattern C, and steps 51 to 53 are executed. As a result, a template candidate 3503 is added to the template candidate storage unit 1605, an arrow is set between the template candidate 3502 and the template candidate 3503, and a label ns (3603) is attached.

この段階におけるテンプレート候補蓄積手段１６０５には、図２９に示す各テンプレート候補のうち、テンプレート候補３５０１、テンプレート候補３５０２、テンプレート候補３５０３のみが蓄積されている。また、各テンプレート候補３５０１、テンプレート候補３５０２、テンプレート候補３５０３のそれぞれの通過回数はいずれも“１”である。
（中間手順５）
そして、ステップ５４〜５６を実行してステップ５７に進むと、第３の引数cursorが頂点２９０６を示し、中間手順４に示す中断条件が成立する。そこで、テンプレート生成装置１６０１はテンプレート検出を中断してステップ５８に進み、第３の引数cursorを引数に設定して、第２の手順を呼び出す。 In the template candidate accumulation means 1605 at this stage, only the template candidate 3501, the template candidate 3502, and the template candidate 3503 are accumulated among the template candidates shown in FIG. Further, the number of times each of the template candidates 3501, the template candidates 3502, and the template candidates 3503 is “1”.
(Intermediate procedure 5)
Then, after executing Steps 54 to 56 and proceeding to Step 57, the third argument “cursor” indicates the vertex 2906, and the interruption condition shown in the intermediate procedure 4 is satisfied. Therefore, the template generation device 1601 interrupts template detection and proceeds to step 58, sets the third argument cursor as an argument, and calls the second procedure.

（中間手順６）
ここでは、引数に頂点２９０６が設定されて第２の手順が呼び出される。これにより、頂点２９０６を根とする保持データＤの部分木に対して、テンプレート候補の探索が行われる。
テンプレート生成装置１６０１は、第２の手順を呼び出すと、図２６に示すステップ４０からステップ４３までを上記同様に実行する。そして、ステップ４４に進み、テンプレート候補３５０１のパターンと同様の構造を第１の引数に設定し、頂点２９０６を第２の引数に設定し、頂点２９０７を第３の引数に設定して第３の手順を呼び出す。 (Intermediate procedure 6)
Here, the vertex 2906 is set as an argument, and the second procedure is called. Thereby, a template candidate is searched for the subtree of the retained data D having the vertex 2906 as a root.
When the template generation apparatus 1601 calls the second procedure, it executes steps 40 to 43 shown in FIG. 26 in the same manner as described above. Then, the process proceeds to step 44, in which the same structure as the pattern of the template candidate 3501 is set as the first argument, the vertex 2906 is set as the second argument, and the vertex 2907 is set as the third argument. Call the procedure.

そして、第３の手順の呼び出しにより、テンプレート生成装置１６０１は、候補パターンＣとしてテンプレート候補３５０１に含まれるパターンと同様のパターンを取得する。また、第２の引数Ｒとして頂点２９０６を取得し、第３の引数cursorとして、頂点２９０７を取得する。
ここで、候補パターンＣは既にテンプレート候補蓄積手段１６０５にテンプレート候補３５０１として蓄積されていることから、テンプレート生成装置１６０１は、図２７におけるステップ５１からステップ５２に進み、テンプレート候補３５０１の通過回数に“１”を加算して“２”にする。 Then, by calling the third procedure, the template generation device 1601 acquires a pattern similar to the pattern included in the template candidate 3501 as the candidate pattern C. Also, the vertex 2906 is acquired as the second argument R, and the vertex 2907 is acquired as the third argument cursor.
Here, since the candidate pattern C has already been stored as the template candidate 3501 in the template candidate storage unit 1605, the template generation apparatus 1601 proceeds from step 51 to step 52 in FIG. Add “1” to “2”.

また、テンプレート生成装置１６０１は、ステップ５４からステップ５７までの処理を実行するとステップ５１に戻るが、このとき、候補パターンＣはテンプレート候補３５０２のテンプレート候補と合致し、第３の引数cursorは頂点２９０８を示している。このとき、テンプレート生成装置１６０１は、テンプレート候補３５０２の通過回数に“１”を加算して、“２”にする。
この時点で、第３の引数cursorが頂点２９０８を示す。このことは、中間手順３に示した上述の停止条件１、すなわち引数cursorが頂点Ｒ以下の部分木の最後の頂点であることを満たしている。そのため、テンプレート生成装置１６０１の処理はステップ５４からステップ５９に進む。 The template generation device 1601 returns to step 51 when the processing from step 54 to step 57 is executed. At this time, the candidate pattern C matches the template candidate of the template candidate 3502, and the third argument cursor is the vertex 2908. Is shown. At this time, the template generation device 1601 adds “1” to the number of times the template candidate 3502 has passed to make “2”.
At this point, the third argument cursor indicates vertex 2908. This satisfies the above-described stop condition 1 shown in the intermediate procedure 3, that is, the argument “cursor” is the last vertex of the subtree with the vertex R or less. Therefore, the process of the template generation device 1601 proceeds from step 54 to step 59.

（中間手順７）
テンプレート生成装置１６０１は、ステップ５９に進むと、候補パターンＣを有するテンプレート候補による置換がそのまま可能であるか否かを判定する。置換可能と判定したときはステップ６０に進み、そうでなければステップ６２に進む。
ここで、候補パターンＣによる置換が可能であるか否かは、候補パターンＣにおける根頂点Ｘの子頂点の数と、候補パターンＣを保持データＤに合致させたとき、根頂点Ｘに合致する保持データＤにおける頂点Ｙの子頂点の数とが一致するか否かによって判定することができる（以下「置換判定基準」という）。ここでは、候補パターンＣは、テンプレート候補３５０２のパターンと同様の構造を有し、その根頂点３６０１の子頂点は、頂点３６０２と頂点３６０３の２つであり、保持データＤにおける頂点２９０６の子頂点は頂点２９０７と頂点２９０８の２つである。したがって、根頂点３６０１の子頂点数は、頂点２９０６における子頂点数と一致する。そのため、テンプレート生成装置１６０１はステップ５９からステップ６０に処理を進める。 (Intermediate procedure 7)
When the template generating apparatus 1601 proceeds to step 59, it determines whether or not replacement with a template candidate having the candidate pattern C is possible as it is. If it is determined that the replacement is possible, the process proceeds to step 60; otherwise, the process proceeds to step 62.
Here, whether or not replacement with the candidate pattern C is possible matches the number of child vertices of the root vertex X in the candidate pattern C and the root vertex X when the candidate pattern C matches the retained data D. The determination can be made based on whether or not the number of child vertices of the vertex Y in the retained data D matches (hereinafter referred to as “replacement determination criterion”). Here, the candidate pattern C has the same structure as the pattern of the template candidate 3502, and the root vertex 3601 has two child vertices, the vertex 3602 and the vertex 3603, and the child vertex of the vertex 2906 in the retained data D Are two of vertex 2907 and vertex 2908. Therefore, the number of child vertices of the root vertex 3601 matches the number of child vertices at the vertex 2906. Therefore, the template generation device 1601 advances the process from step 59 to step 60.

（中間手順８）
テンプレート生成装置１６０１は、ステップ６０に進むと、保持データＤにおける該当箇所をテンプレート候補により置換して、その適用箇所を頂点で表現し、その結果得られるデータを入力データ取得手段１６０４に入力する。すなわち、テンプレート置換手段１６０３により、保持データＤの頂点を候補パターンＣと同様の構造を有するテンプレート候補で置換し、その置換した結果を入力データ取得手段１６０４に入力する。
テンプレート候補による置換は、第１の実施の形態におけるテンプレートによる置換と同様にテンプレート実体を用いて行うが、本実施の形態では、説明を簡単にするため、便宜上、「候補置換」と表現する。 (Intermediate procedure 8)
When the template generating apparatus 1601 proceeds to step 60, the corresponding part in the retained data D is replaced with the template candidate, the applied part is represented by a vertex, and the resulting data is input to the input data acquiring unit 1604. That is, the template replacement unit 1603 replaces the vertex of the retained data D with a template candidate having the same structure as the candidate pattern C, and inputs the replacement result to the input data acquisition unit 1604.
The replacement by the template candidate is performed using the template entity in the same manner as the replacement by the template in the first embodiment. However, in this embodiment, for the sake of simplicity, it is expressed as “candidate replacement”.

ステップ６２に進むと、テンプレート候補蓄積手段１６０５に蓄積されたテンプレート候補の中で、最新のテンプレート候補から木構造を上方向に辿り、最初に遭遇する親参照または子参照接続のテンプレート候補を用いて保持データＤを置換して頂点で表現し、その結果得られるデータを入力データ取得手段１６０４に入力する。
入力データ取得手段１６０４は置換された保持データＤが頂点一つで表されていない限り、さらにテンプレートの適用が可能と判断し、テンプレート候補検出手段１６０２に保持データＤを入力してテンプレート候補の検出を継続する。このとき、テンプレート候補蓄積手段１６０５に蓄積された候補パターンＣと合致するテンプレート候補、すなわち、テンプレート候補３５０２の置換回数に“１”を加算して“１”とする。 Proceeding to step 62, the template candidate stored in the template candidate storage means 1605 is traced upward from the latest template candidate, and the template candidate of the first parent reference or child reference connection encountered is used. The retained data D is replaced and expressed by vertices, and the data obtained as a result is input to the input data acquisition means 1604.
The input data acquisition unit 1604 determines that the template can be further applied unless the replacement holding data D is represented by one vertex, and inputs the holding data D to the template candidate detection unit 1602 to detect the template candidate. Continue. At this time, “1” is added to the number of replacements of the template candidate that matches the candidate pattern C stored in the template candidate storage unit 1605, that is, the template candidate 3502 to be “1”.

置換結果は図２３に示す相互参照関係データ２９００から、頂点２９０７、頂点２９０８を除いて、第１の実施の形態と同様に、頂点２９０６をテンプレート候補３５０２による置換を表す頂点（指示付頂点）で表現することができる。また、テンプレート候補３５０２による置換のため、引数cursorが参照していた頂点２９０８は存在しなくなる。そのため、引数cursorは置換後の頂点２９０６を参照するように設定する。 23. From the cross-reference relationship data 2900 shown in FIG. 23, the replacement results are obtained by excluding the vertices 2907 and 2908, and the vertices 2906 are vertices (vertices with instructions) representing replacement with the template candidates 3502, as in the first embodiment. Can be expressed. Also, because of the replacement by the template candidate 3502, the vertex 2908 referred to by the argument cursor no longer exists. Therefore, the argument cursor is set so as to refer to the vertex 2906 after replacement.

ステップ６１では、第３の引数cursorと第２の引数Ｒとが異なるか否かを判定する。ここで、両者が一致していると第３の手順が終了し、異なればステップ６３に進む。上述の中間手順８に示すようにしてテンプレート候補により置換すると第２の引数Ｒ以下の頂点がすべて含まれる。そのため、第３の引数cursorと第２の引数Ｒとは、ともに頂点２９０６を参照し、一致することになる。よって、テンプレート生成装置１６０１はステップ６１からステップ６３を実行することなく第３の手順を終了する。
一方、第３の手順の呼び出しは、上述の中間手順６に示すように、第２の手順におけるステップ４４から再帰的に行われているが、第２の手順におけるステップ４４の処理が修了した後、その第２の手順の呼び出し元となる第３の手順のステップ５８、すなわち、上述の中間手順５で中断したテンプレート検出を再開することになる。
このとき、復帰後における値はそれぞれ候補パターンＣはテンプレート候補３５０３と一致し、第３の引数cursorは頂点２９０６を示し、第２の引数Ｒは頂点２９０１を示し、データＳは頂点２９０４を示している。 In step 61, it is determined whether or not the third argument cursor and the second argument R are different. Here, if the two match, the third procedure ends, and if different, the process proceeds to step 63. If the template candidates are substituted as shown in the intermediate procedure 8 above, all the vertices below the second argument R are included. Therefore, the third argument “cursor” and the second argument “R” both coincide with each other with reference to the vertex 2906. Therefore, the template generation device 1601 ends the third procedure without executing steps 61 to 63.
On the other hand, the third procedure is called up recursively from step 44 in the second procedure as shown in the intermediate procedure 6 described above, but after the processing in step 44 in the second procedure is completed. Then, step 58 of the third procedure that is the caller of the second procedure, that is, the template detection interrupted in the intermediate procedure 5 is resumed.
At this time, the values after the return match the candidate pattern C with the template candidate 3503, the third argument cursor indicates the vertex 2906, the second argument R indicates the vertex 2901, and the data S indicates the vertex 2904. Yes.

テンプレート生成装置１６０１は、ステップ５５に戻ると、テンプレート候補検出手段１６０２により、候補パターンＣに頂点を追加し、新たなテンプレート候補を検出しようとする。しかし、第３の引数cursorが示す頂点２９０６は、上述のとおり既にテンプレート候補による置換が行われ、子参照も、次兄弟参照も有していないことから、頂点の追加は第３の引数cursorにおける親頂点の更に親参照方向に向かって行われる。この頂点の追加により、候補パターンＣはテンプレート候補３５０５のテンプレート候補と同様の構造を取得し、ステップ５６により第３の引数cursorは頂点２９０２に移動する。 When the template generation device 1601 returns to step 55, the template candidate detection unit 1602 adds a vertex to the candidate pattern C and tries to detect a new template candidate. However, since the vertex 2906 indicated by the third argument “cursor” has already been replaced by the template candidate as described above, and has neither a child reference nor a next sibling reference, the addition of the vertex is performed in the third argument “cursor”. This is performed further toward the parent reference direction of the parent vertex. By adding this vertex, the candidate pattern C obtains the same structure as the template candidate of the template candidate 3505, and the third argument cursor is moved to the vertex 2902 in step 56.

第３の引数cursorが示す頂点２９０２は、上述の中間手順２で保存され、始点となるデータＳの示す頂点（頂点Ｓ）、すなわち、頂点２９０４と高さが異なるため、中間手順４に示した中断条件が成立しない。よって、テンプレート生成装置１６０１の処理は、ステップ５７からステップ５１に戻り、さらにステップ５３に進み、テンプレート候補３５０５をテンプレート候補蓄積手段１６０５に出現回数を“１”にして蓄積する。
ステップ５４に進むと、候補パターンＣがテンプレート候補３５０５のパターンと一致し、頂点数が“５”であり、中間手順３に示した停止条件が成立している。よって、テンプレート生成装置１６０１の処理はステップ５９に進む。 The vertex 2902 indicated by the third argument “cursor” is stored in the above-described intermediate procedure 2 and is shown in the intermediate procedure 4 because the vertex (vertex S) indicated by the data S as the starting point, that is, the vertex 2904 is different in height. The interruption condition is not satisfied. Therefore, the processing of the template generation device 1601 returns from step 57 to step 51, and further proceeds to step 53, where the template candidate 3505 is stored in the template candidate storage unit 1605 with the number of appearances set to “1”.
In step 54, the candidate pattern C matches the pattern of the template candidate 3505, the number of vertices is “5”, and the stop condition shown in the intermediate procedure 3 is satisfied. Therefore, the processing of the template generation device 1601 proceeds to step 59.

ステップ５９では、候補パターンＣを有するテンプレート候補による置換がそのまま可能か否かを判断する。ここでは、上述した中間手順７に示した置換判定基準を適用し、テンプレート候補３５０５により置換可能であると判断する。
さらに、テンプレート生成装置１６０１は続くステップ６０において、テンプレート置換手段１６０３を用いてテンプレート候補３５０５により保持データＤの頂点を置換し、その置換結果を入力データ取得手段１６０４に入力する。 In step 59, it is determined whether or not replacement with a template candidate having the candidate pattern C is possible as it is. Here, the replacement criterion shown in the intermediate procedure 7 described above is applied, and it is determined that the template candidate 3505 can be replaced.
Further, in the subsequent step 60, the template generation device 1601 replaces the vertex of the retained data D with the template candidate 3505 using the template replacement unit 1603, and inputs the replacement result to the input data acquisition unit 1604.

入力データ取得手段１６０４は置換された保持データＤが頂点１つで表されていない限り、更にテンプレートの適用が可能と判断し、テンプレート候補検出手段１６０２に保持データＤを入力し、テンプレート候補の検出を継続する。
この時点で、候補パターンＣはテンプレート候補３５０５のパターンと一致するため、テンプレート候補３５０５の通過回数に“１”を加算して、“１”とする。 The input data acquisition unit 1604 determines that the template can be further applied unless the replacement holding data D is represented by one vertex, and inputs the holding data D to the template candidate detection unit 1602 to detect the template candidate. Continue.
At this time, since the candidate pattern C matches the pattern of the template candidate 3505, “1” is added to the number of times the template candidate 3505 has passed to make “1”.

この時、第３の引数cursorは置換後の頂点２９０２を示すが、第２の引数Ｒは頂点２９０１を示す。そのため、ステップ６１において、第３の引数cursorと第２の引数Ｒとが異なると判断され、ステップ６３に進み、第２の引数Ｒを引数として第２の手順が呼び出される。これは既にテンプレート候補により置換された元の保持データＤに対し、テンプレート候補の検出を試みることを意味している。
本実施の形態のこれまでの説明の中で、保持データＤに対しては、既にテンプレート置換が２回行われており、この段階で保持データＤは、図２３におけるエリア２９１２および２９１５に属する頂点のみを有している。 At this time, the third argument cursor indicates the vertex 2902 after replacement, but the second argument R indicates the vertex 2901. Therefore, in step 61, it is determined that the third argument “cursor” and the second argument R are different, and the process proceeds to step 63 where the second procedure is called using the second argument R as an argument. This means that an attempt is made to detect a template candidate for the original retained data D that has already been replaced by the template candidate.
In the description so far of the present embodiment, template replacement has already been performed twice for the retained data D. At this stage, the retained data D is a vertex belonging to areas 2912 and 2915 in FIG. Have only.

テンプレート生成装置１６０１はこれまでと同様、第２の手順と第３の手順を再帰的に呼び出しながら、保持データＤのテンプレート候補による置換を更に進め、最終的に第１の手順を終了し、保持データＤを一つの頂点で表す。入力データ取得手段１６０４は、保持データＤが一つの頂点で表されたことを検出すると、入力データ群１６０７から、次の相互参照関係データ３０００を取得する。
図２３に示す相互参照関係データ２９００について、テンプレート候補による置換が終了した段階におけるテンプレート候補蓄積手段１６０５の状況は図３０に示すとおりである。 As in the past, the template generation device 1601 recursively calls the second procedure and the third procedure, further advances the replacement of the retained data D with the template candidate, finally ends the first procedure, and retains it. Data D is represented by one vertex. When the input data acquisition unit 1604 detects that the retained data D is represented by one vertex, the input data acquisition unit 1604 acquires the next cross-reference relationship data 3000 from the input data group 1607.
For the cross-reference relationship data 2900 shown in FIG. 23, the status of the template candidate storage unit 1605 at the stage where the replacement with the template candidate is completed is as shown in FIG.

そして、入力データ取得手段１６０４は、図２３に示す保持データＤの入力が終了したため、図２４に示す相互参照関係データ３０００の入力を開始する。図２４に示す相互参照関係データ３０００についても、図２３に示す相互参照関係データ２９００と同様の手順を繰り返す。ここで、引数cursorは頂点３００６に一致し、変数Ｒが頂点３００８に一致し、候補パターンＣが図２９のテンプレート候補３５１０に一致するとき、候補パターンＣの頂点数が“５”であることから、上述の中間手順３に示す停止条件２が成立する。 Then, the input data acquisition unit 1604 starts inputting the cross-reference relationship data 3000 shown in FIG. 24 since the input of the holding data D shown in FIG. 23 is completed. For the cross-reference relationship data 3000 shown in FIG. 24, the same procedure as the cross-reference relationship data 2900 shown in FIG. 23 is repeated. Here, when the argument cursor matches the vertex 3006, the variable R matches the vertex 3008, and the candidate pattern C matches the template candidate 3510 of FIG. 29, the number of vertices of the candidate pattern C is “5”. The stop condition 2 shown in the intermediate procedure 3 is satisfied.

そのため、テンプレート生成装置１６０１は、処理をステップ５９に進め、テンプレート候補３５１０がそのまま適用可能であるか否かを判定する。図２４に示すように、頂点３００６には、次兄弟頂点３００７が存在するため、テンプレート候補３５１０による置換は不可能であり、ステップ６２に進む。
よって、テンプレート生成装置１６０１はテンプレート候補蓄積手段１６０５の中で、テンプレート候補３５１０の位置から木構造を上方向に辿り、最初に遭遇する親参照または子参照接続のテンプレート候補を発見する。この時、辿った経路上に存在する、テンプレート候補の通過回数から“１”を減算していく。 Therefore, the template generation device 1601 advances the processing to step 59 and determines whether or not the template candidate 3510 can be applied as it is. As shown in FIG. 24, since the next sibling vertex 3007 exists at the vertex 3006, the replacement with the template candidate 3510 is impossible, and the process proceeds to Step 62.
Therefore, the template generation device 1601 traces the tree structure upward from the position of the template candidate 3510 in the template candidate storage unit 1605 and finds the template candidate of the parent reference or child reference connection that is encountered first. At this time, “1” is subtracted from the number of times the candidate template is present on the traced path.

本実施の形態の場合、これは図２９に示すテンプレート候補３５０１となり、そのテンプレート候補３５０１により、保持データＤの頂点を置換する。これ以降の処理は、相互参照関係データ２９００にテンプレート置換を施した場合と同様にして進める。
相互参照関係データ３０００についても、最終的には頂点１つで表現することが可能である。頂点１つで表現された段階で、入力データ取得手段１６０４がこれ以上入力がないことを検知し、テンプレート選択手段１６０６にその旨を入力する。 In the case of the present embodiment, this is a template candidate 3501 shown in FIG. The subsequent processing proceeds in the same manner as when the template replacement is performed on the cross-reference relationship data 2900.
The cross-reference relationship data 3000 can also be finally expressed by one vertex. At the stage expressed by one vertex, the input data acquisition unit 1604 detects that there is no more input, and inputs that fact to the template selection unit 1606.

図２３、図２４に示す相互参照関係データ２９００，３０００の入力が終了した時点におけるテンプレート候補蓄積手段１６０５の状況を図３１に示す。符号３７０１〜３７０９はパターンを示している。
テンプレート選択手段１６０６はテンプレート候補蓄積手段１６０５に蓄積されたテンプレート候補の中からテンプレートを選択して出力する。このとき、テンプレートの選択方法は複数通り考えられるが、本実施の形態では、テンプレート候補の置換回数が“０”でないものを選択するものとする。この時、テンプレート候補からパターンを抽出し、その抽出したパターンをテンプレートにするが、これは、第１の実施の形態に示すようにして、テンプレートを構成すれば容易に可能である。 FIG. 31 shows the status of the template candidate accumulating unit 1605 at the time when the input of the cross-reference relationship data 2900, 3000 shown in FIGS. Reference numerals 3701 to 3709 indicate patterns.
The template selection unit 1606 selects a template from the template candidates stored in the template candidate storage unit 1605 and outputs the template. At this time, a plurality of template selection methods are conceivable. In the present embodiment, a template candidate whose replacement count is not “0” is selected. At this time, a pattern is extracted from the template candidates, and the extracted pattern is used as a template. This is easily possible if a template is configured as shown in the first embodiment.

このようにして、テンプレート生成装置１６０１は、テンプレート候補蓄積手段１６０５から、テンプレート選択手段１６０６を用いてパターン３７０１、パターン３７０２、パターン３７０４、パターン３７０５、パターン３７０９を抽出し、抽出した各パターンをそれぞれテンプレートに変換し、変換された各テンプレートを出力データ１６０８として出力する。
以上に示す手順により、テンプレート生成装置１６０１はテンプレートを生成することができる。 In this way, the template generation device 1601 extracts the pattern 3701, the pattern 3702, the pattern 3704, the pattern 3705, and the pattern 3709 from the template candidate storage unit 1605 by using the template selection unit 1606, and each extracted pattern is a template. And the converted templates are output as output data 1608.
The template generation apparatus 1601 can generate a template by the procedure described above.

（第２の実施の形態）
上述のとおり、本発明ではテンプレートの再帰的な適用によるデータ圧縮、すなわち、テンプレートを適用し、そのテンプレートを新たな頂点として表現することにより、再帰的にデータ圧縮を行っている。このことを考慮すると、テンプレートはそのテンプレート内の頂点がテンプレート外の頂点との間で保有される参照情報を接続情報として保持するが、その接続情報の種類および数はもとの相互参照関係データの頂点が有している参照情報の種類および数（木構造の頂点であれば、親参照１つ、子参照１つ、前兄弟参照１つ、次兄弟参照１つ）を超えないことが望ましい。 (Second Embodiment)
As described above, in the present invention, data compression by recursive application of a template, that is, by applying a template and expressing the template as a new vertex, data compression is performed recursively. Considering this, the template holds the reference information held between the vertices in the template and the vertices outside the template as connection information, but the type and number of the connection information is the original cross-reference relationship data. It is desirable not to exceed the types and number of reference information possessed by vertices (one parent reference, one child reference, one previous sibling reference, and one next sibling reference if the vertex is a tree structure) .

また、テンプレートは接続情報の種類および数が多くなるほど、それを用いた置換による圧縮の効率が悪化するため、テンプレートはできるだけ接続情報が少なくなるように設けられていることが望ましい。
そこで、本実施の形態では、次のようにして、できるだけ接続情報が少なくなるようして、テンプレートを生成している。すなわち、データ圧縮の対象として入力される相互参照関係データ（入力データ）が根付木構造を有するときに、テンプレート生成装置において、入力データからテンプレート候補を検出する際に、入力データのある頂点を根頂点として、その頂点が有する子孫頂点で形成される部分木構造を抽出すべきテンプレート候補のパターン情報に設定している。 In addition, since the efficiency of compression by replacement using a template increases as the type and number of connection information increases, it is desirable that the template be provided so that the connection information is reduced as much as possible.
Therefore, in the present embodiment, the template is generated so as to reduce the connection information as much as possible. That is, when the cross-reference relationship data (input data) input as the data compression target has a rooted tree structure, when detecting a template candidate from the input data in the template generation device, the vertex having the input data is rooted. As a vertex, a partial tree structure formed by descendant vertices of the vertex is set as pattern information of a template candidate to be extracted.

このようにして生成されたテンプレートは、テンプレート外部と相互に接続している頂点が根頂点のみとなる根付木構造になるため、再圧縮を行ったときにテンプレートをひとつの頂点とみなしても、テンプレートの接続情報の種類および数が、もとの入力データにおける各頂点が有する参照情報の種類および数を越えることはない。しかも、テンプレート内の根頂点のみがテンプレートの外部と相互に接続しているため、入力データを再圧縮する際、入力データの木構造で最も深い位置にある頂点（葉頂点）から順にテンプレートに置換していくと、頂点として表現されたテンプレートが必ずそのテンプレートを適用した箇所の上位の頂点よりも深い位置の頂点（葉頂点）となる。そのため、テンプレート候補検出手段１６０２において検出されたテンプレート候補をすべてテンプレートとして利用したときは、最終的に入力データを１つの頂点で表現することが可能となる。 Since the template generated in this way has a rooted tree structure in which the vertex connected to the outside of the template is only the root vertex, even if the template is regarded as one vertex when recompressing, The type and number of connection information of the template does not exceed the type and number of reference information that each vertex has in the original input data. Moreover, since only the root vertices in the template are interconnected with the outside of the template, when recompressing the input data, it is replaced with the template in order from the deepest vertex (leaf vertex) in the tree structure of the input data As a result, the template expressed as a vertex always becomes a vertex (leaf vertex) at a position deeper than the upper vertex of the place where the template is applied. Therefore, when all the template candidates detected by the template candidate detection unit 1602 are used as templates, it is possible to finally represent input data with one vertex.

図１６はテンプレート候補検出手段１６０２における動作手順を示すフローチャートである。テンプレート候補検出手段１６０２は根付木構造を有する入力データを入力し、テンプレート候補として、その入力データの根頂点から数えて最も浅い位置にある葉頂点と同一の深さを有する頂点すべてを含む最小木構造の部分参照情報を検出する。
その葉頂点と同一の深さを有する頂点のうち、子孫頂点を有する頂点があるときは、その頂点を根頂点とし、その頂点が有する子孫頂点で形成される根付木構造の部分参照情報をテンプレート候補検出手段に再度入力することにより、別途テンプレート候補を検出する。 FIG. 16 is a flowchart showing an operation procedure in the template candidate detection unit 1602. The template candidate detection unit 1602 receives input data having a rooted tree structure, and as a template candidate, a minimum tree including all vertices having the same depth as the leaf vertex at the shallowest position counted from the root vertex of the input data Detect partial reference information of structure.
If there are vertices having descendant vertices among the vertices having the same depth as the leaf vertices, the partial reference information of the rooted tree structure formed by the descendant vertices of the vertices is used as a template. By inputting again into the candidate detecting means, a template candidate is separately detected.

ここで、例えば、図１７に示す相互参照関係データ２００８からテンプレート候補を検出するとする。幅優先探索を行って最初に遭遇する葉頂点は頂点２００６である。テンプレート候補検出手段１６０２は、根頂点２００１から頂点２００６の深さまでに存在する頂点群で形成される部分参照情報、すなわち、頂点２００１、頂点２００２、頂点２００６、頂点２００７で形成される根付木構造の部分参照情報をテンプレート候補として抽出し、テンプレート候補蓄積手段１６０５に蓄積する。 Here, for example, a template candidate is detected from the cross-reference relationship data 2008 shown in FIG. The first leaf vertex encountered in the breadth-first search is vertex 2006. The template candidate detection unit 1602 has partial reference information formed by a group of vertices existing from the root vertex 2001 to the depth of the vertex 2006, that is, a rooted tree structure formed by the vertex 2001, the vertex 2002, the vertex 2006, and the vertex 2007. Partial reference information is extracted as a template candidate and stored in the template candidate storage unit 1605.

頂点２００２は根頂点２００１から頂点２００６と同一の深さに存在し、子孫頂点を有する頂点である。この場合は、頂点２００２を根頂点として、これとその子孫頂点とで形成される部分参照情報、すなわち、頂点２００２、頂点２００３、頂点２００４、頂点２００５で形成される根付木構造を有する部分参照情報を引数として、テンプレート候補検出手段１６０２を再帰的に呼び出す。その結果、得られるテンプレート候補は、図１８、図１９に示す第１のテンプレート候補２００９、第２のテンプレート候補２０１０のようになる。 A vertex 2002 is a vertex that exists at the same depth as the vertex 2006 from the root vertex 2001 and has descendant vertices. In this case, the partial reference information formed by the vertex 2002 as a root vertex and its descendant vertices, that is, the partial reference information having a rooted tree structure formed by the vertex 2002, the vertex 2003, the vertex 2004, and the vertex 2005. As an argument, the template candidate detecting means 1602 is recursively called. As a result, the obtained template candidates are the first template candidate 2009 and the second template candidate 2010 shown in FIGS.

本実施の形態では、相互参照関係データ２００８の根頂点から数えて最も浅い位置にある葉頂点を基準にしてテンプレート候補を抽出しているが、深さ優先探索で最初に遭遇する葉頂点を基準にしてその葉頂点より浅いか同じ深さにある頂点すべてを含む最小木構造の部分参照情報をテンプレート候補として検出しても良い。また、根頂点から数えた深さを予め設定しておき、その設定された深さにある頂点をすべて含む最小木構造の部分参照情報をテンプレート候補にしても良い。 In this embodiment, template candidates are extracted based on the leaf vertex at the shallowest position counted from the root vertex of the cross-reference relationship data 2008. However, the first leaf vertex encountered in the depth-first search is used as the reference. Alternatively, partial reference information of a minimum tree structure including all vertices that are shallower than or at the same depth as the leaf vertex may be detected as a template candidate. Further, the depth counted from the root vertex may be set in advance, and the partial reference information of the minimum tree structure including all the vertices at the set depth may be used as the template candidate.

また、相互参照関係データ２００８の木構造のうち最も深い位置にある頂点を基準の頂点とし、その頂点と同一の深さにある頂点すべてを含む最小木構造の部分参照情報をテンプレート候補として検出しても良い。この場合、テンプレート候補として検出された部分参照情報は、その部分参照情報の根頂点のみで表し、その根頂点の子孫頂点は基準となる頂点の探索範囲としない。また、最も深い位置にある頂点から数えた深さ（設定深さ）を予め設定しておき、最も深い位置に存在する頂点から、その設定深さの中に存在している頂点までに存在する頂点で形成される根付木構造の部分参照情報群をそれぞれテンプレート候補としてもよい。 Further, the vertex at the deepest position in the tree structure of the cross-reference relationship data 2008 is used as a reference vertex, and partial reference information of the minimum tree structure including all the vertices at the same depth as the vertex is detected as a template candidate. May be. In this case, the partial reference information detected as the template candidate is represented only by the root vertex of the partial reference information, and the descendant vertex of the root vertex is not set as the reference vertex search range. Also, the depth (set depth) counted from the vertex at the deepest position is set in advance, and it exists from the vertex existing at the deepest position to the vertex existing within the set depth A partial reference information group of a rooted tree structure formed by vertices may be used as a template candidate.

データ圧縮システムの実施の形態
図２０は、本発明の実施の形態に係るデータ圧縮システム２６０１のシステム構成図である。データ圧縮システム２６０１はテンプレート生成装置２６０２と、データ圧縮装置２６０３とを有し、第１の入力データ２６０５と第２の入力データ２６０４を入力してデータ圧縮を行う。そして、データ圧縮システム２６０１では、テンプレートにより置換されたテンプレート適用済みの相互参照関係データを第１の出力データ２６０６として出力し、入力する相互参照関係データから分離された複数の頂点を有する頂点群のデータを第２の出力データ２６０７として出力する。 Embodiment of Data Compression System FIG. 20 is a system configuration diagram of a data compression system 2601 according to an embodiment of the present invention. The data compression system 2601 includes a template generation device 2602 and a data compression device 2603, and performs data compression by inputting first input data 2605 and second input data 2604. Then, the data compression system 2601 outputs the template-applied cross-reference relationship data replaced by the template as first output data 2606, and the vertex group having a plurality of vertices separated from the input cross-reference relationship data. The data is output as second output data 2607.

テンプレート生成装置２６０２は、型と値をそれぞれ有する複数の頂点と、各頂点間の参照関係が根付木構造を有する参照情報とを有する相互参照関係データを第２の入力データ２６０４として入力する。この第２の入力データ２６０４は、データ圧縮装置２６０３における圧縮に用いられるテンプレートの生成用として入力する。
また、テンプレート生成装置２６０２は、第２の入力データの各頂点間の参照情報に含まれる根付木構造の部分参照情報から、その部分参照情報をパターンに設定したときの出現頻度およびテンプレートにしたときの圧縮効果に応じた部分参照情報をテンプレートとして出力する。つまり、テンプレート生成装置２６０２は、出現頻度が高く、しかも圧縮効果が高い部分参照情報をテンプレートとして出力する。 The template generation device 2602 receives, as second input data 2604, cross-reference relationship data having a plurality of vertices each having a type and a value, and reference information in which the reference relationship between the vertices has a rooted tree structure. The second input data 2604 is input for generating a template used for compression in the data compression device 2603.
Further, the template generation device 2602 uses the partial reference information of the rooted tree structure included in the reference information between the vertices of the second input data as the appearance frequency and the template when the partial reference information is set as a pattern. The partial reference information corresponding to the compression effect is output as a template. That is, the template generation device 2602 outputs partial reference information having a high appearance frequency and a high compression effect as a template.

そして、例えば、図２（ｂ）に示したデータ構造２１を入力し、これを圧縮する際、第２の入力データ２６０４として、データ構造２１から分離手段１０７によって分離された相互参照関係データ２００８がテンプレート生成装置２６０２に入力される。ここで、出現頻度が高く圧縮効果が高い部分参照情報が、図１９に示した第２のテンプレート候補２０１０であったとすると、テンプレート生成装置２６０２は第２のテンプレート候補２０１０をテンプレートとして出力する。出力されたテンプレートはデータ圧縮装置２６０３のテンプレート蓄積手段１０２に蓄積される。このテンプレートは、他の相互参照関係データが入力された際に共通に利用されることになる。 Then, for example, when the data structure 21 shown in FIG. 2B is input and compressed, the cross-reference relationship data 2008 separated from the data structure 21 by the separating means 107 is used as the second input data 2604. The data is input to the template generation device 2602. Here, if the partial reference information having a high appearance frequency and a high compression effect is the second template candidate 2010 shown in FIG. 19, the template generation device 2602 outputs the second template candidate 2010 as a template. The output template is stored in the template storage unit 102 of the data compression device 2603. This template is commonly used when other cross-reference relationship data is input.

データ圧縮装置２６０３は、第１の入力データ２６０５として、第２の入力データ２６０４と同様の構造を有するデータを入力する。このデータ圧縮装置２６０３は、図１に示すデータ圧縮装置１０１と同様の構成を有している。すなわち、データ圧縮装置２６０３は、分離手段１０７と、テンプレート蓄積手段１０２と、一致箇所検出手段１０３と、テンプレート置換手段１１１および切り替え手段１０４を有している。テンプレート蓄積手段１０２はテンプレート生成装置２６０２から出力されるテンプレートを蓄積している。一致箇所検出手段１０３は入力される相互参照関係データから、木構造の葉頂点を含む一致箇所のみを検出し、テンプレートへ置換する再置換を行う。 The data compression device 2603 inputs data having the same structure as the second input data 2604 as the first input data 2605. The data compression device 2603 has the same configuration as the data compression device 101 shown in FIG. That is, the data compression apparatus 2603 includes a separating unit 107, a template accumulating unit 102, a matching location detecting unit 103, a template replacing unit 111, and a switching unit 104. The template storage unit 102 stores templates output from the template generation device 2602. The coincidence part detecting means 103 detects only the coincidence part including the leaf vertex of the tree structure from the input cross-reference relationship data, and performs re-replacement to replace the template.

一致箇所検出手段１０３は、図１７に示す相互参照関係データ２００８に含まれる頂点のうち、テンプレート蓄積手段１０２に蓄積されているテンプレートに対応する一致箇所を検出する。この際、一致箇所検出手段１０３は、葉頂点を含むテンプレートの一致箇所のみを検出したとすると、頂点２００２、頂点２００３、頂点２００４および頂点２００５で形成される部分参照情報を一致箇所として検出する。 The matching part detection unit 103 detects a matching part corresponding to the template stored in the template storage unit 102 among the vertices included in the cross-reference relationship data 2008 shown in FIG. At this time, if the matching part detection unit 103 detects only the matching part of the template including the leaf vertex, it detects partial reference information formed by the vertex 2002, the vertex 2003, the vertex 2004, and the vertex 2005 as the matching part.

テンプレート置換手段１１１は、検出された一致箇所にテンプレートを適用する。この際、テンプレートを新たな頂点として表現し、テンプレート適用済みの相互参照関係データを出力する。また、切り替え手段１０４により出力が選択されると、テンプレート適用済みの相互参照関係データが第１の出力データ２６０６として出力される。切り替え手段１０４により再圧縮が選択されると、再圧縮を行うため、そのテンプレート適用済みの相互参照関係データが再帰的に一致箇所検出手段１０３に入力され、再度テンプレートの適用が行われる。第１の出力データ２６０６は、図８に示した相互参照関係データ１４０１のようになる。 The template replacement unit 111 applies the template to the detected matching part. At this time, the template is expressed as a new vertex, and the cross-reference relationship data to which the template has been applied is output. When the output is selected by the switching means 104, the cross-reference relationship data to which the template has been applied is output as the first output data 2606. When re-compression is selected by the switching unit 104, the re-compression is performed, so that the cross-reference relationship data to which the template has been applied is recursively input to the matching location detection unit 103, and the template is applied again. The first output data 2606 is like the cross-reference relationship data 1401 shown in FIG.

このようにデータ圧縮システム２６０１を構成することにより、与えられた入力データの圧縮に適したテンプレートを動的に生成し、その生成されたテンプレートを用いて、入力データを圧縮することができる。そのため、データ圧縮システム２６０１によると圧縮効果の高いデータ圧縮を実現できる。 By configuring the data compression system 2601 in this way, a template suitable for compression of given input data can be dynamically generated, and the input data can be compressed using the generated template. Therefore, the data compression system 2601 can realize data compression with a high compression effect.

本発明の実施の形態に係るデータ圧縮装置の構成を示すブロック図である。It is a block diagram which shows the structure of the data compression apparatus which concerns on embodiment of this invention. （ａ）はＸＭＬ文書の一例を示す図、（ｂ）は（ａ）のＸＭＬ文書のデータ構造を示す図である。(A) is a figure which shows an example of an XML document, (b) is a figure which shows the data structure of the XML document of (a). 圧縮前の相互参照関係データを示す図である。It is a figure which shows the cross-reference relationship data before compression. 頂点の集合のテーブルを示す図である。It is a figure which shows the table of the set of vertices. （ａ）はテンプレートの構成を示すブロック図、（ｂ）はテンプレート実体の構成を示すブロック図である。(A) is a block diagram showing the configuration of the template, (b) is a block diagram showing the configuration of the template entity. 図５（ｂ）に示すテンプレートを用いて圧縮した相互参照関係データを示す図である。It is a figure which shows the cross reference relationship data compressed using the template shown in FIG.5 (b). 図５（ｂ）に示すテンプレートを用いてさらに圧縮した相互参照関係データを示す図である。It is a figure which shows the cross-reference relationship data further compressed using the template shown in FIG.5 (b). 図５（ｂ）に示すテンプレートを用いてさらに圧縮した別の相互参照関係データを示す図である。It is a figure which shows another cross-reference relationship data further compressed using the template shown in FIG.5 (b). 本発明の実施の形態に係るデータ復元装置の構成を示すブロック図である。It is a block diagram which shows the structure of the data restoration apparatus which concerns on embodiment of this invention. 本発明の実施の形態に係るテンプレート生成装置の構成を示すブロック図である。It is a block diagram which shows the structure of the template production | generation apparatus which concerns on embodiment of this invention. （ａ）は一致箇所情報の構成を示すブロック図、（ｂ）は図６に対応する一致箇所情報の構成を示すブロック図である、（ｃ）は別の一致箇所情報の構成を示すブロック図である。(A) is a block diagram showing the configuration of matching location information, (b) is a block diagram showing the configuration of matching location information corresponding to FIG. 6, (c) is a block diagram showing the configuration of another matching location information It is. 一致箇所検出手順の一例を示す図である。It is a figure which shows an example of a matching location detection procedure. 置換手順の一例を示す図である。It is a figure which shows an example of the replacement procedure. 復元手順の一例を示す図である。It is a figure which shows an example of a decompression | restoration procedure. 別のデータ圧縮装置の構成を示すブロック図である。It is a block diagram which shows the structure of another data compression apparatus. テンプレート候補検出手段の動作手順の一例を示すフローチャートである。It is a flowchart which shows an example of the operation | movement procedure of a template candidate detection means. 圧縮前の別の相互参照関係データを示す図である。It is a figure which shows another cross-reference relationship data before compression. 第１のテンプレート候補の構成を示す図である。It is a figure which shows the structure of a 1st template candidate. 第２のテンプレート候補の構成を示す図である。It is a figure which shows the structure of a 2nd template candidate. データ圧縮システムのシステム構成図である。It is a system configuration figure of a data compression system. 別のＸＭＬ文書を示す図である。It is a figure which shows another XML document. さらに別のＸＭＬ文書を示す図である。It is a figure which shows another XML document. 図２１のＸＭＬ文書から分離された相互参照関係データを示す図である。It is a figure which shows the cross reference relationship data isolate | separated from the XML document of FIG. 図２２のＸＭＬ文書から分離された相互参照関係データを示す図である。It is a figure which shows the cross reference relationship data isolate | separated from the XML document of FIG. テンプレート生成の第１の手順を示すフローチャートである。It is a flowchart which shows the 1st procedure of template production | generation. テンプレート生成の第２の手順を示すフローチャートである。It is a flowchart which shows the 2nd procedure of template production | generation. テンプレート生成の第３の手順を示すフローチャートである。It is a flowchart which shows the 3rd procedure of template production | generation. テンプレート候補の構成を示すブロック図である。It is a block diagram which shows the structure of a template candidate. テンプレート候補蓄積手段の内部構成を示す図である。It is a figure which shows the internal structure of a template candidate storage means. テンプレート候補蓄積手段の別の内部構成を示す図である。It is a figure which shows another internal structure of a template candidate storage means. テンプレート候補蓄積手段のさらに別の内部構成を示す図である。It is a figure which shows another internal structure of a template candidate storage means. ＸＭＬ文書の別の一例を示す図である。It is a figure which shows another example of an XML document. 図３２のＸＭＬ文書のデータ構造を示す図である。It is a figure which shows the data structure of the XML document of FIG. （ａ）は図３３のＸＭＬ文書から分離された相互参照関係データを示す図、（ｂ）は頂点の集合のテーブルを示す図である。(A) is a figure which shows the cross-reference relationship data isolate | separated from the XML document of FIG. 33, (b) is a figure which shows the table | surface of the vertex set. 図３２のＸＭＬ文書から分離されたデータ構造を示す図である。It is a figure which shows the data structure separated from the XML document of FIG. 図３２のＸＭＬ文書から分離された要素名情報を示す図である。It is a figure which shows the element name information isolate | separated from the XML document of FIG. 図３２のＸＭＬ文書から分離されたテキスト情報を示す図である。It is a figure which shows the text information isolate | separated from the XML document of FIG.

Explanation of symbols

２０，３０，４０…ＸＭＬ文書
２１，３１…データ構造
１０１，２１０４，２６０３…データ圧縮装置
１０２，１５０２…テンプレート蓄積手段
１０３，２１０７…一致箇所検出手段
１０７，２１０６…分離手段
１０８，２１０５…入力データ
１０９，２１１０，２６０６…第１の出力データ
１１０，２１１１，２６０７…第２の出力データ
１１１，１６０３，２１０８…テンプレート置換手段
１０００，１２０１…相互参照関係データ
１３０１，１４０１，２００８…相互参照関係データ
１００１，１００２，１００３，１００４…頂点
１００５，１００６，１００７…頂点
９００…テーブル、１１０５…テンプレート
１１０９，１２０２，１３０２…テンプレート実体
１５０１…データ復元装置、１５０４…合成手段
１５０５…テンプレート展開手段
１５０６，２６０５…第１の入力データ
１５０７，２６０４…第２の入力データ
１５０８…出力データ
１６０１，２６０２…テンプレート生成装置
１６０２…テンプレート候補検出手段
１６０４…入力データ取得手段
１６０５…テンプレート候補蓄積手段
１６０６…テンプレート選択手段
１６０７…入力データ群、１６０８…出力データ
２００９…第１のテンプレート候補
２０１０…第２のテンプレート候補
２１０１…第１の圧縮手段
２１０９…テンプレート蓄積手段
２６０１…データ圧縮システム
２９００，３０００…相互参照関係データ
３４０１…テンプレート候補 20, 30, 40 ... XML document 21, 31 ... Data structure 101, 2104, 2603 ... Data compression device 102, 1502 ... Template storage means 103, 2107 ... Matching location detection means 107, 2106 ... Separation means 108, 2105 ... Input data 109, 2110, 2606 ... first output data 110, 2111, 2607 ... second output data 111, 1603, 2108 ... template replacement means 1000, 1201 ... cross-reference relation data 1301, 1401, 2008 ... cross-reference relation data 1001 , 1002, 1003, 1004 ... vertex 1005, 1006, 1007 ... vertex 900 ... table 1105 ... template 1109, 1202, 1302 ... template entity 1501 ... data restoration device, 1504 ... synthesis means 150 ... Template expansion means 1506, 2605 ... First input data 1507, 2604 ... Second input data 1508 ... Output data 1601, 2602 ... Template generation device 1602 ... Template candidate detection means 1604 ... Input data acquisition means 1605 ... Template candidate accumulation Means 1606 ... Template selection means 1607 ... Input data group, 1608 ... Output data 2009 ... First template candidate 2010 ... Second template candidate 2101 ... First compression means 2109 ... Template storage means 2601 ... Data compression system 2900, 3000 ... Cross-reference relationship data 3401 ... Template candidates

Claims

Input data having a plurality of vertices each having a type and a value, and reference information between the vertices, cross-reference relationship data having reference information between the vertices, and a vertex comprising a plurality of vertices having the type and value Separating means for separating into groups and outputting data of the separated vertex groups;
A template storage means for storing a template representing a specific partial pattern of the reference information between previous SL vertices,
A template matching point detecting unit for detecting a matching point corresponding to the template stored in the template storing unit from the cross-reference relationship data separated by the separating unit;
Among the cross-reference relationship data separated by the separation unit, when replacing the matching part detected by the template matching part detection unit with the template, the indicated vertex indicating the replacement part by the template is matched. provided at a position, a portion other than the matching portion of the cross reference data, the data comprising said instructions with the vertex, and a template replacing means to enable re-replacement by the template,
A data compression apparatus comprising: output means for outputting cross-reference relationship data repeatedly replaced by the template replacement means to a storage device in a state where reference information between the vertices can be restored by the template .

When the input data has a rooted tree structure, the template matching position detecting means is at the deepest position of the rooted tree structure when detecting a matching position corresponding to the template stored in the template storing means. 2. The data compression apparatus according to claim 1, wherein only a coincidence portion including a leaf vertex which is a vertex is detected.

A template storage means for storing a template representing a specific partial pattern of the reference information between the vertices of multiple,
The compressed cross-reference relationship data having reference information between the vertices and replaced by the template is input, the compressed cross-reference relationship data is expanded using the template, and the expanded mutual reference When the designated vertex is included in the reference relationship data, the expanded cross-reference relationship data is repeatedly expanded using the template, and the cross-reference before compression is performed from the compressed cross-reference relationship data. Deployment means to restore the relevant data;
Input data of the vertex group composed of the plurality of vertices, synthesize the data of the vertex group and the cross-reference relationship data before compression restored by the expansion means, and output the synthesized data And a data recovery device.

The expansion means performs the re-expansion using the template necessary for referring to the vertex to be referred to when the vertex to be expanded using the template is specified in the cross-reference relationship data. 4. The data restoration apparatus according to claim 3, wherein

A data compression system for compressing first input data having a plurality of vertices each having a type and a value, and reference information between the vertices,
The cross-reference relation data having the reference information between the vertices extracted from the second input data having the same structure as the first input data is input, and a specific partial pattern of the reference information between the vertices is input. and lutein plates generator to generate a template representing,
The data compression apparatus according to claim 1, further comprising a template storage unit that stores the template generated by the template generation apparatus.
In a state where the reference information between the vertices can be restored by the template , the cross-reference relation data repeatedly replaced by the data compression device using the template stored in the template storage means is output as first output data. And a data compression system comprising: output means for outputting, as second output data, data of a plurality of vertices each having the type and value separated from the first input data.