JP2004258773A

JP2004258773A - Document compressing device and document reconstructing device

Info

Publication number: JP2004258773A
Application number: JP2003046217A
Authority: JP
Inventors: Hironori Yamashita; 洋徳山下
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2003-02-24
Filing date: 2003-02-24
Publication date: 2004-09-16

Abstract

<P>PROBLEM TO BE SOLVED: To provide a document compressing device which can compress an XML document in such a manner that the XML document can be easily reconstructed without a dedicated application. <P>SOLUTION: The document compressing device is configured to compress an element specified by an XML document analyzing part 12, provide an XML document compressing part 13 which creates a correspondence list indicating the correspondence relationship between before and after compression of the element, and create an XSLT 4 for reconstruction for reference when the compressed element is reconstructed based on the correspondence list created by an XML document compressing part 13. <P>COPYRIGHT: (C)2004,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
この発明は、例えば、ＸＭＬ文書などの構造化文書を圧縮する文書圧縮装置と、その圧縮された構造化文書を復元する文書復元装置とに関するものである。
【０００２】
【従来の技術】
近年、インターネットの普及に伴って、様々な電子データをインターネットを介してやり取りする機会が増加している。その際、データ交換方式として、ＸＭＬが採用される機会が多くなってきている。
ＸＭＬは、開始タグと終了タグによって、データである「要素」を挟む形式の文書であり、その要素の内容として、別要素を持つことができるため、入れ子になった階層的な構造を実現することができる。また、要素は「属性」を有することができる。
＜要素名＞要素の内容＜／要素名＞
＜要素名属性名＝“属性値”＞要素の内容＜／要素名＞
【０００３】
しかし、要素名や属性名に長い名前が指定された場合、データの冗長度が高くなり、データ容量が大きくなる。
そこで、ＸＭＬ文書を圧縮する文書圧縮装置が以下の特許文献１に開示されているが、従来の文書圧縮装置がＸＭＬ文書を圧縮した場合、専用のアプリケーションがなければ、そのＸＭＬ文書を復元することができない。
【０００４】
【特許文献１】
特開２００１−６７３４８公報（段落番号［００５０］から［００６５］、図１）
【０００５】
【発明が解決しようとする課題】
従来の文書圧縮装置は以上のように構成されているので、ＸＭＬ文書を圧縮してデータ容量を小さくすることができるが、専用のアプリケーションがなければ、そのＸＭＬ文書を復元することができず、インターネットなどを利用したＸＭＬ文書の交換に適さないなどの課題があった。
【０００６】
この発明は上記のような課題を解決するためになされたもので、ＸＭＬ文書を圧縮する際、専用のアプリケーションがなくても、簡単にＸＭＬ文書を復元することができるような形式で圧縮することができる文書圧縮装置を得ることを目的とする。
また、この発明は、専用のアプリケーションがなくても、簡単にＸＭＬ文書を復元することができる文書復元装置を得ることを目的とする。
【０００７】
【課題を解決するための手段】
この発明に係る文書圧縮装置は、要素特定手段により特定された要素を圧縮するとともに、その要素の圧縮前後の対応関係を示す対応表を作成する要素圧縮手段を設け、その要素圧縮手段により作成された対応表に基づいて圧縮後の要素を復元する際に参照するテンプレートを生成するようにしたものである。
【０００８】
【発明の実施の形態】
以下、この発明の実施の一形態を説明する。
実施の形態１．
図１はこの発明の実施の形態１による文書圧縮装置及び文書復元装置が適用されるシステムを示すシステム構成図であり、図において、文書圧縮装置２はＸＭＬ文書の原文１を入力して、その原文１を圧縮するとともに、圧縮後のＸＭＬ文書３を復元する際に参照する復元用ＸＳＬＴ４を生成する。
文書復元装置６は文書圧縮装置２からネットワーク５を介して圧縮後のＸＭＬ文書３と復元用ＸＳＬＴ４を受信すると、その復元用ＸＳＬＴ４を参照してＸＭＬ文書の原文１を復元する。
【０００９】
図２はこの発明の実施の形態１による文書圧縮装置２を示す構成図であり、図において、ＸＭＬ文書入力部１１は構造化文書であるＸＭＬ文書を入力する。ＸＭＬ文書解析部１２はＸＭＬ文書入力部１１により入力されたＸＭＬ文書の構造を解析して、圧縮対象の要素を特定する。なお、ＸＭＬ文書解析部１２は要素特定手段を構成している。
ＸＭＬ文書圧縮部１３はＸＭＬ文書解析部１２により特定された要素を圧縮するとともに、その要素の圧縮前後の対応関係を示す対応表を作成する。なお、ＸＭＬ文書圧縮部１３は要素圧縮手段を構成している。ＸＳＬＴ生成部１４はＸＭＬ文書圧縮部１３により作成された対応表に基づいて圧縮後の要素を復元する際に参照する復元用ＸＳＬＴ４を生成する。なお、ＸＳＬＴ生成部１４はテンプレート生成手段を構成している。
ＸＭＬ文書送信部１５は圧縮後のＸＭＬ文書３をネットワーク５を介して文書復元装置６に送信し、ＸＳＬＴ送信部１６はＸＳＬＴ生成部１４により生成された復元用ＸＳＬＴ４をネットワーク５を介して文書復元装置６に送信する。
【００１０】
図３はこの発明の実施の形態１による文書復元装置６を示す構成図であり、図において、ＸＭＬ文書受信部２１は文書圧縮装置２から送信された圧縮後のＸＭＬ文書３を受信し、ＸＳＬＴ受信部２２は文書圧縮装置２から送信された復元用ＸＳＬＴ４を受信する。なお、ＸＭＬ文書受信部２１及びＸＳＬＴ受信部２２から入力手段が構成されている。
ＸＭＬ文書復元部２３はＸＳＬＴ受信部２２により受信された復元用ＸＳＬＴ４を参照して、ＸＭＬ文書受信部２１により受信された圧縮後のＸＭＬ文書３を復元する。なお、ＸＭＬ文書復元部２３は要素復元手段を構成している。
ＸＭＬ文書出力部２４はＸＭＬ文書復元部２３により復元されたＸＭＬ文書を出力する。
図４はＸＭＬ文書解析部１２及びＸＭＬ文書圧縮部１３の処理内容を示すフローチャートであり、図９はＸＳＬＴ生成部１４の処理内容を示すフローチャートである。
【００１１】
次に動作について説明する。
まず、文書圧縮装置２のＸＭＬ文書入力部１１がＸＭＬ文書の原文１を入力すると、ＸＭＬ文書解析部１２がＸＭＬ文書入力部１１により入力されたＸＭＬ文書の構造を解析して、圧縮対象の要素を特定する。
ＸＭＬ文書圧縮部１３は、ＸＭＬ文書解析部１２により特定された要素を圧縮するとともに、その要素の圧縮前後の対応関係を示す対応表を作成する。
【００１２】
具体的には、下記の通りである。
まず、ＸＭＬ文書の構造情報と内容をメモリ上に展開してＤＯＭツリーを生成し（ステップＳＴ１）、処理対象を指し示すポインタをＸＭＬ文書のルート要素に指定する（ステップＳＴ２）。
例えば、ＸＭＬ文書入力部１１により図５（ａ）のようなＸＭＬ文書が入力された場合、処理対象ポインタをルート要素である＜ｐｒｏｄｕｃｔｓ＞に指定する。
【００１３】
次に、ＸＭＬ文書解析部１２は、そのルート要素＜ｐｒｏｄｕｃｔｓ＞が未処理の子要素を有しているか否かを確認し（ステップＳＴ３）、そのルート要素＜ｐｒｏｄｕｃｔｓ＞が未処理の子要素を有している場合には、処理対象ポインタをルート要素＜ｐｒｏｄｕｃｔｓ＞から子要素に移行する（ステップＳＴ４）。
この例では、ルート要素＜ｐｒｏｄｕｃｔｓ＞は、未処理の子要素として＜ｐｒｏｄｕｃｔｉｄ＝“０００１”＞を有しているので、図５（ｂ）に示すように、処理対象ポインタを＜ｐｒｏｄｕｃｔｉｄ＝“０００１”＞に移行する。
【００１４】
さらに、ＸＭＬ文書解析部１２は、移行後の要素＜ｐｒｏｄｕｃｔｉｄ＝“０００１”＞が未処理の子要素を有しているか否かを確認し（ステップＳＴ３）、移行後の要素＜ｐｒｏｄｕｃｔｉｄ＝“０００１”＞が未処理の子要素を有している場合には、処理対象ポインタを要素＜ｐｒｏｄｕｃｔｉｄ＝“０００１”＞から子要素に移行する（ステップＳＴ４）。
この例では、要素＜ｐｒｏｄｕｃｔｉｄ＝“０００１”＞は、未処理の子要素として＜ｎａｍｅ＞を有しているので、図５（ｃ）に示すように、処理対象ポインタを＜ｎａｍｅ＞に移行する。
【００１５】
次に、ＸＭＬ文書解析部１２は、移行後の要素＜ｎａｍｅ＞は、未処理の子要素を有していないので、要素＜ｎａｍｅ＞が未処理の属性を有しているか否かを確認する（ステップＳＴ５）。
この例では、要素＜ｎａｍｅ＞は、未処理の属性を有していないので、図５（ｄ）に示すように、ＸＭＬ文書圧縮部１３が要素＜ｎａｍｅ＞の要素名を短くする（ステップＳＴ６）。要素名を“ｎａｍｅ”→“ａ”のように短くしている。
ＸＭＬ文書圧縮部１３は、このように要素名を短くすると、図６に示すように、短縮前後の要素名の対応関係を示す対応表を作成する。
【００１６】
次に、ＸＭＬ文書解析部１２は、要素＜ｎａｍｅ＞が親要素を有しているか否かを確認し（ステップＳＴ７）、即ち、要素＜ｎａｍｅ＞がルート要素であるか否かを確認し、要素＜ｎａｍｅ＞がルート要素ではないので、要素＜ｎａｍｅ＞が属性又は子要素を有しているか否かを確認する（ステップＳＴ８）。
この例では、要素＜ｎａｍｅ＞は、属性も子要素も有していないので、その親要素である＜ｐｒｏｄｕｃｔｉｄ＝“０００１”＞が内容を有しているか否かを確認する（ステップＳＴ９）。
この例では、その親要素である＜ｐｒｏｄｕｃｔｉｄ＝“０００１”＞は内容を有していないので、ＸＭＬ文書圧縮部１３が図５（ｅ）に示すように、要素＜ｎａｍｅ＞の開始タグと終了タグを削除して、その要素の内容“ＡＢＣ”を親要素の属性として付加する（ステップＳＴ１０）。即ち、要素＜ｎａｍｅ＞を削除して、親要素を＜ｐｒｏｄｕｃｔｉｄ＝“０００１”ａ＝“ＡＢＣ”＞のように変更する。
【００１７】
上記のようにしてＸＭＬ文書圧縮部１３が要素の圧縮処理を実施して、対応表を更新すると（ステップＳＴ１１またはＳＴ１２）、ＸＭＬ文書解析部１２が処理対象ポインタを親要素に移行し（ステップＳＴ１３）、ステップＳＴ３の処理に戻ることにより、上記と同様の処理を繰り返し実行する。
これにより、この例では、要素＜ｐｒｉｃｅ＞の要素名が“ｐｒｉｃｅ”→“ｂ”のように短くされたのち、要素＜ｐｒｉｃｅ＞の開始タグと終了タグが削除されて、要素＜ｐｒｉｃｅ＞の要素の内容“１０００”が親要素の属性として付加される。
その結果、親要素が子要素を有しないようになるので、図５（ｆ）に示すように、その親要素の終了タグ＜／ｐｒｏｄｕｃｔ＞を空要素タグ“／”に変更する。即ち、＜ｐｒｏｄｕｃｔｉｄ＝“０００１”ａ＝“ＡＢＣ”ｂ＝“１０００”／＞のように変更される。
【００１８】
その後、親要素＜ｐｒｏｄｕｃｔｉｄ＝“０００１”ａ＝“ＡＢＣ”ｂ＝“１０００”／＞に処理対象ポインタが移行すると（ステップＳＴ１３）、親要素＜ｐｒｏｄｕｃｔｉｄ＝“０００１”ａ＝“ＡＢＣ”ｂ＝“１０００”／＞が未処理の子要素を有していないので（ステップＳＴ３）、ＸＭＬ文書解析部１２が未処理の属性を有しているか否かを確認する（ステップＳＴ５）。
ＸＭＬ文書圧縮部１３は、親要素＜ｐｒｏｄｕｃｔｉｄ＝“０００１”ａ＝“ＡＢＣ”ｂ＝“１０００”／＞が未処理の属性“ｉｄ”を有しているので、図５（ｇ）に示すように、その属性名を短くする（ステップＳＴ１４）。属性名を“ｉｄ”→“ｃ”のように短くしている。
ＸＭＬ文書圧縮部１３は、このように属性名を短くすると、図７に示すように、短縮前後の属性名の対応関係を示す対応表を作成する。
【００１９】
ＸＭＬ文書圧縮部１３は、その後、要素＜ｐｒｏｄｕｃｔｃ＝“０００１”ａ＝“ＡＢＣ”ｂ＝“１０００”／＞が未処理の属性を有しなくなると、図５（ｈ）に示すように、要素＜ｐｒｏｄｕｃｔｃ＝“０００１”ａ＝“ＡＢＣ”ｂ＝“１０００”／＞の要素名を短くする（ステップＳＴ６）。要素名を“ｐｒｏｄｕｃｔ”→“ｄ”のように短くしている。
ＸＭＬ文書圧縮部１３は、このように要素名を短くすると、図８に示すように、短縮前後の要素名の対応関係を示す対応表を作成する。
【００２０】
その後、ルート要素＜ｐｒｏｄｕｃｔｓ＞に処理対象ポインタが移行すると（ステップＳＴ１３）、ＸＭＬ文書圧縮部１３は、図５（ｉ）に示すように、ルート要素＜ｐｒｏｄｕｃｔｓ＞の要素名を短くする。要素名を“ｐｒｏｄｕｃｔｓ”→“ｅ”のように短くしている。
ＸＭＬ文書圧縮部１３は、このように要素名を短くすると、図８に示すように、短縮前後の要素名の対応関係を示す対応表を作成する（ステップＳＴ１６）。
【００２１】
ＸＳＬＴ生成部１４は、上記のようにして、ＸＭＬ文書圧縮部１３がＸＭＬ文書の圧縮処理を終了すると、その際に作成された図６〜図８の対応表に基づいて圧縮後の要素を復元する際に参照する復元用ＸＳＬＴ４を生成する（ステップＳＴ１７）。
即ち、属性として変換された葉要素（属性や子要素を有しない要素）や属性は、ＸＳＬＴのテンプレートとして記述し、親要素となる要素は、子要素へのテンプレート適用を記述したテンプレートとして記述する。具体的には次の通りである。
【００２２】
まず、ＸＳＬＴ生成部１４は、図６の対応表の各要素をＸＳＬＴのテンプレートにする（ステップＳＴ２１）。
次に、ＸＳＬＴ生成部１４は、図７の対応表の各属性をＸＳＬＴのテンプレートにする（ステップＳＴ２２）。
次に、ＸＳＬＴ生成部１４は、図８の対応表の各要素をＸＳＬＴのテンプレートにする（ステップＳＴ２３）。
【００２３】
次に、ＸＳＬＴ生成部１４は、図７の対応表から該当する要素に属する属性を探して、テンプレートを適用する（ステップＳＴ２４）。
次に、ＸＳＬＴ生成部１４は、図６の対応表から該当する要素に属する葉要素を探し、テンプレートを適用する（ステップＳＴ２５）。
ＸＳＬＴ生成部１４は、図８の対応表における全ての要素について、テンプレートを作成するまで、ステップＳＴ２４，ＳＴ２５の処理を繰り返し実行する（ステップＳＴ２６）。
図１０はＸＳＬＴ生成部１４により生成された復元用ＸＳＬＴ４の一例を示している。
【００２４】
ＸＭＬ文書送信部１５は、圧縮後のＸＭＬ文書３をネットワーク５を介して文書復元装置６に送信し、ＸＳＬＴ送信部１６はＸＳＬＴ生成部１４により生成された復元用ＸＳＬＴ４をネットワーク５を介して文書復元装置６に送信する。
【００２５】
一方、文書復元装置６のＸＭＬ文書復元部２３は、ＸＭＬ文書受信部２１が文書圧縮装置２から送信された圧縮後のＸＭＬ文書３を受信し、ＸＳＬＴ受信部２２が文書圧縮装置２から送信された復元用ＸＳＬＴ４を受信すると、その復元用ＸＳＬＴ４を参照して、圧縮後のＸＭＬ文書３を復元する。
即ち、復元処理は、ＸＭＬ文書形式で記述されている復元用ＸＳＬＴ４の記述内容を順次実行することにより、圧縮後のＸＭＬ文書３を復元する。図１１は復元処理の処理内容を示しており、図１１（ｋ）が復元されたＸＭＬ文書に相当する。
【００２６】
以上で明らかなように、この実施の形態１によれば、ＸＭＬ文書解析部１２により特定された要素を圧縮するとともに、その要素の圧縮前後の対応関係を示す対応表を作成するＸＭＬ文書圧縮部１３を設け、そのＸＭＬ文書圧縮部１３により作成された対応表に基づいて圧縮後の要素を復元する際に参照する復元用ＸＳＬＴ４を生成するように構成したので、ＸＭＬ文書を圧縮する際、専用のアプリケーションがなくても、簡単にＸＭＬ文書を復元することができるような形式で圧縮することができる効果を奏する。
また、この実施の形態１によれば、文書圧縮装置により生成された復元用ＸＳＬＴ４を参照して、文書圧縮装置により圧縮されたＸＭＬ文書の要素を復元するように構成したので、専用のアプリケーションがなくても、簡単にＸＭＬ文書を復元することができる効果を奏する。
【００２７】
この実施の形態１によれば、ＸＭＬ文書解析部１２により特定された要素が属性及び子要素を有しない場合、その要素の開始タグと終了タグを削除し、その要素の内容を親要素の属性として付加するように構成したので、ＸＭＬ文書の容量を小さくすることができる効果を奏する。
また、この実施の形態１によれば、ＸＭＬ文書解析部１２により特定された要素が子要素を有しないが属性を有する場合、その属性名を短縮化して、短縮化前後の属性名の対応関係を示す対応表を作成するように構成したので、ＸＭＬ文書の容量を小さくすることができるとともに、容易にＸＭＬ文書を復元することができる効果を奏する。
【００２８】
さらに、この実施の形態１によれば、要素の開始タグと終了タグを削除し、その要素の内容を親要素の属性として付加することにより、親要素が子要素を持たなくなると、その親要素の終了タグを空要素タグに変更するように構成したので、ＸＭＬ文書の容量を小さくすることができる効果を奏する。
また、この実施の形態１によれば、親要素の要素名及び属性名を短縮化して、短縮化前後の要素名及び属性名の対応関係を示す対応表を作成するように構成したので、ＸＭＬ文書の容量を小さくすることができるとともに、容易にＸＭＬ文書を復元することができる効果を奏する。
【００２９】
【発明の効果】
以上のように、この発明によれば、要素特定手段により特定された要素を圧縮するとともに、その要素の圧縮前後の対応関係を示す対応表を作成する要素圧縮手段を設け、その要素圧縮手段により作成された対応表に基づいて圧縮後の要素を復元する際に参照するテンプレートを生成するように構成したので、ＸＭＬ文書を圧縮する際、専用のアプリケーションがなくても、簡単にＸＭＬ文書を復元することができるような形式で圧縮することができる効果がある。
【図面の簡単な説明】
【図１】この発明の実施の形態１による文書圧縮装置及び文書復元装置が適用されるシステムを示すシステム構成図である。
【図２】この発明の実施の形態１による文書圧縮装置を示す構成図である。
【図３】この発明の実施の形態１による文書復元装置を示す構成図である。
【図４】ＸＭＬ文書解析部及びＸＭＬ文書圧縮部の処理内容を示すフローチャートである。
【図５】圧縮処理の処理内容を示す説明図である。
【図６】葉要素名と短文字列の対応表を示す説明図である。
【図７】属性名と短文字列の対応表を示す説明図である。
【図８】要素名と短文字列の対応表を示す説明図である。
【図９】ＸＳＬＴ生成部の処理内容を示すフローチャートである。
【図１０】復元用ＸＳＬＴを示す説明図である。
【図１１】復元処理の処理内容を示す説明図である。
【符号の説明】
１ＸＭＬ文書の原文、２文書圧縮装置、３圧縮後のＸＭＬ文書、４復元用ＸＳＬＴ、５ネットワーク、６文書復元装置、１１ＸＭＬ文書入力部、１２ＸＭＬ文書解析部（要素特定手段）、１３ＸＭＬ文書圧縮部（要素圧縮手段）、１４ＸＳＬＴ生成部（テンプレート生成手段）、１５ＸＭＬ文書送信部、１６ＸＳＬＴ送信部、２１ＸＭＬ文書受信部（入力手段）、２２ＸＳＬＴ受信部（入力手段）、２３ＸＭＬ文書復元部（要素復元手段）、２４ＸＭＬ文書出力部。[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a document compression apparatus for compressing a structured document such as an XML document and a document restoration apparatus for restoring the compressed structured document.
[0002]
[Prior art]
In recent years, with the spread of the Internet, opportunities for exchanging various electronic data via the Internet have increased. At that time, XML is increasingly used as a data exchange method.
XML is a document in a format in which data "elements" are sandwiched between a start tag and an end tag. Since the content of the element can have another element, a nested hierarchical structure is realized. be able to. Elements can also have “attributes”.
<Element name> Element content </ Element name>
<Element name attribute name = "attribute value"> element content </ element name>
[0003]
However, when a long name is specified for an element name or an attribute name, data redundancy increases and data capacity increases.
Therefore, a document compression apparatus for compressing an XML document is disclosed in Patent Document 1 below. However, when a conventional document compression apparatus compresses an XML document, if there is no dedicated application, the XML document is restored. Can not.
[0004]
[Patent Document 1]
JP 2001-67348 A (paragraph numbers [0050] to [0065], FIG. 1)
[0005]
[Problems to be solved by the invention]
Since the conventional document compression apparatus is configured as described above, the data volume can be reduced by compressing the XML document. However, without a dedicated application, the XML document cannot be restored. There is a problem that it is not suitable for exchanging XML documents using the Internet or the like.
[0006]
SUMMARY OF THE INVENTION The present invention has been made to solve the above-described problem. When an XML document is compressed, the compression is performed in such a format that the XML document can be easily restored without a dedicated application. It is an object of the present invention to obtain a document compression apparatus capable of performing the above.
Another object of the present invention is to provide a document restoring apparatus capable of easily restoring an XML document without a dedicated application.
[0007]
[Means for Solving the Problems]
The document compression apparatus according to the present invention includes an element compression unit that compresses the element specified by the element specification unit and creates a correspondence table indicating a correspondence relationship between the element before and after the compression. Based on the correspondence table, a template to be referred to when the compressed element is restored is generated.
[0008]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, an embodiment of the present invention will be described.
Embodiment 1 FIG.
FIG. 1 is a system configuration diagram showing a system to which a document compression apparatus and a document decompression apparatus according to Embodiment 1 of the present invention are applied. In FIG. 1, a document compression apparatus 2 inputs an original 1 of an XML document and The original document 1 is compressed, and a restoring XSLT 4 to be referred to when the compressed XML document 3 is restored is generated.
Upon receiving the compressed XML document 3 and the restoring XSLT 4 from the document compression device 2 via the network 5, the document restoring device 6 refers to the restoring XSLT 4 to restore the original document 1 of the XML document.
[0009]
FIG. 2 is a configuration diagram showing the document compression apparatus 2 according to the first embodiment of the present invention. In the figure, an XML document input unit 11 inputs an XML document that is a structured document. The XML document analysis unit 12 analyzes the structure of the XML document input by the XML document input unit 11 and specifies an element to be compressed. Note that the XML document analysis unit 12 constitutes an element specifying unit.
The XML document compression unit 13 compresses the element specified by the XML document analysis unit 12 and creates a correspondence table indicating the correspondence of the element before and after compression. Note that the XML document compression unit 13 constitutes element compression means. The XSLT generating unit 14 generates a restoring XSLT 4 that is referred to when restoring the compressed element based on the correspondence table created by the XML document compressing unit 13. Note that the XSLT generation unit 14 forms a template generation unit.
The XML document transmitting unit 15 transmits the compressed XML document 3 to the document restoring device 6 via the network 5, and the XSLT transmitting unit 16 restores the restoring XSLT 4 generated by the XSLT generating unit 14 via the network 5. Transmit to the device 6.
[0010]
FIG. 3 is a block diagram showing a document restoring device 6 according to the first embodiment of the present invention. In FIG. 3, an XML document receiving unit 21 receives a compressed XML document 3 transmitted from the document compressing device 2 and performs XSLT The receiving unit 22 receives the decompressing XSLT 4 transmitted from the document compression device 2. The XML document receiving unit 21 and the XSLT receiving unit 22 constitute an input unit.
The XML document restoring unit 23 refers to the restoring XSLT 4 received by the XSLT receiving unit 22 and restores the compressed XML document 3 received by the XML document receiving unit 21. Note that the XML document restoring unit 23 constitutes an element restoring unit.
The XML document output unit 24 outputs the XML document restored by the XML document restoration unit 23.
FIG. 4 is a flowchart showing the processing contents of the XML document analysis unit 12 and the XML document compression unit 13, and FIG. 9 is a flowchart showing the processing contents of the XSLT generation unit 14.
[0011]
Next, the operation will be described.
First, when the XML document input unit 11 of the document compression apparatus 2 inputs the original document 1 of the XML document, the XML document analysis unit 12 analyzes the structure of the XML document input by the XML document input unit 11 and generates a compression target element. To identify.
The XML document compression unit 13 compresses the element specified by the XML document analysis unit 12 and creates a correspondence table indicating the correspondence of the element before and after compression.
[0012]
Specifically, it is as follows.
First, the structure information and the contents of the XML document are expanded on the memory to generate a DOM tree (step ST1), and a pointer indicating a processing target is designated as a root element of the XML document (step ST2).
For example, when an XML document as shown in FIG. 5A is input by the XML document input unit 11, the processing target pointer is designated as a root element <products>.
[0013]
Next, the XML document analysis unit 12 checks whether or not the root element <products> has an unprocessed child element (step ST3), and determines whether the root element <products> has an unprocessed child element. If so, the processing target pointer is shifted from the root element <products> to the child element (step ST4).
In this example, since the root element <products> has <product id = "0001"> as an unprocessed child element, as shown in FIG. 5B, the processing target pointer is set to <product id = <0001 >>.
[0014]
Further, the XML document analysis unit 12 checks whether or not the migrated element <product id = "0001"> has an unprocessed child element (step ST3), and the migrated element <product id = If “0001”> has an unprocessed child element, the processing target pointer is shifted from the element <product id = “0001”> to the child element (step ST4).
In this example, since the element <product id = "0001"> has <name> as an unprocessed child element, the processing target pointer is shifted to <name> as shown in FIG. I do.
[0015]
Next, since the element <name> after the migration does not have an unprocessed child element, the XML document analysis unit 12 checks whether the element <name> has an unprocessed attribute. (Step ST5).
In this example, since the element <name> has no unprocessed attribute, the XML document compression unit 13 shortens the element name of the element <name> as shown in FIG. 5D (step ST6). ). The element name is shortened as "name" → "a".
When the element names are shortened in this way, the XML document compression unit 13 creates a correspondence table indicating the correspondence between the element names before and after the shortening, as shown in FIG.
[0016]
Next, the XML document analysis unit 12 checks whether the element <name> has a parent element (step ST7), that is, checks whether the element <name> is a root element. Since the element <name> is not the root element, it is checked whether the element <name> has an attribute or a child element (step ST8).
In this example, since the element <name> has neither an attribute nor a child element, it is confirmed whether or not the parent element <product id = "0001"> has the content (step ST9). .
In this example, since the parent element <product id = “0001”> has no content, the XML document compression unit 13 sets the start tag of the element <name> as shown in FIG. The end tag is deleted, and the content "ABC" of the element is added as an attribute of the parent element (step ST10). That is, the element <name> is deleted, and the parent element is changed to <product id = “0001” a = “ABC”>.
[0017]
When the XML document compression unit 13 performs the element compression process and updates the correspondence table as described above (step ST11 or ST12), the XML document analysis unit 12 shifts the processing target pointer to the parent element (step ST13). ), By returning to the processing of step ST3, the same processing as described above is repeatedly executed.
Thus, in this example, after the element name of the element <price> is shortened from “price” to “b”, the start tag and the end tag of the element <price> are deleted, and the element <price> is deleted. The element content "1000" is added as an attribute of the parent element.
As a result, the parent element has no child element, so that the end tag </ product> of the parent element is changed to an empty element tag “/” as shown in FIG. That is, it is changed as <product id = “0001” a = “ABC” b = “1000” />.
[0018]
Thereafter, when the processing target pointer moves to the parent element <product id = “0001” a = “ABC” b = “1000” /> (step ST13), the parent element <product id = “0001” a = “ABC” b Since “=“ 1000 ”/> has no unprocessed child element (step ST3), it is checked whether the XML document analysis unit 12 has an unprocessed attribute (step ST5).
Since the parent element <product id = “0001” a = “ABC” b = “1000” /> has an unprocessed attribute “id”, the XML document compression unit 13 shown in FIG. Thus, the attribute name is shortened (step ST14). The attribute names are shortened, such as "id" → "c".
When the attribute names are shortened in this way, the XML document compression unit 13 creates a correspondence table indicating the correspondence between the attribute names before and after the shortening, as shown in FIG.
[0019]
After that, when the element <product c = “0001” a = “ABC” b = “1000” /> has no unprocessed attribute, the XML document compressing unit 13 performs the following processing as shown in FIG. The element name of the element <product c = “0001” a = “ABC” b = “1000” /> is shortened (step ST6). The element name is shortened, such as "product" → "d".
When the element names are shortened in this way, the XML document compression unit 13 creates a correspondence table indicating the correspondence between the element names before and after the shortening, as shown in FIG.
[0020]
Thereafter, when the processing target pointer moves to the root element <products> (step ST13), the XML document compression unit 13 shortens the element name of the root element <products> as shown in FIG. 5 (i). The element names are shortened, such as "products" → "e".
When the element names are shortened in this way, the XML document compression unit 13 creates a correspondence table indicating the correspondence between the element names before and after the shortening as shown in FIG. 8 (step ST16).
[0021]
When the XML document compression unit 13 completes the compression processing of the XML document as described above, the XSLT generation unit 14 restores the compressed elements based on the correspondence tables created at that time in FIGS. Then, a restoring XSLT 4 to be referred to when performing the process is generated (step ST17).
That is, leaf elements (elements having no attribute or child element) and attributes converted as attributes are described as an XSLT template, and an element serving as a parent element is described as a template that describes application of a template to a child element. . Specifically, it is as follows.
[0022]
First, the XSLT generation unit 14 uses each element of the correspondence table in FIG. 6 as an XSLT template (step ST21).
Next, the XSLT generation unit 14 uses each attribute of the correspondence table of FIG. 7 as an XSLT template (step ST22).
Next, the XSLT generation unit 14 uses each element of the correspondence table in FIG. 8 as an XSLT template (step ST23).
[0023]
Next, the XSLT generation unit 14 searches for the attribute belonging to the corresponding element from the correspondence table in FIG. 7, and applies the template (step ST24).
Next, the XSLT generation unit 14 searches for a leaf element belonging to the corresponding element from the correspondence table in FIG. 6, and applies the template (step ST25).
The XSLT generation unit 14 repeatedly executes the processing of steps ST24 and ST25 until a template is created for all elements in the correspondence table of FIG. 8 (step ST26).
FIG. 10 shows an example of the restoration XSLT 4 generated by the XSLT generation unit 14.
[0024]
The XML document transmitting unit 15 transmits the compressed XML document 3 to the document restoring device 6 via the network 5, and the XSLT transmitting unit 16 transmits the restoring XSLT 4 generated by the XSLT generating unit 14 to the document via the network 5. The data is transmitted to the restoration device 6.
[0025]
On the other hand, the XML document restoring unit 23 of the document restoring device 6 receives the compressed XML document 3 transmitted from the document compressing device 2 by the XML document receiving unit 21, and transmits the XML document 3 transmitted from the document compressing device 2 by the XSLT receiving unit 22. When the restored XSLT 4 is received, the compressed XML document 3 is restored with reference to the restored XSLT 4.
That is, in the decompression process, the compressed XML document 3 is decompressed by sequentially executing the description contents of the decompression XSLT 4 described in the XML document format. FIG. 11 shows the contents of the restoration processing, and FIG. 11 (k) corresponds to the restored XML document.
[0026]
As is clear from the above, according to the first embodiment, the XML document compression unit that compresses the element specified by the XML document analysis unit 12 and creates a correspondence table indicating the correspondence between the element before and after the compression. 13 is configured to generate the restoring XSLT 4 that is referred to when restoring the compressed element based on the correspondence table created by the XML document compressing unit 13. Thus, it is possible to compress the XML document in a format that can easily restore the XML document without the application.
Further, according to the first embodiment, the configuration is such that the elements of the XML document compressed by the document compression device are restored with reference to the decompression XSLT 4 generated by the document compression device. Even if there is no XML document, it is possible to easily restore the XML document.
[0027]
According to the first embodiment, when the element specified by the XML document analysis unit 12 has no attribute and no child element, the start tag and the end tag of the element are deleted, and the content of the element is changed to the attribute of the parent element. Since the configuration is such that the XML document is added, the capacity of the XML document can be reduced.
According to the first embodiment, when the element specified by the XML document analysis unit 12 does not have a child element but has an attribute, the attribute name is shortened, and the correspondence between the attribute names before and after the shortening is reduced. Is generated so that the capacity of the XML document can be reduced and the XML document can be easily restored.
[0028]
Further, according to the first embodiment, by deleting the start tag and end tag of an element and adding the content of the element as an attribute of the parent element, when the parent element has no child element, Is changed to an empty element tag, so that the capacity of the XML document can be reduced.
According to the first embodiment, the element name and the attribute name of the parent element are shortened, and the correspondence table indicating the correspondence between the element name and the attribute name before and after the shortening is created. It is possible to reduce the size of the document and to easily restore the XML document.
[0029]
【The invention's effect】
As described above, according to the present invention, the element specified by the element specifying means is compressed, and the element compression means for creating a correspondence table indicating a correspondence relationship between before and after the compression of the element is provided. Since the template to be referred to when restoring the compressed element is generated based on the created correspondence table, when the XML document is compressed, the XML document can be easily restored even if there is no dedicated application. There is an effect that compression can be performed in a format that can be performed.
[Brief description of the drawings]
FIG. 1 is a system configuration diagram showing a system to which a document compression device and a document decompression device according to a first embodiment of the present invention are applied;
FIG. 2 is a configuration diagram showing a document compression apparatus according to Embodiment 1 of the present invention.
FIG. 3 is a configuration diagram showing a document restoration device according to the first embodiment of the present invention.
FIG. 4 is a flowchart showing processing contents of an XML document analysis unit and an XML document compression unit.
FIG. 5 is an explanatory diagram showing processing contents of a compression processing.
FIG. 6 is an explanatory diagram showing a correspondence table between leaf element names and short character strings.
FIG. 7 is an explanatory diagram showing a correspondence table between attribute names and short character strings.
FIG. 8 is an explanatory diagram showing a correspondence table between element names and short character strings.
FIG. 9 is a flowchart illustrating processing performed by an XSLT generation unit;
FIG. 10 is an explanatory diagram showing a restoring XSLT.
FIG. 11 is an explanatory diagram illustrating processing contents of a restoration processing;
[Explanation of symbols]
1 original document of XML document, 2 document compression device, 3 compressed XML document, 4 restoration XSLT, 5 network, 6 document restoration device, 11 XML document input unit, 12 XML document analysis unit (element specifying means), 13 XML Document compression unit (element compression unit), 14 XSLT generation unit (template generation unit), 15 XML document transmission unit, 16 XSLT transmission unit, 21 XML document reception unit (input unit), 22 XSLT reception unit (input unit), 23 An XML document restoring unit (element restoring unit) and a 24 XML document outputting unit;

Claims

Analyzes the structure of the structured document and creates an element identification unit that identifies the element to be compressed, and a correspondence table that compresses the element identified by the element identification unit and shows the correspondence of the element before and after compression A document compression apparatus comprising: an element compression unit that performs compression; and a template generation unit that generates a template to be referred to when restoring the compressed element based on the correspondence table created by the element compression unit.

When the element specified by the element specifying means has no attribute and no child element, the element compressing means deletes a start tag and an end tag of the element and adds the content of the element as an attribute of the parent element. 2. The document compression apparatus according to claim 1, wherein:

When the element specified by the element specifying means does not have a child element but has an attribute, the element compressing means shortens the attribute name and creates a correspondence table indicating a correspondence relationship between the attribute names before and after the shortening. The document compression apparatus according to claim 1, wherein:

The element compression means deletes the start tag and the end tag of the element and adds the content of the element as an attribute of the parent element. When the parent element has no child element, the end tag of the parent element is set to an empty element. 3. The document compression apparatus according to claim 2, wherein the apparatus is changed to a tag.

3. The document compression apparatus according to claim 2, wherein the element compression unit shortens the element names and the attribute names of the parent element and creates a correspondence table indicating the correspondence between the element names and the attribute names before and after the shortening. .

Input means for inputting a structured document whose elements have been compressed by the document compression apparatus and a template generated by the document compression apparatus; and decompressing the compressed elements by referring to the template input by the input means. Document restoring device, comprising: