JP2007537642A

JP2007537642A - Method and apparatus for compression and decompression of structured block unit of XML data

Info

Publication number: JP2007537642A
Application number: JP2007512605A
Authority: JP
Inventors: モレル，アントニー
Original assignee: Koninklijke Philips NV; Koninklijke Philips Electronics NV
Current assignee: Koninklijke Philips NV
Priority date: 2004-05-13
Filing date: 2005-04-01
Publication date: 2007-12-20
Also published as: WO2005112270A1; CN1697327A; EP1751873A1; CN1951017A; KR20070011490A

Abstract

本発明は、ＸＭＬ圧縮及び解凍のための方法及び装置を提供する。ＸＭＬ文書は、単一のメタデータテンプレート及び含まれるコンテンツデータに分解可能な類似したデータブロックから構成されると想定される。このテンプレートが解析され、コンテンツデータのＬＺＷベースデータ圧縮（ＺＬＩＢ）が、テンプレート内の位置の参照を利用して実行される。テンプレート自体は、別に圧縮される。データブロックに関するコンテンツデータは、データブロックの選択的な解凍がファイル全体を解凍することなく実行できるように、ブロック単位（本願では「逐次的に」と呼ばれる）により圧縮される。 The present invention provides a method and apparatus for XML compression and decompression. An XML document is assumed to consist of a single metadata template and similar data blocks that can be decomposed into contained content data. The template is analyzed, and LZW-based data compression (ZLIB) of the content data is performed using the reference of the position in the template. The template itself is compressed separately. The content data for the data block is compressed in block units (referred to herein as “sequentially”) so that selective decompression of the data block can be performed without decompressing the entire file.

Description

本発明は、データを圧縮／解凍する方法及び装置に関し、より詳細には、データを逐次的に圧縮／解凍する方法及び装置に関する。 The present invention relates to a method and apparatus for compressing / decompressing data, and more particularly to a method and apparatus for sequentially compressing / decompressing data.

現在、データの逐次的圧縮／解凍は、データ圧縮／解凍の通常の方法である。この方法は、圧縮処理中にすでに処理されたデータを後続するデータを圧縮／解凍するためのリファレンスとして利用し、これにより冗長性を低減する。典型的な逐次的圧縮スキームは、ＬＺ７７及びＺｌｉｂである。Ｚｌｉｂは、ハフマンコード及びＬＺ７７に基づく逐次的圧縮スキームである。 Currently, sequential compression / decompression of data is the usual method of data compression / decompression. This method utilizes data already processed during the compression process as a reference for compressing / decompressing subsequent data, thereby reducing redundancy. Typical sequential compression schemes are LZ77 and Zlib. Zlib is a sequential compression scheme based on Huffman code and LZ77.

図１は、従来技術によるデータを逐次的に圧縮／解凍する装置を示す構造的な概略ブロック図である。図１に示されるように、当該装置は、入力されるＮ個のデータセグメントを全体として逐次的に符号化及び圧縮するのに利用される逐次的エンコーダ１２０と、圧縮されたデータを格納するのに利用されるメモリ１３０と、圧縮されたデータを逐次的に復号化し、データ性質又はインデックステーブルを利用してＮ個のデータセグメントを互いに分離するのに利用される逐次的デコーダ１４０とを有する。このような圧縮方法は、最大圧縮効率を達成することが可能であり、データセグメント間の冗長度を格納することが可能である。圧縮後のデータ冗長性は、特にＮ個のデータセグメントが同一又は類似したデータ構造を有するとき、大きく低減することが可能である。 FIG. 1 is a schematic structural block diagram showing an apparatus for sequentially compressing / decompressing data according to the prior art. As shown in FIG. 1, the apparatus stores a serial encoder 120 that is used to sequentially encode and compress the entire N data segments as a whole, and the compressed data. And a sequential decoder 140 which is used to sequentially decode the compressed data and separate the N data segments from each other using data properties or index tables. Such a compression method can achieve maximum compression efficiency and can store redundancy between data segments. Data redundancy after compression can be greatly reduced, especially when N data segments have the same or similar data structure.

しかしながら、図１に示される装置では、圧縮された各データセグメントにランダムにアクセスすることは不可能である。これは、図１に示されるような圧縮処理では、次のデータセグメント、すなわち、データセグメントＫ（Ｋは１〜Ｎの数である）の圧縮は、以前に圧縮されたすべてのデータセグメント、すなわち、データセグメント１〜Ｋに依存するためである。従って、解凍処理中、それはまず、当該データセグメント（データセグメントＫ）の前にすべてのデータセグメント（データセグメント１〜Ｋ）を解凍する必要があり、その後に、圧縮されたデータセグメント（データセグメントＫ）を解凍することが可能となる。極端な具体例として、圧縮されたデータから最後のデータセグメント（データセグメントＮ）を取得するため、データセグメントＮの以前の圧縮されたデータセグメントのすべてが順次解凍され、破棄されるまで、データセグメントＮを取得することはできない。 However, with the apparatus shown in FIG. 1, it is impossible to randomly access each compressed data segment. This is because in the compression process as shown in FIG. 1, the compression of the next data segment, ie, data segment K (where K is a number from 1 to N), is all the previously compressed data segments, ie This is because it depends on the data segments 1 to K. Thus, during the decompression process, it must first decompress all data segments (data segments 1 to K) before the data segment (data segment K), and then the compressed data segment (data segment K). ) Can be decompressed. As an extreme example, to obtain the last data segment (data segment N) from the compressed data, the data segment until all of the previous compressed data segments of data segment N are sequentially decompressed and discarded. N cannot be acquired.

図２は、従来技術によるデータを逐次的に圧縮／解凍する他の装置を示す構造的な概略ブロック図である。上記装置との相違点は、データセグメントの圧縮／解凍が互いに独立しているということである。図２に示されるように、当該装置は、データセグメント１〜Ｎの何れか１つとすることができるデータセグメントＫを独立に逐次的に圧縮するのに利用される逐次的エンコーダ１２０と、圧縮されたデータセグメントＫを格納するのに利用されるメモリ１３０と、圧縮されたデータセグメントＫを解凍し、データセグメントＫを復元するのに利用される逐次的デコーダ１４０とを有する。上記圧縮処理では、Ｎ個のデータセグメントは分離して圧縮／解凍され、各データセグメントの圧縮／解凍処理は独立し、後続するデータセグメントの逐次的圧縮／解凍は以前に圧縮されたデータセグメントに依存しない。データセグメントＫが解凍により取得される場合、メモリ１３０から直接的に圧縮されたデータセグメントＫを検出し、逐次的解凍をその前にデータセグメントを解凍する必要なく実行することしか必要としない。このデータセグメントは、当該装置がデータセグメントを独立に圧縮するため、あるアプローチによりランダムにアクセス可能である。しかしながら、各データセグメントは独立して圧縮されるため、それらの間の冗長性を利用することはできず、これにより、データ圧縮比全体は低いものとなる。 FIG. 2 is a structural schematic block diagram illustrating another apparatus for sequentially compressing / decompressing data according to the prior art. The difference from the above device is that the compression / decompression of the data segments is independent of each other. As shown in FIG. 2, the apparatus is compressed with a sequential encoder 120 that is used to independently and sequentially compress data segment K, which can be any one of data segments 1-N. A memory 130 used to store the data segment K, and a sequential decoder 140 used to decompress the compressed data segment K and decompress the data segment K. In the above compression process, the N data segments are separated and compressed / decompressed, the compression / decompression process of each data segment is independent, and the sequential compression / decompression of subsequent data segments is performed on the previously compressed data segment. Do not depend. When data segment K is obtained by decompression, it is only necessary to detect compressed data segment K directly from memory 130 and perform sequential decompression without having to decompress the data segment before that. This data segment is randomly accessible by some approach because the device compresses the data segment independently. However, since each data segment is compressed independently, the redundancy between them cannot be exploited, which results in a low overall data compression ratio.

従って、データが効率的に圧縮され、またランダムにアクセス可能となるデータを逐次的に圧縮／解凍する新規な方法及び装置が必要とされる。 Accordingly, there is a need for a new method and apparatus for sequentially compressing / decompressing data that is efficiently compressed and randomly accessible.

本発明の一課題は、現在の逐次的圧縮／解凍スキームの欠点を解消し、データを逐次的に圧縮／解凍するための新規な方法及び装置を提供し、これによりデータを効率的に逐次的圧縮することが可能となるだけでなく、データにランダムにアクセスすることが可能となることである。 One object of the present invention is to overcome the shortcomings of current sequential compression / decompression schemes and provide a new method and apparatus for sequentially compressing / decompressing data, thereby efficiently and sequentially In addition to being able to compress, it is possible to randomly access data.

本発明は、あるデータ構造を有するデータセグメントを逐次的に圧縮する方法を提供する。第１に、複数の圧縮パラメータを取得し、第２に、取得したパラメータに従ってデータセグメントを逐次的に圧縮し、圧縮されたデータセグメントを取得する。ここで、当該データセグメントは、データの前処理により取得することが可能である。圧縮パラメータは、記憶装置から得られ、おそらく、圧縮パラメータはまた、このデータ構造を有するテンプレートの圧縮から取得可能である。このデータ構造を有するテンプレートの圧縮は、上記圧縮パラメータに加えて、圧縮されたテンプレートを取得することが可能である。圧縮されたテンプレートは、圧縮されたデータセグメントとは別に格納されるか、あるいは、圧縮されたテンプレートは破棄される。 The present invention provides a method for sequentially compressing data segments having a data structure. First, a plurality of compression parameters are acquired, and second, data segments are sequentially compressed according to the acquired parameters, and compressed data segments are acquired. Here, the data segment can be obtained by data preprocessing. The compression parameters are obtained from the storage device, and perhaps the compression parameters can also be obtained from compression of a template having this data structure. In the compression of the template having this data structure, it is possible to obtain a compressed template in addition to the compression parameter. The compressed template is stored separately from the compressed data segment, or the compressed template is discarded.

本発明はまた、あるデータ構造を有する圧縮されたデータセグメントを逐次的に解凍する方法を提供する。第１に、複数の解凍パラメータを取得し、第２に、取得した解凍パラメータに従って圧縮されたデータセグメントを逐次的に解凍し、解凍されたデータセグメントを取得する。解凍パラメータは、記憶装置から得られ、おそらく、解凍パラメータはまた、上記データ構造を有する圧縮されたテンプレートの逐次的解凍から取得することが可能である。圧縮されたテンプレートの逐次的解凍は、解凍パラメータに加えて、解凍されたテンプレートを取得することが可能である。解凍されたテンプレートは、破棄される。 The present invention also provides a method for sequentially decompressing compressed data segments having a data structure. First, a plurality of decompression parameters are acquired, and secondly, data segments compressed according to the acquired decompression parameters are sequentially decompressed to obtain decompressed data segments. The decompression parameters are obtained from the storage device, and possibly the decompression parameters can also be obtained from the sequential decompression of the compressed template having the above data structure. Sequential decompression of the compressed template can obtain the decompressed template in addition to the decompression parameters. The decompressed template is discarded.

本発明はまた、データセグメントを逐次的に圧縮する装置を提供し、当該装置は、取得装置と圧縮装置とを有し、データセグメントはあるデータ構造を有する。データセグメントを逐次的に圧縮する装置は、任意的に、前処理装置と、記憶装置と、破棄装置とを有することが可能である。取得装置は、複数の圧縮パラメータを取得するのに利用され、圧縮装置は、取得した圧縮パラメータに従ってデータセグメントを逐次的に圧縮するのに利用され、これにより、圧縮されたデータセグメントを取得する。圧縮装置はまた、上記データ構造を有するテンプレートを逐次的に圧縮するのに利用され、これにより、圧縮パラメータ及び圧縮されたテンプレートを取得する。前処理装置は、データを前処理するのに利用され、これにより、上記データセグメントを取得する。記憶装置は、圧縮パラメータを格納するのに利用され、破棄装置は、上記圧縮されたテンプレートを破棄するのに利用される。 The present invention also provides an apparatus for sequentially compressing data segments, the apparatus comprising an acquisition device and a compression device, wherein the data segment has a data structure. An apparatus for sequentially compressing data segments can optionally include a preprocessing device, a storage device, and a discarding device. The acquisition device is used to acquire a plurality of compression parameters, and the compression device is used to sequentially compress the data segments according to the acquired compression parameters, thereby acquiring the compressed data segments. The compression device is also used to sequentially compress the template having the above data structure, thereby obtaining the compression parameters and the compressed template. The preprocessing device is used to preprocess data, thereby acquiring the data segment. The storage device is used to store compression parameters, and the discard device is used to discard the compressed template.

本発明は、圧縮されたデータセグメントを逐次的に解凍する装置を提供し、当該装置は、取得装置と解凍装置とを有する。圧縮されたデータセグメントを逐次的に解凍する装置はまた、任意的に、記憶装置と破棄装置とを有することが可能である。取得装置は、複数の解凍パラメータを取得するのに利用され、解凍装置は、取得した解凍パラメータに従って圧縮されたデータセグメントを逐次的に解凍するのに利用され、これにより、特別なデータ構造を有する解凍されたデータセグメントを取得する。解凍装置はまた、圧縮されたテンプレートを解凍するのに利用され、これにより、解凍パラメータと上記データ構造を有する解凍されたテンプレートを取得し、記憶装置は、解凍パラメータを格納するのに利用され、破棄装置は、解凍されたテンプレートを破棄するのに利用される。 The present invention provides an apparatus for sequentially decompressing compressed data segments, the apparatus having an acquisition device and a decompression device. An apparatus for sequentially decompressing compressed data segments can also optionally include a storage device and a discard device. The acquisition device is used to acquire a plurality of decompression parameters, and the decompression device is used to sequentially decompress data segments compressed according to the acquired decompression parameters, thereby having a special data structure. Get the decompressed data segment. The decompressor is also used to decompress the compressed template, thereby obtaining a decompressed template having the decompression parameters and the above data structure, and the storage device is utilized to store the decompression parameters; The discard device is used to discard the decompressed template.

本発明により提供されるデータを逐次的に圧縮／解凍する上記方法及び装置は、データ構造部分（テンプレート）が特別なデータ構造を有するデータセグメントの圧縮後にフィルタリング可能であるため、データ圧縮比を向上させることが可能であり、特別なデータ構造を有する複数のデータセグメントの個別の圧縮後、各データセグメントのデータ構造部分がフィルタリング可能であるが、圧縮されたテンプレートを保持、又はすべての圧縮されたテンプレートを破棄するのみであり、これにより、データ圧縮比は、大きく向上し、各データセグメントがランダムに処理可能である。 The above method and apparatus for sequentially compressing / decompressing data provided by the present invention improves the data compression ratio because the data structure part (template) can be filtered after compression of the data segment having a special data structure. After individual compression of multiple data segments with special data structure, the data structure part of each data segment can be filtered, but keep the compressed template or all compressed It only discards the template, which greatly improves the data compression ratio and allows each data segment to be processed randomly.

図面及び請求項と共に含まれる以下の説明を参照することにより、本発明の他の課題及び効果が明らかとなり、本発明の包括的な理解を得ることができる。 Other objects and advantages of the present invention will become apparent and a comprehensive understanding of the invention can be obtained by reference to the following description, which is taken in conjunction with the drawings and the claims.

図３は、本発明の実施例によるあるデータ構造を有するデータセグメントを逐次的に圧縮する装置を示す構造的な概略ブロック図である。装置３００は、取得装置３１０と圧縮装置３２０とを有する。装置３００はまた、前処理装置３３０と、記憶装置３４０と破棄装置３５０とを有することが可能である。 FIG. 3 is a structural schematic block diagram illustrating an apparatus for sequentially compressing data segments having a data structure according to an embodiment of the present invention. The device 300 includes an acquisition device 310 and a compression device 320. The device 300 can also include a pre-processing device 330, a storage device 340, and a discard device 350.

取得装置３１０は、テンプレートのデータ構造、格納位置及び圧縮モードなどの情報を含むデータ構造を有するテンプレートの符号化された圧縮情報を示す複数の圧縮パラメータを取得するのに利用される。圧縮パラメータは、データ構造を有するテンプレートを圧縮した後の圧縮装置３２０の内部状態に対応する。 The acquisition device 310 is used to acquire a plurality of compression parameters indicating encoded compression information of a template having a data structure including information such as a template data structure, a storage position, and a compression mode. The compression parameter corresponds to the internal state of the compression apparatus 320 after compressing the template having the data structure.

圧縮装置３２０は、取得した圧縮パラメータに従って圧縮されるデータセグメントを逐次的に圧縮するのに利用され、この結果、圧縮されたデータセグメントを取得する。圧縮装置３２０はまた、データ構造を有するテンプレートを逐次的に圧縮するのに利用され、これにより、上記圧縮パラメータを取得する。圧縮されるデータは、前処理装置３３０から入力され、あそらく、それはまた他のデータソース（図示せず）から入力することも可能である。 The compression device 320 is used to sequentially compress data segments that are compressed according to the acquired compression parameters, and as a result, acquires compressed data segments. The compression device 320 is also used to sequentially compress a template having a data structure, thereby obtaining the compression parameters. The data to be compressed is input from the pre-processor 330, and possibly it can also be input from other data sources (not shown).

前処理装置３３０は、入力データを前処理するのに利用され、この結果、上記データ構造に従うデータセグメントを生成する。このデータセグメントは、動的データであり、入力データの相違により変化する。例えば、前処理装置の１つの適切な具体例は、ＸＭＬ（ｅＸｔｅｎｓｉｂｌｅＭａｒｋｕｐＬａｎｇｕａｇｅ）によるデータベースレコードのパブリッシング処理ユニットである。入力データがコンテンツを有しないブランクデータである場合、前処理装置３３０は、上記データ構造を有するテンプレートを生成することが可能である。このテンプレートは、データセグメントのデータ構造を記述するが、それは動的なコンテンツデータを含まない。 The preprocessing device 330 is used to preprocess input data, and as a result, generates a data segment according to the data structure. This data segment is dynamic data and changes depending on the difference of input data. For example, one suitable specific example of the pre-processing device is a database record publishing processing unit by XML (extensible Markup Language). When the input data is blank data having no content, the preprocessing device 330 can generate a template having the data structure. This template describes the data structure of the data segment, but it does not contain dynamic content data.

前処理装置３３０は、異なる状況に従ってあらゆるタイプのカテゴリのデータセグメントを出力可能である。例えば、それは、データベースから入力される異なるタイプのデータに従って、異なるデータセグメントを生成することが可能である。この場合、データセグメントの各タイプについて、それは対応するデータセグメントを圧縮するのに利用される対応するデータ構造を有するテンプレートを生成することが可能となる。あそらく、汎用的なテンプレートが、圧縮中にすべてのタイプのデータセグメントによる利用のため構成可能であるが、このスキームは、サブ最適なパフォーマンスを有する圧縮レシオしか提供することはできない。 The preprocessor 330 can output any type of category of data segment according to different circumstances. For example, it can generate different data segments according to different types of data input from a database. In this case, for each type of data segment, it is possible to generate a template having a corresponding data structure that is used to compress the corresponding data segment. Perhaps a generic template can be configured for use by all types of data segments during compression, but this scheme can only provide a compression ratio with sub-optimal performance.

記憶装置３４０は、圧縮されたデータセグメントと圧縮パラメータを格納するのに利用され、また圧縮されたデータ構造のテンプレートを格納するのに利用される。記憶装置３４０は、ハードディスク、ＵＳＢ（ＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓ）ディスク又はキャッシュなどであってもよい。しかしながら、圧縮されたデータセグメントと圧縮されたテンプレートは別々に格納されるべきであるということに留意すべきである。例えば、それらは、異なるファイル名により格納することが可能である。記憶装置３４０に格納されている圧縮されたテンプレートは、データセグメントの解凍中に利用するためのものである。 Storage device 340 is used to store compressed data segments and compression parameters, and is also used to store templates of compressed data structures. The storage device 340 may be a hard disk, a USB (Universal Serial Bus) disk, a cache, or the like. However, it should be noted that the compressed data segment and the compressed template should be stored separately. For example, they can be stored with different file names. The compressed template stored in the storage device 340 is for use during decompression of the data segment.

破棄装置３５０は、圧縮されたテンプレートを破棄するのに利用される。圧縮されたテンプレートは、圧縮装置３２０から入力されたものである。記憶装置内に大きすぎるスペースを占有することを回避するため、圧縮されたテンプレートをまた破棄することが可能である。 The discard device 350 is used to discard the compressed template. The compressed template is input from the compression device 320. In order to avoid occupying too much space in the storage device, the compressed template can also be discarded.

図４において、装置３００の実行処理が以下で詳述される。 In FIG. 4, the execution process of the apparatus 300 is described in detail below.

図４は、本発明の実施例によるデータセグメントを逐次的に圧縮するフローチャートである。当該データセグメントは、あるデータ構造を有する。第１に、当該データ構造を有するテンプレートが取得される（ステップＳ４１０）。このテンプレートは、有効なコンテンツを有しないデータを前処理した後に取得することができる。前処理プロセスでは、テンプレートのエンドにファイルエンドタグなどのタグを付加することが可能である。このタグは、テンプレートファイルのエンド位置を示すのに利用される。 FIG. 4 is a flowchart for sequentially compressing data segments according to an embodiment of the present invention. The data segment has a certain data structure. First, a template having the data structure is acquired (step S410). This template can be obtained after preprocessing data that does not have valid content. In the preprocessing process, a tag such as a file end tag can be added to the end of the template. This tag is used to indicate the end position of the template file.

テレビプログラムデータのデータ構造が、以下に記載される。 The data structure of the television program data is described below.

上記データ構造は、当該テンプレートのコンテンツである。

The data structure is the content of the template.

次に、テンプレートが逐次的に圧縮される（ステップＳ４２０）。特に、最初に符号化状態を開始し、その後にＺｌｉｂ逐次的圧縮アルゴリズムなどを利用してテンプレートを逐次的に圧縮し、これにより、圧縮されたテンプレートと複数の圧縮パラメータが取得される。圧縮パラメータは、テンプレートの符号化された圧縮情報と圧縮装置におけるそれの格納位置を、それの圧縮モード、データ構造、ハフマンリストなどの情報を含め示す。Ｚｌｉｂ逐次的圧縮アルゴリズムが唯一の圧縮スキームでないことに留意すべきである。実現する人は、ニーズに応じた異なる圧縮アルゴリズムを選択することが可能である。 Next, the templates are sequentially compressed (step S420). In particular, the encoding state is started first, and then the template is sequentially compressed using a Zlib sequential compression algorithm or the like, whereby a compressed template and a plurality of compression parameters are obtained. The compression parameter indicates the encoded compression information of the template and its storage position in the compression device, including information such as the compression mode, data structure, and Huffman list. Note that the Zlib sequential compression algorithm is not the only compression scheme. The implementer can select different compression algorithms according to the needs.

その後、圧縮されたテンプレートが破棄される（ステップＳ４３０）。この圧縮されたテンプレートは、全体のデータ圧縮率に影響を与えず、格納スペースを確保するためシステムにより破棄することが可能である。テンプレートのエンドにはファイルエンドタグなどのタグが存在するため、システムはタグを読んだ後に圧縮されたテンプレートを自動的に破棄することが可能である。おそらく、圧縮されたテンプレートがまた、以降における解凍のため、ローカルな記憶装置の特別な場所に格納することが可能である。特に、当該データ構造を有する複数のデータセグメントが本発明の方法を利用して順に圧縮されるとき、圧縮されたテンプレートの１つが維持され、他の圧縮されたテンプレートを破棄することが可能である。 Thereafter, the compressed template is discarded (step S430). This compressed template does not affect the overall data compression rate and can be discarded by the system to secure storage space. Since there is a tag such as a file end tag at the end of the template, the system can automatically discard the compressed template after reading the tag. Perhaps the compressed template can also be stored in a special location on a local storage device for subsequent decompression. In particular, when a plurality of data segments having the data structure are sequentially compressed using the method of the present invention, one of the compressed templates can be maintained and the other compressed templates can be discarded. .

上記ステップＳ４１０〜Ｓ４３０は、当該データ構造を有するテンプレートを逐次的に圧縮し、その後に圧縮パラメータを取得する手続である。圧縮パラメータは、同一又は類似のデータ構造を有する他のデータセグメントを圧縮するのに利用するため、ローカル記憶装置などに格納することが可能である。この場合、圧縮パラメータがローカル記憶装置に格納されている場合、同一のデータ構造を有する他のデータセグメントを圧縮する際、上記ステップＳ４１０〜Ｓ４３０は省略することが可能である。すなわち、圧縮パラメータは、当該データ構造を有するテンプレートを圧縮することなく、記憶装置から直接取得することが可能である。記憶装置から圧縮パラメータを直接取得する前に、現在の処理が前の処理の終わりの圧縮パラメータによる影響を受けないことを保証するため、まず符号化状態が初期化されるべきである。 Steps S410 to S430 are procedures for sequentially compressing a template having the data structure and obtaining a compression parameter thereafter. The compression parameters can be stored in a local storage device or the like for use in compressing other data segments having the same or similar data structure. In this case, when the compression parameter is stored in the local storage device, the steps S410 to S430 can be omitted when compressing another data segment having the same data structure. That is, the compression parameter can be obtained directly from the storage device without compressing the template having the data structure. Before obtaining the compression parameters directly from the storage device, the coding state should first be initialized to ensure that the current process is not affected by the compression parameters at the end of the previous process.

次に、当該データ構造を有するデータセグメントが取得する（ステップＳ４４０）。このデータセグメントは、例えば、ＸＭＬフォーマットによるデータベースレコードのパブリッシング処理ユニットからのＸＭＬ（ｅＸｔｅｎｓｉｂｌｅＭａｒｋｕｐＬａｎｇｕａｇｅ）に従うデータセグメントの取得など、データの前処理から取得される。おそらく、当該データセグメントはまた他のデータソースからも入力可能である。 Next, a data segment having the data structure is acquired (step S440). This data segment is acquired from pre-processing of data, such as acquisition of a data segment according to XML (extensible Markup Language) from a publishing processing unit for database records in XML format. Perhaps the data segment can also be input from other data sources.

以下の具体例を利用して、テンプレートとデータセグメントとの関係が、より詳細に説明可能である。例えば、フォーマット前処理後、テレビプログラムデータの２つのデータセグメントが取得され、データセグメント１は、 Using the following specific example, the relationship between the template and the data segment can be explained in more detail. For example, after pre-format processing, two data segments of TV program data are acquired, and data segment 1 is

であり、データセグメント２は、

And data segment 2 is

である。

It is.

上記データセグメントは共に上記テンプレートの同一のデータ構造を有し、それらの相違点は、各データセグメントが、「新しいトークショー」などの毎週金曜日の新しいテレビトークショーなどをデータセグメント１に、「ニュースレポート」などの毎晩のニュースレポートなどをデータセグメント２になど、異なる有効なコンテンツを含むということである。 Both of the above data segments have the same data structure of the above template, and the difference between them is that each data segment is a new TV talk show every Friday, such as “New Talk Show”, etc. The data segment 2 includes different effective contents such as a nightly news report.

次に、取得された圧縮パラメータに従ってデータセグメントが逐次的に圧縮される（ステップＳ４５０）。圧縮パラメータは同一のデータ構造を有するテンプレートの圧縮情報と、格納アドレスや長さなどの情報を有するため、逐次的圧縮のキャラクタは、圧縮パラメータに従ってデータセグメントを逐次的に圧縮する際に出現したキャラクタストリングを自動検索することが可能であり、テンプレートのキャラクタストリングが出現する位置及び長さを自動的に出力することが可能である。従って、逐次的符号化の原理は、データセグメント及び圧縮されたテンプレートの同一部分を格納アドレス及び長さ並びに圧縮されたテンプレートへの格納アドレスポイントに自動的に置換することが可能であり、置換後のデータセグメントのサイズは、元のデータセグメントのものよりはるかに小さくなる。データセグメントは、テンプレート圧縮処理中に生成される圧縮パラメータにより逐次的に圧縮され、圧縮されたテンプレートは破棄されてもよいため、当該データセグメントの圧縮比は大きく向上させることが可能である。 Next, the data segments are sequentially compressed according to the acquired compression parameters (step S450). Since compression parameters include compression information of templates having the same data structure and information such as storage addresses and lengths, sequential compression characters are characters that appear when data segments are sequentially compressed according to the compression parameters. The string can be automatically searched, and the position and length at which the character string of the template appears can be automatically output. Thus, the principle of sequential coding can automatically replace the same segment of the data segment and the compressed template with the storage address and length and the storage address point to the compressed template, after The data segment size is much smaller than that of the original data segment. Since the data segment is sequentially compressed by the compression parameter generated during the template compression process, and the compressed template may be discarded, the compression ratio of the data segment can be greatly improved.

最後に、圧縮されたデータセグメントが格納される（ステップＳ４６０）。圧縮されたデータセグメントと圧縮されたテンプレートの格納場所は、圧縮比に影響を与えないために異なるものとされ、例えば、異なるファイル名によりそれらを格納する。 Finally, the compressed data segment is stored (step S460). The storage locations of the compressed data segment and the compressed template are different in order not to affect the compression ratio, and for example, they are stored with different file names.

上記実施例は、１つのデータセグメントに対する圧縮を完了し、多くの場合、同一のテンプレートを有する複数のデータセグメントを逐次的に圧縮することが必要とされる。例えば、それは２０のテレビプログラムデータ（すなわち、２０のデータセグメント）を圧縮するかもしれず、これらのテレビプログラムデータは、前処理後に同一又は類似のデータ構造を有する。上記処理Ｓ４６０に従ってデータセグメントを圧縮した後、次のデータセグメントが、上記処理に従って逐次的に圧縮することができる。すべての処理の始めに、符号化状態は、例えば、現在処理が終了するとき、パラメータは有効なコンテンツに対応するいくつかの圧縮パラメータをデータセグメントにすでに有しているため、現在処理の終了時に、圧縮パラメータが次の処理に影響を与えないことを確実にするため、圧縮装置のすべての圧縮パラメータをクリアするなど、初期化されるべきであるということに留意すべきである。 The above embodiment completes compression for one data segment, and in many cases, it is required to sequentially compress multiple data segments having the same template. For example, it may compress 20 television program data (ie, 20 data segments), which have the same or similar data structure after preprocessing. After compressing the data segment according to the process S460, the next data segment can be sequentially compressed according to the process. At the beginning of every process, the encoding state is, for example, when the current process ends, the parameters already have some compression parameters in the data segment corresponding to valid content, so at the end of the current process. It should be noted that the compression parameters should be initialized, such as clearing all compression parameters of the compressor, to ensure that the compression parameters do not affect the next process.

図５は、本発明の実施例による圧縮されたデータセグメントを逐次的に解凍する装置を示す構造的な概略ブロック図である。圧縮されたデータセグメントが装置３００から取得され、当該データセグメントはこのデータ構造を有する。装置５００は、取得装置５１０と解凍装置５２０とを有する。装置５００はまた、記憶装置５４０と破棄装置５５０とを有することが可能である。 FIG. 5 is a structural schematic block diagram illustrating an apparatus for sequentially decompressing compressed data segments according to an embodiment of the present invention. A compressed data segment is obtained from the device 300, and the data segment has this data structure. The device 500 includes an acquisition device 510 and a decompression device 520. The device 500 can also include a storage device 540 and a discard device 550.

取得装置５１０は、複数の解凍パラメータを取得するのに利用される。これらの解凍パラメータは、テンプレートのデータ構造、格納位置、解凍スタイルなどの情報を含む圧縮されたテンプレートの解凍情報を示す。これらの圧縮パラメータは、記憶装置５４０から得られるかもしれない。解凍されたパラメータは、圧縮されたテンプレートの解凍後の解凍装置５２０の内部状態に対応する。 The acquisition device 510 is used to acquire a plurality of decompression parameters. These decompression parameters indicate the decompression information of the compressed template including information such as the template data structure, storage location, decompression style, and the like. These compression parameters may be obtained from the storage device 540. The decompressed parameters correspond to the internal state of the decompression device 520 after decompressing the compressed template.

解凍装置５２０は、取得された解凍パラメータに従って圧縮されたデータセグメントを逐次的に解凍するのに利用され、これにより、解凍されたデータセグメントを取得する。解凍パラメータが取得装置５１０から直接取得することができない場合、解凍装置５２０はまた、解凍されたテンプレートと解凍パラメータを取得するため、圧縮されたテンプレートを逐次的に解凍するのに利用することが可能であり、テンプレートは上記データ構造を有する。 The decompressor 520 is used to sequentially decompress data segments that have been compressed according to the obtained decompression parameters, thereby obtaining the decompressed data segments. If the decompression parameters cannot be obtained directly from the acquisition device 510, the decompression device 520 can also be used to sequentially decompress the compressed template to obtain the decompressed template and decompression parameters. And the template has the above data structure.

記憶装置５４０が、解凍されたデータセグメントを格納すると共に、解凍パラメータを格納するのに利用される。 A storage device 540 is used to store the decompressed data segments and store decompression parameters.

破棄装置５５０が、解凍されたテンプレートを破棄するのに利用される。 A discard device 550 is used to discard the decompressed template.

図６において、装置５００の実行処理が以下で説明される。 In FIG. 6, the execution process of the apparatus 500 is described below.

図６は、本発明の実施例による圧縮されたデータセグメントを逐次的に解凍するフローチャートである。圧縮されたデータセグメントは、図４の処理により生成され、このデータセグメントは当該データ構造を有する。 FIG. 6 is a flowchart for sequentially decompressing compressed data segments according to an embodiment of the present invention. The compressed data segment is generated by the process of FIG. 4, and this data segment has the data structure.

第１に、圧縮されたテンプレートが取得される（ステップＳ６１０）。このテンプレートは当該データ構造を有する。圧縮されたテンプレートは、記憶装置から取得され、またネットワーク又は他の装置から取得することが可能である。圧縮されたテンプレートはまた、当該データ構造を有するテンプレートを逐次的に圧縮することにより取得することも可能である。 First, a compressed template is acquired (step S610). This template has the data structure. The compressed template is obtained from a storage device and can be obtained from a network or other device. A compressed template can also be obtained by sequentially compressing a template having the data structure.

その後、圧縮されたテンプレートが逐次的に解凍される（ステップＳ６２０）。より詳細には、復号化状態の初期化後、圧縮されたテンプレートが逐次的に解凍され、解凍されたテンプレートと複数の回答パラメータが取得される。これらの解凍パラメータは、解凍される後続のデータセグメントを解凍するのに利用可能であり、解凍スタイル、アドレス、データ構造などの情報を有する圧縮されたテンプレートの解凍情報を含む。逐次的解凍は、Ｚｌｉｂ逐次的解凍原理を利用することが可能である。格納スペースを確保するため、解凍されたテンプレートは破棄することが可能であり（ステップＳ６３０）、この破棄はテンプレートのファイルエンドタグにより実現可能である。 Thereafter, the compressed templates are sequentially decompressed (step S620). More specifically, after initialization of the decoding state, the compressed template is sequentially decompressed, and the decompressed template and a plurality of answer parameters are obtained. These decompression parameters are available for decompressing subsequent data segments to be decompressed and include decompression information for a compressed template having information such as decompression style, address, data structure, and the like. Sequential decompression can utilize the Zlib sequential decompression principle. In order to secure storage space, the decompressed template can be discarded (step S630), and this destruction can be realized by the file end tag of the template.

上記ステップＳ６１０〜Ｓ６３０は、圧縮されたテンプレートを逐次的に解凍し、これにより、解凍パラメータを取得する手続である。これらの解凍パラメータは、例えば、ローカルな記憶装置に格納することが可能であり、同一又は類似したデータ構造を有する他の圧縮されたデータセグメントを逐次的に解凍するのに利用可能である。この場合、解凍パラメータがローカルな記憶装置に格納されている場合、上記ステップＳ４１０〜Ｓ４３０は、他の圧縮されたデータセグメントを解凍する際にスキップすることが可能であり、すなわち、解凍パラメータは、圧縮されたテンプレートを解凍することなく記憶装置から直接取得することが可能である。記憶装置から解凍パラメータを直接取得する前に、現在処理が前の処理の終了時の解凍パラメータによる影響を受けないことを確実にするため、まず初期化されるべきである。 Steps S610 to S630 are procedures for sequentially decompressing the compressed template and thereby obtaining the decompression parameters. These decompression parameters can be stored, for example, in a local storage device and can be used to sequentially decompress other compressed data segments having the same or similar data structure. In this case, if the decompression parameters are stored in a local storage device, the above steps S410-S430 can be skipped when decompressing other compressed data segments, ie, the decompression parameters are: It is possible to obtain the compressed template directly from the storage device without decompression. Before obtaining the decompression parameters directly from the storage device, it should first be initialized to ensure that the current process is not affected by the decompression parameters at the end of the previous process.

次に、圧縮されたデータセグメントが取得される（ステップＳ６４０）。この圧縮されたデータセグメントは、記憶装置から取得することも可能であるし、あるいはネットワーク又は他の装置から転送することも可能である。 Next, a compressed data segment is acquired (step S640). This compressed data segment can be obtained from a storage device or transferred from a network or other device.

その後、圧縮されたデータセグメントが逐次的に解凍される（ステップＳ６５０）。この逐次的解凍は、Ｚｌｉｂ逐次的解凍原理を利用することが可能である。圧縮されたデータセグメントの格納アドレス及び取得された解凍パラメータに従って、対応する置換情報が検出され、解凍されたデータセグメントが置換情報と置換され、その後、完成した解凍されたデータセグメントが取得される。解凍されたデータセグメントは、当該データ構造を有し、特定の有効なコンテンツを含む。 Thereafter, the compressed data segments are sequentially decompressed (step S650). This sequential decompression can use the Zlib sequential decompression principle. According to the storage address of the compressed data segment and the obtained decompression parameter, corresponding replacement information is detected, the decompressed data segment is replaced with the replacement information, and then the completed decompressed data segment is obtained. The decompressed data segment has the data structure and includes specific valid content.

最後に、解凍されたデータセグメントが格納される（ステップＳ６６０）。 Finally, the decompressed data segment is stored (step S660).

上記解凍処理は、圧縮されたデータセグメントの解凍を完了させる。多くの場合、同一／類似のデータ構造を有する多数の圧縮されたデータセグメントを逐次的に解凍することが必要である。上記処理Ｓ６６０を利用して圧縮されたデータセグメントを解凍した後、次の圧縮されたデータセグメントが、上記処理を利用して逐次的に解凍可能である。各処理の始めに、符号化状態が、例えば、現在処理の終了時のパラメータは有効なコンテンツに対応するいくつかの解凍パラメータをデータセグメントにすでに有しているため、現在処理の終了時に、解凍パラメータが次の処理に影響を与えないことを確実にするため、圧縮装置のすべての解凍パラメータをクリアするなど、初期化されるべきであるということに留意すべきである。 The decompression process completes decompression of the compressed data segment. In many cases, it is necessary to sequentially decompress a number of compressed data segments having the same / similar data structure. After decompressing the compressed data segment using the process S660, the next compressed data segment can be sequentially decompressed using the process. At the beginning of each process, the encoding state is uncompressed at the end of the current process, for example because the parameters at the end of the current process already have some decompression parameters in the data segment corresponding to valid content. Note that it should be initialized, such as clearing all decompression parameters of the compressor, to ensure that the parameters do not affect the next process.

上記実施例におけるデータを逐次的に圧縮／解凍する方法は、主として、当該データを同一のデータ構造を有するデータセグメントに同時に前処理することにより転送し、データを圧縮／解凍するためのデータセグメントを構成するテンプレートを取得する。本方法は、あるデータ構造を有するテンプレートを圧縮することによって、同一のデータ構造を有するデータセグメントの圧縮に必要とされる圧縮パラメータを取得し、当該パラメータに従ってデータセグメントを逐次的に圧縮し、圧縮されたテンプレートを破棄し、これにより、高い圧縮比を有するデータセグメントを取得する。その間に、各データセグメントが別々に逐次的に圧縮されるため、それは各データセグメントにランダムにアクセスするかもしれない。これに対応して、本方法は、圧縮されたテンプレートを解凍することによって、テンプレートと同一のデータ構造を有する圧縮されたデータセグメントの解凍に必要とされる圧縮パラメータを取得し、当該パラメータに従って圧縮されたデータセグメントを解凍し、これにより、解凍されたデータセグメントを取得する。この間、各データセグメントは別々に逐次的に圧縮されるため、それは各圧縮されたデータセグメントを別々に解凍するかもしれない。 The method of sequentially compressing / decompressing data in the above embodiment mainly transfers data by simultaneously preprocessing the data into data segments having the same data structure, and a data segment for compressing / decompressing the data is used. Get the template to configure. The method obtains a compression parameter required for compression of a data segment having the same data structure by compressing a template having a certain data structure, and sequentially compresses the data segment according to the parameter. Discard the generated template, thereby obtaining a data segment having a high compression ratio. In the meantime, since each data segment is separately and separately compressed, it may randomly access each data segment. Correspondingly, the method obtains the compression parameters required for decompressing the compressed data segment having the same data structure as the template by decompressing the compressed template and compresses according to the parameters. The decompressed data segment is decompressed, thereby obtaining the decompressed data segment. During this time, since each data segment is separately and sequentially compressed, it may decompress each compressed data segment separately.

以下の実験結果は、さらに従来技術と比較して本発明の圧縮比の効果を示すことができる。この実験は、図２に記載される技術を利用してデータセグメントの圧縮比に基づくものであり、同一のデータセグメントを逐次的に圧縮するとき、対応するレシオが本発明を利用して３８．４％向上している。これらすべての方法は、逐次的圧縮／解凍原理としてＺｌｉｂを利用している。これら２つの結果が、以下のテーブルに記載される。 The following experimental results can further show the effect of the compression ratio of the present invention compared to the prior art. This experiment is based on the compression ratio of data segments using the technique described in FIG. 2. When the same data segment is sequentially compressed, the corresponding ratio is 38. 4% improvement. All these methods utilize Zlib as a sequential compression / decompression principle. These two results are listed in the following table.

本発明が好適な実施例に関して説明されたが、ここに記載された処理に代替、改良及び変更が適用可能であるということは当業者に明らかであろう。従って、以下の請求項に与えられるような本発明の範囲及びコンセプトの範囲内に、このようなすべての代替、改良及び変更は含まれるとみなされる。

Although the present invention has been described in terms of a preferred embodiment, it will be apparent to those skilled in the art that alternatives, improvements and modifications may be applied to the processes described herein. Accordingly, all such alternatives, modifications and variations are considered to be included within the scope and concept of the invention as provided in the following claims.

図１は、従来技術によるデータを逐次的に圧縮／解凍する装置を示す構造的概略ブロック図である。FIG. 1 is a structural schematic block diagram illustrating an apparatus for sequentially compressing / decompressing data according to the prior art. 図２は、従来技術によるデータを逐次的に圧縮／解凍する他の装置を示す構造的概略ブロック図である。FIG. 2 is a structural schematic block diagram illustrating another apparatus for sequentially compressing / decompressing data according to the prior art. 図３は、本発明の実施例によるデータセグメントを逐次的に圧縮する装置を示す構造的概略ブロック図である。FIG. 3 is a structural schematic block diagram illustrating an apparatus for sequentially compressing data segments according to an embodiment of the present invention. 図４は、本発明の実施例によるデータセグメントを逐次的に圧縮するフローチャートである。FIG. 4 is a flowchart for sequentially compressing data segments according to an embodiment of the present invention. 図５は、本発明の実施例による圧縮されたデータセグメントを逐次的に解凍する装置を示す構造的概略ブロック図である。FIG. 5 is a structural schematic block diagram illustrating an apparatus for sequentially decompressing compressed data segments according to an embodiment of the present invention. 図６は、本発明の実施例による圧縮されたデータセグメントを逐次的に解凍するフローチャートである。FIG. 6 is a flowchart for sequentially decompressing compressed data segments according to an embodiment of the present invention.

Claims

A method for sequentially compressing a data segment having a data structure,
a. Obtaining a plurality of compression parameters;
b. Sequentially compressing the data segments according to the acquired compression parameters to obtain compressed data segments;
A method characterized by comprising:

The method of claim 1, comprising:
The step a includes sequentially compressing a template having the data structure, thereby obtaining the compression parameter.

The method of claim 2, comprising:
The method, wherein step a further comprises obtaining a compressed template.

The method of claim 3, further comprising:
A method comprising discarding the compressed template.

The method of claim 3, further comprising:
Storing each of the compressed template and the compressed data segment.

The method of claim 1, comprising:
The compression parameter in step a is obtained from a storage device.

The method of claim 1, further comprising:
A method comprising pre-processing data to thereby obtain the data segment.

The method of claim 1, further comprising:
A method comprising storing the compressed data segment.

A method for sequentially decompressing a compressed data segment having a data structure comprising:
a. Obtaining a plurality of decompression parameters;
b. Sequentially decompressing the compressed data segments according to the obtained decompression parameters, thereby obtaining decompressed data segments;
A method characterized by comprising:

The method of claim 9, comprising:
The step a includes sequentially decompressing a compressed template having the data structure, thereby obtaining the decompression parameters.

The method of claim 10, further comprising:
A method comprising: sequentially compressing a template having the data structure, thereby obtaining the compressed template.

The method of claim 11, comprising:
The method wherein the compressed template is obtained from a storage device.

The method of claim 10, comprising:
The method, wherein step a further comprises obtaining a decompressed template.

14. The method of claim 13, further comprising:
A method comprising discarding the decompressed template.

The method of claim 10, further comprising:
A method comprising storing the decompressed data segment.

The method of claim 9, comprising:
The method wherein the decompression parameter is obtained from a storage device.

An apparatus for sequentially compressing a data segment having a data structure,
Obtaining means for obtaining a plurality of compression parameters;
Compression means for sequentially compressing the data segments according to the acquired compression parameters, thereby obtaining compressed data segments;
A device characterized by comprising:

The apparatus of claim 17, further comprising:
An apparatus comprising preprocessing means for preprocessing data and thereby obtaining said data segment.

The apparatus of claim 17, further comprising:
An apparatus comprising storage means for storing the compression parameter.

The apparatus of claim 17, comprising:
The apparatus is also characterized in that the compression means can be used to sequentially compress the template having the data structure, thereby obtaining the compression parameter and the compressed template.

The apparatus of claim 20, further comprising:
An apparatus comprising: discarding means for discarding the compressed template.

An apparatus for sequentially decompressing a compressed data segment having a data structure comprising:
Obtaining means for obtaining a plurality of thawing parameters;
Decompression means for sequentially decompressing the compressed data segments according to the obtained decompression parameters, thereby obtaining decompressed data segments;
A device characterized by comprising:

The apparatus of claim 22, further comprising:
An apparatus comprising storage means for storing the decompression parameter.

23. The apparatus of claim 22, wherein
The decompression means can be used to sequentially decompress the compressed template, thereby obtaining the decompression parameter and the decompressed template having the data structure. apparatus.

25. The apparatus of claim 24, further comprising:
An apparatus comprising: discarding means for discarding the decompressed template.