JP2827982B2

JP2827982B2 - Data compression control method

Info

Publication number: JP2827982B2
Application number: JP22347995A
Authority: JP
Inventors: 利彦岡村
Original assignee: Nippon Electric Co Ltd
Current assignee: NEC Corp
Priority date: 1995-08-31
Filing date: 1995-08-31
Publication date: 1998-11-25
Anticipated expiration: 2015-08-31
Also published as: JPH0969784A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明はデータ圧縮処理の制
御方法に関する。[0001] 1. Field of the Invention [0002] The present invention relates to a method for controlling data compression processing.

【０００２】[0002]

【従来の技術】データを効率良く保管、通信するために
はデータ圧縮処理が有効である。データ圧縮はデータの
統計的性質を利用することにより達成される。このデー
タの統計的性質を保持するデータ構造を“モデル”と呼
ぶ。適応的データ圧縮方式ではこのモデルを圧縮が進む
につれて動的に更新していく。圧縮処理の進行とともに
データの性質をよりよく捉えたモデルを用いて圧縮が行
えるようになるので、適応的データ圧縮方式は圧縮が進
むにつれて優れた圧縮率を得られるようになる。多くの
種類のデータに対して有効な圧縮方式であるユニバーサ
ル圧縮方式は、データの性質を動的に捉える必要がある
ために適応的データ圧縮方式であることがほとんどであ
る。元のデータを完全に復元できる、無歪みなユニバー
サル圧縮方式としてはレンペル・ジブ方式や文脈木と算
術符号を組み合わせた方式などが知られている。これら
の方式もモデルを動的に更新する適応的データ圧縮方式
である。2. Description of the Related Art Data compression processing is effective for efficiently storing and communicating data. Data compression is achieved by exploiting the statistical properties of the data. A data structure that holds the statistical properties of this data is called a “model”. In the adaptive data compression method, this model is dynamically updated as the compression progresses. As the compression process progresses, compression can be performed using a model that better captures the properties of data, so that the adaptive data compression method can obtain an excellent compression ratio as the compression proceeds. The universal compression method, which is an effective compression method for many types of data, is often an adaptive data compression method because it is necessary to dynamically grasp the nature of the data. As a distortion-free universal compression method capable of completely restoring original data, a Lempel-Jib method, a method combining a context tree and an arithmetic code, and the like are known. These schemes are also adaptive data compression schemes that dynamically update the model.

【０００３】レンペル・ジブ方式の場合には、モデルは
辞書もしくはバッファという形態である。辞書を使用す
る方式はＬＺ７８型、バッファを使用する方式はＬＺ７
７型と呼ばれている。In the case of the Lempel-Jib method, the model is in the form of a dictionary or a buffer. The method using a dictionary is LZ78 type, and the method using a buffer is LZ7.
It is called type 7.

【０００４】ＬＺ７８型における辞書は入力データ中の
文字列を一定の規則に従って切り出し、インデックスを
割り振る構造になっている。辞書中の文字列が再び現れ
た場合には、その文字列を辞書中のインデックスで置き
換えることにより圧縮が達成される。The dictionary in the LZ78 type has a structure in which a character string in input data is cut out according to a certain rule, and an index is assigned. If the string in the dictionary reappears, compression is achieved by replacing the string with an index in the dictionary.

【０００５】ＬＺ７７型におけるバッファは既に圧縮を
終えた最新の入力文字列を蓄える構造となっている。バ
ッファ内の文字列が再び現れた場合には、その文字列が
始まる位置（符号化している位置より何文字前から始ま
るか？）と長さという二つの数の組から成るインデック
スで置き換えることにより圧縮が達成される。ＬＺ７７
型では基本的にこのようにバッファがあれば処理が可能
となる。しかし、圧縮処理の高速化のためにはバッファ
内の文字列検索を高速化するためのデータ構造が必要で
あり、そのデータ構造を辞書と呼ぶこともある。The LZ77 type buffer has a structure for storing the latest input character string which has already been compressed. If the string in the buffer reappears, it is replaced by an index consisting of two pairs: the position where the string starts (how many characters before the encoding position?) And the length. Compression is achieved. LZ77
Basically, a type can be processed if there is a buffer like this. However, in order to speed up the compression process, a data structure for speeding up a character string search in the buffer is required, and the data structure is sometimes called a dictionary.

【０００６】レンペル・ジブ方式の詳細については米国
で発行された単行本テキストコンプレッション（Ｔｅｘ
ｔＣｏｍｐｒｅｓｓｉｏｎ，１９９０年，Ｐｒｅｎｔ
ｉｃｅＨａｌｌ社，ｐｐ．２０６−２４３）に記述さ
れている。辞書やバッファの初期状態は基本的に空とし
て圧縮を開始し、圧縮が進むにつれて辞書には新しい文
字列が登録され、データの性質をよく捉えた辞書が生成
されていく。[0006] For details of the Lempel Jib method, see the book text compression (Tex
t Compression, 1990, Prent
ice Hall Inc., pp. 206-243). The initial state of the dictionary and buffer is basically empty, and compression is started. As the compression proceeds, a new character string is registered in the dictionary, and a dictionary that captures the characteristics of data is generated.

【０００７】一方、文脈木を用いた方式は、モデルは文
脈木と各文脈の下での各文字の出現頻度という形態であ
る。実際の符号化は出現頻度を元に各文字を算術符号化
することにより行われる。文脈木は文脈の集合で、新し
い文脈が現れる度に必要に応じて文脈木に新しい文脈が
登録される。各文脈が入力データ中に現れたときには、
その文脈における文字の出現頻度をその次に現れた文字
に従って更新する。On the other hand, in the method using a context tree, the model is in the form of a context tree and the appearance frequency of each character under each context. Actual encoding is performed by arithmetically encoding each character based on the frequency of appearance. A context tree is a set of contexts. Each time a new context appears, a new context is registered in the context tree as needed. When each context appears in the input data,
The frequency of occurrence of the character in that context is updated according to the next character.

【０００８】文脈木を用いた方式は、上述の単行本テキ
ストコンプレッション（ｐｐ．１４０−１６６）に詳述
されている。文脈木は基本的には空として圧縮を開始
し、圧縮が進むにつれて文脈木には新しい文脈木が追加
され、各文脈の下での文字の出現頻度も更新され、デー
タの性質をよく捉えた文脈木が生成されていく。The method using the context tree is described in detail in the above-mentioned book text compression (pp. 140-166). The context tree is basically empty and starts to be compressed.As the compression progresses, a new context tree is added to the context tree, the frequency of occurrence of characters under each context is updated, and the nature of the data is well understood. A context tree is generated.

【０００９】以上に述べた適応的データ圧縮方式におい
ては、データの圧縮が進行するにつれて良いモデルが構
築されていくことになるため、データの圧縮を開始した
時点よりも、ある程度処理が進んでからの方が有効な圧
縮が可能となる。In the above-described adaptive data compression method, a good model is constructed as the data compression progresses, so that the processing proceeds to some extent from the time when the data compression is started. Enables more effective compression.

【００１０】適応的データ圧縮方式で圧縮されたデータ
の復元時には、圧縮時と同じモデルを使用しなければ正
しく復元を行うことができない。そのために、モデルの
初期状態を圧縮時と等しく設定し、圧縮時と同様にモデ
ルを更新する必要がある。At the time of restoring data compressed by the adaptive data compression method, the restoration cannot be performed correctly unless the same model as that at the time of compression is used. For this purpose, it is necessary to set the initial state of the model equal to that at the time of compression, and to update the model as in the case of compression.

【００１１】[0011]

【発明が解決しようとする課題】適応的データ圧縮方式
は優れた圧縮率を得られる圧縮方式であるが、圧縮デー
タ中にビット反転などのエラーが発生すると、それ以降
のデータが全く復元できなくなるという問題点がある。
圧縮符号語が可変長の場合には、ビット反転により圧縮
符号語の境界に誤りが生じる可能性があり、この場合に
はデータは壊滅的な被害を受け得る。また、ある一つの
圧縮符号語のみが誤ったまま復元された場合でも、それ
に対応する復元データが異なるものとなり、適応的デー
タ圧縮方式の場合にはその誤って復元されデータを元に
モデルを更新してしまうため、誤りは留まることなく伝
搬していく恐れがある。誤り訂正符号化を組み合わせる
ことにより、誤りの発生をある程度抑えることができる
が、それでも多量のエラーが発生した場合には訂正不可
能になる。The adaptive data compression method is a compression method capable of obtaining an excellent compression ratio. However, if an error such as bit inversion occurs in the compressed data, the subsequent data cannot be restored at all. There is a problem.
If the compression codeword is of variable length, errors may occur at the boundaries of the compression codeword due to bit inversion, in which case the data may be catastrophically damaged. In addition, even if only one compression codeword is decompressed while it is incorrect, the corresponding decompressed data will be different.In the case of the adaptive data compression method, the model is updated based on the decompressed data that has been decompressed incorrectly. Therefore, the error may propagate without stopping. By combining error correction coding, the occurrence of errors can be suppressed to some extent, but even if a large number of errors occur, correction becomes impossible.

【００１２】このような問題に対して、従来、図２に示
したように、データを小さなブロックに分割し、ブロッ
ク毎に独立に圧縮するといった方法が採られてきた。一
つのブロックの復元に失敗しても、残りのブロックは正
しく復元することが可能となる。各ブロックの圧縮開始
時点でモデルを予め定められた初期状態に戻すことによ
り、各ブロックを独立に圧縮することができる。特開平
０５−２５２０４７号公報、特開平０５−２５２０４８
号公報に開示されている方法は、本質的にこれと同種の
方法である。通信システムへの応用に際しては復元側で
は誤りが検出されたブロックのみの再送を要求すれば済
み、データ全体を最初から復元しなくても済む。To solve such a problem, conventionally, as shown in FIG. 2, a method of dividing data into small blocks and independently compressing each block has been adopted. Even if restoration of one block fails, the remaining blocks can be restored correctly. By returning the model to a predetermined initial state at the start of compression of each block, each block can be compressed independently. JP 05-252047 A, JP 05-252048 A
The method disclosed in the publication is essentially the same type of method. When applied to a communication system, the restoration side only needs to request retransmission of only the block in which an error has been detected, and there is no need to restore the entire data from the beginning.

【００１３】しかし、適応的データ圧縮方式の場合には
モデルを構築しながら圧縮を行うため、モデルを十分に
成長させることができる大きなデータに対して初めて優
れた圧縮率を達成する。小さなブロック単位で独立に圧
縮を行う場合には、ブロックの先頭で予め決められた初
期状態に設定されるため、そのデータに合ったモデルが
構築できず、十分な圧縮率が得られないという問題点が
あった。一方、ブロックを大きくすると、そのブロック
が復元できなくなったときの損失データが大きくなると
いう問題点がある。However, in the case of the adaptive data compression method, compression is performed while constructing a model, so that an excellent compression ratio is achieved for large data for which the model can be sufficiently grown for the first time. In the case where compression is performed independently in small block units, a predetermined initial state is set at the head of the block, so that a model suitable for the data cannot be constructed and a sufficient compression ratio cannot be obtained. There was a point. On the other hand, when a block is enlarged, there is a problem that the loss data when the block cannot be restored increases.

【００１４】[0014]

【課題を解決するための手段】（１）本発明は、データの統計的性質を表すデータ構造
であるモデルを利用し、前記モデルを動的に更新しなが
ら圧縮を行う適応的データ圧縮方式におけるデータ圧縮
制御方法において、入力データをいくつかのブロックに
分割し、該ブロック単位で圧縮を行い、隣接する該ブロ
ックを一定数まとめたクラスタを形成し、異なる該クラ
スタは独立に圧縮を行い、該クラスタ内の先頭の該ブロ
ックを圧縮する場合には、前記モデルの初期状態は予め
定められた設定とし、該クラスタの先頭以外の該ブロッ
クを圧縮する場合には、前記モデルの初期状態は該ブロ
ックが属する該クラスタの先頭の該ブロックを用いて決
定される設定とすることを特徴とする。Means for Solving the Problems (1) The present invention relates to an adaptive data compression system that uses a model that is a data structure representing statistical properties of data and performs compression while dynamically updating the model. In the data compression control method, input data is divided into several blocks, compression is performed in units of the blocks, clusters in which a certain number of adjacent blocks are put together are formed, and different clusters are independently compressed, and When the head of the cluster is compressed, the initial state of the model is set to a predetermined setting. When the block other than the head of the cluster is compressed, the initial state of the model is the block. The setting is determined using the first block of the cluster to which.

【００１５】（２）また、該ブロックの大きさを一定と
することを特徴とする。(2) The size of the block is constant.

【００１６】（３）また、該ブロックの大きさは可変で
あり、該ブロックを圧縮したときの大きさが等しくなる
ように該ブロックの大きさを制御することを特徴とす
る。(3) The size of the block is variable, and the size of the block is controlled so that the size when the block is compressed becomes equal.

【００１７】（４）また、本発明は、前記モデルとして
既に圧縮を終えた入力データを蓄えるバッファを用い、
前記バッファ内の文字列が入力データ中に再び現れた場
合には、前記文字列の開始位置、長さを表すインデック
スの組を前記文字列に対する符号語とし、前記バッファ
を、常に圧縮を終えた最新のデータが格納するように更
新し、該クラスタの先頭の該ブロックを圧縮する場合に
は、前記バッファの初期状態は予め定められた設定と
し、該クラスタの先頭以外の該ブロックを圧縮する場合
には、前記バッファの初期状態は該ブロックが属する該
クラスタの先頭の該ブロックの圧縮が終了したときの前
記バッファの状態とすることを特徴とする。(4) The present invention uses a buffer for storing input data that has already been compressed as the model,
When the character string in the buffer appears again in the input data, the start position of the character string, a set of indices indicating the length is set as a codeword for the character string, and the buffer is always compressed. When updating so that the latest data is stored and compressing the head block of the cluster, the initial state of the buffer is set to a predetermined setting, and the block other than the head of the cluster is compressed. The initial state of the buffer is the state of the buffer when the compression of the first block of the cluster to which the block belongs has been completed.

【００１８】（５）また、本発明は、前記モデルとして
既に圧縮を終えた入力データ中の文字列にインデックス
を対応させる辞書を用い、前記辞書内の文字列が再び現
れた場合には、前記文字列の前記辞書におけるインデッ
クスを前記文字列に対する符号語とし、前記辞書を、入
力データ中に新たに現れた文字列にインデックスを割り
当て、登録することによって更新し、該クラスタの先頭
の該ブロックを圧縮する場合には、前記辞書の初期状態
は予め定められた設定とし、該クラスタの先頭以外の該
ブロックを圧縮する場合には、前記辞書の初期状態は該
ブロックが属する該クラスタの先頭の該ブロックの圧縮
が終了したときの前記辞書の状態とすることを特徴とす
る。(5) The present invention uses a dictionary in which an index is made to correspond to a character string in input data which has already been compressed as the model, and when the character string in the dictionary appears again, The index of the character string in the dictionary is a code word for the character string, the dictionary is updated by assigning and registering an index to a character string newly appearing in the input data, and updating the block at the head of the cluster. When compressing, the initial state of the dictionary is set to a predetermined setting, and when compressing the block other than the head of the cluster, the initial state of the dictionary is set at the head of the cluster to which the block belongs. The dictionary is in a state when the compression of the block is completed.

【００１９】（６）また、本発明は、前記モデルとして
入力データ中に現れた文字列である文脈の集合と前記文
脈の下での入力データ中の各文字の出現頻度を対応させ
る構造を用い、前記文脈の下での前記出現頻度を元に入
力データ中の各文字を符号化し、前記文脈の集合を、入
力データ中に新たに現れた文字列を前記文脈の集合に加
えることによって更新し、また、前記出現頻度を、入力
データ中で前記文脈の下で文字が現れる度に更新し、該
クラスタの先頭の該ブロックを圧縮する場合には、前記
文脈の集合および文脈の下での各文字の出現頻度の初期
状態は予め定められた設定とし、該クラスタの先頭以外
の該ブロックを圧縮する場合には、該ブロックが属する
該クラスタの先頭の該ブロックの圧縮を終了したときに
得られる前記文脈の集合および文脈の下での各文字の出
現頻度の初期状態とすることを特徴とする。(6) The present invention uses a structure in which a set of contexts, which are character strings appearing in input data as the model, and an appearance frequency of each character in the input data under the context correspond to each other. Encoding each character in the input data based on the frequency of occurrence under the context, and updating the set of contexts by adding a character string newly appearing in the input data to the set of contexts And updating the occurrence frequency each time a character appears under the context in the input data, and compressing the block at the head of the cluster. The initial state of the appearance frequency of the character is set in a predetermined manner, and when the block other than the head of the cluster is compressed, it is obtained when the compression of the head of the cluster to which the block belongs is completed. The context Characterized in that the initial state of each appearance frequency of under set and context.

【００２０】[0020]

【作用】本発明を用いた場合、一定数のブロックの集合
であるクラスタ単位では独立に圧縮を行うため、一つの
クラスタの復元に失敗しても残りのクラスタを正しく復
元することが可能である。また、クラスタ内の先頭ブロ
ックの復元に失敗した場合には、そのクラスタの残りの
ブロックの復元も行うことができないが、クラスタ内の
先頭以外のブロックの復元に失敗しても、残りのブロッ
クは正しく復元できる。クラスタの先頭以外ブロックの
圧縮に際しては、先頭ブロックを利用してモデルをある
程度構築してから圧縮が行われるため、ブロックの大き
さに比して優れた圧縮率を得ることができる。When the present invention is used, compression is performed independently for each cluster, which is a set of a fixed number of blocks. Therefore, even if restoration of one cluster fails, the remaining clusters can be restored correctly. . In addition, if the restoration of the first block in the cluster fails, the rest of the blocks in the cluster cannot be restored. Can be restored correctly. When compressing a block other than the head of a cluster, compression is performed after a model is constructed to some extent using the head block, so that a compression ratio superior to the size of the block can be obtained.

【００２１】[0021]

【発明の実施の形態】本発明は、データの統計的性質を
表すデータ構造であるモデルを利用し、モデルを動的に
更新しながら圧縮を行う適応的データ圧縮方式に適用さ
れる。モデルはある初期状態から圧縮の進行とともに更
新されていく。モデルは例えば辞書や出現頻度表といっ
た形態で実現される。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS The present invention is applied to an adaptive data compression system that uses a model, which is a data structure representing the statistical properties of data, and performs compression while dynamically updating the model. The model is updated from an initial state as the compression progresses. The model is realized in the form of, for example, a dictionary or an appearance frequency table.

【００２２】本発明の大きな特徴は基本的にブロック単
位の圧縮を行うが、隣接する一定数のブロックをまとめ
たクラスタを形成する点にある。クラスタ内のブロック
数をＫとし、クラスタ内の各ブロックに順に０から（Ｋ
−１）までの番号を割り振り、これをブロック番号と呼
ぶことにする。A major feature of the present invention is that compression is basically performed on a block basis, but a cluster in which a certain number of adjacent blocks are combined is formed. Let K be the number of blocks in a cluster, and assign 0 to (K
Numbers up to -1) are assigned, and are referred to as block numbers.

【００２３】本発明の実施例を図を参照しながら述べ
る。図１は本発明の処理の流れを示すフローチャートで
ある。An embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a flowchart showing the flow of the process of the present invention.

【００２４】まず、圧縮するブロックがクラスタの先頭
であるかどうかを判定する（ステップＳ１）。これは圧
縮するブロックのブロック番号が０かどうかで判断でき
る。圧縮開始時のブロック番号は０とする。First, it is determined whether the block to be compressed is the head of a cluster (step S1). This can be determined based on whether the block number of the block to be compressed is 0 or not. The block number at the start of compression is 0.

【００２５】クラスタの先頭のブロック（ブロック番号
＝０）であったらモデルの初期状態は予め決められた状
態に設定する（ステップＳ２）。モデルとして各文字の
出現頻度表を用いる場合、例えば各文字の出現頻度を１
に設定する。If it is the first block of the cluster (block number = 0), the initial state of the model is set to a predetermined state (step S2). When the appearance frequency table of each character is used as a model, for example, the appearance frequency of each character is set to 1
Set to.

【００２６】クラスタの先頭ブロックでない場合には、
そのブロックの属するクラスタの先頭ブロックの圧縮が
終了した状態にモデルを設定する（ステップＳ３）。モ
デルとして各文字の出現頻度表を用いる場合、先頭ブロ
ックの圧縮が進むにつれて出現頻度表も更新され、先頭
ブロック圧縮終了時点での出現頻度表を初期状態として
そのブロックの圧縮を開始する。この初期状態設定処理
の方法は二通り考えられる。一つの方法は、クラスタの
先頭ブロックの圧縮が終了した時点で、そのときのモデ
ルを別の領域に格納しておき、クラスタ内の残りのブロ
ックの圧縮に際しては、格納しておいたモデルを圧縮に
使用するモデルの領域にコピーする方法である。もう一
つの方法は、各ブロックを圧縮する前にクラスタの先頭
ブロックの圧縮処理を行いモデルを更新していき、先頭
ブロック圧縮終了の状態になったらそれぞれのブロック
の圧縮を開始する方法である。ただし、この先頭ブロッ
ク圧縮処理の過程ではモデルの更新を行うのみで実際に
符号語は出力しない。図２に示した従来の方法と比較し
て、前者の方法では余分なメモリーが必要となり、後者
の方法は圧縮時間が２倍になる。メモリーが大容量で安
価になってきている現在では前者の方法が有効であると
思われる。If it is not the first block of the cluster,
The model is set in a state where the compression of the head block of the cluster to which the block belongs has been completed (step S3). When the appearance frequency table of each character is used as a model, the appearance frequency table is updated as the compression of the first block progresses, and compression of the block is started with the appearance frequency table at the end of the compression of the first block as an initial state. There are two methods for this initial state setting process. One method is to store the current model in another area when the compression of the first block of the cluster is completed, and to compress the stored model when compressing the remaining blocks in the cluster. This is a method of copying to the area of the model to be used. Another method is a method of performing compression processing of the first block of a cluster and compressing the model before compressing each block, and starting compression of each block when the state of the first block compression is completed. However, in the process of the head block compression process, only the model is updated, and no codeword is actually output. As compared with the conventional method shown in FIG. 2, the former method requires extra memory, and the latter method requires twice the compression time. The former method seems to be effective at present when the memory capacity is becoming large and inexpensive.

【００２７】モデルの初期状態が決まったら、各ブロッ
クの圧縮を行う（ステップＳ４）。圧縮をする過程でモ
デルは順次更新されていく。When the initial state of the model is determined, each block is compressed (step S4). In the process of compression, the model is updated sequentially.

【００２８】次のブロックに移行する前に、圧縮するデ
ータがまだ存在するのかどうかを確認する（ステップＳ
５）。もし存在する場合にはブロック番号を１インクリ
メントし（Ｋになった場合は０とする）、次のブロック
の圧縮に移行する（ステップＳ６）。存在しない場合は
圧縮を終了する。Before moving to the next block, it is checked whether data to be compressed still exists (step S).
5). If there is, the block number is incremented by 1 (if it becomes K, it is set to 0), and the process shifts to compression of the next block (step S6). If not, the compression ends.

【００２９】ブロックの大きさの決め方は二通り考えら
れる。一つは、圧縮前の各ブロックを同じ大きさにする
方法であり、もう一つは圧縮後の各ブロックを同じ大き
さにする方法である。クラスタは一定数のブロックから
なるため、必然的に、前者は圧縮前のクラスタは同じ大
きさになり、後者は圧縮後のクラスタは同じ大きさにな
る。なお、実際に圧縮データを媒体に書き込むときには
リードソロモン符号などを用いた誤り訂正符号化や誤り
検出符号化などを行う。リードソロモン符号などの誤り
訂正符号に関しては単行本「符号理論」（１９９０年，
電子情報通信学会，ｐｐ．１５１−１８７）などに詳述
されている。リードソロモン符号などを用いた誤り訂正
符号化ではブロック単位で符号化を行うが、誤り訂正符
号化の単位となるブロックの大きさと圧縮におけるブロ
ックは無関係でよい。There are two ways to determine the size of a block. One is a method of making each block before compression the same size, and the other is a method of making each block after compression the same size. Since the cluster is composed of a fixed number of blocks, the former necessarily has the same size as the cluster before compression, and the latter has the same size as the cluster after compression. When actually writing the compressed data on the medium, error correction coding or error detection coding using Reed-Solomon code or the like is performed. Error correction codes such as Reed-Solomon codes are described in the book "Code Theory" (1990,
IEICE, pp. 151-187). In error correction coding using a Reed-Solomon code or the like, coding is performed in units of blocks, but the size of a block serving as a unit of error correction coding and a block in compression may be irrelevant.

【００３０】圧縮前のブロックの大きさを一定とする場
合、圧縮処理が容易であり、クラスタ単位での並列圧縮
処理にも適する。入力データの最後のブロックだけは他
のブロックと異なる大きさになることがある。When the size of the block before compression is fixed, the compression processing is easy and is suitable for the parallel compression processing in cluster units. Only the last block of input data may be different in size than the other blocks.

【００３１】圧縮後のブロックの大きさを一定とする場
合、圧縮データがその大きさに達したときに、原データ
におけるそのブロックの大きさが確定し、次のブロック
の圧縮に移行する。最後の圧縮ブロックだけ他の圧縮ブ
ロックと異なる大きさとなることがある。この方式は、
クラスタ単位の並列復元処理に適するとともに、圧縮ブ
ロックが固定長のため、ブロックにエラーが発生した場
合でも常に次のブロック、クラスタへ移行できるという
大きな利点を持つ。図３に圧縮ブロックの大きさが一定
の場合の原データと圧縮データのフォーマットの例を示
す。図３においてクラスタは４つのブロックから成る。
ブロック番号０である先頭ブロック（ｂｌｏｃｋ０）の
圧縮時にはモデルの初期状態を予め定められた設定と
し、圧縮を行う。圧縮データの大きさが所定の大きさに
達したら、そこでブロックを区切り、続くデータからは
次のブロックとなる。ブロック番号２，３，４のブロッ
ク（ｂｌｏｃｋ１，ｂｌｏｃｋ２，ｂｌｏｃｋ３）の圧
縮時には、各クラスタのブロック番号０のブロック（ｂ
ｌｏｃｋ０）の圧縮を終了したときのモデルの状態を初
期状態として、圧縮を開始する。When the size of the block after compression is fixed, when the size of the compressed data reaches the size, the size of the block in the original data is determined, and the process shifts to compression of the next block. Only the last compressed block may be different in size from other compressed blocks. This method is
In addition to being suitable for parallel restoration processing in cluster units, the compressed block has a fixed length, so that even if an error occurs in a block, there is a great advantage that it can always be shifted to the next block or cluster. FIG. 3 shows an example of the format of original data and compressed data when the size of the compressed block is constant. In FIG. 3, the cluster is composed of four blocks.
When the first block (block 0) having the block number 0 is compressed, the initial state of the model is set to a predetermined setting, and compression is performed. When the size of the compressed data reaches a predetermined size, the block is divided there, and the subsequent data is the next block. At the time of compressing the blocks (block1, block2, block3) of block numbers 2, 3, and 4, the block (b) of block number 0 of each cluster
The compression is started with the state of the model at the end of the compression of (lock 0) as an initial state.

【００３２】次に、本発明で圧縮されたデータの復元処
理について説明する。図４はこの処理の流れを示す図で
ある。クラスタ内のブロック数はＫとする。Next, the decompression processing of the data compressed by the present invention will be described. FIG. 4 is a diagram showing the flow of this processing. The number of blocks in the cluster is K.

【００３３】各圧縮ブロックを復元する前に、その圧縮
ブロック中にエラーが混入しているかどうかを判別する
（ステップＴ１）。これは誤り訂正符号化方法、誤り検
出符号化方法を用いて行うことができる。これらの手法
は前述の単行本「符号理論」に詳述されている。Before decompressing each compressed block, it is determined whether an error is mixed in the compressed block (step T1). This can be performed using an error correction coding method and an error detection coding method. These techniques are described in detail in the aforementioned book "Code Theory".

【００３４】圧縮ブロックに訂正不可能なエラーが混入
していることが判明した場合には、まず、そのブロック
がクラスタの先頭ブロックかどうかを判別する（ステッ
プＴ２）。これも圧縮時と同様にブロック番号が０であ
るかどうかで判別できる。復元開始時のブロック番号は
０とする。If it is determined that an uncorrectable error is mixed in the compressed block, it is first determined whether or not the block is the head block of the cluster (step T2). This can also be determined based on whether the block number is 0, as in the case of compression. The block number at the start of restoration is set to 0.

【００３５】クラスタの先頭ブロックでエラーの混入が
判明した場合には、その情報を知らせ（ステップＴ
３）、そのクラスタの復元は中断し、次のクラスタの復
元に移行する（ステップＴ５）。ブロック番号は０とす
る。クラスタの先頭でないブロックでエラーの混入が判
明した場合には、その情報を知らせ（ステップＴ４）、
次のブロックの復元に移行する（ステップＴ６）。ブロ
ック番号は１インクリメントする（このときブロック番
号＝Ｋとなった場合には、ブロック番号は０とする）。
圧縮ブロックの大きさを一定とした場合には常に次のブ
ロック、クラスタの復元に移ることが可能となる。If an error is found in the first block of the cluster, the information is notified (step T).
3), restoration of the cluster is interrupted, and the process shifts to restoration of the next cluster (step T5). The block number is 0. If it is determined that an error has occurred in a block other than the head of the cluster, the information is notified (step T4).
The process proceeds to the restoration of the next block (step T6). The block number is incremented by 1 (if the block number = K at this time, the block number is set to 0).
When the size of the compressed block is fixed, it is always possible to move to the restoration of the next block or cluster.

【００３６】ブロックにエラーが混入していないことが
判明した場合にも、まず、そのブロックがクラスタの先
頭ブロックかどうか判別する（ステップＴ７）。ブロッ
クがクラスタの先頭ブロックであったら、モデルは予め
決められた初期状態に設定する（ステップＴ８）。モデ
ルとして各文字の出現頻度表を用いる場合、例えばすべ
て１に設定する。ブロックがクラスタの先頭ブロックで
ない場合には、クラスタの先頭ブロックが復元されたと
きのモデルの状態にモデルを設定する（ステップＴ
９）。モデルとして各文字の出現頻度表を用いる場合、
先頭ブロックの復元が進むにつれて出現頻度表も更新さ
れ、先頭ブロック復元終了時点での出現頻度表を初期状
態としてそのブロックの復元を開始する。この処理にお
いて、圧縮時と同様に二通りの手法が考えられる。先頭
ブロックの復元終了時点でのモデルの状態を別の領域に
保持しておく方法と、先頭ブロックを再び復元していく
方法である（復元データの出力は行わない）。モデルの
初期状態が決まったら、圧縮ブロックの復元を行い（ス
テップＴ１０）、次のブロックの復元に移る（ステップ
Ｔ１１）。ブロック番号は１インクリメントする（この
ときブロック番号＝Ｋとなった場合には、ブロック番号
は０とする）。Even when it is determined that no error is mixed in the block, first, it is determined whether or not the block is the head block of the cluster (step T7). If the block is the first block of the cluster, the model is set to a predetermined initial state (step T8). When using the appearance frequency table of each character as a model, for example, all are set to 1. If the block is not the head block of the cluster, the model is set to the state of the model when the head block of the cluster is restored (step T).
9). When using the appearance frequency table of each character as a model,
As the restoration of the head block progresses, the appearance frequency table is also updated, and restoration of the block is started with the appearance frequency table at the end of restoration of the head block as an initial state. In this processing, two methods are conceivable as in the case of compression. There are a method of retaining the state of the model at the end of restoring the first block in another area, and a method of restoring the first block again (recovery data is not output). When the initial state of the model is determined, the compressed block is restored (step T10), and the process proceeds to the next block (step T11). The block number is incremented by 1 (if the block number = K at this time, the block number is set to 0).

【００３７】残りの圧縮ブロックがもう存在しない場合
には、復元処理を終了する（ステップＴ１２）。If there are no remaining compressed blocks, the decompression process ends (step T12).

【００３８】次に、本発明のデータ圧縮制御方法と具体
的な圧縮方式との組み合わせの例を述べる。Next, an example of a combination of the data compression control method of the present invention and a specific compression method will be described.

【００３９】本発明にＬＺ７７型を圧縮方式として適用
した場合について述べる。ＬＺ７７型ではバッファをモ
デルとして用いる。A case where the LZ77 type is applied as a compression method to the present invention will be described. In the LZ77 type, a buffer is used as a model.

【００４０】まず、ＬＺ７７型の説明を行う。入力デー
タをｘ（０）ｘ（１）ｘ（ｔ）…ｘ（ｎ）とする。この
ときＬＺ７７型による圧縮は次のように行われる。（１）バッファを初期状態にする。ｊ＝０（２）次の処理を繰り返す：（ｉ）ｘ（ｊ）ｘ（ｊ＋１）…とバッファ内の文字列と
の間でマッチングをとり最長一致列を求める。First, the LZ77 type will be described. The input data is x (0) x (1) x (t)... X (n). At this time, compression by the LZ77 type is performed as follows. (1) Initialize the buffer. j = 0 (2) The following processing is repeated: (i) Match between x (j) x (j + 1)... and the character string in the buffer to obtain the longest matching string.

【００４１】（ｉｉ）・最長一致列の長さｍが閾値より
も小さかったら先頭の一文字ｘ（ｊ）をそのまま出力す
る。ｍ＝１とする。(Ii) If the length m of the longest matching string is smaller than the threshold, the first character x (j) is output as it is. m = 1.

【００４２】・最長一致列の長さｍが閾値よりも大きか
ったら、最長一致列の始まる位置（符号化地点より何文
字前か？）と長さｍを表すインデックスをｘ（ｊ）ｘ
（ｊ＋１）…ｘ（ｊ＋ｍ−１）に対する符号語として出
力する。If the length m of the longest matching sequence is larger than the threshold value, the starting position of the longest matching sequence (how many characters before the encoding point?) And an index representing the length m are x (j) x
(J + 1)... X (j + m-1) are output as codewords.

【００４３】（ｉｉｉ）バッファ内の文字列をｍ文字シ
フトし、符号化を終えた文字列ｘ（ｊ）…ｘ（ｊ＋ｍ−
１）をバッファ内に挿入する。(Iii) A character string x (j)... X (j + m−) obtained by shifting the character string in the buffer by m characters and completing the encoding.
Insert 1) into the buffer.

【００４４】（ｉｖ）ｊ←ｊ＋ｍ。(Iv) j ← j + m.

【００４５】（ｖ）ｊ＞ｎとなったら終了。（２）の（ｉｉｉ）の処理によって、バッファは圧縮を
終えた最新の入力データを格納するように更新される。
（２）の（ｉｉ）において、符号語が文字そのものなの
か、位置・長さを表すインデックスなのかを判別するた
めにフラグを用いる等の工夫が必要となる。(V) End when j> n. By the process (3) of (2), the buffer is updated so as to store the latest input data after compression.
In (ii) of (2), it is necessary to devise a method such as using a flag to determine whether the code word is the character itself or an index indicating the position and length.

【００４６】本発明をこの圧縮方式に適用する場合を述
べる。The case where the present invention is applied to this compression method will be described.

【００４７】まず、圧縮するブロックがクラスタの先頭
であるかどうかを判断する（ステップＳ１）。これは図
１の説明のときと同様にブロック番号から判別できる。First, it is determined whether or not the block to be compressed is the head of the cluster (step S1). This can be determined from the block number as in the case of the description of FIG.

【００４８】クラスタの先頭ブロックの圧縮に際して
は、バッファの初期状態は予め定められた状態、例えば
図５（ａ）のようにバッファを空の状態に設定する（ス
テップＳ２）。クラスタの先頭でないブロックの圧縮に
際しては、クラスタの先頭ブロックが圧縮を終了した状
態にバッファを設定する（ステップＳ３）。このため
に、先頭ブロック圧縮終了時点のバッファの状態を別の
領域に格納しておく必要がある。先頭ブロックがｘ
（ｓ）ｘ（ｓ＋１）…ｘ（ｔ）とし、バッファが十分に
大きければ、バッファ内には図５（ｂ）のように先頭ブ
ロックがそのまま格納されている形態となる。なお、バ
ッファ内の文字列を検索する手段も先頭ブロックの圧縮
が終了したときの状態に設定する必要がある。When compressing the first block of the cluster, the initial state of the buffer is set to a predetermined state, for example, the buffer is set to an empty state as shown in FIG. 5A (step S2). When compressing a block that is not the head of a cluster, the buffer is set so that the compression of the head block of the cluster is completed (step S3). For this purpose, it is necessary to store the state of the buffer at the end of the first block compression in another area. The first block is x
(S) x (s + 1)... X (t), and if the buffer is sufficiently large, the head block is directly stored in the buffer as shown in FIG. 5B. The means for searching for a character string in the buffer also needs to be set to the state when the compression of the first block is completed.

【００４９】バッファの設定が終わったら、圧縮を開始
する（ステップＳ４）。After the setting of the buffer is completed, compression is started (step S4).

【００５０】ブロックの圧縮が終了したら、残りのデー
タが存在するかどうか判別する（ステップＳ５）。残り
のデータが存在する場合には、ブロック番号を１インク
リメントし（ステップＳ６）、次のブロックの圧縮に移
行する。When the compression of the block is completed, it is determined whether or not the remaining data exists (step S5). If there is remaining data, the block number is incremented by 1 (step S6), and the process proceeds to compression of the next block.

【００５１】本発明にＬＺ７８型を圧縮方式として適用
した場合について述べる。ＬＺ７８型では辞書をモデル
として用いる。The case where the LZ78 type is applied as a compression method to the present invention will be described. The LZ78 type uses a dictionary as a model.

【００５２】まず、ＬＺ７８型の説明を行う。入力デー
タをｘ（０）ｘ（１）ｘ（ｔ）…ｘ（ｎ）とする。この
ときＬＺ７８型による圧縮は次のように行われる。（１）辞書を初期状態にする。ｊ＝０（２）次の処理を繰り返す：（ｉ）ｘ（ｊ）ｘ（ｊ＋１）…と辞書内の文字列との間
でマッチングをとり最長一致列を求める。First, the LZ78 type will be described. The input data is x (0) x (1) x (t)... X (n). At this time, compression by the LZ78 type is performed as follows. (1) Initialize the dictionary. j = 0 (2) The following process is repeated: (i) Match between x (j) x (j + 1)... and the character string in the dictionary to find the longest matching string.

【００５３】（ｉｉ）・最長一致列（ｘ（ｊ）ｘ（ｊ＋
１）…ｘ（ｍ）とする）の辞書におけるインデックスを
符号語として出力する。(Ii) The longest matching sequence (x (j) x (j +
1)... X (m)) in the dictionary are output as codewords.

【００５４】（ｉｉｉ）ｘ（ｊ）ｘ（ｊ＋１）…ｘ
（ｍ）ｘ（ｍ＋１）に新たなインデックスを割り振り、
辞書に登録する。(Iii) x (j) x (j + 1)... X
Allocate a new index to (m) x (m + 1),
Register in the dictionary.

【００５５】（ｉｖ）ｊ←ｍ＋１。(Iv) j ← m + 1.

【００５６】（ｖ）ｊ＞ｎとなったら終了。（２）の（ｉｉｉ）の処理は、辞書に登録されている文
字列を１文字拡張した文字列を辞書に新たに登録するこ
とによって辞書を更新していることを意味する。。(V) End when j> n. The process (3) of (2) means that the dictionary is updated by newly registering a character string obtained by extending the character string registered in the dictionary by one character into the dictionary. .

【００５７】本発明をこの圧縮方式に適用する場合を述
べる。文字の集合を｛ａ，ｂ，ｃ，ｄ｝とする。The case where the present invention is applied to this compression method will be described. Let the set of characters be {a, b, c, d}.

【００５８】まず、圧縮するブロックがクラスタの先頭
であるかどうかを判断する（ステップＳ１）。これは図
１の説明のときと同様にブロック番号から判別できる。First, it is determined whether or not the block to be compressed is the head of the cluster (step S1). This can be determined from the block number as in the case of the description of FIG.

【００５９】クラスタの先頭ブロックの圧縮に際して
は、辞書の初期状態は予め定められた状態、例えば図６
（ａ）のように辞書にはすべての文字が登録されている
状態とする（ステップＳ２）。クラスタの先頭でないブ
ロックの圧縮に際しては、クラスタの先頭ブロックが圧
縮を終了した時点に辞書を設定する（ステップＳ３）。
このために、先頭ブロック圧縮終了時点の辞書の状態を
別の領域に格納しておく必要がある。先頭ブロックがａ
ｂｂｃｂｂｃｂａｃという文字列であったら、図６
（ｂ）に示した状態が、このブロックを圧縮したときの
辞書の状態であり、先頭以外のブロックはこの辞書をこ
の状態に設定してから圧縮を開始するのである。When compressing the head block of a cluster, the initial state of the dictionary is a predetermined state, for example, as shown in FIG.
It is assumed that all the characters are registered in the dictionary as shown in (a) (step S2). When compressing a block that is not the head of a cluster, a dictionary is set when the compression of the head block of the cluster is completed (step S3).
For this reason, it is necessary to store the state of the dictionary at the end of the first block compression in another area. The first block is a
If the character string is bbcbbbcbac,
The state shown in (b) is the state of the dictionary when this block is compressed. For blocks other than the head, compression is started after setting this dictionary to this state.

【００６０】辞書の設定が終わったら、圧縮を開始する
（ステップＳ４）。After setting the dictionary, compression is started (step S4).

【００６１】ブロックの圧縮が終了したら、残りのデー
タが存在するかどうか判別する（ステップＳ５）。残り
のデータが存在する場合には、ブロック番号を１インク
リメントし（ステップＳ６）、次のブロックの圧縮に移
行する。When the compression of the block is completed, it is determined whether or not the remaining data exists (step S5). If there is remaining data, the block number is incremented by 1 (step S6), and the process proceeds to compression of the next block.

【００６２】本発明に、文脈とその下での文字の出現頻
度を対応させたデータ構造をモデルとした適応的データ
圧縮方式を適用した場合について述べる。文脈の集合を
動的に更新することが可能であるが、ここでは文脈の集
合は１文字から成る集合に固定したモデル（１次マルコ
フモデル）について説明する。このモデルは各文字に対
して出現頻度表を対応させた構造で、出現頻度数はその
文字の下での各文字の出現回数をカウントするカウンタ
の集合である。A case where an adaptive data compression method is applied to the present invention using a data structure in which a context and a frequency of appearance of characters under the context are modeled. Although it is possible to dynamically update the context set, here, a model (first-order Markov model) in which the context set is fixed to a set consisting of one character will be described. This model has a structure in which an appearance frequency table is associated with each character, and the number of appearance frequencies is a set of counters for counting the number of appearances of each character under the character.

【００６３】まず、１次マルコフモデルを利用した適応
的データ圧縮方式の説明を行う。今、入力データの文字
の集合が｛ａ，ｂ，ｃ，ｄ｝の４文字から成るとする。
｛ａ，ｂ，ｃ，ｄ｝のそれぞれに対して、その文字の次
に現れた文字の出現回数をカウントするカウンタから成
る出現頻度表を用意する。ｃ（ｘ，ｙ）でｘという文字
の次にｙという文字が現れた回数で表すとする。例えば
ａに対しては、ｃ（ａ，ａ）、ｃ（ａ，ｂ）、ｃ（ａ，
ｃ）、ｃ（ａ，ｄ）の値を格納する４個のカウンタから
なる出現頻度表を対応させる。First, an adaptive data compression method using a first-order Markov model will be described. Now, it is assumed that the set of characters of the input data consists of four characters {a, b, c, d}.
For each of {a, b, c, d}, an appearance frequency table including a counter for counting the number of appearances of the character appearing next to the character is prepared. Let c (x, y) be the number of times the character y appears after the character x. For example, for a, c (a, a), c (a, b), c (a,
c) and an appearance frequency table composed of four counters storing the values of c (a, d) is associated.

【００６４】入力データｘ（０）ｘ（１）…ｘ（ｎ）の
符号化は次のような流れで行われる。（１）カウンタを初期状態に設定する。ｊ＝０（２）次の処理を繰り返す：（ｉ）ｃ（ｘ（ｊ−１），ａ），ｃ（ｘ（ｊ−１），
ｂ），ｃ（ｘ（ｊ−１），ｃ），ｃ（ｘ（ｊ−１），
ｄ）を用いてｘ（ｊ）を算術符号化する。（ｘ（ｊ）の
出願確率をｃ（ｘ（ｊ−１），ｘ（ｊ））／ｃ（ｘ（ｊ
−１））とする。ここで、ｃ（ｘ（ｊ−１））＝ｃ（ｘ
（ｊ−１），ａ）＋ｃ（ｘ（ｊ−１），ｂ）＋ｃ（ｘ
（ｊ−１），ｃ）＋ｃ（ｘ（ｊ−１），ｄ）である。）（ｉｉ）ｃ（ｘ（ｊ−１），ｘ（ｊ））←ｃ（ｘ（ｊ−
１）），ｘ（ｊ））＋１（ｉｉｉ）ｊ←ｊ＋１。The encoding of the input data x (0) x (1)... X (n) is performed in the following flow. (1) Set the counter to the initial state. j = 0 (2) Repeat the following processing: (i) c (x (j−1), a), c (x (j−1),
b), c (x (j-1), c), c (x (j-1),
arithmetically encode x (j) using d). Let the application probability of (x (j) be c (x (j-1), x (j)) / c (x (j
-1)). Here, c (x (j-1)) = c (x
(J-1), a) + c (x (j-1), b) + c (x
(J-1), c) + c (x (j-1), d). ) (Ii) c (x (j−1), x (j)) ← c (x (j−
1)), x (j)) + 1 (iii) j ← j + 1.

【００６５】（ｉｖ）ｊ＞ｎとなったら終了。ここで、ｘ（−１）は予め定められた文字（例えばａと
する）。（２）−（ｉｉ）の処理によって出現頻度表が
更新される。（２）−（ｉ）の算術符号化に関しては前
述の単行本「テキストコンプレッション」（ｐｐ．１０
２−１３９）に詳述されている。(Iv) End when j> n. Here, x (-1) is a predetermined character (for example, a). The appearance frequency table is updated by the processing of (2)-(ii). Regarding the arithmetic coding of (2)-(i), the above-mentioned book “Text compression” (pp. 10)
2-139).

【００６６】本発明の方式をこの圧縮方式に適用する場
合を述べる。A case where the method of the present invention is applied to this compression method will be described.

【００６７】まず、圧縮するブロックがクラスタの先頭
であるかどうかを判断する（ステップＳ１）。これは図
１の説明のときと同様にブロック番号から判別できる。First, it is determined whether or not the block to be compressed is the head of the cluster (step S1). This can be determined from the block number as in the case of the description of FIG.

【００６８】クラスタの先頭ブロックの圧縮に際して
は、各文字の出現頻度の初期状態は予め定められた設定
とする。例えば図７（ａ）のようにカウンタをすべて１
にセットする（ステップＳ２）。文脈木自身も動的に更
新する場合は、例えば文脈木も空の状態に設定する。ク
ラスタの先頭以外のブロックの圧縮に際しては、各文字
の出現頻度の初期状態は図７（ｂ）のようにカウンタを
先頭ブロックが終了した状態に設定する（ステップＳ
３）。図７（ｂ）において、ｃ０（ｘ，ｙ）は先頭ブロ
ック圧縮終了時のカウンタの値（ｃ（ｘ，ｙ））であ
る。文脈木自身も動的に更新する場合は、文脈木も先頭
ブロックの圧縮が終了したときの文脈木を初期状態とす
る。When compressing the first block of the cluster, the initial state of the appearance frequency of each character is set to a predetermined setting. For example, as shown in FIG.
(Step S2). When the context tree itself is dynamically updated, for example, the context tree is also set to an empty state. When compressing a block other than the head of the cluster, the initial state of the appearance frequency of each character is set to a state where the head block ends as shown in FIG. 7B (step S).
3). In FIG. 7B, c0 (x, y) is the value (c (x, y)) of the counter at the end of the compression of the first block. When the context tree itself is dynamically updated, the context tree at the time when the compression of the first block is completed is set as the initial state.

【００６９】カウンタの設定が終わったら、圧縮を開始
する（ステップＳ４）。圧縮の進行に連れてカウンタの
値も更新されていく。When the setting of the counter is completed, the compression is started (step S4). The value of the counter is updated as the compression proceeds.

【００７０】ブロックの圧縮が終了したら、残りのデー
タが存在するかどうかを判別する（ステップＳ５）。残
りのデータが存在する場合には、ブロック番号を１イン
クリメントし（ステップＳ６）、次のブロックの圧縮に
移行する。When the compression of the block is completed, it is determined whether or not the remaining data exists (step S5). If there is remaining data, the block number is incremented by 1 (step S6), and the process proceeds to compression of the next block.

【００７１】[0071]

【発明の効果】次の二つの仮定を置く。The following two assumptions are made.

【００７２】・圧縮ブロックに訂正不可能なビット（エ
ラー）が現れたときには、そのブロック全体の復元（圧
縮ブロックを元に戻す操作）が不可能。When an uncorrectable bit (error) appears in a compressed block, the entire block cannot be restored (operation to restore the compressed block).

【００７３】・訂正不可能なビットはランダムに発生、
その確率は十分小さい。Uncorrectable bits occur randomly,
The probability is small enough.

【００７４】・圧縮ブロックの大きさを一定とする。The size of the compressed block is fixed.

【００７５】この仮定の下で、本発明を用いた場合の復
元不可能なビット数の平均値は、従来のブロック単位に
独立に圧縮を行うデータ圧縮制御方式で、圧縮ブロック
の大きさを本発明の圧縮ブロックの大きさを２倍にした
ときの値とほぼ一致する。つまり、本発明の圧縮制御に
よる誤り伝搬制御能力は、ブロック単位に独立に圧縮を
行う従来の圧縮制御で圧縮ブロックの大きさを２倍にし
たときの誤り伝搬制御能力と同程度である。Under this assumption, the average value of the number of unrecoverable bits in the case of using the present invention is calculated by the conventional data compression control method for independently compressing data in block units, and determining the size of the compressed block. It is almost equal to the value when the size of the compressed block of the invention is doubled. That is, the error propagation control ability by the compression control of the present invention is almost the same as the error propagation control ability when the size of the compressed block is doubled by the conventional compression control in which compression is independently performed in block units.

【００７６】圧縮方式としてＬＺ７７型（１文字＝１バ
イト、バッファサイズ＝８１９２、一致列の最大値＝３
２、ポインタ値は等長符号化）を使用して実験を行っ
た。本発明では圧縮ブロックの大きさを１０２４バイト
とし、クラスタ内のブロック数は１６とした。また、こ
のＬＺ７７型を組み込んだ、ブロック単位に独立に圧縮
を行う従来のデータ圧縮制御方法では圧縮ブロックの大
きさを２倍の２０４８バイトとした。両者の圧縮率を比
較したところ、本発明の方が２〜５ポイント程度優れて
いることが多かった。つまり、本発明を用いることによ
り、同程度の誤り伝搬制御能力で従来よりも２〜５ポイ
ント程度優れた圧縮率が得られるのである。As a compression method, LZ77 type (1 character = 1 byte, buffer size = 8192, maximum value of matching string = 3
2. An experiment was performed using pointer value isometric coding. In the present invention, the size of the compressed block is 1024 bytes, and the number of blocks in the cluster is 16. In the conventional data compression control method incorporating the LZ77 type and performing independent compression in block units, the size of the compressed block is doubled to 2048 bytes. Comparing the two compression ratios, the present invention was often superior by about 2 to 5 points. That is, by using the present invention, it is possible to obtain a compression ratio superior to that of the related art by about 2 to 5 points with the same error propagation control ability.

【００７７】また、圧縮方式としてＬＺ７８型（１文字
＝１バイト）を使用して実験を行った。ＬＺ７８型で使
用する辞書のサイズは十分大きくとった。圧縮ブロック
の大きさは、ＬＺ７７型の場合と同様に本発明では１０
２４バイトとし、従来のデータ圧縮制御方法では２倍の
２０４８バイトとした。本発明におけるクラスタ内のブ
ロック数もＬＺ７７型の場合と同様に１６とした。両者
の圧縮率を比較したところ、本発明の方が１〜４ポイン
ト程度優れていることが多かった。つまり、本発明を用
いることにより、同程度の伝搬制御能力で従来よりも１
〜４ポイント程度優れた圧縮率が得られるのである。Further, an experiment was conducted using the LZ78 type (one character = 1 byte) as a compression method. The size of the dictionary used in the LZ78 type was sufficiently large. In the present invention, the size of the compressed block is 10 as in the case of the LZ77 type.
In the conventional data compression control method, the number is set to 24 bytes, which is twice as large as 2048 bytes. The number of blocks in the cluster according to the present invention is also 16 as in the case of the LZ77 type. When the compression ratios of the two were compared, the present invention was often superior by about 1 to 4 points. In other words, by using the present invention, it is possible to achieve one-
An excellent compression ratio of about 4 points can be obtained.

[Brief description of the drawings]

【図１】本発明の圧縮処理制御の流れを示す図である。FIG. 1 is a diagram showing a flow of compression processing control of the present invention.

【図２】従来の圧縮処理制御の流れを示す図である。FIG. 2 is a diagram showing a flow of conventional compression processing control.

【図３】本発明において圧縮ブロックを等長化した場合
の原データと圧縮データのフォーマットを示す図であ
る。FIG. 3 is a diagram showing formats of original data and compressed data when a compressed block is made equal length in the present invention.

【図４】本発明の制御の下で圧縮したデータの復元処理
の流れを示す図である。FIG. 4 is a diagram showing a flow of a restoration process of data compressed under control of the present invention.

【図５】本発明において、圧縮方式としてＬＺ７７型を
使用した場合のバッファの初期状態を示す図である。FIG. 5 is a diagram showing an initial state of a buffer when an LZ77 type is used as a compression method in the present invention.

【図６】本発明において、圧縮方式としてＬＺ７８型を
使用した場合の辞書の初期状態を示す図である。FIG. 6 is a diagram showing an initial state of a dictionary when an LZ78 type is used as a compression method in the present invention.

【図７】本発明において、圧縮方式として文脈モデル
（１次マルコフモデル）を使用した場合のカウンタの初
期状態を示す図である。FIG. 7 is a diagram showing an initial state of a counter when a context model (first-order Markov model) is used as a compression method in the present invention.

[Explanation of symbols]

なし None

Claims

(57) [Claims]

1. A data compression control method in an adaptive data compression system for performing compression while dynamically updating a model by using a model which is a data structure representing a statistical property of data. Is divided into blocks, and compression is performed on a block-by-block basis, a cluster is formed by grouping a certain number of adjacent blocks, and different clusters are independently compressed, and the first block in the cluster is compressed. In
The initial state of the model is set to a predetermined setting, and when compressing the block other than the head of the cluster, the initial state of the model is determined using the block at the head of the cluster to which the block belongs. A data compression control method, characterized in that:

2. The data compression control method according to claim 1, wherein the size of said block is fixed.

3. The data compression method according to claim 1, wherein the size of the block is variable, and the size of the block is controlled so that the size when the block is compressed becomes equal. Control method.

4. A buffer for storing input data that has already been compressed is used as the model. If a character string in the buffer appears again in the input data, the buffer indicates the start position and length of the character string. A set of indices is a codeword for the character string. The buffer is updated so that the latest data that has been compressed is always stored. When the block at the head of the cluster is compressed, an initial value of the buffer is used. When the state is set to a predetermined setting, and when the block other than the head of the cluster is compressed, the initial state of the buffer is the buffer at the end of the compression of the block at the head of the cluster to which the block belongs. 2. The data compression control method according to claim 1, wherein:

5. A dictionary for associating an index with a character string in input data which has already been compressed as said model, and when a character string in said dictionary appears again, an index of said character string in said dictionary Is a code word for the character string, the dictionary is updated by assigning and registering an index to a character string newly appearing in the input data, and when the block at the head of the cluster is compressed, The initial state of the dictionary is set to a predetermined setting. When the blocks other than the head of the cluster are compressed, the initial state of the dictionary is set when the compression of the head of the cluster to which the block belongs is completed. 2. The data compression control method according to claim 1, wherein the state of the dictionary is set as follows.

6. A structure in which a set of contexts, which are character strings appearing in input data, and a frequency of appearance of each character in input data under said context are used as said model, Encoding each character in the input data based on the appearance frequency, updating the set of contexts by adding a character string newly appearing in the input data to the set of contexts,
Updating the frequency of appearance each time a character appears under the context in the input data; and compressing the block at the beginning of the cluster, the set of contexts and the appearance of each character under the context. The initial state of the frequency is set to a predetermined setting. When the block other than the head of the cluster is compressed, the context obtained when the compression of the block at the head of the cluster to which the block belongs is completed. 2. The data compression control method according to claim 1, wherein an initial state of the appearance frequency of each character under a set and a context is set.