JPH05241775A

JPH05241775A - Data compression system

Info

Publication number: JPH05241775A
Application number: JP4257692A
Authority: JP
Inventors: Shigeru Yoshida; 茂吉田; Yoshiyuki Okada; 佳之岡田; Yasuhiko Nakano; 泰彦中野; Hirotaka Chiba; 広隆千葉
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1992-02-28
Filing date: 1992-02-28
Publication date: 1993-09-21

Abstract

PURPOSE:To improve the compressing rate, for plural types of data to be processed by producing beforehand a character string which emerges at the high frequency and in common to those plural types of data as the initial value of a dictionary and carrying out the actual coding/decoding operations after the initial value is registered into the dictionary. CONSTITUTION:When a dictionary 10 is initialized, an initial value production means 12 codes the sample data A and B representing plural types of data to regard a partial string that emerges at the high frequency in common to both data A and B as a coded partial string among those partial strings registered in the dictionary 10 and registeres this partial string in the dictionary 10 as the initial value. Otherwise the means 12 can produce the initial. value by preparing plural proper threshold values accordant with the number of types of sample date. In such a method, the dictionary 10 toes not contain such initially registered character strings that are hardly used for coding the data to be processed. Thus the compressing rate is improved for plural types of data.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、ＬＺＷ符号等の動的辞
書型アルゴリズムを用いたデータ圧縮方式に関する。近
年、文字コード、ベクトル情報，画像など様々な種類の
データがコンピュータで扱われるようになっており、扱
われるデータ量も急速に増加してきている。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a data compression method using a dynamic dictionary type algorithm such as LZW code. In recent years, various types of data such as character codes, vector information, and images have been handled by computers, and the amount of data handled has been increasing rapidly.

【０００２】大量のデータを扱うときは、データの中の
冗長な部分を省いてデータ量を圧縮することで、記憶容
量を減らしたり、速く伝送したりできるようになる。様
々なデータを１つの方式でデータ圧縮できる方法として
ユニバーサル符号化が提案されている。ここで、本発明
の分野は、文字コードの圧縮に限らず、様々なデータに
適用できるが、以下では、情報理論で用いられている呼
称を踏襲し、データの１ワード単位を文字と呼び、デー
タが任意ワードつながったものを文字列と呼ぶことにす
る。When handling a large amount of data, the redundant portion of the data is omitted and the data amount is compressed, so that the storage capacity can be reduced or the data can be transmitted at high speed. Universal encoding has been proposed as a method of compressing various data by one method. Here, the field of the present invention is not limited to compression of character codes and can be applied to various data, but in the following, the word used in information theory is followed, and one word unit of data is called a character, A string in which data is connected to arbitrary words is called a character string.

【０００３】ユニバーサル符号の代表的な方法として、
ジブ−レンペル（Ｚｉｖ−Ｌｅｍｐｅｌ）符号がある
（詳しくは、例えば、宗像「Ｚｉｖ−Ｌｅｍｐｅｌのデ
ータ圧縮法」，情報処理，Ｖｏｌ．２６，Ｎｏ．１，１
９８５年を参照のこと）。ジブ−レンペル符号ではスライド辞書型（ユニバーサル型ともいう）と、動的辞書型（増分分解型ともいう）の２つのアルゴリズムが提案されている。これらの方式
の実用向きの改良方法が発表され、補助記憶装置のファ
イル圧縮や、パソコン通信でのデータ伝送に利用される
ようになっている。As a typical method of the universal code,
There is a Ziv-Lempel code (specifically, for example, Munakata "Ziv-Lempel Data Compression Method", Information Processing, Vol. 26, No. 1, 1).
985). For the Jib-Lempel code, two algorithms have been proposed: a slide dictionary type (also called universal type) and a dynamic dictionary type (also called incremental decomposition type). A method for improving the practical use of these methods has been announced, and it has come to be used for file compression of an auxiliary storage device and data transmission by personal computer communication.

【０００４】[0004]

【従来の技術】まず従来の動的辞書型のアリゴリズムに
ついて説明する。［動的辞書型（増分分解）アルゴリズム］このアルゴリ
ズムは、圧縮率はユニバーサル型より劣るが、シンプル
で、計算も容易であることが知られている。2. Description of the Related Art First, a conventional dynamic dictionary algorithm will be described. [Dynamic dictionary type (incremental decomposition) algorithm] Although this algorithm is inferior in compression rate to the universal type, it is known that it is simple and easy to calculate.

【０００５】増分分解型ジブ−レンペル符号では、入力
シンボルの系列をＸ＝ａａｂａｂａｂａａ・・・とすると、成分系列Ｘ＝Ｘ₀ Ｘ₁ Ｘ₂ ・・・への増分分
解は次のようにする。まずＸ₁ を既成分の右端のシンボ
ルを取り除いた最長の列とし、Ｘ＝ａ・ａｂ・ａｂａ・ｂ・ａａ・・・・となる。従って、Ｘ₀ ＝λ（空列）Ｘ₁ ＝Ｘ₀ ａＸ₂ ＝Ｘ₁ ｂＸ₃ ＝Ｘ₂ ａＸ₄ ＝Ｘ₀ ｂＸ₅ ＝Ｘ₁ ａ・・・と分解できる。増分分解した各成分系列は既成分系列を
用いて次のような組で符号化する。In the incremental decomposition type Jib-Lempel code, assuming that the input symbol sequence is X = aabababaa ..., Incremental decomposition into a component sequence X = X ₀ X ₁ X ₂ ... First, let X _{1 be} the longest column in which the rightmost symbol of the existing component has been removed, and X = a · ab · aba · b · aa. Therefore, X ₀ = λ (empty column) X ₁ = X ₀ a X ₂ = X ₁ b X ₃ = X ₂ a X ₄ = X ₀ b X ₅ = X ₁ a. Each incrementally decomposed component series is encoded by the following set using the existing component series.

【０００６】[0006]

【表１】 [Table 1]

【０００７】すなわち、増分分解型アルゴリズムは、符
号化パターンについて、過去に分解した部分列の内、最
大長一致するものを求め、過去に分解した部分列の複製
として符号化するものである。動的辞書型アルゴリズム
の改良としては、ＬＺＷ（Lempel-Ziv-Welch) 符号(T.A.Welch,"A Tech
nique for High-Performance Data Compression",Compu
ter,June 1984 参照）ＬＺＪ符号(M.Jakobsson,"Comperssion of Character
Strings by An Adaptive Dictionary,BIT,25 号，１９
８５年，５９３−６０３頁参照のこと）とがある。次にＬＺＷ符号について説明する。〔ＬＺＷ符号〕ＬＺＷ符号の符号化の処理のフローを図
７に示す。即ちＬＺＷ符号化は、書き替え可能な辞書を
もち、入力文字コードのデータ中を相異なる文字列に分
け、この文字列を出現した順に番号を付けて辞書に登録
すると共に、現在入力している文字列を辞書に登録して
ある最長一致文字列の番号だけで表して、符号化するも
のである。尚、動的辞書型符号およびＬＺＷ符号の技術
は、特開昭５９−２３１６８３，米国特許４，５５８，
３０２で開示されている。In other words, the incremental decomposition type algorithm is to obtain a coding pattern having a maximum length match among the previously decomposed partial strings, and encode it as a duplicate of the previously decomposed partial string. As an improvement of the dynamic dictionary type algorithm, LZW (Lempel-Ziv-Welch) code (TAWelch, "A Tech
nique for High-Performance Data Compression ", Compu
ter, June 1984) LZJ code (M. Jakobsson, "Comperssion of Character
Strings by An Adaptive Dictionary, BIT, 25, 19
1985, pp.593-603). Next, the LZW code will be described. [LZW Code] FIG. 7 shows a flow of processing for encoding the LZW code. That is, the LZW encoding has a rewritable dictionary, divides the data of the input character code into different character strings, numbers the character strings in the order in which they appear, registers them in the dictionary, and is currently inputting them. The character string is represented by only the number of the longest matching character string registered in the dictionary and is encoded. The technique of the dynamic dictionary type code and the LZW code is disclosed in JP-A-59-231683, US Pat.
302.

【０００８】図７のＬＺＷ符号化処理は次のようにな
る。Ｓ１：予め全文字につき一文字からなる文字列を初期値
として登録してから符号化を始める。辞書の登録数ｎを
文字種数Ａと置く。カーソルをデータの先頭の位置に置
く。Ｓ２：カーソルの位置からの文字列に一致する辞書登録
の最長文字列Ｓを見つける。The LZW encoding process of FIG. 7 is as follows. S1: Encoding is started after a character string consisting of one character for all characters is registered in advance as an initial value. The registered number n of the dictionary is set as the character type number A. Place the cursor at the beginning of the data. S2: Find the longest character string S in the dictionary registration that matches the character string from the cursor position.

【０００９】Ｓ３：文字列Ｓの辞書番号を〔ｌｏｇ₂
ｎ〕ビットで表して出力する。ただし、〔ｌｏｇ₂ ｎ〕
はｌｏｇ₂ ｎ以上の最小の整数である。辞書登録数ｎを
一つインクリメントする。Ｓ４：文字列Ｓにカーソルの最初の文字Ｃを付加した文
字列ＳＣを辞書に登録する。カーソルは文字列Ｓの後の
文字に移動させる。Ｓ２に戻る。S3: The dictionary number of the character string S is [log ₂
[n] bits and output. However, [log ₂ n]
Is the smallest integer greater than or equal to log ₂ n. The dictionary registration number n is incremented by one. S4: The character string SC in which the first character C of the cursor is added to the character string S is registered in the dictionary. The cursor moves to the character after the character string S. Return to S2.

【００１０】図８はＬＺＷ符号の復号化を示したフロー
チャートであり、符号化の逆の処理となる。動的辞書型
アルゴリズムは、辞書内の系列は過去に符号化した（サ
ンプリングした）系列の中だけから選ぶため、処理速度
が速い。しかし、過去に現れたデータの一部の系列しか
含めないため圧縮率が高く取れない欠点がある。FIG. 8 is a flowchart showing the decoding of the LZW code, which is the reverse process of the coding. The dynamic dictionary algorithm has a high processing speed because the sequence in the dictionary is selected only from the sequences coded (sampled) in the past. However, there is a drawback that the compression ratio cannot be high because only a part of the series of data that has appeared in the past is included.

【００１１】動的辞書型アルゴリズムの改良版として、
辞書への学習量を増やしインデックスのみで符号化でき
るようにしたＬＺＪ符号がある。〔ＬＺＪ符号〕ＬＺＪ符号の符号化アルゴリズムを図９
のフローチャートに示し、また復号化アルゴリズムを図
１０のフローチャートに示す。As an improved version of the dynamic dictionary algorithm,
There is an LZJ code in which the learning amount for the dictionary is increased so that it can be coded only by the index. [LZJ Code] FIG. 9 shows an encoding algorithm of the LZJ code.
10 and the decoding algorithm is shown in the flowchart of FIG.

【００１２】ここで、辞書と文字列の表記法を次のよう
に定義する。文字種の集合をＡとし、集合Ａの文字を組
み合わせてできる文字列をＳで表す。文字列Ｓのｉ番目
の文字をＳ（ｉ）とする。更に複数の部分文字列Ｓ
（ｉ），Ｓ（ｉ＋１），・・・，Ｓ（ｊ）をＳ（ｉ，
ｊ）とする。辞書をＤ_h （Ｓ）で表わし、辞書の木（ｔ
ｒｅｅ）の根（ｒｏｏｔ）から葉（ｌｅａｆ）へのパス
として文字列Ｓ中の一定の長さｈの全ての部分文字列を
登録する。Here, the notation of the dictionary and the character string is defined as follows. Let A be a set of character types, and S be a character string formed by combining the characters of the set A. Let the i-th character of the character string S be S (i). Further, a plurality of partial character strings S
(I), S (i + 1), ..., S (j) are converted into S (i,
j). The dictionary is represented by D _h (S), and the dictionary tree (t
All the partial character strings having a constant length h in the character string S are registered as the path from the root of the lee to the leaf.

【００１３】図９のＬＺＪ符号化処理は次のようにな
る。Ｓ１：辞書に全文字種の一文字を初期値として登録して
から符号化を始める。辞書の登録数ｎを文字種数Ａとお
く。カーソルｋ＝０とおく。Ｓ２〜Ｓ５：ｋ番目の入力文字まで符号化が終了したと
して文字列Ｓ（１，ｋ）の全ての部分文字列がすでに辞
書Ｄ_h （Ｓ（１，ｋ））に登録してある。Ｓ（ｋ＋
１），・・・の文字列から符号化する。The LZJ encoding process of FIG. 9 is as follows. S1: Encoding is started after registering one character of all character types as an initial value in the dictionary. The number n of registered characters in the dictionary is set as the number of character types A. Set the cursor k = 0. S2 to S5: All partial character strings of the character string S (1, k) have already been registered in the dictionary D _h (S (1, k)), assuming that the encoding has been completed up to the kth input character. S (k +
1), encoding from the character string.

【００１４】詳細に説明すると、次のようになる。Ｓ２：Ｓ（ｋ＋１），・・から辞書Ｄ_h （Ｓ（１，
ｋ）) の登録文字列に最長一致する部分文字列Ｓ（ｋ＋
１，ｋ＋ｚ）を見つける。Ｓ３：部分文字列Ｓ（ｋ＋１，Ｋ＋ｚ）の辞書番号ａ_x
を［ｌｏｇ₂ ｎ］ビットで表して出力する。ただし、ｎ
は辞書の現在の登録数であり、［ｌｏｇ₂ ｎ］はｌｏｇ
₂ ｎ以上の最小の整数である。ここで、符号語ａ_x は部
分文字列Ｓｉ_x ，ｊ_x ）を表す。各々のａ_x は辞書Ｄ_h
（Ｓ（１，ｊ_x-1 ）），（ｉ_x ≦ｊ_x ≦ｉ_x ＋ｈ，ｉ_x
＝ｊ_x-1 ＋１）の辞書番号である。The details will be described below. S2: S (k + 1), ... From the dictionary D _h (S (1,
k)) the longest matching substring S (k +
1, k + z). S3: dictionary number a _{x of the} partial character string S (k + 1, K + z)
Is output with [log ₂ n] bits. However, n
Is the current number of registrations in the dictionary, and [log ₂ n] is log
It is a minimum integer of ₂ n or more. Here, the code word a _x represents a partial character string S i _x , j _x ). Each a _x is a dictionary D _h
(S (1, j _x-1 )), (i _x ≤j _x ≤i _x + h, i _x
= J _x-1 +1).

【００１５】Ｓ４：部分文字列Ｓ（ｋ−ｈ＋２，ｋ＋
１），・・・，Ｓ（ｋ＋ｚ−ｈ＋１，ｋ＋ｚ）にｎをイ
ンクリメントしながら辞書番号を付けて辞書に追加し、
辞書Ｄ_h （Ｓ（１，ｋ＋ｚ））を構成する。Ｓ５：カーソルｋ＝ｋ＋ｚとおく。Ｓ６：全文字を処理するまでＳ１〜Ｓ５を繰り返す。S4: Partial character string S (k-h + 2, k +
1), ..., S (k + z-h + 1, k + z) is added to the dictionary by adding a dictionary number while incrementing n.
Construct the dictionary D _h (S (1, k + z)). S5: The cursor k = k + z is set. S6: S1 to S5 are repeated until all characters are processed.

【００１６】ここでステップＳ４の文字列の辞書登録を
図示すると図１１に示すようになる。次に図１０のＬＺ
Ｊ復号化処理は次のようになる。Ｓ１：図１６のＳ１と同様に辞書に全文字種の一文字を
初期値として登録する。辞書の登録数ｎを文字種数Ａと
おく。カーソルｋ＝０とおく。Here, the dictionary registration of the character string in step S4 is illustrated in FIG. Next, LZ in FIG.
The J decoding process is as follows. S1: Similar to S1 of FIG. 16, one character of all character types is registered as an initial value in the dictionary. The number n of registered characters in the dictionary is set as the number of character types A. Set the cursor k = 0.

【００１７】Ｓ２〜Ｓ４：辞書番号ａ_w が復号化され、
文字列Ｓ（１，ｊ_w ）まで利用することができ、辞書Ｄ
_h （Ｓ（１，ｊ_w ））が再構成されている。次に符号語
ａ_w+ ₁ を復号する。詳細に説明すると次のようになる。Ｓ２：符号語ａ_w+1 を復号した辞書番号より辞書Ｄ_h
（Ｓ（１，ｊ_w ））内の部分列Ｓ（ｉ_w+1 ，ｊ_w+1 ）を
復元する。部分列Ｓ（ｉ_w+1 ，ｊ_w+1 ）は辞書内で根
（ｒｏｏｔ）からアドレスａ_w+1 の節点で表わされる文
字列である。S2 to S4: The dictionary number a _w is decoded,
Up to the character string S (1, j _w ) can be used, and the dictionary D
_{_{h (S (1, j w}} )) has been re-configured. Next, the codeword a _{w +} ₁ is decoded. The detailed description is as follows. S2: Dictionary D _h from the dictionary number obtained by decoding the code word a _{w + 1}
The subsequence S (i _{w + 1} , j _{w + 1} ) in (S (1, j _w )) is restored. The subsequence S (i _{w + 1} , j _{w + 1} ) is a character string represented by a node at the address a _{w + 1} from the root in the dictionary.

【００１８】Ｓ３：文字列Ｓ（１，ｊ_w+1 ）を復号した
後、辞書Ｄ_h （Ｓ（１，ｊ_w+1 ））を図１６のＳ４と同
様に構成する。Ｓ４：カーソルｋ＝ｊ_w+1 とおく。Ｓ５：全符号を処理するまでＳ１〜Ｓ４を繰り返す。S3: After decoding the character string S (1, j _{w + 1} ), the dictionary D _h (S (1, j _{w + 1} )) is constructed in the same manner as S4 in FIG. S4: The cursor k = j _{w + 1} is set. S5: S1 to S4 are repeated until all the codes are processed.

【００１９】[0019]

【発明が解決しようとする課題】ところで従来のＬＺＷ
符号化は完全なユニバーサル性を前提にしており、辞書
が空白の状態から符号化を始めるようにしている。この
ため従来のＬＺＷ符号化では、入力データの始めの方の
学習量が少ない段階、即ち辞書の登録内容が少ない段階
では、圧縮率が低いという欠点があった。By the way, the conventional LZW
Encoding is based on the assumption of complete universality, and the dictionary starts encoding from a blank state. For this reason, the conventional LZW encoding has a drawback that the compression rate is low at the stage where the learning amount at the beginning of the input data is small, that is, the stage where the contents registered in the dictionary are small.

【００２０】ＬＺＷ符号ではユニバーサル性も重要であ
るが、入力データに特定の種類のデータだけ特に多く現
れるときは、辞書は必ずしも空白の状態から符号化する
必要はない。この観点から本願発明者等は動的辞書方の
アルゴリズムにおいて、図１２に示すように、サンプル
データを対象にＬＺＷ符号化を行って辞書登録する際
に、符号化に使用された部分列の参照番号の使用回数を
出現頻度として計数し、高い頻度で出現する文字列のみ
予め初期値として辞書１０に登録し、この辞書１０を用
いることでて高圧縮率を得るデータ圧縮方式を提案して
いる。Although universality is important in the LZW code, the dictionary does not necessarily need to be coded from a blank state when a particular type of data appears in the input data in a particularly large amount. From this point of view, in the dynamic dictionary algorithm, the present inventors refer to the subsequence used for encoding when performing LZW encoding on sample data and registering the dictionary as shown in FIG. A data compression method is proposed in which the number of times a number is used is counted as an appearance frequency, only a character string that appears with high frequency is registered in advance in the dictionary 10 as an initial value, and a high compression rate is obtained by using this dictionary 10. ..

【００２１】しかし、図１２の使用頻度の高い文字列を
初期値として登録して符号化を行う方式にあっては、例
えば２種類のデータが出現するとき、この２種類のデー
タを代表するサンプルデータの各々を符号化して作成し
た辞書から双方のサンプルデータ毎に得た高頻度の文字
列を初期値として辞書１０に登録するため、片方の種類
のデータでは高頻度で出現しても他方の種類のデータで
ほとんど出現しない文字列が初期値に含まれることにな
る。However, in the method of registering a character string that is frequently used in FIG. 12 as an initial value and performing encoding, when two types of data appear, for example, a sample representative of these two types of data Since a high-frequency character string obtained for each sample data of both from the dictionary created by encoding each data is registered in the dictionary 10 as an initial value, even if one type of data appears at high frequency, the other A character string that rarely appears in the type data will be included in the initial value.

【００２２】例えば種類の異なる２つのサンプルデータ
Ａ，Ｂを符号化した場合の辞書の参照番号に対する各文
字列の出現頻度が例えば図１３のようであったとする。
このような参照番号に対する出現頻度の分布が異なる２
つのサンプルデータＡ，Ｂに対し、所定の閾値Ｔ以上の
出現頻度の文字列は、サンプルデータＡについては参照
番号ｉ〜ｉ＋ｐのｐ個の文字列、サンプルデータＢにつ
いては参照番号ｊ〜ｊ＋ｑのｑ個の文字列が得られ、そ
れぞれ辞書に初期値として登録される。For example, it is assumed that the frequency of appearance of each character string with respect to the reference number of the dictionary when two sample data A and B of different types are encoded is as shown in FIG.
The distribution of appearance frequencies for such reference numbers is different 2
For the sample data A and B, the character strings having the appearance frequency equal to or higher than the predetermined threshold value T are p character strings with reference numbers i to i + p for the sample data A and with reference numbers j to j + q for the sample data B. q character strings are obtained and registered in the dictionary as initial values.

【００２３】しかし、サンプルデータＡに代表されるデ
ータの符号化では、サンプルデータＢから得られた初期
値はほとんど使用されず、逆にサンプルデータＢに代表
されるデータの符号化では、サンプルデータＡから得ら
れた初期値はほとんど使用されないことになる。その結
果、辞書に使われない文字列が初期値されることになる
ため、複数種類のデータを含む文字列の符号化を行った
場合、十分に圧縮率を改善できないという問題があっ
た。However, in encoding the data represented by the sample data A, the initial value obtained from the sample data B is rarely used, and conversely, in encoding the data represented by the sample data B, The initial value obtained from A will be rarely used. As a result, since a character string that is not used in the dictionary is initialized, there is a problem that the compression rate cannot be sufficiently improved when the character string including a plurality of types of data is encoded.

【００２４】本発明は、このような問題点に鑑みてなさ
れたもので、複数種類のデータの符号化に共通に使用可
能な無駄の少ない初期値登録を可能とする動的辞書型ア
ルゴリズムに従ったデータ圧縮方式を提供することを目
的とする。The present invention has been made in view of the above problems, and follows a dynamic dictionary algorithm which enables common initial value registration which can be commonly used for encoding a plurality of types of data. It is intended to provide a data compression method.

【００２５】[0025]

【課題を解決するための手段】図１は本発明の原理説明
図である。まず本発明は、入力データを相異なる部分列
に分けて参照番号を付けて辞書１０に登録し、入力した
部分列を最長一致する辞書に登録済みの部分列の参照番
号として符号化し、また符号化された符号データを入力
して復号するＬＺＷ符号、ＬＺＪ符号等の動的辞書アル
ゴリズムに従ったデータ圧縮方式を対象とする。FIG. 1 illustrates the principle of the present invention. First, according to the present invention, input data is divided into different subsequences and given reference numbers to be registered in the dictionary 10, and the input subsequence is encoded as a reference number of a subsequence registered in the longest matching dictionary. A data compression method according to a dynamic dictionary algorithm such as an LZW code and an LZJ code which inputs and decodes the encoded code data is targeted.

【００２６】このようなデータ圧縮方式につき本発明に
あっては、辞書１０の初期化時に、複数種類のデータを
代表するサンプルデータＡ，Ｂを対象とした符号化によ
り辞書登録された部分列の内、複数種類のサンプルデー
タＡ，Ｂに共通して出現する頻度の高い部分列を既に符
号化済の部分列と見做して辞書１０に初期値として登録
する初期値作成手段１２を設けたことを特徴とする。In the present invention regarding such a data compression method, when the dictionary 10 is initialized, the partial strings registered in the dictionary by encoding the sample data A and B representing a plurality of types of data are targeted. Among them, an initial value creating means 12 is provided for registering a subsequence that frequently appears in a plurality of types of sample data A and B in common as a subsequence that has already been coded, as an initial value in the dictionary 10. It is characterized by

【００２７】ここで初期値作成手段１２は、複数種類の
サンプルデータＡ，Ｂを符号化して辞書登録する際に、
部分列の参照番号を使用した回数を各サンプルデータ
Ａ，Ｂ毎に計数しておき、各サンプルデータＡ，Ｂを符
号化した際の使用回数がともに所定の閾値Ｔ以上である
文字列を初期値として辞書１０に登録することを特徴と
する。Here, the initial value creating means 12 encodes a plurality of types of sample data A and B and registers them in the dictionary,
The number of times the reference number of the subsequence is used is counted for each sample data A and B, and a character string whose number of times of use when encoding each sample data A and B is equal to or more than a predetermined threshold T is initialized. It is characterized in that it is registered in the dictionary 10 as a value.

【００２８】また初期値作成手段１２は、サンプルデー
タの種類分に応じた数だけ固有の値をもつ複数の閾値Ｔ
_A ，Ｔ_B を設けて初期値を作成してもよい。更に初期値
作成手段１２は、サンプルデータの種類分に応じた数だ
け固有の値をもつ複数の閾値Ｔ_A ，Ｔ_B を設け、各閾値
Ｔ_A ，Ｔ_B は基準閾値Ｔ₀ に各サンプルデータの出現割
合の比率に従った重み付けを行った値として設定するよ
うにしてもよい。Further, the initial value creating means 12 includes a plurality of threshold values T having unique values corresponding to the number of kinds of sample data.
_The initial values may be created by providing _A and T _B. Further, the initial value creating means 12 provides a plurality of threshold values T _A and T _B each having a unique value corresponding to the number of types of sample data, and each threshold value T _A and T _B is set to the reference threshold value T ₀ for each sample data. It may be set as a value that is weighted according to the ratio of the appearance ratio.

【００２９】[0029]

【作用】このような構成を備えた本発明のデータ圧縮方
式によれば、多く出現する可能性がある複数種類の代表
的なデータをサンプルデータとして予めデータの種類毎
に符号化し、この符号化時に辞書の各節点（参照番号）
にカウンタを設けて、各参照番号が符号化時に使われた
回数を計数しておき、各サンプルデータで共通に高頻度
で出現する文字列を求めて初期値として辞書に登録する
こととなる。According to the data compression method of the present invention having such a configuration, a plurality of types of representative data that may appear frequently are coded in advance as sample data for each data type, and this coding is performed. Sometimes each node of the dictionary (reference number)
A counter is provided to count the number of times each reference number has been used at the time of encoding, and a character string commonly appearing frequently in each sample data is obtained and registered in the dictionary as an initial value.

【００３０】このような初期値の登録が行なわれた辞書
を使用した符号化にあっては、初期値の作成に使用した
サンプルデータの中の特定種類のデータの符号化であっ
ても、初期登録された文字列を高い頻度で使用した符号
化が行われ、符号化にほとんど使用されないような初期
登録文字列がないため、無駄のない初期登録ができ、複
数種類のデータの圧縮率を最初から高めることができ
る。In encoding using a dictionary in which such initial values are registered, even if encoding of a specific type of sample data used to create initial values, Encoding that uses the registered character string with high frequency is performed, and since there is no initial registration character string that is rarely used for encoding, lean initial registration can be performed, and the compression ratio of multiple types of data can be set first. Can be increased from.

【００３１】[0031]

【実施例】図２は本発明の一実施例を示した実施例構成
図である。図２において、１６はＣＰＵであり、ＣＰＵ
１６に対してプログラムメモリ１８とデータメモリ３０
が接続される。プログラムメモリ１８にはコントロール
ソフト２０、符号化ソフト２２、初期値作成手段として
の機能を備えた辞書作成ソフト１４、出現頻度カウント
テーブル２６及び頻度閾値格納テーブル２８が設けられ
る。FIG. 2 is a block diagram of an embodiment showing one embodiment of the present invention. In FIG. 2, 16 is a CPU, and
16 to program memory 18 and data memory 30
Are connected. The program memory 18 is provided with control software 20, encoding software 22, dictionary creation software 14 having a function as an initial value creation means, an appearance frequency count table 26, and a frequency threshold value storage table 28.

【００３２】符号化ソフト２２は入力文字列に最長一致
する辞書中の文字列を検索して辞書の参照番号を符合デ
ータとして出力する動的辞書型アルゴリズムに従った符
号化を行う。復号化ソフト２４は符号化ソフト２２によ
り符号化された入力符号列で辞書中の参照番号を検索し
て、対応する文字列を復号する動的辞書型アルゴリズム
に従った復号化を行う。The encoding software 22 searches for a character string in the dictionary that has the longest match with the input character string and performs the encoding according to the dynamic dictionary type algorithm which outputs the reference number of the dictionary as the encoded data. The decoding software 24 searches the reference number in the dictionary with the input code string coded by the coding software 22, and performs decoding according to the dynamic dictionary type algorithm for decoding the corresponding character string.

【００３３】辞書作成ソフト１４は符号化あるいは復号
化に先立って行う初期値作成処理と、符号化及び復号化
の処理中に新たな文字列を辞書に登録する処理との２つ
を行う。この辞書作成ソフト１４における初期値作成機
能はデータメモリ３０に格納された複数種類のデータを
代表するサンプルデータ、例えば２種類のサンプルデー
タＡ、Ｂを対象に符号化ソフト２２に従った符号化を行
い、この符号化時に辞書から文字を検索して符号データ
として出力する毎に、符号語として検索された文字列の
参照番号の使用回数を出現頻度カンウトテーブル２６を
使用してカウントアップし、文字列の使用回数を計数す
る。The dictionary creating software 14 carries out two processes: an initial value creating process performed before encoding or decoding and a process of registering a new character string in the dictionary during the encoding and decoding processes. The initial value creating function of the dictionary creating software 14 is to encode sample data representing a plurality of types of data stored in the data memory 30, for example, two types of sample data A and B according to the encoding software 22. Every time a character is searched from the dictionary and output as code data at the time of this encoding, the number of times of use of the reference number of the character string searched as the code word is counted up using the appearance frequency count table 26, Count the number of times a string is used.

【００３４】サンプルデータの符号化が終了したなら
ば、出現頻度カウントテーブル２６の中のサンプルデー
タＡ、Ｂ毎の各出現頻度を参照し、サンプルデータＡ、
Ｂの両方について所定の閾値Ｔ以上となる文字列を検出
し、この文字列を辞書に初期値として登録する。この初
期値登録の際の頻度閾値は頻度閾値テーブル２８に予め
格納されている。When the coding of the sample data is completed, the appearance frequencies of the sample data A and B in the appearance frequency count table 26 are referred to, and the sample data A,
A character string having a predetermined threshold value T or more is detected for both B, and this character string is registered in the dictionary as an initial value. The frequency threshold value at the time of registering the initial value is stored in the frequency threshold value table 28 in advance.

【００３５】一方、データメモリ３０には辞書１０とデ
ータバッファ３２の各メモリ領域が確保される。初期値
作成時にはデータバッファ３２には初期値作成の対象と
なる複数種類のサンプルデータが格納され、また辞書１
０には初期値作成のための符号化時に辞書作成ソフト１
４で作成された文字列が参照番号と共に登録される。On the other hand, the memory area of the dictionary 10 and the data buffer 32 is secured in the data memory 30. At the time of creating the initial value, the data buffer 32 stores a plurality of types of sample data for which the initial value is created.
0 is a dictionary creation software at the time of encoding for creating an initial value 1
The character string created in 4 is registered together with the reference number.

【００３６】初期値作成が済むと、辞書１０には辞書作
成ソフト１４で作成された複数種類のデータに対し共に
閾値Ｔ以上の出現頻度をもつ文字列の初期登録が行わ
れ、データバッファ３２には新たに符号化しようとする
文字列あるいは復号化しようとする符号列が格納され、
符号化ソフト２２による文字列の符号化あるいは復号化
ソフト２４による符号列の復元が行われる。After the initial values have been created, the dictionary 10 is initially registered with the character string having the appearance frequency of the threshold value T or more for a plurality of types of data created by the dictionary creating software 14, and is stored in the data buffer 32. Stores the character string to be newly encoded or the code string to be decoded,
Encoding of the character string by the encoding software 22 or restoration of the code string by the decoding software 24 is performed.

【００３７】図３は図２の実施例における複数種類のデ
ータを対象とした初期値作成処理を示したフローチャー
トであり、次のステップＳ１〜Ｓ５の処理を行う。Ｓ１：初期値作成の対象として入力するサンプルデータ
の種類数をｎとおく。また、ｎ種類の文字列の出現頻度
を計数するカウンタ群を０にクリアし、更にデータの種
類を示す種類番号ｉをｉ＝０とおく。FIG. 3 is a flow chart showing an initial value generation process for a plurality of types of data in the embodiment of FIG. 2, and the following steps S1 to S5 are performed. S1: Let n be the number of types of sample data to be input as a target for initial value creation. Also, the counter group that counts the appearance frequency of n types of character strings is cleared to 0, and the type number i indicating the type of data is set to i = 0.

【００３８】Ｓ２：全ての種類のサンプルデータを入力
したか否かチェックする。もし次のサンプルデータがあ
ればステップＳ３に進み、また全てのサンプルデータの
入力が終了すればステップＳ５に進む。Ｓ３：種類番号ｉのサンプルデータを入力し、前回の種
類番号ｉ−１で作成した辞書を用いて符号化を行う。こ
のとき辞書に登録してある文字列の使用回数をカウンタ
群ｉの対応する文字列のカウンタを用いて計数する。S2: It is checked whether all kinds of sample data have been input. If there is the next sample data, the process proceeds to step S3, and if the input of all sample data is completed, the process proceeds to step S5. S3: The sample data of the type number i is input, and encoding is performed using the dictionary created with the type number i-1 of the previous time. At this time, the number of times of use of the character string registered in the dictionary is counted using the counter of the corresponding character string of the counter group i.

【００３９】Ｓ４：種類番号ｉのサンプルデータの符号
化が終了したならば種類番号ｉを１つインクリメント
し、ステップＳ２に戻る。Ｓ５：全ての種類のサンプルデータの入力（符号化）が
終了したならば、このとき得られている辞書から各カウ
ンタ群の計数値が所定の閾値Ｔ以上の高頻度で出現した
ことを示している文字列を取り出して初期値とし一連の
処理を終了している。S4: If the coding of the sample data of the type number i is completed, the type number i is incremented by 1, and the process returns to step S2. S5: When the input (encoding) of all types of sample data is completed, it is shown that the count value of each counter group appears at a high frequency of a predetermined threshold value T or more from the dictionary obtained at this time. A string of characters is taken out and set as an initial value, and a series of processing is completed.

【００４０】図４は図３のステップＳ３におけるある種
類のサンプルデータの動的辞書符号化と、辞書の文字列
使用回数の計数を示した説明図である。この例では文字
ａｂｃｄの４つで構成される文字列を例にとっており、
辞書１０には各文字ａｂｃｄを参照番号〜に示すよ
うに初期登録してから符号化を行う。参照番号〜で
示す登録文字列の木構造の節点には使用回数を計数する
カウンタが設けられている。FIG. 4 is an explanatory diagram showing the dynamic dictionary encoding of a certain type of sample data and the counting of the number of times the character string is used in the dictionary in step S3 of FIG. In this example, a character string composed of four characters abcd is taken as an example,
Each character abcd is initially registered in the dictionary 10 as indicated by reference numbers 1 to 3, and then encoded. A counter for counting the number of times of use is provided at each node of the tree structure of the registered character string indicated by reference numbers 1 to 3.

【００４１】辞書１０の木構造は参照番号〜までの
符号化が終了した際の登録内容を示している。例えば、
既に符号化が済んだ文字列「ａｂｃ」を例にとると、既
に参照番号により文字列「ａｂ」が登録されているこ
とから、この参照番号に次の１文字「ｃ」を加えたも
のを符号語として出力する。符号化が済むと符号語「
ｃ」に参照番号を付けて図示のように登録する。ま
た、符号語「ｃ」の符号化で登録文字列「ａｂ」が使
用されていることから、参照番号及びに設けたカウ
ンタの値が１つカウントアップされる。The tree structure of the dictionary 10 shows the registered contents when the encoding of the reference numbers up to is completed. For example,
Taking the character string “abc” that has already been encoded as an example, the character string “ab” is already registered by the reference number. Therefore, add the following one character “c” to this reference number. Output as a code word. After encoding, the codeword "
"c" is given a reference number and registered as shown. Further, since the registered character string “ab” is used in the encoding of the code word “c”, the reference number and the value of the counter provided for are incremented by one.

【００４２】次の入力文字「ａｂｄ」についても、文字
列「ａｂ」が参照番号により既に登録されていること
から、符号語「ｄ」を出力し、文字列「ａｂ」が符号
化に使用されたことから参照番号及び参照番号のカ
ウンタが１つカウントアップされる。更に、符号化後に
参照番号により文字列「ｄ」が辞書に登録される。Also for the next input character "abd", since the character string "ab" has already been registered by the reference number, the code word "d" is output and the character string "ab" is used for encoding. Therefore, the reference number and the reference number counter are incremented by one. Further, the character string "d" is registered in the dictionary by the reference number after encoding.

【００４３】図５は本発明の初期値作成を２種類のサン
プルデータＡ、Ｂを対象に行ったときの辞書参照番号に
対する出現頻度を示した説明図である。図５において、
サンプルデータＡは参照番号の値の小さい方にピークを
もった分布であり、またサンプルデータＢは参照番号の
高い方にピークをもった分布であったとする。このよう
に２種類のサンプルデータＡ、Ｂについては参照番号で
示される文字列の使用回数が異なった分布となる。FIG. 5 is an explanatory diagram showing the frequency of appearance with respect to the dictionary reference number when the initial value creation of the present invention is performed on two types of sample data A and B. In FIG.
It is assumed that the sample data A has a distribution having a peak in the smaller reference number and the sample data B has a distribution having a peak in the higher reference number. As described above, the two types of sample data A and B have different distributions of the number of times the character string indicated by the reference number is used.

【００４４】本発明にあっては、このようなサンプルデ
ータＡ、Ｂに対しそれぞれの出現頻度が所定の閾値Ｔ以
上となる斜線部分に対応した文字列を初期値として登録
する。即ち、サンプルデータＡ、Ｂから得られた文字列
の使用頻度が共に閾値Ｔ以上となる参照番号ｉからｉ＋
ｈの範囲に並んだ文字列を初期値として登録する。ここ
で、図３のステップＳ３における複数種類のサンプルデ
ータの符号化にあっては、今回のサンプルデータの符号
化で得られた使用回数は前回のサンプルデータの符号化
までに得られた使用回数の値に累積加算せずにサンプル
データ毎に文字列の使用頻度を計数している。そして、
全てのサンプルデータの符号化が済んだときに所定の閾
値Ｔ以上の文字列を初期値として登録している。According to the present invention, a character string corresponding to the shaded portion where the appearance frequency of each of the sample data A and B is a predetermined threshold value T or more is registered as an initial value. That is, the reference numbers i to i + at which the frequency of use of the character strings obtained from the sample data A and B are both equal to or greater than the threshold T
A character string arranged in the range of h is registered as an initial value. Here, in the encoding of a plurality of types of sample data in step S3 of FIG. 3, the number of uses obtained in the encoding of the sample data this time is the number of uses obtained until the encoding of the previous sample data. The usage frequency of the character string is counted for each sample data without cumulative addition to the value of. And
When the coding of all sample data is completed, a character string equal to or larger than a predetermined threshold value T is registered as an initial value.

【００４５】これを図５に適用すると、従来はサンプル
データＡとＢの使用頻度を加算した（Ａ＋Ｂ）の特性と
なる。この場合の閾値としては２Ｔを使用し、２Ｔ以上
の範囲に対応した参照番号ｊ〜ｊ＋ｋの文字列を初期値
とした。本発明ではサンプルデータＡとＢがともに閾値
Ｔ以上である参照番号ｉ〜ｉ＋ｈを初期値とする。尚、
図５の辞書参照番号に対する出現頻度の分布は説明の便
宜上示したもので、文字列の使用回数が図示の一点でピ
ークをもつような分布となることは極めてまれであり、
各参照番号毎にランダムに使用頻度が異なる棒グラフが
立つことになる。When this is applied to FIG. 5, the characteristic becomes (A + B) obtained by adding the usage frequencies of the sample data A and B in the related art. In this case, 2T was used as the threshold value, and the character string of reference numbers j to j + k corresponding to the range of 2T or more was used as the initial value. In the present invention, reference numbers i to i + h in which both sample data A and B are the threshold value T or more are set as initial values. still,
The distribution of the appearance frequency with respect to the dictionary reference numbers in FIG. 5 is shown for convenience of explanation, and it is extremely rare that the number of times the character string is used has a peak at one point in the figure.
A bar graph with a different usage frequency is randomly set for each reference number.

【００４６】また、図３のＳ３で行う符号化としては、
図７に示したＬＺＷ符号化アルゴリズムあるいは図９に
示したＬＺＪ符号化アルゴリズムのいずれを使用しても
よい。このような初期化処理で作成された初期値は、実
際の符号化を行う際に辞書１０に格納した後、入力デー
タの符号化を開始する。また、実際の復号化では符号化
と同様の辞書を作成することから、符号化で使用した初
期値を辞書１０に格納した後、入力符号の復号を行うこ
とになる。The encoding performed in S3 of FIG.
Either the LZW coding algorithm shown in FIG. 7 or the LZJ coding algorithm shown in FIG. 9 may be used. The initial value created by such an initialization process is stored in the dictionary 10 when the actual encoding is performed, and then the encoding of the input data is started. Further, in actual decoding, a dictionary similar to that used in encoding is created. Therefore, after storing the initial values used in encoding in the dictionary 10, the input code is decoded.

【００４７】図６は本発明の初期化処理における閾値Ｔ
の決め方の他の実施例を示した説明図である。上記の実
施例にあっては、複数種類のデータに出現する文字列の
各データ毎の使用回数を単一の閾値Ｔと比較するように
していたが、この閾値Ｔはデータの種類毎に異なるよう
にしてもよい。例えば、複数種類のデータの出現する割
合が予め判っていれば、基準となる閾値に出現する割合
を重み付けすることにより最適な初期値を得ることがで
きる。FIG. 6 shows the threshold value T in the initialization processing of the present invention.
It is an explanatory view showing other examples of how to decide. In the above-described embodiment, the number of times each character string appearing in a plurality of types of data is used is compared with a single threshold value T, but this threshold value T is different for each type of data. You may do it. For example, if the appearance ratios of a plurality of types of data are known in advance, it is possible to obtain an optimum initial value by weighting the appearance ratio to a reference threshold value.

【００４８】例えば、図６に示すようにサンプルデータ
Ａがひらがなで出現比率が７０％、サンプルデータＢが
漢字で出現比率が３０％と判っていた場合には、基準と
する閾値Ｔ₀ に出現比率０．７で重み付けした閾値Ｔ₁
をサンプルデータＡから初期を得るための閾値とし、ま
た基準閾値Ｔ₀ に出現比率０．３を掛けた閾値Ｔ₂ をサ
ンプルデータＢから初期値を得るための閾値とすればよ
い。For example, as shown in FIG. 6, when it is known that the sample data A is hiragana and the appearance ratio is 70%, and the sample data B is kanji and the appearance ratio is 30%, it appears at the reference threshold T ₀ . Threshold value T ₁ weighted by ratio 0.7
May be used as a threshold value for obtaining an initial value from the sample data A, and a threshold value T ₂ obtained by multiplying the reference threshold value T ₀ by the appearance ratio 0.3 may be used as a threshold value for obtaining an initial value from the sample data B.

【００４９】[0049]

【発明の効果】以上説明してきたように本発明によれ
ば、複数種類のデータに共通して高頻度で現われる文字
列を辞書の初期値として予め作成し、この初期値を辞書
に登録してから実際の符号化及び復号化を行うことによ
り処理対象とした複数種類のデータでの圧縮率を高める
ことができ、また特定種類のデータで高頻度で現われる
が、他の種類のデータではほとんど現われることのない
ような文字列を初期値から除外することで辞書メモリを
効率的に使用することができる。As described above, according to the present invention, a character string commonly appearing in a plurality of types of data at a high frequency is created in advance as an initial value of the dictionary, and this initial value is registered in the dictionary. It is possible to increase the compression rate for multiple types of data to be processed by performing actual encoding and decoding, and it appears with high frequency for specific types of data, but almost for other types of data. The dictionary memory can be used efficiently by excluding a character string that does not exist from the initial value.

[Brief description of drawings]

【図１】本発明の原理説明図FIG. 1 is an explanatory diagram of the principle of the present invention.

【図２】本発明の実施例構成図FIG. 2 is a block diagram of an embodiment of the present invention.

【図３】本発明の初期値作成処理を示したフローチャー
トFIG. 3 is a flowchart showing an initial value creation process of the present invention.

【図４】本発明のサンプルデータを対象とした符号化と
辞書の文字列使用回数の計数を示した説明図FIG. 4 is an explanatory diagram showing encoding for sample data of the present invention and counting of the number of times a character string is used in a dictionary.

【図５】２つのサンプルデータＡ、Ｂの辞書参照番号に
対する出現頻度の閾値の設定を示した説明図FIG. 5 is an explanatory diagram showing setting of a threshold value of appearance frequency for dictionary reference numbers of two sample data A and B.

【図６】本発明で初期値を作成する際に使用する閾値の
他の設定例を示した説明図FIG. 6 is an explanatory view showing another setting example of the threshold value used when creating an initial value in the present invention.

【図７】従来のＬＺＷ符号化アルゴリズムを示したフロ
ーチャートFIG. 7 is a flowchart showing a conventional LZW encoding algorithm.

【図８】従来のＬＺＷ復号化アルゴリズムを示したフロ
ーチャートFIG. 8 is a flowchart showing a conventional LZW decoding algorithm.

【図９】従来のＬＺＪ符号化アルゴリズムを示したフロ
ーチャートFIG. 9 is a flowchart showing a conventional LZJ encoding algorithm.

【図１０】従来のＬＺＪ復号化アルゴリズムを示したフ
ローチャートFIG. 10 is a flowchart showing a conventional LZJ decoding algorithm.

【図１１】ＬＺＪ符号化における文字列の登録を示した
説明図FIG. 11 is an explanatory diagram showing registration of a character string in LZJ encoding.

【図１２】本願発明者が既に提案しているＬＺＷ符号を
用いたデータ圧縮における辞書の初期登録の説明図FIG. 12 is an explanatory diagram of initial registration of a dictionary in data compression using the LZW code that the present inventor has already proposed.

【図１３】２種類のサンプルデータを符号化した際の辞
書参照番号に対する出現頻度と閾値の設定を示した説明
図FIG. 13 is an explanatory diagram showing setting of appearance frequencies and threshold values for dictionary reference numbers when two types of sample data are encoded.

[Explanation of symbols]

１０：辞書１２：初期値作成部１４：初期値作成手段（辞書作成ソフト）１６：ＣＰＵ１８：プログラムメモリ２０：コントロールソフト２２：符号化ソフト２４：復号化ソフト２６：出現頻度カウントテーブル２８：頻度閾値格納テーブル３０：データメモリ３２：データバッファ 10: Dictionary 12: Initial value creation unit 14: Initial value creation means (dictionary creation software) 16: CPU 18: Program memory 20: Control software 22: Encoding software 24: Decoding software 26: Appearance frequency count table 28: Frequency Threshold storage table 30: Data memory 32: Data buffer

───────────────────────────────────────────────────── フロントページの続き (72)発明者千葉広隆神奈川県川崎市中原区上小田中1015番地富士通株式会社内 ─────────────────────────────────────────────────── --- Continuation of the front page (72) Inventor Hirotaka Chiba 1015 Kamiodanaka, Nakahara-ku, Kawasaki City, Kanagawa Prefecture Fujitsu Limited

Claims

[Claims]

1. The input data is divided into different subsequences, assigned reference numbers and registered in a dictionary (10), and the input subsequence is encoded as a reference number of a subsequence registered in the longest matching dictionary, In addition, in a data compression method for inputting and decoding encoded code data, when the dictionary (10) is initialized, the code for sample data (A, B) representing a plurality of types of data is used. Of the substrings registered in the dictionary by encoding, the substrings that frequently appear in common in the plurality of types of sample data (A, B) are regarded as already encoded substrings, and the dictionary (1
The data compression method is characterized in that initial value creating means (12) for registering as an initial value is provided in 0).

2. The data compression method according to claim 1, wherein
The initial value creating means (12) indicates the number of times the reference number of the partial string is used when the plurality of types of sample data (A, B) are encoded and registered in the dictionary, for each sample data (A, B). Each sample data (A,
A data compression method characterized in that a character string whose number of times of use when encoding B) is both a predetermined threshold value T or more is registered in the dictionary as an initial value.

3. The data compression method according to claim 2, wherein
Said initial value generating means (12) includes a plurality of threshold values (T _A which number corresponding to the kind worth of sample data with a specific value, T
_B ) A data compression method characterized by being provided.

4. The data compression method according to claim 3,
Said initial value generating means (12) includes a plurality of threshold values (T _A which number corresponding to the kind worth of sample data with a specific value, T
_B ) and each threshold value (T _A , T _B ) is a reference threshold value (T ₀ ).
The data compression method is characterized in that the weighting is set according to the ratio of the appearance ratio of each sample data.