JP3241787B2

JP3241787B2 - Data compression method

Info

Publication number: JP3241787B2
Application number: JP4257792A
Authority: JP
Inventors: 茂吉田; 佳之岡田; 泰彦中野; 広隆千葉
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1992-02-28
Filing date: 1992-02-28
Publication date: 2001-12-25
Anticipated expiration: 2016-12-25
Also published as: JPH05241776A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、ジブ−レンペル符号を
用いてデータを圧縮するデータ圧縮方式に関する。近
年、文字コード、ベクトル情報、画像など様々な種類デ
ータがコンピュータで扱われるようになっており、扱わ
れるデータ量も急速に増加してきている。大量のデータ
を扱うときは、データの中の冗長な部分を省いてデータ
量を圧縮することで、記憶容量を減らしたり、速く伝送
したりできるようになる。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a data compression system for compressing data using a Jib-Lempel code. In recent years, various types of data such as character codes, vector information, and images have been handled by computers, and the amount of data handled has been rapidly increasing. When dealing with a large amount of data, by compressing the amount of data by omitting redundant portions in the data, the storage capacity can be reduced or the data can be transmitted faster.

【０００２】様々なデータを１つの方式でデータ圧縮で
きる方法としてユニバーサル符号化が提案されている。
ここで、本発明の分野は、文字コードの圧縮に限らず、
様々なデータに適用できるが、以下では、情報理論で用
いられている呼称を踏襲し、データの１ワード単位を文
字と呼び、データが任意ワードつながったものを文字列
と呼ぶことにする。2. Description of the Related Art Universal coding has been proposed as a method for compressing various data in a single system.
Here, the field of the present invention is not limited to compression of character codes,
Although it can be applied to various data, in the following, one word unit of data will be referred to as a character, and data connected with arbitrary words will be referred to as a character string, following the name used in information theory.

【０００３】ユニバーサル符号化の代表的な方法として
ジブ−レンペル（Ziv-Lempel）符号化と算術符号化があ
る。ジブ−レンペル符号ではスライド辞書型（ユニバーサル型ともいう）と、動的辞書型（増分分解型ともいう）の２つのアルゴリ
ズムが提案されている。[0003] Typical methods of universal coding include Ziv-Lempel coding and arithmetic coding. Two algorithms have been proposed for the Jib-Lempel code, a slide dictionary type (also called universal type) and a dynamic dictionary type (also called incremental decomposition type).

【０００４】さらに、スライド辞書型アルゴリズムの改
良として、ＬＺＳＳ符号がある（T.C. Bell,“Better O
PM/L Text Compression ”,IEEE Trans. on Commun., V
ol.COM-34, No.12,Dec. 1986参照）やパソコンで用いら
れているLHarc がある。また、動的辞書型アルゴリズム
の改良としては、ＬＺＷ（Lempel-Ziv-Welch）符号があ
る（T.A.Welch,“A Technique for High-Performance D
ata Compression ”,Computer, June 1984参照）。Further, as an improvement of the slide dictionary type algorithm, there is an LZSS code (TC Bell, “Better O.
PM / L Text Compression ”, IEEE Trans. On Commun., V
ol.COM-34, No.12, Dec. 1986) and LHarc used in personal computers. As an improvement of the dynamic dictionary type algorithm, there is an LZW (Lempel-Ziv-Welch) code (TAWelch, “A Technique for High-Performance D
ata Compression ", Computer, June 1984).

【０００５】これらの改良方法は補助記憶装置のファイ
ル圧縮や、モデムでの伝送データの圧縮に利用されるよ
うになっている。[0005] These improved methods are used for compressing a file in an auxiliary storage device or compressing transmission data in a modem.

【０００６】[0006]

【従来の技術】従来のジブ−レンペル符号化におけるス
ライド辞書型アルゴリズムと動的辞書型アルゴリズムを
説明する。１．スライド辞書型アルゴリズムスライド辞書型アルゴリズムは、演算量は多いが、高圧
縮率が得られる方法である。即ち、符号化データを、過
去のデータ系列の任意の位置から一致する最長の系列に
区切り（部分列）、過去の文字列の複製として符号化す
る方法である。2. Description of the Related Art A slide dictionary type algorithm and a dynamic dictionary type algorithm in conventional Jib-Lempel coding will be described. 1. Slide dictionary type algorithm The slide dictionary type algorithm is a method that requires a large amount of calculation but can obtain a high compression rate. That is, the encoded data is divided into the longest sequence that matches from an arbitrary position in the past data sequence (subsequence) and encoded as a copy of the past character string.

【０００７】図１０にユニバーサル型ジブ−レンペル符
号の符号器の原理図を示す。Ｐバッファ１２には符号化
済みの入力データが格納されており、Ｑバッファ１０に
はこれから符号化するデータが入力されている。Ｑバッ
ファ１０の文字列をＰバッファ１２の系列と照合し、Ｐ
バッファ１２中で一致する最長の文字部分列を求め、Ｐ
バッファ１２中でこの最長文字列を指定するため次の情
報の組を符号化する。FIG. 10 shows a principle diagram of a universal type Jib-Lempel code encoder. The P buffer 12 stores encoded input data, and the Q buffer 10 receives data to be encoded. The character string in the Q buffer 10 is compared with the series in the P buffer 12, and P
Find the longest matching character substring in buffer 12,
To specify the longest character string in the buffer 12, the next set of information is encoded.

【０００８】[0008]

【表１】 [Table 1]

【０００９】次にＱバッファ１０内の符号化した文字列
をＰバッファ１２に移して新たなデータをＱバッファ１
０に入力する。以下、同様の操作を繰り返し、データを
部分列に分解して符号化する。このようにジブ−レンペ
ル符号では、現在の文字コードの系列を符号化済みの過
去の系列からの複製として符号化するものである。ジブ
−レンペル符号を用いた場合、文字コードの文書情報は
１／２程度に圧縮できる。［ＱＩＣ−１２２符号］３Ｍを中心とするメーカの団体
であるＱＩＣ（Quauter Inch Cartrrige Standard In
c.）が１／４インチ・カートリッジ磁気テープの標準圧
縮方式として採用した符号である。Next, the encoded character string in the Q buffer 10 is transferred to the P buffer 12 and new data is transferred to the Q buffer 1.
Enter 0. Hereinafter, the same operation is repeated, and the data is decomposed into partial strings and encoded. As described above, in the Jib-Lempel code, a current character code sequence is encoded as a copy from an encoded past sequence. When the Jib-Lempel code is used, the character code document information can be compressed to about 1/2. [QIC-122 code] QIC (Quauter Inch Cartrrige Standard In), a group of manufacturers centering on 3M
c.) is a code adopted as a standard compression method for a 1/4 inch cartridge magnetic tape.

【００１０】図１１はＱＩＣ１２２符号のアルゴリズム
を示したフローチャートであり、次の処理を行う。Ｓ１：Ｐバッファとして２０４８バイトの履歴をもち、
その内容を空にする。またＱバッファに入力データを詰
める。Ｓ２：Ｑバッファの文字列に一致するＰバッファの最長
文字列Ｓを検索する。FIG. 11 is a flowchart showing the algorithm of the QIC122 code, and performs the following processing. S1: has a history of 2048 bytes as a P buffer,
Empty the contents. Also, the input data is packed in the Q buffer. S2: Search for the longest character string S in the P buffer that matches the character string in the Q buffer.

【００１１】Ｓ３：検索された最長文字列Ｓが２文字以
上のときはＳ５の複製モードに進み、１文字のときはＳ
４の生データモードに進む。Ｓ４：生データモードでは［フラグビット０］［生デー
タ１バイト］の組を符号化する。Ｓ５：複製モードでは、［フラグビット１］を固定長符
号化し、［文字列Ｓの出現位置］［一致長］の組を符号
化する。S3: If the longest character string S searched for is two or more characters, the process proceeds to the copy mode of S5, and if it is one character, S is performed.
4 to the raw data mode. S4: In the raw data mode, a set of [flag bit 0] and [raw data 1 byte] is encoded. S5: In the copy mode, [Flag bit 1] is fixed-length coded, and a set of [Appearance position of character string S] [Match length] is coded.

【００１２】Ｓ６：符号化が済んだＱバッファの文字列
をＰバッファに移すと共に、同じ数の文字をＱバッファ
に入力する。同時にＰバッファにＯバッファから移した
文字枠分の最も古い文字をＰバッファから捨てる。Ｓ７：入力データがなくなるまでＳ１〜Ｓ６の処理を繰
り返す。図１２はＢＮＦメタ言語で表わされたＱＩＣ−
１２２符号の符号語フォーマットを示す。またＢＮＦメ
タ言語に用いるメタ記号は図１３に示す意味をもつ。S6: The encoded character string in the Q buffer is transferred to the P buffer, and the same number of characters are input to the Q buffer. At the same time, the oldest characters in the character frame transferred from the O buffer to the P buffer are discarded from the P buffer. S7: The processes of S1 to S6 are repeated until there is no more input data. FIG. 12 shows QIC- expressed in the BNF meta-language.
1 shows a codeword format of a 122 code. The meta symbols used in the BNF meta language have the meanings shown in FIG.

【００１３】図１２のＱＩＣ−１２２符号の符号語フォ
ーマットを詳細に説明すると次のようになる。（１）圧縮系列（Compressed Stream ）は、圧縮ストリ
ング（Compressed String)とエンドマーカで構成され
る。（２）圧縮ストリングは、生データについては識別ビッ
ト０に続くＡＳＣＩＩ生バイトで表現され、また圧縮デ
ータについては識別ビット１に続いて圧縮バイトで表現
される。The codeword format of the QIC-122 code shown in FIG. 12 will be described in detail as follows. (1) A compressed stream (Compressed Stream) is composed of a compressed string (Compressed String) and an end marker. (2) The compressed string is represented by an ASCII raw byte following identification bit 0 for raw data, and a compressed byte following identification bit 1 for compressed data.

【００１４】（３）ＡＳＣＩＩ生バイトは、８ビットを
１バイトして表現される。（４）圧縮バイトは、オフセット（開始位置）とレング
ス（一致長）の組でなる。（５）オフセット（開始位置）は、識別ビット１の場合
は７ビットで表現される。また識別ビット０のは場合は
１１ビットで表現される。(3) The ASCII raw byte is represented by 8 bytes as 1 byte. (4) A compressed byte is a set of an offset (start position) and a length (match length). (5) The offset (start position) is represented by 7 bits in the case of the identification bit 1. The identification bit 0 is represented by 11 bits.

【００１５】（６）エンドマーカは、１１００００００
０であり、オフセットは０となる。（７）ビットｂは０又は１である。（８）レングス（一致長）は、図１２のように可変長符
号で表現される。図１４にＱＩＣ−１２２符号の符号化
の具体例を示す。図１４では文字列「ＡＢＡＡＡＡＡＡ
ＣＡＢＡ」が入力した場合を例にとっている。(6) The end marker is 11000000
0, and the offset is 0. (7) The bit b is 0 or 1. (8) The length (match length) is represented by a variable length code as shown in FIG. FIG. 14 shows a specific example of encoding of the QIC-122 code. In FIG. 14, the character string "ABAAAAAA
CABA "is input.

【００１６】まず最初の３文字「ＡＢＡ」に関してはＰ
バッファ中の一致する文字数が１文字以下であることか
らＡＳＣＩＩ生バイトのビット系列を出力する。４文字
目から８文字目までの５つの「Ａ」については、Ｐバッ
ファの直前文字「Ａ」と一致することから、圧縮バイト識別ビット７ビットオフセット識別ビットオフセット＝１レングス＝５バイトでなるビット系列「１１００００００１１１０
０」として出力する。For the first three characters "ABA", P
Since the number of matching characters in the buffer is one or less, a bit sequence of ASCII raw bytes is output. The five "A" characters from the fourth character to the eighth character match the character "A" immediately before in the P buffer, so the compressed byte identification bit 7-bit offset identification bit Offset = 1 Length = 5 bytes The series "1 1 000 0001 110
Output as "0".

【００１７】ここで最大長一致の部分列の開始位置を示
すオフセットの値は、Ｐバッファの最新登録位置（アド
レス）から前に遡って何番目かを示している。９番目の
文字「Ｃ」はＰバッファにないことからＡＳＣＩＩ生バ
イトを出力する。１０〜１２番目の文字「ＡＢＡ」はＰ
バッファの先頭からの３文字として既に登録済みである
ので、圧縮バイト識別ビット７ビットオフセット識別ビットオフセット＝９レングス＝３バイトでなるビット系列「１１０００１００１０１」を
出力する。Here, the offset value indicating the start position of the substring having the maximum length coincidence indicates the number of the latest registered position (address) in the P buffer from the previous position. Since the ninth character "C" is not in the P buffer, it outputs an ASCII raw byte. The 10th to 12th characters "ABA" are P
Since the three characters have already been registered as the three characters from the head of the buffer, a bit sequence “1 1000100101” consisting of compressed byte identification bits, 7-bit offset identification bits, offset = 9 length = 3 bytes is output.

【００１８】以上で全ての入力文字の符号化が済んだの
でエンドデータとして「１１０００００００」を出
力して処理を終了する。２．動的分解型（増分分解）アルゴリズムこのアルゴリズムは、圧縮率はユニバーサル型より劣る
が、シンプルで、計算も容易であることが知られてい
る。As described above, since all the input characters have been encoded, "1 100000000" is output as end data, and the processing is terminated. 2. Dynamic Decomposition (Incremental Decomposition) Algorithm This algorithm is known to have a lower compression ratio than the universal type, but is simple and easy to calculate.

【００１９】増分分解型ジプーレンペル符号では、入力
シンボルの系列をＸ＝ａａｂａｂａｂａａ・・・とすると、成分系列Ｘ＝Ｘ₀ Ｘ₁ Ｘ₂ ・・・への増分分
解は次のようにする。まずＸ₁ を既成分の右端のシンボ
ルを取り除いた最長の列とし、Ｘ＝ａ・ａｂ・ａｂａ・ｂ・ａａ・・・・となる。従って、Ｘ₀ ＝λ（空列）Ｘ₁ ＝Ｘ₀ ａＸ₂ ＝Ｘ₁ ｂＸ₃ ＝Ｘ₂ ａ，Ｘ₄ ＝Ｘ₀ ｂＸ₅ ＝Ｘ₁ ａ，・・・と分解できる。増分分解した各成分系列は既成分系列を
用いて次のような組で符号化する。In the incremental decomposition type diplen-pel code, if the input symbol sequence is X = aabababaa..., The incremental decomposition into the component sequence X = X ₀ X ₁ X ₂ . First, X ₁ is a column of the longest removing the right edge of the symbol of the preformed component, and X = a · ab · aba · b · aa ····. Thus, X ₀ = λ (empty _{_{string) X 1 = X 0 a X}} 2 = X 1 b X 3 = X 2 a, X 4 = X 0 b X 5 = X 1 a, can be decomposed,. Each of the component sequences that have been incrementally decomposed is encoded in the following set using the existing component sequences.

【００２０】[0020]

【表２】 [Table 2]

【００２１】すなわち、増分分解型アルゴリズムは、符
号化パターンについて、過去に分解した部分列の内、最
大長一致するものを求め、過去に分解した部分列の複製
として符号化するものである。動的辞書型アルゴリズム
の改良としては、ＬＺＷ（Lempel-Ziv-Welch) 符号(T.A.Welch,"A Tech
nique for High-Performance Data Compression",ComPu
ter,June 1984 参照）ＬＺＪ符号(M.Jakobsson,"Comperssion of Character
Strings by An Adaptive Dictionar",BIT,25 号，１９
８５年，５９３−６０３頁参照のこと）とがある。次に
ＬＺＷ符号について説明する。〔ＬＺＷ符号〕ＬＺＷ符号の符号化の処理のフローを図
１５に示す。即ちＬＺＷ符号化は、書き替え可能な辞書
をもち、入力文字コードのデータ中を相異なる文字列に
分け、この文字列を出現した順に番号を付けて辞書に登
録すると共に、現在入力している文字列を辞書に登録し
てある最長一致文字列の番号だけで表して、符号化する
ものである。尚、動的辞書型符号およびＬＺＷ符号の技
術は、特開昭５９−２３１６８３，米国特許４，５５
８，３０２で開示されている。図１５の符号化処理は次
のようになる。In other words, the incremental decomposition type algorithm obtains a coding pattern having the same maximum length among sub-sequences that have been decomposed in the past, and encodes the same as a duplicate of the sub-sequence that has been decomposed in the past. Improvements to the dynamic dictionary algorithm include LZW (Lempel-Ziv-Welch) codes (TAWelch, "A Tech
nique for High-Performance Data Compression ", ComPu
ter, June 1984) LZJ code (M. Jakobsson, "Comperssion of Character
Strings by An Adaptive Dictionar ", BIT, Issue 25, 19
1985, pp. 593-603). Next, the LZW code will be described. [LZW Code] FIG. 15 shows a flow of an LZW code encoding process. That is, the LZW encoding has a rewritable dictionary, divides the data of the input character code into different character strings, assigns numbers to the character strings in the order in which they appear, registers them in the dictionary, and inputs the character strings. A character string is represented and encoded only by the number of the longest matching character string registered in the dictionary. The technology of the dynamic dictionary code and the LZW code is disclosed in Japanese Patent Application Laid-Open No. 59-231683, U.S. Pat.
8,302. The encoding process in FIG. 15 is as follows.

【００２２】Ｓ１：予め全文字につき一文字からなる文
字列を初期値として登録してから符号化を始める。辞書
の登録数ｎを文字種数Ａと置く。カーソルをデータの先
頭の位置に置く。Ｓ２：カーソルの位置からの文字列に一致する辞書登録
の最長文字列Ｓを見つける。S1: Encoding is started after a character string consisting of one character for all characters is registered in advance as an initial value. The number of dictionary registrations n is set as the number of character types A. Position the cursor at the beginning of the data. S2: Find the longest character string S registered in the dictionary that matches the character string from the cursor position.

【００２３】Ｓ３：文字列Ｓの辞書番号を〔ｌｏｇ₂
ｎ〕ビットで表して出力する。ただし、〔ｌｏｇ₂ ｎ〕
はｌｏｇ₂ ｎ以上の最小の整数である。辞書登録数ｎを
一つインクリメントする。Ｓ４：文字列Ｓにカーソルの最初の文字Ｃを付加した文
字列ＳＣを辞書に登録する。カーソルは文字列Ｓの後の
文字に移動させる。Ｓ２に戻る。S3: The dictionary number of the character string S is [log ₂
n] bits. Where [log ₂ n]
Is the minimum integer of log ₂ n or more. The dictionary registration number n is incremented by one. S4: A character string SC obtained by adding the first character C of the cursor to the character string S is registered in the dictionary. The cursor moves to the character after the character string S. Return to S2.

【００２４】図１６はＬＺＷ符号の復号化を示したフロ
ーチャートであり、符号化の逆の処理となる。動的辞書
型アルゴリズムは、辞書内の系列は過去に符号化した
（サンプリングした）系列の中だけから選ぶため、処理
速度が速い。しかし、過去に現れたデータの一部の系列
しか含めないため圧縮率が高く取れない欠点がある。FIG. 16 is a flowchart showing the decoding of the LZW code, which is the reverse of the coding. The dynamic dictionary type algorithm has a high processing speed because a sequence in the dictionary is selected only from a sequence coded (sampled) in the past. However, there is a disadvantage that a high compression ratio cannot be obtained because only a part of the series of data that appeared in the past is included.

【００２５】動的辞書型アルゴリズムの改良版として、
辞書への学習量を増やしインデックスのみで符号化でき
るようにしたＬＺＪ符号がある。〔ＬＺＪ符号〕ＬＺＪ符号の符号化の処理フローを図１
７に示し、また復号化の処理フローを図１８に示す。As an improved version of the dynamic dictionary type algorithm,
There is an LZJ code in which the amount of learning to a dictionary is increased so that encoding can be performed using only an index. [LZJ Code] FIG. 1 shows a processing flow of LZJ code encoding.
7 and the processing flow of the decoding is shown in FIG.

【００２６】ここで、辞書と文字列の表記法を次のよう
に定義する。文字種の集合をＡとし、集合Ａの文字を組
み合わせてできる文字列をＳで表す。文字列Ｓのｉ番目
の文字をＳ（ｉ）する。更に複数の部分文字列Ｓ
（ｉ），Ｓ（ｉ＋１），・・・，Ｓ（ｊ）をＳ（ｉ，
ｊ）とする。辞書をＤ_h （Ｓ）で表わし、辞書の木（ｔ
ｒｅｅ）の根（ｒｏｏｔ）から葉（ｌｅａｆ）へのパス
として文字列Ｓ中の一定長さｈの全ての部分文字列を登
録する。Here, the notation of a dictionary and a character string is defined as follows. A character set is represented by A, and a character string formed by combining the characters of the set A is represented by S. The i-th character of the character string S is S (i). Further, a plurality of partial character strings S
(I), S (i + 1),..., S (j) are replaced by S (i,
j). The dictionary is represented by D _h (S), and the dictionary tree (t
All partial character strings of a fixed length h in the character string S are registered as a path from the root of the (ree) to the leaf (leaf).

【００２７】図１６のＬＺＪ符号化処理は次のようにな
る。Ｓ１：辞書に全文字種の一文字を初期値として登録して
から符号化を始める。辞書の登録数ｎを文字種数Ａとお
く。カーソルｋ＝０とおく。Ｓ２〜Ｓ５：ｋ番目の入力文字まで符号化が終了したと
して文字列Ｓ（１，ｋ）の全ての部分文字列がすでに辞
書Ｄ_h （Ｓ（１，ｋ））に登録してある。Ｓ（ｋ＋
１），・・・の文字列から符号化する。The LZJ encoding process of FIG. 16 is as follows. S1: Start encoding after registering one character of all character types in the dictionary as an initial value. The registration number n of the dictionary is set to the character type number A. The cursor k is set to 0. S2 to S5: All partial character strings of the character string S (1, k) have already been registered in the dictionary D _h (S (1, k)), assuming that encoding has been completed up to the k-th input character. S (k +
1) Encode from the character string of.

【００２８】詳細に説明すると、次のようになる。Ｓ２：Ｓ（ｋ＋１），・・から辞書Ｄ_h （Ｓ（１，
ｋ）) の登録文字列に最長一致する部分文字列Ｓ（ｋ＋
１，ｋ＋ｚ）を見つける。Ｓ３：部分文字列Ｓ（ｋ＋１，Ｋ＋ｚ）の辞書番号ａ_x
を［ｌｏｇ₂ ｎ］ビットで表して出力する。ただし、ｎ
は辞書の現在の登録数であり、［ｌｏｇ₂ ｎ］はｌｏｇ
₂ ｎ以上の最小の整数である。ここで、符号語ａ_x は部
分文字列Ｓ（ｉ_x ，ｊ_x ）を表す。各々のａ_x は辞書Ｄ
_h （Ｓ（１，ｊ_x-1 ）），（ｉ_x ≦ｊ_x ≦ｉ_x ＋ｈ，ｉ
_x ＝ｊ_x-1 ＋１）の辞書番号である。This will be described in detail as follows. S2: A dictionary D _h (S (1,
k)) the substring S (k +
1, k + z). S3: Dictionary number a _{x of} partial character string S (k + 1, K + z)
Is represented by [log ₂ n] bits and output. Where n
Is the current number of entries in the dictionary, and [log ₂ n] is log
It is the smallest integer of ₂ n or more. Here, the code word a _x represents the substring _{_{S (i x, j x)}} . Each a _x is a dictionary D
_{_{h (S (1, j x}} -1)), (i x ≦ j x ≦ i x + h, i
_x = j _x-1 +1).

【００２９】Ｓ４：部分文字列Ｓ（ｋ−ｈ＋２，ｋ＋
１），・・・，Ｓ（ｋ＋ｚ−ｈ＋１，ｋ＋ｚ）にｎをイ
ンクリメントしながら辞書番号を付けて辞書に追加し、
辞書Ｄ_h （Ｓ（１，ｋ＋ｚ））を構成する。Ｓ５：カーソルｋ＝ｋ＋ｚとおく。Ｓ６：全文字を処理するまでＳ１〜Ｓ５を繰り返す。S4: Partial character string S (kh + 2, k +
1),..., S (k + z−h + 1, k + z) are added to the dictionary by adding a dictionary number while incrementing n,
Construct a dictionary D _h (S (1, k + z)). S5: Set cursor k = k + z. S6: S1 to S5 are repeated until all characters are processed.

【００３０】ここでステップＳ４の文字列の辞書登録を
図示すると図１９に示すようになる。次に図１８のＬＺ
Ｊ復号化処理は次のようになる。Ｓ１：図１６のＳ１と
同様に辞書に全文字種の一文字を初期値として登録す
る。辞書の登録数ｎを文字種数Ａとおく。カーソルｋ＝
０とおく。Here, the dictionary registration of the character string in step S4 is as shown in FIG. Next, LZ in FIG.
The J decoding process is as follows. S1: One character of all character types is registered as an initial value in the dictionary as in S1 of FIG. The registration number n of the dictionary is set to the character type number A. Cursor k =
Set to 0.

【００３１】Ｓ２〜Ｓ４：辞書番号ａ_w が復号化され、
文字列Ｓ（１，ｊ_w ）まで利用することができ、辞書Ｄ
_h （Ｓ（１，ｊ_w ））が再構成されている。次に符号語
ａ _w+1 を復号する。詳細に説明すると次のようになる。Ｓ２：符号語ａ_w+1 を復号した辞書番号より辞書Ｄ
_h（Ｓ（１，ｊ_w ））内の部分列Ｓ（ｉ_w+1 ，ｊ_w+1 ）
を復元する。部分列Ｓ（ｉ_w+1 ，ｊ_w+1 ）は辞書内で根
（ｒｏｏｔ）からアドレスａ_w+1 の節点で表わされる文
字列である。S2 to S4: Dictionary number a_w Is decrypted,
Character string S (1, j_w ), And the dictionary D
_h (S (1, j_w )) Has been restructured. Then the codeword
a _{w + 1} Is decrypted. The details are as follows. S2: code word a_{w + 1} D from the dictionary number
_h(S (1, j_w ))_{w + 1} , J_{w + 1} )
To restore. The subsequence S (i_{w + 1} , J_{w + 1} ) Is the root in the dictionary
(Root) to address a_{w + 1} Sentence represented by a node
It is a character string.

【００３２】Ｓ３：文字列Ｓ（１，ｊ_w+1 ）を復号した
後、辞書Ｄ_h（Ｓ（１，ｊ_w+1 ））を図１６のＳ４と同
様に構成する。Ｓ４：カーソルｋ＝ｊ_w+1 とおく。Ｓ５：全文字を処理するまでＳ１〜Ｓ４を繰り返す。S3: After decoding the character string S (1, j _{w + 1} ), the dictionary D _h (S (1, j _{w + 1} )) is constructed in the same manner as S4 in FIG. S4: Set cursor k = j _{w + 1} . S5: S1 to S4 are repeated until all characters are processed.

【００３３】[0033]

【発明が解決しようとする課題】このように従来の動的
辞書型ジブ−レンペル符号化は、辞書を木構造にしてお
き、登録文字列と入力文字列との照合によって圧縮を行
なうため、文字列の探索が早く、符号化が高速で行える
利点がある。しかし、一方で全ての文字列のバリエーシ
ョンを辞書に登録しなければならないため、辞書の容量
が大きくなる欠点があった。As described above, in the conventional dynamic dictionary type Jib-Lempel coding, the dictionary is formed in a tree structure, and compression is performed by comparing a registered character string with an input character string. There is an advantage that a column search is quick and encoding can be performed at high speed. However, on the other hand, there is a disadvantage that the capacity of the dictionary is increased because all character string variations must be registered in the dictionary.

【００３４】これに対してスライド辞書型ジブ−レンペ
ル符号化は、辞書に符号化済みの一定量の生データを蓄
えるので、辞書の容量が一定であり、辞書の登録及び削
除操作により最新の文字列が辞書として利用できる利点
がある。しかし、動的辞書型のように頻繁に出現する文
字列程、規則性のある長い文字列に見なされるという学
習機能はない。On the other hand, in the slide dictionary type Jib-Lempel encoding, a fixed amount of raw data which has been encoded is stored in the dictionary, so that the dictionary has a fixed capacity, and the latest character can be updated by registering and deleting the dictionary. The advantage is that the columns can be used as a dictionary. However, there is no learning function that a character string that appears more frequently as in a dynamic dictionary type is regarded as a character string having a longer regularity.

【００３５】このため同じ文字列が頻繁に出現したと
き、辞書が同じ文字列の繰り返しで占められてしまい、
辞書の効率が低下する欠点があった。本発明は、このよ
うな従来の問題点に鑑みてなされたもので、スライド型
辞書による符号化であって、繰り返し同じ文字列が出現
したときの辞書の効率を改善し、高圧縮率を得るように
したデータ圧縮方式を提供することを目的とする。Therefore, when the same character string frequently appears, the dictionary is occupied by repetition of the same character string,
There was a disadvantage that the efficiency of the dictionary was reduced. The present invention has been made in view of such a conventional problem, and is an encoding using a slide dictionary, in which the efficiency of the dictionary when the same character string appears repeatedly is improved, and a high compression rate is obtained. An object of the present invention is to provide a data compression method as described above.

【００３６】[0036]

【課題を解決するための手段】図１は本発明の原理説明
図である。本発明は、スライド辞書型アルゴリズムに従
った符号化を行う場合の前処理として、動的辞書型のア
ルゴリズムに従った符号化を行い、この符号化で得られ
た符号列を入力文字列と前処理としてスライド辞書型の
符号化を行うことを基本とする。FIG. 1 is a diagram illustrating the principle of the present invention. The present invention performs encoding according to a dynamic dictionary type algorithm as preprocessing when encoding according to a slide dictionary type algorithm, and encodes a code string obtained by this encoding with an input character string. Basically, slide dictionary type encoding is performed as processing.

【００３７】即ち、図１（ａ）に示すように、入力デー
タを部分列に分解し、この部分列を辞書に登録済みの最
長一致する部分列の参照番号で表わして符号化する第１
符号化手段（動的辞書型符号化手段）１０と、第１符号
化手段１０で符号化した辞書番号を文字列として入力
し、この入力文字列に最長一致する辞書に保持した符号
化済み文字列の出現位置と一致長で符号化する第２符号
化手段（スライド辞書型符号化手段）２０を設けたこと
を特徴とする。That is, as shown in FIG. 1A, the input data is decomposed into sub-sequences, and this sub-sequence is represented by the reference number of the longest-matching sub-sequence registered in the dictionary and encoded.
Encoding means (dynamic dictionary-type encoding means) 10 and a dictionary number encoded by the first encoding means 10 are input as a character string, and encoded characters stored in a dictionary that longest matches the input character string A second encoding means (slide dictionary type encoding means) 20 for encoding with a column appearance position and a matching length is provided.

【００３８】ここで第１符号化手段１０は、辞書に登録
済みの部分列の符号化の使用回数を計数する出現頻度計
数手段を備え、入力文字列に最長一致する辞書の符号化
済み部分列を検索した際に、検索部分列の出現頻度が所
定の閾値以上の場合にのみ参照番号を符号化出力し、閾
値より小さい場合には入力部分列をそのまま出力する。Here, the first encoding means 10 includes an appearance frequency counting means for counting the number of times of use of the encoding of the partial string registered in the dictionary, and the encoded partial string of the dictionary which longest matches the input character string. When the search is performed, the reference number is encoded and output only when the appearance frequency of the search subsequence is equal to or higher than a predetermined threshold, and when the occurrence frequency is smaller than the threshold, the input subsequence is output as it is.

【００３９】また第１符号化手段１０としては、符号化
済み文字列を参照番号を付して登録する辞書を有し、入
力文字の部分列に最長一致する辞書中の符号化済み部分
列を検索して参照番号で指定して符号化し、符号化後に
参照番号に次の入力文字を付加した部分列を新たな参照
番号を付して辞書に登録するＬＺＷ符号化を行う。更に
第１符号化手段１０としては、符号化済み文字列を参照
番号を付して登録する辞書を有し、入力文字列の部分列
に最長一致する辞書中の符号化済み部分列を検索して参
照番号で指定して符号化し、符号化後に符号化済み文字
列の各部分列を順次接頭部分列とし、接頭部分列に辞書
中の部分列を順次加えた一定長の部分列を作成して全て
辞書に登録するＬＺＪ符号化を行う。The first encoding means 10 has a dictionary for registering an encoded character string with a reference number attached thereto, and stores the encoded substring in the dictionary which longest matches the input character substring. LZW encoding is performed in which a search is performed and specified and specified by a reference number, and after encoding, the subsequence obtained by adding the next input character to the reference number is added to a new reference number and registered in the dictionary. Further, the first encoding means 10 has a dictionary for registering encoded character strings with reference numbers, and searches for encoded subsequences in the dictionary which longest match a subsequence of the input character string. After encoding, each subsequence of the encoded character string is sequentially set as a prefix subsequence, and a subsequence of a fixed length is created by sequentially adding the subsequences in the dictionary to the prefix subsequence. LZJ encoding that registers all in the dictionary.

【００４０】更に本発明のデータ圧縮方式は図１（ｂ）
に示すように、復号データ列を順次保持した辞書を有
し、第２符号化手段２０で符号化した符号データを入力
し、この符号データで指定される辞書の出現位置と一致
長により辞書番号又は文字列を復号する第１復号手段３
０と、復号済み文字列を参照番号を付して登録した辞書
を有し、第１復号手段３０で復号された復号データによ
る辞書の参照により文字列を復号する第２復号手段４０
とを備えたことを特徴とする。FIG. 1B shows a data compression method according to the present invention.
As shown in FIG. 2, a dictionary having a decoded data sequence is sequentially stored. Code data encoded by the second encoding unit 20 is input, and a dictionary number is determined based on an appearance position and a matching length of the dictionary specified by the code data. Or first decoding means 3 for decoding a character string
0 and a dictionary in which the decoded character string is registered with a reference number added thereto, and the second decoding means 40 decodes the character string by referring to the dictionary based on the decoded data decoded by the first decoding means 30.
And characterized in that:

【００４１】[0041]

【作用】このような構成を備えた本発明のデータ圧縮方
式によれば、動的辞書型アルゴリズムとスライド辞書型
アルゴリズムとによる符号化を組み合わせることにより
効率のよい圧縮ができる。例えば同じ文字列が繰り返し
出現したときには、動的辞書型アルゴリズムに従った符
号化で同じ文字列の繰り返しは１つの参照番号で指定さ
れる１文字に符号化され、これを更にスライド型辞書で
圧縮するようになり、一定量のスライド型辞書を使用し
て同じ文字列が繰り返す場合にも効率よく圧縮できる。According to the data compression method of the present invention having such a configuration, efficient compression can be performed by combining the encoding by the dynamic dictionary type algorithm and the slide dictionary type algorithm. For example, when the same character string appears repeatedly, the repetition of the same character string is encoded into one character designated by one reference number by encoding according to the dynamic dictionary type algorithm, which is further compressed by a slide type dictionary. This allows efficient compression even when the same character string is repeated using a fixed amount of slide dictionary.

【００４２】また動的辞書型アルゴリズムによる符号化
では、高頻度で出現する文字列を対象に最長一致する文
字列の符号化を行うため、出現し易い部分は動的辞書に
よる符号化の段階で短く（小さい参照番号）表され、次
のスライド辞書符号化での一致長をより短くでき、これ
を可変長符号化することによって高圧縮率を得ることが
できる。In the encoding by the dynamic dictionary type algorithm, the longest matching character string is encoded for a character string that appears frequently. It is expressed as short (small reference number), and the match length in the next slide dictionary coding can be made shorter, and a high compression ratio can be obtained by performing variable length coding on the match length.

【００４３】[0043]

【実施例】図２は本発明のデータ圧縮方式に用いる符号
化装置の一実施例を示した実施例構成図である。図２に
おいて、符号化装置は第１符号化手段としての動的辞書
符号化部１０、第２符号化手段としてのスライド辞書符
号化部２０及び制御回路５０で構成される。FIG. 2 is a block diagram showing an embodiment of an encoding apparatus used for a data compression system according to the present invention. 2, the encoding device includes a dynamic dictionary encoding unit 10 as a first encoding unit, a slide dictionary encoding unit 20 as a second encoding unit, and a control circuit 50.

【００４４】動的辞書符号化部１０には入力端子１１か
らの入力文字列を格納する入力バッファ１２、照合回路
１３、動的辞書として機能する辞書メモリ１４、および
符号化された辞書番号を格納するレジスタ１５が設けら
れる。また、スライド辞書符号化部２０には動的辞書符
号化部１０で符号化された辞書番号の符号列を格納する
Ｑバッファ２１、スライド辞書として機能するＰバッフ
ァ２２、照合回路２３、可変長符号化回路２４が設けら
れる。The dynamic dictionary encoding unit 10 stores an input buffer 12 for storing an input character string from an input terminal 11, a collating circuit 13, a dictionary memory 14 functioning as a dynamic dictionary, and an encoded dictionary number. Register 15 is provided. The slide dictionary encoding unit 20 includes a Q buffer 21 for storing a code string of dictionary numbers encoded by the dynamic dictionary encoding unit 10, a P buffer 22 functioning as a slide dictionary, a collating circuit 23, and a variable length code. Circuit 24 is provided.

【００４５】図３は図２の符号化装置の符号化処理を概
略的に示したフローチャートである。まずステップＳ１
で入力データ（入力文字列）を動的辞書により辞書番号
に変換する符号化を行う。この動的辞書を用いた符号化
としては、例えばＬＺＷ符号あるいはＬＺＪ符号等を用
いることができる。FIG. 3 is a flowchart schematically showing an encoding process of the encoding device shown in FIG. First, step S1
Performs encoding for converting input data (input character string) into a dictionary number using a dynamic dictionary. As the encoding using the dynamic dictionary, for example, an LZW code or an LZJ code can be used.

【００４６】続いてステップＳ２でステップＳ１のスラ
イド辞書符号化で得られた辞書番号からなる符号化文字
列を対象に、スライド辞書に登録されている符号化済み
最長一致文字列を検索して、その［出現位置］［一致
長］の組または［生データ１文字］で表して符号化す
る。図４は図３のステップＳ１に示した動的辞書符号化
としてＬＺＪ符号化を行った場合の処理を示したフロー
チャートである。Subsequently, in step S2, an encoded longest matching character string registered in the slide dictionary is searched for the encoded character string consisting of the dictionary number obtained in the slide dictionary encoding in step S1, and It is represented by a set of [appearance position] and [match length] or [one character of raw data] and encoded. FIG. 4 is a flowchart showing processing when LZJ encoding is performed as the dynamic dictionary encoding shown in step S1 of FIG.

【００４７】図４に示すＬＺＪ符号化のアルゴリズムは
基本的には図１７に示した従来のＬＺＪ符号化アルゴリ
ズムと同じであるが、二重の枠で囲んだステップＳ２〜
Ｓ４における次の点が異なる。即ち、図４のステップＳ
２にあっては、動的辞書に登録された文字列のうち出現
頻度が所定の閾値ｈ_f 以上の文字列の中から入力文字列
に最長一致する文字列を検索するようにしている。この
ため、閾値ｈ_f より小さい出現頻度の文字列は検索対象
から除外される。The LZJ encoding algorithm shown in FIG. 4 is basically the same as the conventional LZJ encoding algorithm shown in FIG.
The following points in S4 are different. That is, step S in FIG.
In the 2, occurrence frequency among the registered character strings in the dynamic dictionary so that searches a string longest matches the input character string from among more strings predetermined threshold h _f. Therefore, the string of threshold h _f is smaller than the appearance frequency is excluded from the search.

【００４８】続いてステップＳ３で検索された最長一致
する文字列の辞書番号を辞書に登録可能な最大番号ｍを
表せる固定長ビット数で表現して出力する。更にステッ
プＳ４にあっては、図１９に示すような全ての文字列の
登録と同時に符号化した文字列に含まれる辞書に登録済
みの各文字列について、出現回数を１つカウントアップ
して出現頻度を計数する。Subsequently, the dictionary number of the longest matching character string searched in step S3 is expressed as a fixed-length bit number that can represent the maximum number m that can be registered in the dictionary and output. Further, in step S4, the number of appearances is counted up by one for each character string registered in the dictionary included in the character string encoded at the same time as the registration of all the character strings as shown in FIG. Count the frequency.

【００４９】図５は文字ａｂｃｄを例にとって動的辞書
符号化により作成された辞書に登録された文字列の木構
造と各文字の出現頻度を示した説明図である。図５
（ａ）は文字ａｂｃｄの４文字を対象に動的辞書による
符号化を行って得られた登録文字列の木構造を示してお
り、各文字の左上に辞書に登録した際の参照番号を示
し、文字の下側の枠内に出現頻度を示している。尚、文
字ａｂｃｄの４文字は初期登録して符号化を開始してい
ることから、出現頻度の計数は特に行わない。FIG. 5 is an explanatory diagram showing the tree structure of character strings registered in a dictionary created by dynamic dictionary coding using the character abcd as an example and the appearance frequency of each character. FIG.
(A) shows a tree structure of a registered character string obtained by performing dynamic dictionary encoding on four characters abcd, and indicates a reference number registered in the dictionary at the upper left of each character. , The appearance frequency is shown in the lower frame of the character. Since the four characters abcd have been initially registered and coding has been started, the frequency of appearance is not particularly counted.

【００５０】この例では辞書の深さｈがｈ＝３となった
登録状態を示しており、各節点が文字列に対応し、各節
点でその文字列の出現回数を計数している。出現回数は
節点の文字を親とすると、その子供の文字の出現回数の
和が親の出現回数となっている。図５（ｂ）は図４のス
テップＳ２において、閾値ｈ_f ＝３とした場合の検索対
象となる文字列の木構造を示す。ここで、出現頻度が閾
値ｈ_f ＝３以上となる出現頻度の高い文字列の辞書参照
番号は飛び飛びの値をとることになるが、続いてスライ
ド辞書型符号化を行うため、符号化効率に影響はない。This example shows a registration state in which the depth h of the dictionary is h = 3, each node corresponds to a character string, and the number of appearances of the character string is counted at each node. Assuming that the character at the node is the parent, the sum of the appearances of the child's character is the appearance frequency of the parent. FIG. 5B shows a tree structure of a character string to be searched when the threshold value h _f = 3 in step S2 of FIG. Here, the dictionary reference number of a character string having a high frequency of appearance with a threshold frequency _hf = 3 or more takes discrete values. However, since the slide dictionary type coding is performed subsequently, the coding efficiency is reduced. No effect.

【００５１】また、動的辞書符号化にあっては、図５
（ｂ）に示す出現頻度の高い文字列を対象に符号化を行
うため、出現頻度の低い文字列については符号化されな
いことになる。例えば、文字ｄは図５（ｂ）の場合１文
字として表され、文字コードと辞書番号が一致するよう
になる。更に、図４のフローチャートに示したように、
スライド辞書型符号化の前処理として行う動的辞書型の
符号化に図４に示すようなＬＺＪ符号化の変形方式を用
いた場合には、所定の閾値以上の文字列として参照番号
の値が所定値以下となる一定長以下の任意長の文字列を
選ぶことができる。In the case of dynamic dictionary coding, FIG.
Since encoding is performed on a character string with a high appearance frequency shown in FIG. 2B, a character string with a low appearance frequency is not encoded. For example, the character d is represented as one character in FIG. 5B, and the character code matches the dictionary number. Further, as shown in the flowchart of FIG.
In the case of using a modification method of LZJ encoding as shown in FIG. 4 for dynamic dictionary encoding performed as preprocessing of slide dictionary encoding, the value of the reference number is determined as a character string equal to or greater than a predetermined threshold. It is possible to select a character string of an arbitrary length equal to or less than a predetermined value which is equal to or less than a predetermined value.

【００５２】また、前処理として行う動的辞書型符号化
としては、図４に示したＬＺＪ符号化の変形に限らず、
例えば２文字単位の組合せの中で頻度の高いものを番号
に置き換える等、もっと簡単な方法を用いてもよい。
尚、この場合には置き換える文字列長は固定長となる。
更に、前処理として行う動的辞書型の符号化としては、
図１５に示したＬＺＷ符号化について、ＬＺＪ符号化と
同様、出現頻度を計数し、所定の閾値以上の出現頻度を
もつ文字列を対象に符号化を行うようにしてもよい。The dynamic dictionary type coding performed as preprocessing is not limited to the modification of the LZJ coding shown in FIG.
For example, a simpler method may be used, such as replacing a frequently used combination in a unit of two characters with a number.
In this case, the character string to be replaced has a fixed length.
Further, as a dynamic dictionary type encoding performed as preprocessing,
In the LZW encoding shown in FIG. 15, as in the LZJ encoding, the appearance frequency may be counted, and encoding may be performed on a character string having an appearance frequency equal to or higher than a predetermined threshold.

【００５３】図６は図３のステップＳ２で行うスライド
辞書符号化の具体例としてＱＩＣ１２２符号の符号化ア
ルゴリズムを示したフローチャートである。図６に示す
ＱＩＣ１２２符合の符号化は符号化する入力データが動
的辞書符号化により得られた辞書の参照番号でなる符号
列を対象に行う点であり、処理内容そのものは図１１に
示した従来のＱＩＣ１２２符号の符号化アルゴリズムと
同じである。動的辞書型符号化により得られた符号列を
処理対象とすることによる相違点は、従来はスライド辞
書で扱うデータの１語が１バイト単位であったものが、
本発明の場合には１語が辞書番号のビット数となる点で
ある。FIG. 6 is a flowchart showing an encoding algorithm of the QIC122 code as a specific example of the slide dictionary encoding performed in step S2 of FIG. The encoding of the QIC122 code shown in FIG. 6 is performed in such a manner that input data to be encoded is applied to a code string composed of dictionary reference numbers obtained by dynamic dictionary encoding. The processing itself is shown in FIG. This is the same as the encoding algorithm of the conventional QIC122 code. The difference due to processing a code string obtained by dynamic dictionary coding as a processing target is that one word of data handled by a slide dictionary is conventionally one byte unit.
In the case of the present invention, one word is the number of bits of the dictionary number.

【００５４】次に図２の実施例を参照して符号化動作を
説明すると次のようになる。動的辞書符号化部１０の入
力端子１１に与えられた入力文字列は一定長毎に入力バ
ッファ１２に入力される。照合回路１３では動的辞書と
して機能する辞書メモリ１４の所定の閾値以上の文字列
の中から最長一致する文字列を検索し、検索した文字列
の辞書番号をレジスタ１５を経由してスライド辞書符号
化部２０のＱバッファ２１に出力する。Next, the encoding operation will be described with reference to the embodiment of FIG. The input character string given to the input terminal 11 of the dynamic dictionary encoding unit 10 is input to the input buffer 12 at regular intervals. The matching circuit 13 searches for a longest matching character string from a character string of a predetermined threshold or more in a dictionary memory 14 functioning as a dynamic dictionary, and stores the dictionary number of the searched character string via a register 15 as a slide dictionary code. Output to the Q buffer 21 of the conversion unit 20.

【００５５】このとき制御回路５０は符号化が済んだ文
字列中の一定長さｈの全ての文字列を参照番号を付けて
辞書メモリ１４に登録する。一方、スライド辞書符号化
部２０にあっては、制御回路２３がＱバッファ２１の辞
書参照番号でなる入力文字列に最長一致するＰバッファ
２２の中の文字列を検索し、検索した文字列が２文字以
上であれば［出現位置］［一致長］の組を出力する。も
し一致長が１文字の場合は生データ（辞書参照番号）１
文字を出力する。At this time, the control circuit 50 registers all character strings of a fixed length h in the encoded character strings in the dictionary memory 14 with reference numbers. On the other hand, in the slide dictionary encoding unit 20, the control circuit 23 searches for a character string in the P buffer 22 that longest matches the input character string of the Q buffer 21 which is the dictionary reference number. If there are two or more characters, a set of [appearance position] and [match length] is output. If the match length is one character, raw data (dictionary reference number) 1
Output a character.

【００５６】このとき制御回路５０は符号化が済んだ文
字列をＱバッファ２１よりＰバッファ２２に移してＰバ
ッファの最も古い一定長さｈ分の文字を捨てると共に、
動的辞書符号化部１０のレジスタ１５より同じ数の文字
列を入力する。照合回路２３から出力された符号データ
は可変長符号化回路２４に与えられ、可変長符号に直し
て出力端子２５より圧縮信号として出力する。At this time, the control circuit 50 transfers the encoded character string from the Q buffer 21 to the P buffer 22, discards the oldest character of a fixed length h in the P buffer,
The same number of character strings are input from the register 15 of the dynamic dictionary encoding unit 10. The code data output from the matching circuit 23 is supplied to a variable length coding circuit 24, converted into a variable length code, and output from an output terminal 25 as a compressed signal.

【００５７】図７は本発明のデータ圧縮方式に用いる復
元装置の一実施例を示した実施例構成図である。図７に
おいて、復元装置は第１復号化手段としてのスライド辞
書復号化部３０、第２復号化手段としての動的辞書復号
化部４０、及び制御回路６０で構成される。スライド辞
書復号化部３０には可変長復号化回路３２、切出し複製
回路３３、Ｑバッファ、スライド辞書としてのＰバッフ
ァ３５が設けられる。また、動的辞書復号化部４０には
レジスタ４１、動的辞書として機能する辞書メモリ４
２、及び出力バッファ４３が設けられる。FIG. 7 is a block diagram of an embodiment showing an embodiment of a decompression device used for the data compression system of the present invention. 7, the restoration device includes a slide dictionary decoding unit 30 as a first decoding unit, a dynamic dictionary decoding unit 40 as a second decoding unit, and a control circuit 60. The slide dictionary decoding unit 30 is provided with a variable length decoding circuit 32, a clipping duplication circuit 33, a Q buffer, and a P buffer 35 as a slide dictionary. The dynamic dictionary decoding unit 40 has a register 41 and a dictionary memory 4 functioning as a dynamic dictionary.
2, and an output buffer 43 are provided.

【００５８】図８は図７の復号装置の復号処理を概略的
に示したフローチャートである。即ち、本発明の復号化
にあっては、図３に示した符号化の逆の手順をとる。ま
ずステップＳ１で、図２に示した符号化装置より得られ
た圧縮符号をスライド辞書を用いて復号することで、辞
書番号の符号列を求める。続いてステップＳ２で復号さ
れた辞書番号により動的辞書を参照して元の文字列を復
元することで、符号化前の元のデータに直す。FIG. 8 is a flowchart schematically showing a decoding process of the decoding device of FIG. That is, in the decoding of the present invention, the reverse procedure of the encoding shown in FIG. 3 is taken. First, in step S1, a code string of a dictionary number is obtained by decoding a compressed code obtained by the coding apparatus shown in FIG. 2 using a slide dictionary. Subsequently, the original character string is restored by referring to the dynamic dictionary based on the dictionary number decoded in step S2, so that the original data before encoding is restored.

【００５９】図９は図７の動的辞書復号部４０としてＬ
ＺＪ復号化を行った場合の処理を示したフローチャート
であり、基本的には図１８に示した従来のＬＺＪ復号化
と同じであり、相違点としてはステップＳ２で前段のス
ライド辞書復号化で復号した固定ビット数で表された辞
書番号に動的辞書を検索して元の文字列を復元している
点である。FIG. 9 shows L as the dynamic dictionary decoding unit 40 in FIG.
FIG. 19 is a flowchart showing processing when ZJ decoding is performed, which is basically the same as the conventional LZJ decoding shown in FIG. 18, with the difference that the decoding is performed by the preceding slide dictionary decoding in step S2. That is, a dynamic dictionary is searched for the dictionary number represented by the fixed number of bits to restore the original character string.

【００６０】次に図７の復元装置の動作を説明すると次
のようになる。入力端子に与えられた圧縮符号は可変長
復号化回路３２で復号化され、復号化された符合が「出
現位置］［一致長］の組でなる複製モードの符合のとき
には切出し複製回路３３でバッファ３５の出現位置と一
致長で指定される所定部分の辞書参照番号が切り出され
る。Next, the operation of the restoration apparatus in FIG. 7 will be described as follows. The compressed code given to the input terminal is decoded by the variable length decoding circuit 32. If the decoded code is a code in the copy mode consisting of a pair of “appearance position” and “match length”, the cut-out copy circuit 33 buffers the code. A dictionary reference number of a predetermined part specified by the appearance position of 35 and the matching length is cut out.

【００６１】このとき制御回路６０は復元した辞書参照
番号をＱバッファ３４に格納し、Ｑバッファ３４への格
納に応じて古い辞書参照番号をＰバッファ３５にシフト
し、このシフトに伴ってＰバッファ３５の中の最も古い
辞書参照番号が同数だけ捨てられる。一方、可変長復号
化回路３２で復号された符号が生データ符号の場合には
Ｐバッファ３５を参照する必要がないことからそのまま
出力し、同時にＱバッファ３４に入力し、Ｐバッファ３
５への古いデータのシフトを行う。At this time, the control circuit 60 stores the restored dictionary reference number in the Q buffer 34 and shifts the old dictionary reference number to the P buffer 35 in accordance with the storage in the Q buffer 34. The oldest dictionary reference number in 35 is discarded by the same number. On the other hand, if the code decoded by the variable length decoding circuit 32 is a raw data code, it is not necessary to refer to the P buffer 35, so that it is output as it is, and is input to the Q buffer 34 at the same time.
Perform a shift of the old data to 5.

【００６２】スライド辞書復号化部３０で復元された辞
書番号からなる文字列は動的辞書復号化部４０のレジス
タ４１にセットされ、レジスタ４１の辞書番号により辞
書メモリ４２をアクセスし、指定した参照番号に格納さ
れている文字列を復元して出力端子４４より外部に文字
列を出力する。このとき制御回路６０はレジスタ４１に
入力された辞書番号からなる文字列から作られる新たな
文字列を辞書メモリ４２に登録する。The character string composed of the dictionary number restored by the slide dictionary decoding unit 30 is set in the register 41 of the dynamic dictionary decoding unit 40. The dictionary memory 42 is accessed by the dictionary number of the register 41, and the designated reference is specified. The character string stored in the number is restored and the character string is output from the output terminal 44 to the outside. At this time, the control circuit 60 registers a new character string formed from the character string composed of the dictionary number input to the register 41 in the dictionary memory 42.

【００６３】尚、上記の実施例にあっては、図２及び図
７に示すようにハードウエア構成で符号化及び復号化を
行う場合を例にとるものであったが、本発明はこれに限
定されず、動的辞書符号化部１０、スライド辞書符号化
部２０、スライド辞書復号化部３０、動的辞書復号化部
４０の各機能を制御プログラムにより実現し、計算機と
辞書メモリとの組合せによりソフトウエア構成で実現す
るようにしてもよいことは勿論である。In the above embodiment, the case where encoding and decoding are performed with a hardware configuration as shown in FIGS. 2 and 7 is taken as an example, but the present invention is not limited to this. Without limitation, the functions of the dynamic dictionary encoding unit 10, the slide dictionary encoding unit 20, the slide dictionary decoding unit 30, and the dynamic dictionary decoding unit 40 are realized by a control program, and a combination of a computer and a dictionary memory is used. Of course, it may be realized by a software configuration.

【００６４】[0064]

【発明の効果】以上説明してきたように本発明によれ
ば、スライド辞書符号化に先立って動的辞書型符号化で
高頻度で出現する文字列が１文字に圧縮できるため、ス
ライド辞書型符号化においてより多くのデータをスライ
ド辞書に格納でき、より多くの候補から最長一致する文
字列が検索できるため、高圧縮率を得ることができる。As described above, according to the present invention, a character string which appears frequently with dynamic dictionary coding prior to slide dictionary coding can be compressed into one character. In the conversion, more data can be stored in the slide dictionary, and the longest matching character string can be searched from more candidates, so that a high compression rate can be obtained.

【００６５】また、動的辞書型符号化にあっては、高頻
度で出現する文字列を対象に符号化を行っており、続い
て行われるスライド辞書型符号化にあっては、出現し易
い部分が短く表されることになり、スライド辞書型符号
化における最長一致文字列の一致長をより短くすること
ができ、最終的に可変長符号化することによって高圧縮
率を得ることができる。In the dynamic dictionary type encoding, encoding is performed on a character string which appears frequently, and in the subsequent slide dictionary type encoding, it is easy to appear. Since the part is represented short, the matching length of the longest matching character string in the slide dictionary type coding can be further shortened, and a high compression rate can be finally obtained by performing variable length coding.

【００６６】更に同じ文字列が繰返し出現するような場
合には、前処理として行われる動的辞書型符号化におい
て１語に圧縮されるため、スライド辞書型符号化のみを
行った場合の同じ文字列の繰返しを効率良く圧縮できな
い問題を解消し、高圧縮率を得ることができる。Further, when the same character string appears repeatedly, it is compressed to one word in the dynamic dictionary type coding performed as preprocessing, so that the same character when only the slide dictionary type coding is performed The problem that the repetition of columns cannot be efficiently compressed can be solved and a high compression ratio can be obtained.

[Brief description of the drawings]

【図１】本発明の原理説明図FIG. 1 is a diagram illustrating the principle of the present invention.

【図２】本発明による符号化装置の実施例構成図FIG. 2 is a configuration diagram of an embodiment of an encoding device according to the present invention.

【図３】本発明の符号化処理を示したフローチャートFIG. 3 is a flowchart showing an encoding process according to the present invention;

【図４】本発明の動的辞書型符号化として行うＬＺＪ符
号化アルゴリズムを示したフローチャートFIG. 4 is a flowchart illustrating an LZJ encoding algorithm performed as dynamic dictionary encoding according to the present invention;

【図５】図４のＬＺＪ符号化における出現頻度の計数と
閾値以上の頻度の文字列の符号化を示した説明図FIG. 5 is an explanatory diagram showing counting of appearance frequency and encoding of a character string having a frequency equal to or higher than a threshold value in the LZJ encoding of FIG. 4;

【図６】本発明のＱＩＣ１２２符号の符号化アルゴリズ
ムを示したフローチャートFIG. 6 is a flowchart showing an encoding algorithm of a QIC122 code according to the present invention.

【図７】本発明による復号化装置の実施例構成図FIG. 7 is a configuration diagram of an embodiment of a decoding device according to the present invention.

【図８】本発明の復号化処理を示したフローチャートFIG. 8 is a flowchart showing a decoding process according to the present invention.

【図９】本発明の動的辞書型復号化として行うＬＺＪ復
号化アルゴリズムを示したフローチャートFIG. 9 is a flowchart showing an LZJ decoding algorithm performed as dynamic dictionary decoding according to the present invention;

【図１０】スライド辞書型符号化の原理図FIG. 10 is a principle diagram of slide dictionary type encoding.

【図１１】従来のＱＩＣ１２２符号の符号化アルゴリズ
ムを示したフローチャートFIG. 11 is a flowchart showing an encoding algorithm of a conventional QIC122 code;

【図１２】ＱＩＣ１２２符号のフォーマット説明図FIG. 12 is an explanatory diagram of a format of a QIC122 code.

【図１３】図１３に使用したＢＮＦメタ言語の説明図FIG. 13 is an explanatory diagram of the BNF meta-language used in FIG.

【図１４】ＱＩＣ１２２符号による符号化の具体例を示
した説明図FIG. 14 is an explanatory diagram showing a specific example of encoding using a QIC122 code;

【図１５】従来のＬＺＷ符号化アルゴリズムを示したフ
ローチャートFIG. 15 is a flowchart showing a conventional LZW encoding algorithm;

【図１６】従来のＬＺＷ復号化アルゴリズムを示したフ
ローチャートFIG. 16 is a flowchart showing a conventional LZW decoding algorithm;

【図１７】従来のＬＺＪ符号化アルゴリズムを示したフ
ローチャートFIG. 17 is a flowchart showing a conventional LZJ encoding algorithm.

【図１８】従来のＬＺＪ復号化アルゴリズムを示したフ
ローチャートFIG. 18 is a flowchart showing a conventional LZJ decoding algorithm.

【図１９】ＬＺＪ符号化における文字列の登録を示した
説明図FIG. 19 is an explanatory diagram showing registration of a character string in LZJ encoding.

[Explanation of symbols]

１０：第１符号化手段（動的辞書符号化部）１１，３１：入力端子１２：入力バッファ１３，２３：照合回路１４，４２：辞書メモリ（動的辞書）１５，４１：レジスタ２０：第２符号化手段（スライド辞書符号化部）２１，３４：Ｑバッファ２２：Ｐバッファ（スライド辞書）２４：可変長符号化回路２５，４４：出力端子３０：第１復号化手段（スライド辞書復号化部）３２：可変長復号回路３３：切出し複製回路４０：第２復号化手段（動的辞書復号化部）４３：出力バッファ 10: first encoding means (dynamic dictionary encoding unit) 11, 31: input terminal 12: input buffer 13, 23: matching circuit 14, 42: dictionary memory (dynamic dictionary) 15, 41: register 20: 2 coding means (slide dictionary coding unit) 21, 34: Q buffer 22: P buffer (slide dictionary) 24: variable length coding circuit 25, 44: output terminal 30: first decoding means (slide dictionary decoding) Unit) 32: variable length decoding circuit 33: clipping duplication circuit 40: second decoding means (dynamic dictionary decoding unit) 43: output buffer

───────────────────────────────────────────────────── フロントページの続き (72)発明者千葉広隆神奈川県川崎市中原区上小田中1015番地富士通株式会社内 (56)参考文献特開昭63−294134（ＪＰ，Ａ) 特開昭62−137633（ＪＰ，Ａ) 特開平３−247167（ＪＰ，Ａ) 特開平３−209923（ＪＰ，Ａ) 特開平３−78322（ＪＰ，Ａ) 特開平４−116738（ＪＰ，Ａ) 特開平３−208469（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06F 5/00 H03M 7/30 - 7/40 ────────────────────────────────────────────────── ─── Continuation of the front page (72) Inventor Hirotaka Chiba 1015 Uedanaka, Nakahara-ku, Kawasaki City, Kanagawa Prefecture Inside Fujitsu Limited (56) References JP-A-63-294134 (JP, A) JP-A-62-137633 (JP, A) JP-A-3-247167 (JP, A) JP-A-3-209923 (JP, A) JP-A-3-78322 (JP, A) JP-A 4-116738 (JP, A) Kaihei 3-208469 (JP, A) (58) Field surveyed (Int. Cl. ⁷ , DB name) G06F 5/00 H03M ^7/ 30-7/40

Claims

(57) [Claims]

1. A first encoding means (10) for decomposing input data into sub-sequences and encoding the sub-sequences by expressing the sub-sequences with reference numbers of longest-matching sub-sequences registered in a dictionary; A second encoding for inputting the dictionary number encoded by the encoding means (10) as a character string, and encoding the encoded character string with the start position and the coincidence length of the encoded character string held in the dictionary which longest matches the input character string. Means (20).

2. The data compression method according to claim 1, wherein
The first encoding unit (10) includes an appearance frequency counting unit that counts the number of times of use of encoding of the substring registered in the dictionary, and searches for an encoded substring of the dictionary that longest matches the input character string. At this time, a data compression method characterized in that a reference number is coded and output only when the appearance frequency of the search subsequence is equal to or higher than a predetermined threshold, and the input subsequence is output as it is when the frequency is smaller than the threshold.

3. The data compression system according to claim 1, wherein said first encoding means (10) has a dictionary for registering encoded character strings with reference numbers attached thereto. A coded subsequence in the dictionary that longest matches the character subsequence is searched and encoded by specifying a reference number, and a subsequence obtained by adding the next input character to the reference number after the encoding is added to a new subsequence. A data compression method wherein a reference number is assigned and registered in the dictionary.

4. A data compression system according to claim 1, wherein said first encoding means (10) has a dictionary for registering encoded character strings with reference numbers, Searching for an encoded subsequence in the dictionary that longest matches the subsequence of the column and encoding by specifying by reference number, and after the encoding, each subsequence of the encoded character string is sequentially a prefix subsequence, A data compression method characterized in that a fixed-length subsequence is created by adding the prefix subsequence to a subsequence in the dictionary, and all the subsequences are registered in the dictionary.

5. A data compression system according to claim 1, further comprising a dictionary in which decoded data strings are sequentially stored, wherein code data encoded by said second encoding means (20) is inputted, and A first decoding unit (30) for decoding a dictionary number or a character string based on an appearance position and a matching length of the dictionary specified by the data; and a dictionary in which the decoded character string is registered with a reference number and registered. (1) a second decoding means (40) for decoding a character string by referring to a dictionary based on the decoded data decoded by the first decoding means (30).