JP3088740B2

JP3088740B2 - Data compression and decompression method

Info

Publication number: JP3088740B2
Application number: JP02269930A
Authority: JP
Inventors: 広隆千葉; 佳之岡田; 茂吉田; 泰彦中野
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1990-10-08
Filing date: 1990-10-08
Publication date: 2000-09-18
Anticipated expiration: 2015-09-18
Also published as: JPH04145726A

Description

【発明の詳細な説明】［概要］符号化済データを相異なる部分列に分けて辞書に登録
し、入力データを辞書中の部分列の内、最大長一致する
もの登録番号で指定して符号化し、また符号語を辞書を
使用して復号するユニバーサル符号化の一種であるLZW
符号によるデータ圧縮及び復元方式に関し、辞書の初期登録により圧縮率を高めることを目的と
し、１文字のみならず、出現頻度の高い１文字以上からな
る文字列の組に予め登録番号を割り当てて辞書に初期登
録するように構成する。DETAILED DESCRIPTION OF THE INVENTION [Summary] Encoded data is divided into different sub-sequences and registered in a dictionary, and input data is designated by a registration number having a maximum length match among the sub-sequences in the dictionary and encoded. LZW, which is a kind of universal encoding that converts and decodes code words using a dictionary
The data compression and decompression method using codes is intended to increase the compression ratio by initial registration of the dictionary, and to assign a registration number in advance to a set of character strings consisting of not only one character but also one or more characters with high appearance frequency. It is configured to initially register with.

［産業上の利用分野］本発明は、ユニバーサル符号の一種である増分分解型
の改良として知られたLZW符号によるデータ圧縮及び復
元方式に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a data compression and decompression method using an LZW code known as an improvement of an incremental decomposition type, which is a kind of universal code.

近年、文字コード，ベクトル情報，画像など様々な種
類のデータがコンピュータで扱われるようになってお
り、扱われるデータ量も急速に増加してきている。大量
のデータを扱うときは、データの中の冗長な部分を省い
てデータ量を圧縮することで、記憶容量を減らしたり、
速く伝送したりできるようになる。In recent years, various types of data such as character codes, vector information, and images have been handled by computers, and the amount of data handled has rapidly increased. When dealing with large amounts of data, you can reduce the storage capacity by compressing the amount of data by eliminating redundant parts of the data,
It can be transmitted quickly.

様々なデータを１つの方式でデータ圧縮できる方法と
してユニバーサル符号化が提案されている。Universal coding has been proposed as a method that can compress various data in one system.

ここで、本発明の分野は、文字コードの圧縮に限ら
ず、様々なデータに適用できるが、以下では、情報理論
で用いられている呼称を踏襲し、データの１ワード単位
を文字と呼び、データが任意ワードつながったものを文
字列と呼ぶことにする。Here, the field of the present invention is not limited to character code compression, and can be applied to various types of data. In the following, one word unit of data is called a character, following the name used in information theory, Data in which arbitrary words are connected is called a character string.

ユニバーサル符号の代表的な方法として、Ziv−Lempe
l（ジブ−レンペル）符号がある（詳しくは、例えば、
宗像『Ziv−Lempelのデータ圧縮法』，情報処理,Vol.2
6,No.1,1985年を参照のこと）。As a typical method of universal code, Ziv-Lempe
There is an l (Jib-Lempel) code (for example, for example,
Munakata "Ziv-Lempel Data Compression Method", Information Processing, Vol.2
6, No. 1, 1985).

Ziv−Lempel符号ではユニバーサル型と、増分分解型（Incremental parsing）の２つのアルゴリズムが提案されている。 For Ziv-Lempel codes, two algorithms, a universal type and an incremental parsing type, have been proposed.

さらに、ユニバーサル型アルゴリズムの改良として、
LZSS符号がある（T.C.Bell,“Better OPM/L Text Comp
ressin",IEEE Trans.on Commun.,Vol.COM−34,No.12,De
c.1986参照）。Furthermore, as an improvement of the universal algorithm,
LZSS code (TCBell, “Better OPM / L Text Comp
ressin ", IEEE Trans.on Commun., Vol.COM-34, No.12, De
c.1986).

また、増分分解型アルゴリズムの改良としては、LZW
（Lempel−Ziv−Welch）符号がある（T.A.Welch,“A Te
chnique for High−Performance Data Compressign",Co
mputer,June 1984参照）。In addition, as an improvement of the incremental decomposition algorithm, LZW
(Lempel-Ziv-Welch) code (TAWelch, "A Te
chnique for High-Performance Data Compressign ", Co
mputer, June 1984).

これらの符号の内、高速処理ができることと、アルゴ
リズムの簡単さからLZW符号が記憶装置のファイル圧縮
などで使われるようになっている。Among these codes, the LZW code is used for file compression of a storage device or the like because of its high speed processing and the simplicity of the algorithm.

［従来の技術］従来のLZW符号の符号化アルゴリズムを第８図に示
し、また復号化アルゴリズムを第９図に示す。[Prior Art] A conventional LZW code encoding algorithm is shown in FIG. 8, and a decoding algorithm is shown in FIG.

LZW符号化は、書き替え可能な辞書をもち、入力文字
コード・データ中を相異なる文字列に分け、この文字列
を出現した順に番号を付けて辞書に登録するとともに、
現在入力している文字列を辞書に登録してある最長一致
文字列の番号で表して符号化するものである。LZW encoding has a rewritable dictionary, divides input character code and data into different character strings, assigns numbers to the character strings in the order they appear, and registers them in the dictionary.
The character string currently input is represented and encoded by the number of the longest matching character string registered in the dictionary.

第８図のLZW符号化処理では、まずステップS1（以下
「ステップ」を省略）で予め全文字につき一文字から成
る文字列を初期値として辞書に登録してから符号化を始
める。またS1では入力した最初の文字Ｋにより辞書を検
索して参照番号ωを求め、これを語頭文字列（prefix s
tring）とする。In the LZW encoding process shown in FIG. 8, first, in step S1 (hereinafter, "step" is omitted), a character string consisting of one character for all characters is registered in a dictionary as an initial value before encoding starts. In step S1, the dictionary is searched using the first character K that has been input to obtain a reference number ω.
tring).

次にS2で入力データの次の文字Ｋを読み込み、S3で文
字入力が終了したか否かをチェックした後、S4に進んで
S1で求めた語頭文字列ωにS2で読み込んだ文字Ｋを加え
た文字列ωＫが辞書にあるか否か探す。Next, the next character K of the input data is read in S2, and it is checked in S3 whether or not the character input has been completed.
A search is performed to determine whether or not the dictionary has a character string ωK obtained by adding the character K read in S2 to the initial character string ω obtained in S1.

S4で文字列ωＫが辞書になければ、S6に進んでS1で求
めた文字Ｋの参照番号ωを符号語code（ω）として出力
し、また文字列（ωＫ）に新たな参照番号を付加して辞
書に登録し、さらにS2の入力文字Ｋを参照番号ωに置き
換えるとともに、辞書アドレスｎをインクリメントとし
てS2に戻って次の文字Ｋを読み込む。If the character string ωK is not found in the dictionary in S4, the process proceeds to S6, where the reference number ω of the character K obtained in S1 is output as a codeword code (ω), and a new reference number is added to the character string (ωK). Then, the input character K of S2 is replaced with the reference number ω, the dictionary address n is incremented, and the process returns to S2 to read the next character K.

一方、S4で文字列（ωＫ）が辞書にあれば、S5で文字
列（ωＫ）を参照番号ωに置き換え、再びS2に戻って文
字列（ωＫ）が辞書から探せなくなるまで最大一致長の
探索を続ける。On the other hand, if the character string (ωK) is found in the dictionary in S4, the character string (ωK) is replaced with the reference number ω in S5, and the process returns to S2 to search for the maximum matching length until the character string (ωK) cannot be found in the dictionary. Continue.

第10,11図を参照してLZW符号化を具体的に説明すると
次のようになる。The LZW encoding will be specifically described with reference to FIGS.

まず第10図の入力データINPUTは左から右へ読み込
む。最初の文字ａを入力したとき、辞書には文字ａの他
に一致する文字列がないので、OUTPUT CODE1（参照番号
ω）を符号語として出力する。そして文字ａを新たな語
頭文字列ωとする。First, the input data INPUT in FIG. 10 is read from left to right. When the first character a is input, there is no matching character string other than the character a in the dictionary, so OUTPUT CODE1 (reference number ω) is output as a code word. Then, the character a is set as a new initial character string ω.

次に２番目の文字ｂを入力し、この入力文字ｂを語頭
文字列ωに加えた文字列ωＫ＝abは辞書にないことか
ら、文字ｂのOUTPUT CODE2を符号語として出力する。そ
して、拡張した文字列abに参照番号４をつけて辞書に登
録する。実際の登録は第11図の右側に示すように文字列
1bの形で登録される。そして２番目に入力した文字ｂが
新たな語頭文字列ωとなる。Next, the second character b is input, and since the character string ωK = ab obtained by adding the input character b to the initial character string ω is not in the dictionary, the OUTPUT CODE2 of the character b is output as a code word. Then, the extended character string ab is assigned a reference number 4 and registered in the dictionary. The actual registration is a character string as shown on the right side of FIG.
Registered in the form of 1b. Then, the character b input second becomes the new initial character string ω.

続いて３番目の文字ａを入力したとすると、入力文字
ａに語頭文字列ωを加えた拡張文字列ωＫ＝ba＝2aは辞
書にないことから、文字ｂのOUTPUT CODE2を符号語とし
て出力した後、拡張文字列ωＫ＝baを2aで表わし、参照
番号５を付けて辞書に登録する。そして３番目に入力し
た文字ａが新たな語頭文字列ωとなる。Then, if the third character a is input, the extended character string ωK = ba = 2a obtained by adding the initial character string ω to the input character a is not in the dictionary, so the OUTPUT CODE2 of the character b is output as a code word. After that, the extended character string ωK = ba is represented by 2a, and is added to the reference number 5 and registered in the dictionary. Then, the third input character a becomes a new initial character string ω.

４番目の入力文字ｂについては拡張文字列ωＫ＝abは
符号語４として既に辞書に登録されているので、文字列
ωＫを新たな語頭文字列ωとし、５番目の文字ｃを入力
して拡張文字列ωＫ＝4c＝abcを作る。この拡張文字列
ωＫ＝abcは辞書に登録されていないことから、文字列a
b＝1bのOUTPUT CODE4を符号語として出力し、拡張文字
列ωＫ＝abcを辞書に4cの形で参照番号６を付けて登録
する。以下、同様にこの処理を続ける。For the fourth input character b, the extended character string ωK = ab is already registered in the dictionary as the code word 4, so the character string ωK is set as a new initial character string ω, and the fifth character c is input. An extended character string ωK = 4c = abc is created. Since this extended character string ωK = abc is not registered in the dictionary, the character string a
OUTPUT CODE4 of b = 1b is output as a code word, and the extended character string ωK = abc is registered in the dictionary with reference number 6 in the form of 4c. Hereinafter, this process is similarly continued.

第９図の復号化処理は第８図の符号化の逆の操作を行
う。The decoding process of FIG. 9 performs the reverse operation of the encoding of FIG.

第９図の復号化では、符号化と同様に予め辞書に全文
字につき一文字からなる文字列を初期値として登録して
から復号を始める。In the decoding shown in FIG. 9, similarly to the encoding, the decoding is started after a character string consisting of one character for every character is previously registered in the dictionary as an initial value.

まずS1で最初の符号（参照番号）を読み込み、現在の
CODEをOLDcodeとし、最初の符号は既に辞書に登録され
た一文字の参照番号のいずれかに該当することから、入
力符号CODEに一致する文字code（Ｋ）を探し出し、文字
Ｋを出力する。なお、出力した文字（Ｋ）は後の例外処
理のためFINcharにセットしておく。First, the first code (reference number) is read in S1, and the current code is read.
CODE is OLDcode, and since the first code corresponds to one of the reference numbers of one character already registered in the dictionary, a character code (K) that matches the input code CODE is searched for and the character K is output. The output character (K) is set in FINchar for later exception processing.

次にS2に進んで次の符号を読み込んでCODEにINcodeと
してセットする。S3で新たな符号があるか否か、すなわ
ち符号入力の終了の有無をチェックしてS4に進み、S3で
入力された符号CODEが辞書に定義（登録）されているか
否かチェックする。通常、入力した符号語は前回までの
処理で辞書に登録されているため、S5に進んで符号CODE
に対応する文字列code（ωＫ）を辞書から読み出し、S6
で文字列Ｋを一時的にスタックし、参照番号code（ω）
を新たなCODEとして再度S5に戻り、このS5,S6の手順を
再帰的に参照番号ωが一文字に至るまで繰り返し、最後
にS7に進んでS6でスタックした文字をLILO（Last In Fa
st Out）形式でポップアップして出力する。同時にS7に
おいて、前回使った符号ωと今回復元した文字列の最初
に一文字Ｋを組（ω,K）と表した文字列に、新たな参照
番号を付加して辞書に登録する。Then, the process proceeds to S2, where the next code is read and set as CODE in INCODE. In S3, it is checked whether there is a new code, that is, whether or not the code input has been completed, and the process proceeds to S4, where it is checked whether the code CODE input in S3 is defined (registered) in the dictionary. Normally, the input code word is registered in the dictionary in the previous processing, so the process proceeds to S5 to execute the code CODE.
A character string code (ωK) corresponding to is read from the dictionary, and S6
Temporarily stacks the character string K with reference number code (ω)
Is returned to S5 again as a new CODE, and the steps of S5 and S6 are recursively repeated until the reference number ω reaches one character. Finally, the process proceeds to S7, where the characters stacked in S6 are LILO (Last In Fa).
output in pop-up format. At the same time, in S7, a new reference number is added to the character string represented as a set (ω, K) in which the character ω used last time and the character string restored this time are the first character K and registered in the dictionary.

第12図を参照して復号化処理を具体的に説明すると次
のようになる。The decoding process is specifically described below with reference to FIG.

まず第12図で最初の入力符号は１であり、一文字a,b,
cについては既に参照番号1,2,3として第２表に示すよう
に辞書に登録されているため、辞書の参照により符号１
に一致する参照番号の文字列ａに置き換えて出力する。
次の符号２についても同様にして文字ｂに置き換えて出
力する。このとき前回処理した符号と今回復号した最初
の一文字ｂとを組み合わせた1b（＝ab）に新たな参照番
号４を付加して辞書に登録する。First, in FIG. 12, the first input code is 1, and one character a, b,
Since c is already registered in the dictionary as reference numbers 1, 2, and 3 as shown in Table 2, the reference 1
Is replaced with the character string a of the reference number that matches with.
Similarly, the next code 2 is replaced with the character b and output. At this time, a new reference number 4 is added to 1b (= ab) obtained by combining the previously processed code and the first character b decoded this time and registered in the dictionary.

３番目の符号４は辞書の探索により1bからabと置き換
えて文字列abを出力する。同時に前回処理した符号２と
今回復号した文字列の１番目の文字ａとを組み合せた文
字列2a（＝ba）を新たな参照番号５を付加して辞書に登
録する。The third code 4 replaces 1b with ab by searching the dictionary and outputs a character string ab. At the same time, a character string 2a (= ba) obtained by combining the code 2 processed last time and the first character a of the character string decoded this time is added to the new reference number 5 and registered in the dictionary.

以下同様に、この処理を繰り返す。 Hereinafter, similarly, this processing is repeated.

第12図の復号化では次の例外処理がある。 In the decoding of FIG. 12, there is the following exception processing.

この例外処理は、第６番目の入力符号８の復号で生ず
る。符号８は復号時に辞書に定義されておらず、復号で
きない。この場合には、前回処理した符号５に前回復号
した文字列baの最初の一文字ｂを加えた文字列5bを求
め、さらに2ab,babと置き換えられて出力される。そし
て、文字列の出力語に前回の符号語５に今回復号した文
字列の文字ｂを加えた文字列5bに参照番号８を付加して
辞書に登録する。This exception processing occurs when the sixth input code 8 is decoded. The code 8 is not defined in the dictionary at the time of decoding and cannot be decoded. In this case, a character string 5b is obtained by adding the first character b of the previously decoded character string ba to the previously processed code 5 and further replaced with 2ab and bab and output. Then, a reference number 8 is added to a character string 5b obtained by adding the character b of the character string decoded this time to the previous code word 5 to the output word of the character string and registered in the dictionary.

この例外処理は第９図の復号化処理フローのS4,S8の
処理を通じて行われ、最終的にS7で文字列の出力と新た
な文字列に参照番号を付加して辞書への登録がS7で行わ
れる。This exception processing is performed through the processing of S4 and S8 in the decoding processing flow of FIG. 9, and finally, the output of the character string and the addition of the reference number to the new character string in S7 are registered in the dictionary in S7. Done.

なお、第8,9図の符号化／復号化処理は、同じ辞書を
作り出しながら行う。The encoding / decoding processing of FIGS. 8 and 9 is performed while creating the same dictionary.

［発明が解決しようとする課題］従来のLZW符号は、過去に出現したデータを辞書に登
録し、登録済みの情報を使用して圧縮を行っている。し
かし、過去に出現したデータを全て辞書に登録すること
が効果的な辞書の登録法であると断言できない。例え
ば、画像データの場合、オール白やオール黒パターンの
出現頻度が高い事は分かっており、、これらのデータを
逐次登録していたのでは、辞書登録がある程度進まない
と高い圧縮率を得ることができない問題がある。[Problem to be Solved by the Invention] In the conventional LZW code, data that appeared in the past is registered in a dictionary, and compression is performed using the registered information. However, it cannot be asserted that registering all data that has appeared in the past in the dictionary is an effective dictionary registration method. For example, in the case of image data, it is known that the appearance frequency of all white and all black patterns is high, and if these data are registered sequentially, a high compression ratio can be obtained if dictionary registration does not progress to some extent. There is a problem that can not be.

本発明は、このような従来の問題点に鑑みてなされた
もので、辞書の初期登録により圧縮率を高めることので
きるLZW符号によるデータ圧縮及び復元方式を提供する
ことを目的とする。The present invention has been made in view of such conventional problems, and has as its object to provide a data compression and decompression method using an LZW code that can increase a compression ratio by initial registration of a dictionary.

［課題を解決するための手段］第１図は本発明の原理説明図である。[Means for Solving the Problems] FIG. 1 is an explanatory view of the principle of the present invention.

まず本発明は、符号化済データを相異なる部分列に分
けて、この部分列を辞書10に登録しておき、入力データ
を辞書10中の部分列の内、最大長一致するもの登録番号
で指定して符号化する符号化手段100を備えたLZW符号に
よるデータ圧縮方式を対象とする。First, the present invention divides the encoded data into different sub-sequences, registers the sub-sequences in the dictionary 10, and inputs data with a registration number of the sub-sequences in the dictionary 10 that matches the maximum length. The present invention is directed to a data compression method using an LZW code including an encoding unit 100 that performs designation and encoding.

このようなデータ圧縮方式につき本発明にあっては、
入力文字１文字毎に予め登録番号を割り当てて辞書10に
初期登録すると共に、２文字以上で出現頻度の高い文字
列に予め登録番号を割り当てて辞書10に初期登録してお
き、この辞書10を使用して前記符号化手段100により入
力データを圧縮符号化することを特徴とする。また圧縮
すべきデータ中において１文字が複数連続する文字列、
例えばaa,aaa,aaaa,aaaaa等にに予め登録番号を割り当
てて辞書10に初期登録する。In the present invention regarding such a data compression method,
A registration number is assigned to each input character in advance and is initially registered in the dictionary 10, and a registration number is assigned in advance to the dictionary 10 by assigning a registration number to a character string having two or more characters and having a high appearance frequency. The encoding means 100 is used to compress and encode the input data. A character string in which one character is continuous in the data to be compressed;
For example, a registration number is previously assigned to aa, aaa, aaaa, aaaaa, and the like, and is initially registered in the dictionary 10.

さらに本発明は、符号化済データを相異なる部分列に
分けて、この部分列を辞書10に登録しておき、入力デー
タを辞書10中の部分列の内、最大長一致するもの登録番
号で指定して符号化された符号語を、辞書10を使用して
元のデータに復元する復号化手段200を備えたデータ復
元方式を対象とする。Further, according to the present invention, the encoded data is divided into different sub-sequences, the sub-sequences are registered in the dictionary 10, and the input data is registered by the registration number of the sub-sequences in the dictionary 10 which matches the maximum length. The present invention is directed to a data restoration method including a decoding unit 200 for restoring a designated and encoded codeword to original data using the dictionary 10.

このようなデータ復元方式についても本発明にあって
は、入力文字１文字毎に予め登録番号を割り当てて辞書
10に初期登録すると共に、２文字以上で出現頻度の高い
文字列に予め登録番号を割り当てて辞書10に初期登録し
ておき、この辞書10を使用して復号化手段200により入
力データを復元することを特徴とする。また圧縮すべき
データ中において１文字が複数連続する文字列、例えば
aa,aaa,aaaa,aaaaa等にに予め登録番号を割り当てて辞
書10に初期登録する。In the present invention, such a data restoration method is also applied to a dictionary in which a registration number is assigned in advance to each input character.
In addition to the initial registration in the dictionary 10, a registration number is assigned in advance to a character string having two or more characters and having a high frequency of occurrence, and is initially registered in the dictionary 10, and the decoding unit 200 uses the dictionary 10 to restore the input data. It is characterized by the following. In a data string to be compressed, a character string in which one character is continuous plural times, for example,
A registration number is assigned in advance to aa, aaa, aaaa, aaaaa, and the like, and is initially registered in the dictionary 10.

［作用］このような構成を備えた本発明のデータ圧縮及び復元
方式によれば、予め出現頻度の高い文字列、例えば同じ
文字が連続する文字列を、辞書の初期化時に個数に応じ
て登録番号に割り当てておき、登録済みの文字列λが出
現した場合には、登録番号を使用して圧縮処理を行うこ
とにより高圧縮を実現できる。[Operation] According to the data compression and decompression method of the present invention having such a configuration, a character string having a high appearance frequency, for example, a character string in which the same characters are consecutively registered in advance according to the number at the initialization of the dictionary. If a registered character string λ appears in advance when a character string is registered, a high compression can be realized by performing compression processing using the registered number.

［実施例］第２図は本発明の一実施例を示した実施例構成図であ
る。[Embodiment] Fig. 2 is an embodiment configuration diagram showing one embodiment of the present invention.

第２図において、12は制御手段としてのCPUであり、C
PU12に対してはプログラムメモリ14とデータメモリ26が
接続される。In FIG. 2, reference numeral 12 denotes a CPU as control means,
The program memory 14 and the data memory 26 are connected to the PU 12.

プログラムメモリ14にはコントロールソフト16、LZW
符号を用いた最大一致長検索を行なう最大一致長検索ソ
フト18、入力文字列をLZW符号に変換する符号化ソフト2
0、符号化ソフト20でLZW符号に変換された符号を元の文
字列に復元する復号化ソフト22、及び処理対象となる１
文字のそれぞれに加えて使用頻度の高い文字列λ、例え
ばこの実施例にあっては文字列λとして同一文字ａが２
〜15個連続した各文字列に登録番号を付けて初期登録す
る初期登録ソフト24を備える。The program memory 14 has control software 16 and LZW
Maximum match length search software 18 that performs maximum match length search using codes, encoding software 2 that converts input character strings to LZW codes
0, decoding software 22 for restoring the code converted to the LZW code by the encoding software 20 to the original character string, and 1 to be processed
In addition to each of the characters, a frequently used character string λ, for example, in this embodiment, the same character a
An initial registration software 24 is provided for initially registering a character string of up to 15 consecutive characters with a registration number.

一方、データメモリ26には、これから符号化しようと
する文字列或いはこれから復号化しようとする符号列を
格納するデータバッファ28と、LZW符号を対象とした符
号化及び復号化の際に逐次作成されながら使用される辞
書10を備える。On the other hand, the data memory 26 stores a character string to be encoded or a code string to be decoded from now on, and a data buffer 28 which is sequentially created at the time of encoding and decoding for the LZW code. It has a dictionary 10 which is used while using.

この第２図の実施例における本発明による辞書の初期
登録の次のようにして行われる。Initial registration of the dictionary according to the present invention in the embodiment of FIG. 2 is performed as follows.

まず、CPU12はコントロールソフト16による制御のも
とに初期登録ソフト24を起動し、辞書10の初期登録を行
なう。即ち、CPU12は初期登録ソフト24に基づき、処理
対象となる文字種における１文字のそれぞれに登録番号
（参照番号）を付けて辞書10に初期登録する。First, the CPU 12 activates the initial registration software 24 under the control of the control software 16, and performs initial registration of the dictionary 10. That is, based on the initial registration software 24, the CPU 12 assigns a registration number (reference number) to each of the characters of the character type to be processed and performs initial registration in the dictionary 10.

説明を簡単にするため処理対象としてa,b,cの３文字
を考えると、第５図の辞書構成説明図に示すように参照
番号1,2,3を付けて文字a,b,cが初期登録される。Assuming that three characters a, b, and c are to be processed for simplicity of description, as shown in the dictionary configuration explanatory diagram of FIG. 5, characters a, b, and c are assigned reference numerals 1, 2, and 3, and Initial registration.

これに加えて本発明にあっては、出現頻度の高い文字
列λとして文字ａが２〜10個連続した文字列を第５図の
辞書構成に示すように、aa,aaa,・・・,aaaaaaaaaaに登
録番号４〜12を付けて辞書に初期登録するる。尚、第５
図では、ａ×2,a×3,・・・,a×10として示している。In addition to this, in the present invention, as a character string λ having a high appearance frequency, a character string in which 2 to 10 consecutive characters a are consecutive as shown in the dictionary configuration of FIG. Initially register in the dictionary by assigning registration numbers 4 to 12 to aaaaaaaaaa. The fifth
In the figure, they are shown as a × 2, a × 3,..., A × 10.

このような辞書10に対する初期登録が済んだならば、
データメモリ26のデータバッファ28に対しては所望の入
力データ又は符号データを格納し、符号化ソフト20によ
る符号化処理或いは復号化ソフト22に基づく復号処理が
行われる。After the initial registration for such a dictionary 10,
Desired input data or encoded data is stored in the data buffer 28 of the data memory 26, and encoding processing by the encoding software 20 or decoding processing based on the decoding software 22 is performed.

本発明によるLZW符号の符号化アルゴリズムを第３図
に示し、復号化アルゴリズムを第４図に示す。FIG. 3 shows an encoding algorithm of the LZW code according to the present invention, and FIG. 4 shows a decoding algorithm.

ここで第５図の入力文字を対象として第３図のLZW符
号の符号化を説明すると、まずS1で文字コードｉを辞書
アドレスｉに１文字ずつ登録する初期登録に加え、予め
出現頻度が高い文字をλとし、文字λを１つだけでな
く、λの個数に応じて複数通りの文字列を辞書に登録し
ておく。例えば文字a,b,を対象とした第６図の辞書構成
の例では、文字ａを１個から10個までの連続する文字列
の各々を登録している。Here, the encoding of the LZW code in FIG. 3 will be described with reference to the input characters in FIG. 5. First, in S1, the character code i is registered in the dictionary address i one character at a time. A character is λ, and not only one character λ but also a plurality of types of character strings are registered in the dictionary according to the number of λ. For example, in the example of the dictionary configuration of FIG. 6 for characters a and b, one to ten consecutive character strings of the character a are registered.

S1の初期登録が済んだならば、S2〜S7の処理により初
期登録した文字λが複数連続した場合の処理を行う。即
に、S3でＫ＝λを判別してS4で文字λの連続個数を示す
λカウンタを１つインクリメントし、入力文字Ｋが文字
λでなくなるまで繰り返す。即ち、第５図の１番目の入
力文字ａは文字λであることから、S3からS4に進んでλ
カウンタを１つインクリメントする。次の入力文字ｂは
文字λでないことからS5に進み、このときλカウンタ＝
１であることからS6へ進んでλカウンタの値を符号コー
ドCODE（λ）＝CODE1として出力し、S7で２番目に入力
した文字ｂを新たな語頭文字列ωとする。When the initial registration of S1 is completed, a process in the case where a plurality of characters λ initially registered by the processes of S2 to S7 are consecutive. Immediately, K = λ is determined in S3, the λ counter indicating the continuous number of characters λ is incremented by one in S4, and the process is repeated until the input character K is no longer a character λ. That is, since the first input character a in FIG. 5 is the character λ, the process proceeds from S3 to S4 and proceeds to λ.
Increment the counter by one. Since the next input character b is not the character λ, the process proceeds to S5.
Since it is 1, the flow advances to S6 to output the value of the λ counter as the code code CODE (λ) = CODE1, and the character b input second in S7 is set as a new initial character string ω.

次のS8、S9、S11,S12,S13及びS22は第８図に示した従
来のLZW符号化処理と同じであり、カッコ内に第８図の
符号を示している。The following S8, S9, S11, S12, S13 and S22 are the same as the conventional LZW encoding processing shown in FIG. 8, and the reference numerals in FIG. 8 are shown in parentheses.

このような従来Ｓ同じLZW符号化の処理中に、本発明
にあっては、新たにS10から分岐してS14〜S21の処理に
至る出現頻度の高い文字λの処理が入る。In the present invention, during the processing of the LZW encoding that is the same as the conventional S, processing of a character λ with a high appearance frequency that newly branches from S10 to the processing of S14 to S21 is included.

具体的には、S7で２番目に入力した文字ｂを語数多文
字列ωとした後に、S8で３番目の文字ａを読み、S9を介
してS10でＫ＝λが判別されてS14の処理に進む。Specifically, after the character b input second in S7 is converted into a multi-word character string ω, the third character a is read in S8, K = λ is determined in S10 through S9, and the processing in S14 is performed. Proceed to.

S14にあっては、そのときの語頭文字列ω＝ｂを符号
コードCODE（ω）＝CODE2として出力した後、S15でλカ
ウンタ１を１つインクリメントし、S16で次の文字ｂを
読み、S18を介してS19でＫ＝λを判別し、この場合には
文字ｂはλでないことからS20で符号コードとしてcode
（λ）＝code1を出力し、λ＝０にリセットしした後に
再びS7に戻る。In S14, after the initial character string ω = b at that time is output as the code code CODE (ω) = CODE2, the λ counter 1 is incremented by one in S15, and the next character b is read in S16. K = λ is determined in S19 through S18. In this case, since the character b is not λ, code is used as a code in S20.
(Λ) = code1 is output, and after resetting to λ = 0, the process returns to S7 again.

これに対し第５図の13番目の入力文字からはλとして
の文字ａが７つ連続することから、この場合にはS15〜1
9の処理の繰り返しによりλカウンタの計数値λ＝７が
得られ、S20において符号コードとしてcode9を出力する
ようになる。On the other hand, from the 13th input character in FIG. 5, there are seven consecutive characters a as λ.
The repetition of the process of 9 yields the count value λ = 7 of the λ counter, and the code 9 is output as the code code in S20.

ここで10個以上の文字ａが連続する文字列の場合、例
えば15個のａが連続する場合は、参照番号12,7の２つ符
号コードを使って表現する。また、本発明のデータ符号
方式では、符号処理の途中でλに相当する文字が出現し
た場合、S14においてそれまでの文字列ωを符号コード
として出力し、文字列ωＫ＝ωλの辞書登録は行わな
い。Here, in the case of a character string in which 10 or more characters a are continuous, for example, in the case where 15 characters a are continuous, they are represented using two code codes of reference numerals 12 and 7. In the data encoding method of the present invention, when a character corresponding to λ appears during the encoding process, the character string ω up to that point is output as a code code in S14, and the dictionary of the character string ωK = ωλ is registered. Absent.

次に第４図の本発明による復号処理を説明する。 Next, the decoding process according to the present invention shown in FIG. 4 will be described.

第４図の復号処理にあっては、S1〜S4,S8及びS9が出
現頻度の高い文字λの連続個数を示す符号コードを復号
するために設けられており、それ以外の処理は第９図に
示した従来の復号化処理と同じであり、括弧内の符号で
対応関係を示す。In the decoding process of FIG. 4, S1 to S4, S8 and S9 are provided for decoding a code code indicating the continuous number of characters λ having a high appearance frequency, and the other processes are performed in FIG. Is the same as the conventional decoding processing shown in FIG.

いま第７図に示す入力符号コードを復元する場合を例
にとると、12番目の符号コードまでは従来の復号と同じ
である。但し、符号コードcode1については、S4又はS9
でλ＝ａを１個分出力している。Taking the case of restoring the input code code shown in FIG. 7 as an example, the decoding up to the twelfth code code is the same as the conventional decoding. However, for the code code code1, S4 or S9
Output one λ = a.

第７図の13番目の符号コードcode9については、S8で
λコードであることが判別されてS9に進み、符号コード
の値９による辞書の参照により７個分の文字ａを出力す
る。The 13th code code code9 in FIG. 7 is determined to be a λ code in S8, and the process advances to S9 to output seven characters a by referring to the dictionary with the code code value 9.

尚、上記の実施例にあっては、初期登録する文字列と
して同じ文字が複数連続する文字列を例にとるものであ
ったが、本発明はこれに限定されず、出現頻度が高い文
字列であれば適宜の文字列を初期登録するようにしても
よい。In the above-described embodiment, a character string in which a plurality of the same characters are consecutive is used as an example of a character string to be initially registered. However, the present invention is not limited to this. If so, an appropriate character string may be initially registered.

［発明の効果］以上説明したように本発明によれば、文字列の使用頻
度を考慮して辞書に予め複数の個数を表現した文字列を
登録使用しておき、この辞書を使用して圧縮復元を行う
事により、LZW符号のデータ圧縮の圧縮率を更に高い圧
縮率とすることができる。[Effects of the Invention] As described above, according to the present invention, character strings representing a plurality of numbers are registered and used in advance in a dictionary in consideration of the frequency of use of the character strings, and compressed using the dictionary. By performing the restoration, the compression ratio of the data compression of the LZW code can be further increased.

[Brief description of the drawings]

第１図は本発明の原理説明図；第２図は本発明の実施例構成図；第３図は本発明によるLZW符号の符号化の処理フロー
図；第４図は本発明によるLZW符号の復号の処理フロー図；対５図は本発明によるLZW符号の符号化説明図；第６図は本発明による辞書構成説明図；第７図は本発明によるLZW符号の復元説明図；第８図は従来のLZW符号化処理フロー図；第９図は従来のLZW復号化処理フロー図；第10図は従来のLZW符号化説明図；第11図は従来の辞書構成例の説明図；第12図は従来のLZW復号説明図である。図中、 10:辞書 12:CPU 14:プログラムメモリ 16:コントロールソフト 18:最大一致長検索ソフト 20:符号化ソフト 22:復号化ソフト 24:初期登録ソフト 26:データメモリ 28:データバッファ 100:符号化手段 200:復号化手段FIG. 1 is a diagram for explaining the principle of the present invention; FIG. 2 is a block diagram of an embodiment of the present invention; FIG. 3 is a flowchart of an LZW code encoding process according to the present invention; FIG. 5 is an explanatory diagram of LZW code encoding according to the present invention; FIG. 6 is an explanatory diagram of dictionary configuration according to the present invention; FIG. 7 is an explanatory diagram of LZW code restoration according to the present invention; FIG. 9 is a conventional LZW encoding processing flow diagram; FIG. 9 is a conventional LZW decoding processing flowchart; FIG. 10 is a conventional LZW encoding explanatory diagram; FIG. 11 is an explanatory diagram of a conventional dictionary configuration example; The figure is an explanatory diagram of conventional LZW decoding. In the figure, 10: Dictionary 12: CPU 14: Program memory 16: Control software 18: Maximum matching length search software 20: Encoding software 22: Decoding software 24: Initial registration software 26: Data memory 28: Data buffer 100: Code Decryption means 200: decryption means

───────────────────────────────────────────────────── フロントページの続き (72)発明者中野泰彦神奈川県川崎市中原区上小田中1015番地富士通株式会社内 (56)参考文献特開昭60−116228（ＪＰ，Ａ) 特開昭61−232724（ＪＰ，Ａ) 特開昭63−209228（ＪＰ，Ａ) 特開昭63−209229（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) H03M 7/40 ────────────────────────────────────────────────── ─── Continuation of front page (72) Inventor Yasuhiko Nakano 1015 Uedanaka, Nakahara-ku, Kawasaki-shi, Kanagawa Prefecture Inside Fujitsu Limited (56) References JP-A-60-116228 (JP, A) JP-A-61-232724 (JP, A) JP-A-63-209228 (JP, A) JP-A-63-209229 (JP, A) (58) Fields investigated (Int. Cl. ⁷ , DB name) H03M 7/40

Claims

(57) [Claims]

An encoded data is divided into different sub-sequences, the sub-sequences are registered in a dictionary (10), and input data is stored in a maximum length of the sub-sequences in the dictionary (10). In a data compression system including an encoding means (100) for designating and encoding a match with a registration number, a registration number is assigned in advance to each input character, and is initially registered in the dictionary (10). The dictionary (10) in which a registration number is previously assigned to a character string having a high appearance frequency of two or more characters.
A data compression method, wherein input data is compression-coded by the coding means (100) using the dictionary (10).

2. A data compression method according to claim 1, wherein a registration number is previously assigned to a character string in which a plurality of characters have a plurality of consecutive characters in the data to be compressed, and the character string is initially registered in said dictionary (10). Data compression method.

3. The encoded data is divided into different subsequences, the subsequences are registered in a dictionary (10), and the input data is stored in a maximum length of the subsequences in the dictionary (10). Decoding means (20) for restoring the code word specified and coded by the registration number to the original data using the dictionary (10)
0), a registration number is assigned to each input character in advance and is initially registered in the dictionary (10), and a registration number is assigned in advance to a character string having two or more characters and having a high appearance frequency. Allocate the dictionary (10)
Data decoding method, wherein input data is restored by the decoding means (200) using the dictionary (10).

4. The data restoration method according to claim 3, wherein a registration number is previously assigned to a character string in which a plurality of characters have a plurality of consecutive characters in the data to be compressed, and the character string is initially registered in the dictionary (10). Data restoration method.