JPH04149766A

JPH04149766A - Data compressing and restoring system

Info

Publication number: JPH04149766A
Application number: JP2275835A
Authority: JP
Inventors: Shigeru Yoshida; 茂吉田; Yoshiyuki Okada; 佳之岡田; Yasuhiko Nakano; 泰彦中野; Hirotaka Chiba; 広隆千葉
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1990-10-15
Filing date: 1990-10-15
Publication date: 1992-05-22
Anticipated expiration: 2013-11-18
Also published as: JP2825960B2

Abstract

PURPOSE:To simplify codes so as to obtain a high compression rate by making it possible to register the initial values of plural dictionaries corresponding to a history state and adopting also the history of precedently appeared character strings for a character string in coding. CONSTITUTION:A dictionary 10 is constituted to a dictionary group consisting of the prescribed number of dictionaries 10-1 to 10-N less than the total number of character sorts to be processed and all the character sorts are initially registered in each of respective dictionaries 10-1 to 10-N by allocating a reference number to each character. At the time of coding an input character string, a specific dictionary 10-i in the dictionary group is specified and coded in accordance with subordination with a precedently coded character string, i.e. index information indicating the history, and when the inputted character string is not registered, a character string obtained by adding the succeeding one character to the reference number of the coded character string is registered as a new reference number. Thus, the initial registration of plural dictionaries can be simplified by collecting the subordination at the time of coding/decoding and the efficiency of coding can be improved.

Description

【発明の詳細な説明】［概要］符号化が済んだ直前文字列の最終文字との従属関係（履
歴）に基づく索引で複数辞書の中の１つを指定し、入力
文字列を指定辞書に登録した既に符号化済みの部分列の
内、最大長一致する部分列の参照番号で指定してＬＺＷ
符号に符号化すると共にＬＺＷ符号から文字列を復元す
るデータ圧縮及び復元方式に関し、複数辞書の初期登録を簡単にして符号化効率を向上する
ことを目的とし、処理対象となる全文字種の数より少ない所定数の辞書か
ら成る辞書群で構成して各辞書毎に全文字種を１文字ｗ
位に参照番号を付けて初期登録するように構成する。[Detailed Description of the Invention] [Summary] Specifies one of multiple dictionaries with an index based on the subordination relationship (history) with the last character of the immediately preceding encoded character string, and inputs the input character string into the specified dictionary. LZW by specifying the reference number of the subsequence that matches the maximum length among the registered subsequences that have already been encoded.
Regarding the data compression and restoration method that encodes character strings into codes and restores character strings from LZW codes, the purpose is to simplify the initial registration of multiple dictionaries and improve encoding efficiency. It consists of a dictionary group consisting of a small predetermined number of dictionaries, and each dictionary contains all character types in one character w.
The configuration is configured so that the initial registration is performed by attaching a reference number to the position.

［産業上の利用分野］本発明は、ユニバーサル符号の一種である増分分解型の
改良として知られたＬＺＷ符号によるデータ圧縮及び復
元方式に関する。[Field of Industrial Application] The present invention relates to a data compression and decompression method using an LZW code, which is known as an improved incremental decomposition type of universal code.

近年、文字コード、ベクトル情報、画像なと様々な種類
のデータがコンピュータで扱われるようになっており、
扱われるデータ量も急速に増加してきている。大量のデ
ータを扱うときは、データの中の冗長な部分を省いてデ
ータ量を圧縮することで、記憶容量を減らしたり、速く
伝送したりできるようになる。In recent years, computers have come to handle various types of data such as character codes, vector information, and images.
The amount of data handled is also rapidly increasing. When handling large amounts of data, by compressing the amount of data by eliminating redundant parts, you can reduce storage capacity and speed up transmission.

このように様々なデータを１つの方式でデータ圧縮でき
る方法としてユニバーサル符号化が提案されている。Universal encoding has been proposed as a method that can compress various types of data using one method.

ここで、本発明の分野は、文字コードの圧縮に限らず、
様々なデータに適用できるが、以下では、情報理論で用
いられている呼称を踏襲し、データの１ワ一ド単位を文
字と呼び、データが任意ワードつながったものを文字列
と呼ぶことにする。Here, the field of the present invention is not limited to character code compression.
Although it can be applied to a variety of data, in the following we will follow the nomenclature used in information theory, and refer to a single word unit of data as a character, and a string of arbitrary words of data. .

ユニバーサル符号の代表的な方法として、ジブレンベル
（２ｉｖ−Ｌｅｍｐｅｌ）符号がある（詳しくは、例え
ば宗像１”　２ｉｙ−Ｌｃｍｐｃｌのデータ圧縮性Ｊ１
情報処理、Ｖｏｌ、　２６．　Ｎｏ、　Ｉ、　１９８５
年を参照のこと）。A representative method of universal code is the Giblen-Bell (2iv-Lempel) code.
Information Processing, Vol. 26. No. I, 1985
(see year).

２ｉｖ−Ｌｅｍｐｅｌ符号では ■ユニバーサル型と、 ■増分分解型（Ｉｎｃｒｅｍｅｎｔａｌ　ｐａｒｓｉｎ
ｇ　）の２つのアルゴリズムが提案されている。2iv-Lempel code has two types: ■Universal type and ■Incremental parsin type.
g) Two algorithms have been proposed.

更に、ユニバーサル型アルゴリズムの改良として、ＬＺ
ＳＳ符号がある（Ｔ、　Ｃ，Ｂｅ　ｌ　Ｉ、　　“Ｂｅ
ｔｔｅｒ　０ＰＭ／Ｌ　Ｔｅ［Ｃｏｍｐｒｅｓｓｉｏｎ
　　　ＩＥＥＥ　Ｔｒａｎｓ、　　ｏｎ　　Ｃｏ５ｎｕ
ｎ、　、　Ｖｏｌ、　Ｃ０Ｍ−３４，Ｎｏ、　１２．　
Ｄｅｃ、　　１９８６参照）。Furthermore, as an improvement of the universal algorithm, LZ
There are SS codes (T, C, Be l I, “Be
tter 0PM/L Te[Compression
IEEE Trans, on Co5nu
n, , Vol, C0M-34, No, 12.
(See Dec. 1986).

また、増分分解型アルゴリズムの改良としては、Ｌ　Ｚ
Ｗ　（Ｌｃｍｐｃｌ−２ｉｙ−Ｗｅｌｃｈ）符号がある
（Ｔ、　Ａ、　Ｗｅｃｈ、　＝Ａ　Ｔｅｃｈｎｉｑｕｅ
　ｆｏｒ　Ｈｉｇｈ−Ｐｅｒｌｏｔｍｘｎｃｅ　Ｄｘｔ
ｘＣｏｍｐｒｅｓｓｉｏｎ　　、　Ｃｏｍｐｕｔｅｒ、
　Ｊａｎｅ　１９８４参照）０これらの符号の内、高速
処理ができることと、アルゴリズムの簡単さからＬＺＷ
符号が記憶装置のファイル圧縮などで使われるようにな
っている。Moreover, as an improvement of the incremental decomposition type algorithm, L Z
There is a W (Lcmpcl-2iy-Welch) code (T, A, Wech, =A Technique
for High-Perlotmxnce Dxt
xCompression, Computer,
(Refer to Jane 1984) 0 Among these codes, LZW is preferred due to its high-speed processing and simple algorithm.
Codes are now used for file compression on storage devices.

［従来の技術］従来のＬＺＷ符号の符号化アルゴリズムのフローチャー
トを第５図に示し、また復号化アルゴリズムのフローチ
ャートを第６図に示す。[Prior Art] A flowchart of a conventional LZW code encoding algorithm is shown in FIG. 5, and a flowchart of a decoding algorithm is shown in FIG.

まずＬＺＷ符号化は、書き替え可能な辞書を持ち、入力
文字列の中を相異なる文字列に分け、この文字列を出現
した順に番号を付けて辞書に登録すると共に、現在入力
している文字列を辞書に登録しである最長−散文字列の
参照番号だけで表して符号化するものである。First, LZW encoding has a rewritable dictionary, divides the input string into different strings, numbers these strings in the order of their appearance, and registers them in the dictionary. In this method, a string is registered in a dictionary and encoded by representing only the reference number of the longest scattered character string.

第７図にＬＺＷ符号化の具体例を示し、また第９図にＬ
ＺＷ復号化の具体例を示し、さらに第８図に符号化と復
号化で使用される辞書の内容を示す。尚、第７．　８．
　９図にあっては、説明を簡単にするためａｂｃの３文
字の組合せからなる文字列を圧縮、復元する場合を例に
とっている。Figure 7 shows a specific example of LZW encoding, and Figure 9 shows LZW encoding.
A specific example of ZW decoding is shown, and FIG. 8 shows the contents of a dictionary used in encoding and decoding. In addition, No. 7. 8.
In order to simplify the explanation, FIG. 9 takes as an example a case where a character string consisting of a combination of three characters abc is compressed and restored.

まず第５図のＬＺＷ復号化の処理を説明すると次のよう
になる。First, the LZW decoding process shown in FIG. 5 will be explained as follows.

ステップＳｌ（以下、［ステップＪを省略）：予め全文
字につき１文字からなる文字列を初期値として登録して
から符号化を始める。Ｓｌの符号化は入力した最初の文
字Ｋにより辞書を検索して参照番号ωを求め、これを語
頭文字列（ｐ「ｅｆｉｘ　ｓｔｒｉｎｇ　）とする。Step Sl (hereinafter, [Step J is omitted): A character string consisting of one character for each character is registered in advance as an initial value, and then encoding is started. To encode Sl, a dictionary is searched using the input first character K to obtain a reference number ω, and this is set as a word-initial string (p"efix string).

Ｓ２．入力データの次の文字Ｋを読み込む。S2. Read the next character K of the input data.

Ｓ３；文字入力が終了したか否かをチエツクする。S3: Check whether character input is completed.

Ｓ４　：　Ｓｌで求めた語頭文字列ωに８２で読み込ん
だ文字Ｋを加えた（ωＫ）が辞書にあるか否か探す。S4: Search whether the dictionary contains the character K read in 82 (ωK) to the initial character string ω obtained in Sl.

Ｓ５：もし、Ｓ４で文字列（ωＫ）が辞書にあれば、Ｓ
５で文字列（ωＫ）を参照番号ωに置き換え、再びＳ２
に戻って文字列（ωＫ）が辞書から探せなくなるまで最
大一致長の探索を続ける。S5: If the character string (ωK) is in the dictionary in S4, S
5, replace the character string (ωK) with the reference number ω, and repeat S2
The search for the maximum match length is continued until the character string (ωK) cannot be found in the dictionary.

Ｓ６：もし、Ｓ４で文字列（ωＫ）が辞書になければ、
Ｓ６に進んでｓｌで求めた文字にの参照番号ωを符号語
ｃｏｄｅ　（ω）として出力し、また文字列（ωＫ）に
新たな参照番号を付加して辞書に登録し、更にＳ２の入
力文字Ｋを参照番号ωに置き換えると共に、辞書アドレ
スｎをインクリメントしてｓ２に戻って次の文字Ｋを読
み込む。S6: If the character string (ωK) is not in the dictionary in S4,
Proceed to S6, output the reference number ω for the character found in sl as the code word code (ω), add a new reference number to the character string (ωK) and register it in the dictionary, and then input the character string in S2. K is replaced with the reference number ω, the dictionary address n is incremented, and the process returns to s2 to read the next character K.

第７．８図を参照して具体的に説明すると次のようにな
る。A concrete explanation will be given below with reference to FIG. 7.8.

まず第７図の入力データ１ｎｐｕｔは左から右へと読む
。最初の文字ａを入力した時、辞書には文字ａの他に一
致する文字列がないので、０ＵＴＰＵＴ　Ｃ０ＤＥｌ（
参照番号ω）を符号語して出力する。そして文字ａを語
頭文字列ωとする。First, the input data 1nput in FIG. 7 is read from left to right. When you enter the first letter a, there are no matching strings in the dictionary other than the letter a, so 0UTPUT C0DEl(
The reference number ω) is output as a code word. Then, let the character a be the initial character string ω.

次に２番目の文字すを入力したとすると、この入力文字
を語頭文字列ωに加えた拡張文字列ωＫａｂは辞書にな
いことから、文字すの０ＩＩＴＰＩＩＴ　Ｃ０ＤＥ　２
を符号語として出力する。そして、拡張文字列ωに＝ａ
ｂに参照番号４を付けて辞書に登録する。実際の辞書登
録は第８図の右側に示すように文字列１ｂとして登録さ
れる。そして文字すが語頭文字列ωとなる。Next, if you input the second character S, the expanded character string ωKab, which is the addition of this input character to the initial character string ω, is not in the dictionary, so the character S is 0IITPIIT C0DE 2
is output as a code word. Then, in the extended string ω = a
Add reference number 4 to b and register it in the dictionary. In actual dictionary registration, the character string 1b is registered as shown on the right side of FIG. Then, the character S becomes the word-initial character string ω.

続いて３番目の文字ａを入力したとすると、文字ａに語
頭文字列ωを加えた拡張文字列ωに＝ｂａ＝２ａは辞書
にないこ吉から、文字すのＯＵ丁ＰＩ丁Ｃ０ＤＥ　２を
符号語として出力した後、拡張文字列ωに＝ｂａを２ａ
で表わし、参照番号５を付けて辞書に登録する。そし、
て文字ａが新たな語頭文字列ωとなる。If you then input the third character a, the expanded character string ω, which is the initial character string ω added to the character a, = ba = 2a is from Kokichi, which is not in the dictionary, so the character OU DING PI DING C0DE 2 After outputting as a code word, =ba is added to the extended string ω as 2a
, and register it in the dictionary with reference number 5. stop,
The letter a becomes a new word-initial string ω.

４番目の人力文字すについては拡張文字列ωに＝ａｂは
ｉ　ｂの符号語４として既に辞書に登録されているので
、文字列ωＫを新たな語頭文字列ωとし、５番目の文字
Ｃを入力して拡張文字列ωに＝４ｃ＝ａｂｅを作る。こ
の拡張文字列ωに＝ａｂｅは辞書に登録されていないこ
とから、文字列ａｂ＝１ｂの０ＵＴＰＵＴ　Ｃ０ＤＥ　
４を符号語トＬ　Ｔ：　出力し、拡張文字列ωに＝ａ　
ｂ　ｅを辞書に４０の形で符号語６として登録する。以
下同様に、この処理を続ける。For the fourth human-powered character string, = ab is already registered in the dictionary as the code word 4 of i b, so the character string ωK is set as a new initial character string ω, and the fifth character C Input , and create =4c=abe in the expanded character string ω. Since =abe is not registered in the dictionary in this extended character string ω, 0UTPUT C0DE of character string ab = 1b
4 as code word LT: Output and extend character string ω=a
b e is registered in the dictionary in the form of 40 as code word 6. This process continues in the same manner.

次に第６図の復号化処理を説明する。この復号化では、
符号化と同様に予め辞書に全文字につき一文字からなる
文字列を初期値として登録してから復号を始める。Next, the decoding process shown in FIG. 6 will be explained. In this decoding,
Similar to encoding, decoding is started after registering a character string consisting of one character for each character in the dictionary as an initial value.

Ｓｌ・最初の符号Ｃ０ＤＥを読み込み参照番号ωを復号
する。現在の参照番号ωを０１．Ｄωとし、最初の符号
は既に辞書に登録された一文字の参照番号いずれかに該
当することから、人力参照番号ωに一致する文字Ｄ　（
Ｋ）を探し出し２、文字Ｋを出力する。尚、出力した文
字には後の例外処理のためＦＩＮｃｈａ＋にセットして
おく。Sl・Read the first code C0DE and decode the reference number ω. Change the current reference number ω to 01. Dω, and the first code corresponds to one of the single-character reference numbers already registered in the dictionary, so the character D ( which matches the manual reference number ω)
K) is found 2 and the character K is output. Note that the output characters are set to FINcha+ for later exception processing.

Ｓ２６次の符号ＣＯ［）Ｅを読み込む、。S26 Read the next code CO[)E.

Ｓ３：新たな符号があるか否か、即ち符号入力の終了の
有無をチエツクする。S3: Check whether there is a new code, that is, whether code input has ended.

Ｓ４・読み込んだ符号ＣＯ［ＩＥから？窯番号（〕）を
復号し、ＩＮωとしてセ・ソトする。S4・Read code CO [from IE? Decode the kiln number () and set it as INω.

Ｓ５：５４で入力された符号Ｃ０ＤＥが辞書に登録され
ているか盃（ω≧ｎ）かチエ・ツクする。At S5:54, it is checked whether the code C0DE inputted is registered in the dictionary (ω≧n).

８６　通常、人力した符号語は前回までの処理で辞書に
登録されているため、Ｓ６に進んで参照番号ωに対応す
る文字列Ｄ（ω’Ｋ）を辞書から読み出す。86 Normally, the manually generated code word has been registered in the dictionary in the previous processing, so the process advances to S6 and the character string D (ω'K) corresponding to the reference number ω is read from the dictionary.

８７　文字列Ｋを一時的にスタックし、参照番号ω′を
新たなωとして再度Ｓ６に戻り、このＳ６の手順を再帰
的に参照番号ωが１之字に至まで繰り返す。87 The character string K is temporarily stacked, the reference number ω' is set as a new ω, the process returns to S6, and the procedure of S6 is recursively repeated until the reference number ω reaches the character 1.

Ｓ８　：　８７でスタックした文字を１．、　Ｉ　Ｌ　
Ｏ（Ｌａａｔ　Ｉｎ　Ｆａｓｔ　Ｏυ０形式てポツプア
ップして出力する。同時に、前回使った参照番号ＯｔＤ
ωと合同復元した文字列の最初の一文字Ｋを組（ＯＬＤ
ω、Ｋ）と表し５た文字列に、新たな参照番号ｎを付加
して辞書に登録する。S8: The characters stuck at 87 are 1. , IL
Pop up and output the O(Laat In Fast Oυ0 format. At the same time, the reference number OtD used last time is
Combine ω and the first character K of the jointly restored character string (OLD
A new reference number n is added to the character string expressed as ω, K) and registered in the dictionary.

このＬＺＷ復号処理を第９図について具体的に説明する
と次のようになる。This LZW decoding process will be specifically explained with reference to FIG. 9 as follows.

まず最初の人力符号は１てあり、１文字ａ、ｂ。First, the human code is 1, and the characters are a and b.

Ｃについては既に参照番号１．　２．　３として第１表
に示すように辞書に登録されているため、辞書の参照に
より符号１に一致する参照番号の文字列ａ１．：置き換
えて出力する。次の符号２についても同様にして文字ｂ
１．ｌ［き換えて出力する。このとき前回処理した符号
と今［口１復号（７た最初の一文字す、ｌ！：を紹み合
わせた（］ａｂに新たな参照番号４を付加し５て辞書に
登録−見る。Regarding C, reference number 1. 2. 3 is registered in the dictionary as shown in Table 1, so by referring to the dictionary, the character string a1. :Replace and output. Similarly, for the next code 2, the character b
1. l[Replace and output. At this time, a new reference number 4 is added to the previously processed code and the current [mouth 1 decoding (7, first character S, l!:) is added to the (]ab, and the code is registered in the dictionary).

３番目の符号４は辞書の探索により］、ｂからａｂと置
き換えて文字列ａｂを出力する。同時に前回処理した符
号２と今回復号した文字列の１番１」の文字ａとの組合
せた文字列２ａ（＝ｂａ）を新たな参照番号５を付加し
て辞書に登録する。The third code 4 searches the dictionary], replaces b with ab, and outputs the character string ab. At the same time, a character string 2a (=ba), which is a combination of the previously processed code 2 and the character a of the character string 1 and 1 of the currently decoded character string, is registered in the dictionary with a new reference number 5 added thereto.

以下同様に、この処理を繰り返す。This process is repeated in the same manner.

第９図の復号化ては次の例外処理がある６、この例外処
理は、第６番Ｌ１の入力符号８の復号で生ずる。符号８
は復号時に辞書に定義さねておらず、復号できない。こ
の場合には、前回処理し７た符号５に前回復号した文字
列ｂａの最初の−文字すを加えた文字列５ｂを求め、更
に２ａｂ、ｂａｂと置き換えられて出力される。そして
、文字列の出力語に前回の符号語５に今回復号した文字
列の文字すを加えた文字列５ｂに参照番号８を付加して
辞書に登録する。In the decoding of FIG. 9, there is the following exception process 6. This exception process occurs in the decoding of the input code 8 of No. 6 L1. code 8
is not defined in the dictionary at the time of decoding and cannot be decoded. In this case, a character string 5b is obtained by adding the first - character of the previously decoded character string ba to the previously processed code 5, which is then replaced with 2ab and bab and output. Then, a reference number 8 is added to a character string 5b obtained by adding the characters of the character string just decoded to the previous code word 5 to the output word of the character string, and the result is registered in the dictionary.

この例外処理は第６図の復号化処理フローの８５、Ｓ９
の処理を通じて行なわれ、最終的に８８で文字列の出力
と新たな文字列に参照番号を付加した辞書への登録が行
なわれる。This exception handling is performed at 85 and S9 in the decoding process flow in FIG.
Finally, in step 88, the character string is output and the new character string is registered with a reference number in the dictionary.

尚、第４図、第５図の符号化／復号化処理は、同じ辞書
を作り出しなから行なう。The encoding/decoding processes shown in FIGS. 4 and 5 are performed without creating the same dictionary.

［発明が解決しようとする課題］このように従来のＬＺＷ符号では、入力文字列の中を相
異なる文字列に分けて符号化するとき、現在符号化中の
各文字列は以前の文字列とは独立に出現するとして符号
化する形式を取っている。[Problem to be solved by the invention] As described above, in the conventional LZW code, when an input character string is divided into different character strings and encoded, each character string currently being encoded is different from the previous character string. is encoded as appearing independently.

この方法は、無記憶情報源の符号化には問題ない。しか
し、実際の文章等、多くのデータは記憶情報源とみなさ
れ、従来のＬＺＷ符号では文字列が出現する履歴を十分
利用できておらず、データ圧縮後も文字列の出現の従属
性については冗長性が残る欠点があった。This method has no problem in encoding memoryless information sources. However, many data such as actual sentences are considered as memory information sources, and conventional LZW codes cannot fully utilize the history of character string occurrences, and even after data compression, the dependence of character string occurrences cannot be determined. It had the drawback of remaining redundant.

このような欠点に対し本願発明者等は、符号化文字列に
対して直前の文字列の最終文字との従属関係、即ち履歴
を辞書に取り込むことによって文字列間の冗長性を削減
し、圧縮率を高めるようにしたデータ圧縮および復元方
式を提案としている。To address these drawbacks, the inventors of the present invention reduced the redundancy between character strings by incorporating the dependency relationship between the encoded character string and the last character of the immediately preceding character string, that is, the history, into a dictionary, and compressed the coded character string. We propose a data compression and decompression method that increases the efficiency.

具体的には第１０図に示すように、辞書１０を複数個の
辞書１０−１．１．０−２．１０−３．１０−４に分け
て索引を付けておき、例えば直前の文字列ａｂの最終文
字すを索引にして特定の辞書１０−２を選択する。各辞
書１０−１〜１０−４には、索引文字に後続してつなが
る文字列のみを登録しておく。Specifically, as shown in FIG. 10, the dictionary 10 is divided into a plurality of dictionaries 10-1.1.0-2.10-3.10-4 and indexed. A specific dictionary 10-2 is selected using the last character of ab as an index. In each dictionary 10-1 to 10-4, only character strings that follow the index character are registered.

この方法によれば、従来、辞書中の文字列を全体から見
た参照番号で指定していたのに対し、索引につながる系
列だけの参照番号で指定できるので小さい参照番号を使
用でき、ＬＺＷ符号を短く表現して符号化効率を向上さ
せることができる。According to this method, whereas conventionally character strings in a dictionary were specified using reference numbers that looked at the whole string, it is possible to specify using reference numbers only for sequences connected to the index, so small reference numbers can be used, and LZW code can be expressed in a short form to improve encoding efficiency.

しかし、この方法では次のように初期値の設定法が問題
となる。However, with this method, the following problem arises in how to set the initial value.

ＬＺＷ符号ではバイト単位にデータを扱うとき、符号語
を簡単にし、参照番号だけで表すため、２５６個の全文
字種を初期値として予め辞書に登録しておく。この初期
登録を第１０図の方法に適用すると、全て文字種２５６
を索引とした２５６子の辞書を使用し、各辞書毎に初期
登録することから、２５６ｘ２５６　（６４Ｋ）個を予
め登録しておくことが必要になる。実際には、６４に個
のうち使われないものも多いので、そのまま初期値を登
録する方法では使用しない文字針だけ参照番号が大きく
なって符号化効率が低下する。In the LZW code, when data is handled in units of bytes, all 256 character types are registered in the dictionary as initial values in order to simplify the code word and express it only with reference numbers. When this initial registration is applied to the method shown in Figure 10, all character types are 256.
Since a dictionary with 256 children is used as an index and initial registration is performed for each dictionary, it is necessary to register 256x256 (64K) pieces in advance. In reality, many of the 64 characters are not used, so if the initial value is registered as is, the reference numbers of the characters that are not used will become larger and the encoding efficiency will decrease.

そこで初期値を予め辞書に登録しておかず、１文字から
なる文字列が新たに出現したとき辞書に登録すれば、初
期値を登録しておくことの非効率を解決できる。しかし
、この方法を採ると、符号語を全て参照番号で表すこと
はできず、符号語が１文字からなる文字列は生データを
符号化するものと、２文字以上の文字列は参照番号を符
号化するものとに分ける必要があり、アルゴリズムが複
雑になる問題点があった。Therefore, the inefficiency of registering the initial value can be solved by not registering the initial value in the dictionary in advance and registering it in the dictionary when a character string consisting of one character newly appears. However, when this method is adopted, it is not possible to represent all code words with reference numbers; character strings where the code word consists of one character encode raw data, and character strings with two or more characters are represented by reference numbers. There was a problem in that the algorithm was complicated because it needed to be separated into what was to be encoded and what was to be encoded.

本発明は、このような問題点に鑑みてなされたもので、
符号化が済んだ直前文字列の最終文字との従属関係に基
づく索引で複数辞書の中の１つを指定して符号化及び復
元する場合、従属関係をまとめることで複数辞書の初期
登録を簡単にして符号化効率を向上するようにしたデー
タ圧縮及び復元方式を提供することを目的とする。The present invention was made in view of these problems, and
When specifying one of multiple dictionaries for encoding and restoration using an index based on the dependency relationship with the last character of the immediately preceding character string that has been encoded, initial registration of multiple dictionaries can be simplified by combining the dependency relationships. An object of the present invention is to provide a data compression and decompression method that improves encoding efficiency.

［課題を解決するための手段］第１図は本発明の原理説明図である。[Means to solve the problem] FIG. 1 is a diagram explaining the principle of the present invention.

まず本発明は、入力文字列を辞書１０に登録された既に
符号化済みの部分列の内、最大長一致する部分列の参照
番号で指定してＬＺＷ符号に符号化するデータ圧縮方式
に関する。First, the present invention relates to a data compression method that encodes an input character string into an LZW code by specifying the reference number of a substring that matches the maximum length among already encoded substrings registered in the dictionary 10.

このようなデータ圧縮方式として本発明にあっては、辞
書１０を、処理対象となる全文字種の数より少ない所定
数の辞書１０−１〜１０−Ｎから成る辞書群で構成して
各辞書１ｏ−１〜１０−Ｎ毎に全文字種を１文字単位に
参照番号を付けて初期登録しておく。In the present invention, as such a data compression method, the dictionary 10 is constituted by a dictionary group consisting of a predetermined number of dictionaries 10-1 to 10-N, which is smaller than the number of all character types to be processed. All character types are initially registered for each character from -1 to 10-N with a reference number attached to each character.

そして、入力文字列の符号化時には、以前に符号化済み
の文字列との従属関係（履歴）を示す索引情報に従って
辞書群の中の特定の辞書１０−ｉを指定して符号化し、
同時に指定辞書１０−ｉに入力文字列がなかった場合に
は、以前の符号化済み文字列の参照番号に次の１文字を
加えた文字列を新たな参照番号を付けて登録することを
特徴とする。When encoding an input character string, a specific dictionary 10-i in the dictionary group is designated and encoded according to index information indicating a dependency relationship (history) with previously encoded character strings,
At the same time, if there is no input character string in the specified dictionary 10-i, a character string obtained by adding the next character to the reference number of the previous encoded character string is registered with a new reference number. shall be.

ここで、入力文字列の符号化時には、直前に符号化済み
の文字列の最終文字コードの一部分から得られた索引情
報に従って辞書群の中の特定の辞書１０−１を指定する
。さらに具体的には、直前に符号化済みの文字列の最終
文字コードの上位ビットで示される索引情報に従って前
記辞書群の中の特定の辞書】０−１を指定する。Here, when encoding an input character string, a specific dictionary 10-1 in the dictionary group is designated according to index information obtained from a part of the final character code of the character string encoded immediately before. More specifically, a specific dictionary 0-1 in the dictionary group is specified according to index information indicated by the upper bits of the final character code of the character string encoded immediately before.

一方、入力文字列の符号化時には、直前に符号化済みの
文字列の最終文字コードによりルックアップテーブルを
参照して得られた索引情報に従って前記辞書群の中の特
定の辞１！１０−１を指定してもよい。具体的には、直
前に符号化済みの文字列の最終文字コードの上位ビット
によりルックアップテーブルを参照して得られた索引情
報に従って前記辞書群の中の特定の辞１ｆ１０−ｉを指
定する。On the other hand, when encoding an input character string, a specific word 1!10-1 in the dictionary group is selected according to index information obtained by referring to a lookup table using the final character code of the previously encoded character string. may also be specified. Specifically, a specific word 1f10-i in the dictionary group is specified according to index information obtained by referring to a lookup table using the upper bits of the last character code of the character string encoded immediately before.

また本発明は、入力文字列を辞書１０に登録された既に
符号化済みの部分列の内、最大長一致する部分列の参照
番号で指定し、て符号化された符号語から元の文字列を
復元するデータ復元方式を対象表し、辞書】０を、処理
対象となる全文字種の数より少ない所定数の辞書１０−
１〜１０−Ｎから成る辞書群で構成し、て各辞１１０−
１〜１０Ｎ毎に全文字種を１文字単位に参照番号を付け
て初期登録しておく。そして、入力符号語の復元時には
、以前に復元済みの文字列との従属関係を示す索引情報
に従って前記辞書群の中の特定の辞書】Ｏ−１を指定し
て復元し、復元毎に、以前に復元済み文字列の参照番号
に、今回復元した文字列の最初の１文字を加えた文字列
を新たな参照番号を付けて登録することを特徴とする。In addition, the present invention specifies the input character string by the reference number of the maximum length matching substring among the already encoded substrings registered in the dictionary 10, and generates the original character string from the encoded code word. The data restoration method for restoring the ``Dictionary] 0 is a predetermined number of dictionaries 10- that are smaller than the number of all character types to be processed.
It consists of a dictionary group consisting of 1 to 10-N, and each word 110-
All character types are initially registered in units of 1 to 10N with reference numbers attached to each character. When restoring an input code word, a specific dictionary]O-1 in the dictionary group is specified and restored according to index information indicating the dependency relationship with previously restored character strings, and each time the input code word is restored, The present invention is characterized in that a character string obtained by adding the first character of the currently restored character string to the reference number of the restored character string is registered with a new reference number.

ここで復元時に使用する特定辞書１０−１の指定は、符
号化の場合と同しである。Here, the specification of the specific dictionary 10-1 used at the time of restoration is the same as in the case of encoding.

［作用コこのような構成を備えた本発明のデータ圧縮及び復元方
式によれば、次の作用が得られる。[Operations] According to the data compression and decompression method of the present invention having such a configuration, the following effects can be obtained.

まず直前文字列の最終文字との従属関係を示す履歴は、
そのままだと２５６通りの状態があるが、文字の出現に
は偏りがあり、２５６通りのうち出現しない状態もある
。そこで、本発明は最終文字の履歴をマージし、て縮小
し、有為な少数通りの状態、例えば８〜１６通りに帰着
させ、辞書の数を減らず。First, the history showing the dependency relationship with the last character of the previous character string is
As it is, there are 256 states, but characters appear unevenly, and some states do not appear among the 256 states. Therefore, the present invention merges and reduces the history of the final character, resulting in a meaningfully small number of states, for example 8 to 16, without reducing the number of dictionaries.

履歴の状態数が少数であるため、全文字種２５６子の各
辞書への初期値とし５て登録数は、履歴数、即ち辞書数
×２５６個であり、大きな無駄は出ないようにできる。Since the number of history states is small, the number of registrations of 256 characters of all character types as an initial value of 5 in each dictionary is the number of histories, that is, the number of dictionaries x 256, and it is possible to avoid large waste.

履歴をまとめる方法として、例えば、符号化済直前文字
列の最終文字の上位４ビツトを取れば、履歴は１６個の
状態にまとめられる。履歴のまとめ方としては、辞書を
有効に使う上では均等に出現する状態を用いるのか望ま
しい。しか］２、必ずしも文字中の生データのビットを
用いる必要はなく、データの大まかな性質に合わせて、
符号化済直前文字列の最終文字を履歴の状態に対応付け
るルックアップ・テーブル（Ｌ［ＪＴ）を用意り、で、
直前文字の履歴状態、即ち辞書の索引を指定してもよい
。As a method of summarizing the history, for example, by taking the upper 4 bits of the last character of the immediately preceding encoded character string, the history can be summarized into 16 states. In order to effectively use a dictionary, it is desirable to use states that appear evenly when organizing the history. 2. It is not necessarily necessary to use the raw data bits in the characters, but depending on the general nature of the data,
Prepare a lookup table (L[JT) that associates the last character of the previous encoded character string with the history state, and
The history state of the immediately preceding character, that is, the dictionary index may be specified.

［実施例］第２図は本発明の一実施例を小した実施例構成図である
。[Embodiment] FIG. 2 is a block diagram of a smaller embodiment of the present invention.

第２図において、１２は制御手段としてのＣＰＵてあり
、ＣＰＵ１２に対してはプログラムメモリ１４とデータ
メモリ２６が接続される。In FIG. 2, 12 is a CPU as a control means, and a program memory 14 and a data memory 26 are connected to the CPU 12.

プログラムメモリ１４にはコントロールソフト１６、Ｌ
ＺＷ符号を用いた最大一致長検索を行なう最大−成長検
索ソフト１８、入力文字列を■、ＺＷ符号に変換する符
号化ソフト２０．符号化ソフト２０でＬＺＷ符号に変換
された符号を元の文字列に復元する復号化ソフト２２、
及び処理対象となる全文字種、例えば２５６個の文字種
を初期登録する辞書初期値作成ソフト２４を備える。The program memory 14 contains control software 16, L
Maximum-growth search software 18 that performs maximum match length search using ZW codes, encoding software 20 that converts input character strings into ZW codes. decoding software 22 that restores the code converted into the LZW code by the encoding software 20 to the original character string;
and dictionary initial value creation software 24 for initially registering all character types to be processed, for example, 256 character types.

一方、データメモリ２６には、これから符号化しようと
する文字列、或いはこれから復号化しようとする符号列
を格納するデータバッファ２８と、ＬＺＷ符号を対象と
した符号化及び復号化の際に逐次作成されなから使用さ
れる辞書１０を備える。On the other hand, the data memory 26 includes a data buffer 28 for storing a character string to be encoded or a code string to be decoded, and a data buffer 28 for storing a character string to be encoded or a code string to be decoded, and a data buffer 28 for storing a character string to be encoded or a code string to be decoded. It is provided with a dictionary 10 which is used from the beginning.

辞書１０は、例えば符号化済み文字列の最終文字コード
の上位４ビツトでなる従属関係を示す索引情報により分
類される場合を例にとると、２５６個の全文字種に対し
１６個の辞書１０−１〜１０−１６で構成される。符号
化文字法の最終文字コードの上位４ビツトによる辞書の
索引指定は、直接指定しても良いが、以下の説明にあっ
ては、ルックアップテーブル（ＬＵＴ）を参照して辞書
の索引を読出して指定する場合を例にとる。For example, the dictionary 10 has 16 dictionaries 10-1 for all 256 character types, for example, when classification is performed using index information indicating a dependency relationship consisting of the upper 4 bits of the final character code of an encoded character string. Consists of 1 to 10-16. The dictionary index can be specified directly using the upper 4 bits of the final character code of the encoded character method, but in the following explanation, the dictionary index is read by referring to a lookup table (LUT). Let's take as an example the case where you specify .

この第３図の実施例における本発明のデータ圧縮及び復
元の概略は次のようになる。The outline of data compression and restoration according to the present invention in the embodiment shown in FIG. 3 is as follows.

ＣＰＵ１２はコントロールソフト１６による制御のもと
に辞書初期値作成ソフト２４を起動し、辞書初期値作成
処理を行なう。具体的には、辞書初期値作成ソフト２４
は全て文字種２５６のを１文字ずつ参照番号を付けて辞
書を構成する１６個の辞書１０−１〜１０−１６のそれ
ぞれに登録する。The CPU 12 starts the dictionary initial value creation software 24 under the control of the control software 16 to perform dictionary initial value creation processing. Specifically, the dictionary initial value creation software 24
All of the character types 256 are registered in each of the 16 dictionaries 10-1 to 10-16 constituting the dictionary by assigning a reference number to each character.

データメモリ２６のデータバッファ２８は符号化すべき
データを外部から一定長の複数文字分を一時に格納し、
符号化ソフ）２０の要求に従って一文字ずつ受渡す。そ
して、データバッファ２８の文字が空になるつど、同様
に外部から複数文字分を取込む。The data buffer 28 of the data memory 26 stores data to be encoded from the outside for multiple characters of a certain length at once.
The encoder software 20 transfers each character one character at a time according to the request. Then, each time the data buffer 28 becomes empty, a plurality of characters are similarly fetched from the outside.

次に第３図のフローチャートを参照して本発明の符号化
アルゴリズムを説明する第３図において、まずＳｌにおいては次の処理を行う。Next, in FIG. 3, the encoding algorithm of the present invention will be explained with reference to the flowchart of FIG. 3. First, in Sl, the following processing is performed.

■直前文字列の最終文字で選択するＮ個の各辞書Ｄｉ（
但し、ｊ＝１．・・・、Ｎ）に−文字からなる文字列全
種を初期値として予め登録する。本発明にあっては、全
て文字種２５６に対し辞書の総数ＮはＮ２１６個と少な
くなっている。■Each of N dictionaries Di (
However, j=1. ..., N), all types of character strings consisting of - characters are registered in advance as initial values. In the present invention, the total number of dictionaries N is as small as N216 for all 256 character types.

■各辞書Ｄｉの参照番号の総数を０１で管理し、初期化
のとき、辞書数Ｎ個のｎｌにｎ１＝（文字種＋１）をセットする。(2) The total number of reference numbers of each dictionary Di is managed as 01, and at the time of initialization, n1=(character type+1) is set in nl of the number N of dictionaries.

■直前の文字列からの履歴、即ち直前文字列の最終文字
コードの上位４ビツトをＰＫとし、初期値としてＰＫに
ＰＫ＝０をセットする。(2) The history from the immediately preceding character string, that is, the upper 4 bits of the last character code of the immediately preceding character string is set as PK, and PK=0 is set in PK as the initial value.

■最初の文字を入力にし、これを参照番号（語頭文字列
）ωに直す。■Input the first character and change it to the reference number (initial character string) ω.

■直前文字列の最終文字に１から履歴状態に対応っける
ＬＵＴをセットする。但し、最初は直前文字列はないの
で、直前文字列の最終文字を示すに１はに１＝０にセッ
トすると共に、Ｋ１＝０でＬＵＴから得られる索引ＰＫ
はＰＫ＝ＱとなるようにＬＵＴをセットしておく。■Set the LUT that corresponds to the history status from 1 to the last character of the immediately preceding character string. However, at first there is no immediately preceding character string, so to indicate the last character of the immediately preceding character string, set 1 = 0, and use the index PK obtained from the LUT with K1 = 0.
Set the LUT so that PK=Q.

このようなＳｌの処理が終了すると＄４〜ｓ７の手順に
従って符号化する。この８４〜Ｓ７の手順は、基本的に
は第５図に示した従来と同じである。When such processing of Sl is completed, encoding is performed according to the steps from $4 to s7. The procedure from 84 to S7 is basically the same as the conventional one shown in FIG.

相違点は、従来のＬＺＷ符号化では辞書は１個だけだっ
たのに対して、本発明にあっては、最初はＳｌ、それ以
降はＳ６に示す符号化済みの文字列の最終文字に１によ
りＬＵＴを参照して得られた履歴状態ＬＯＴ　（ＫＬ）
＝ＰＫによって複数個の辞書から特定の辞書ＤＰ（を選
択して、選択した辞書Ｄ□に登録されている文字列と照
合して最大−成長文字列を探し、最大一致長を一文字伸
ばした文字列ωＫを選択した辞書Ｉ）ｐｇに登録する点
が異なる。The difference is that in the conventional LZW encoding, there was only one dictionary, whereas in the present invention, 1 is used for the last character of the encoded character string shown in Sl at the beginning and S6 thereafter. The history state LOT (KL) obtained by referring to the LUT by
= Select a specific dictionary DP (from multiple dictionaries by PK, search for the maximum growth character string by matching it with the character strings registered in the selected dictionary D□, and increase the maximum match length by one character. The difference is that the column ωK is registered in the selected dictionary I) pg.

Ｓ８で辞書Ｄ□に登録した後は、辞書Ｄ　ｐｘの参照番
号を管理するカウンタｎＰＫがｎ、に＝ｌｌ、。＋１と
１つインクリメントされる。また、前述したように次の
文字列の辞書を選ぶために最終文字に１よりＬＵＴを用
いて新たな履歴状態ＰＫが求められる。After registering in the dictionary D□ in S8, the counter nPK for managing the reference number of the dictionary Dpx becomes n=ll. It is incremented by one (+1). Further, as described above, in order to select the dictionary for the next character string, a new history state PK is obtained by using LUT from 1 on the last character.

次に本発明の復号化アルゴリズムを第４図を参照して説
明する。Next, the decoding algorithm of the present invention will be explained with reference to FIG.

復号化は、符号化の逆の動作となる。まずｓ１００に示
す辞書の初期化は符号化と同様である。Decoding is the inverse operation of encoding. First, initialization of the dictionary shown in s100 is similar to encoding.

８１〜Ｓ８の手順は、第７図の従来方式と基本的に同し
である。The procedures from 81 to S8 are basically the same as the conventional method shown in FIG.

本発明の復号化が異なる点は、入力した符号Ｃ０ＤＥか
らＳ４で参照番号ωを復号した後、直前の文字列の最終
文字から求めた履歴状態ＰＫを使用して辞書ＤｐＫを選
び、選択した辞書ＤＰＫの中から参照番号ωに対応する
文字列を求める。The difference in the decoding of the present invention is that after decoding the reference number ω from the input code C0DE in S4, the dictionary DpK is selected using the history state PK obtained from the last character of the previous character string, and the selected dictionary A character string corresponding to the reference number ω is found in the DPK.

辞書への新たな文字列の登録は、ＬＺＷ符号化の場合と
同様であるが、符号化のときより１テンポ遅れて行なわ
れる。即、符号化の際には注目文字列の符号化を終了し
た時点で一文字伸ばした文字列ωＫ（注目文字列十次の
１文字）を辞書に登録しているが、復号化ては、注目文
字列ωを一文字伸ばすときは次の文字列の先頭文字と合
わせて辞書に登録するため、次の文字列の復元か終了し
。Registration of a new character string in the dictionary is similar to the case of LZW encoding, but is performed one tempo later than when encoding. In other words, during encoding, the character string ωK (the 10th character of the character string of interest), which is extended by one character, is registered in the dictionary when the encoding of the character string of interest is completed, but when decoding, the character string of interest is When extending the string ω by one character, it is registered in the dictionary along with the first character of the next string, so the next string must be restored or terminated.

た時点で登録を行なう。Please register at that time.

具体的にはＳ７に示すように、直前文字列の参照番号Ｏ
Ｌ　Ｄωと復元文字列の第１文字に１の絹を、直前の前
の文字列の最終文字からの履歴状態ＰＫＩで選ばれた辞
書り４．、−に登録することになる。そこで、復元した
文字列を伸ばして次に登録するときのために現在の履歴
状態ＰＫをＰＫＩに移しておき、復元文字列の最終文字
に２より、新たな履歴状態を求めるようにしている。Specifically, as shown in S7, the reference number O of the immediately preceding character string
4. Add 1 silk to the first character of the restored character string with L Dω, and select the dictionary selected by the history state PKI from the last character of the immediately previous character string. , -. Therefore, the current history state PK is transferred to PKI in order to extend the restored character string and register it next time, and a new history state is determined by adding 2 to the last character of the restored character string.

尚、上記の実施例は、全文字種２５６個に対し辞書を履
歴状態に従って１６個で構成する場合を例にとるもので
あったが、必要に応じて全て文字種の総数以下であれば
適宜の辞書数としてよい６、また文字種の数も必要に応
じて適宜に定められるものである。In the above embodiment, the dictionary is composed of 16 characters according to the historical state for a total of 256 character types, but if necessary, an appropriate dictionary can be used as long as the total number of character types is less than the total number of character types. The number may be 6, and the number of character types may be determined as necessary.

［発明の効果］以上説明したように本発明によれば、簡単で無駄なく文
字列の履歴状態に従った複数辞書の初期値登録ができ、
符号化中の文字列に対し、て以前に出現した文字列の履
歴も採り入れることができるため、文字列間の冗長性か
削減され、従来の■７ＺＷ符号より高い圧縮率が得られ
るとともに、符号が参照番号のみで表わされるので簡Ｗ
なアルゴリズムで実行できる。[Effects of the Invention] As explained above, according to the present invention, it is possible to easily and efficiently register the initial values of multiple dictionaries according to the historical status of character strings.
Since it is possible to incorporate the history of character strings that previously appeared in the character string being encoded, redundancy between character strings is reduced, and a compression ratio higher than that of the conventional 7ZW code can be obtained. is expressed only by a reference number, so it is easy
It can be executed using a suitable algorithm.

[Brief explanation of drawings]

第１図は本発明の原理説明図。第２図は本発明の実施例構成図；第３図は本発明の符号化アルゴリズムのフローチ忙ト；第４図は本発明の復号化アルゴリズムのフローチャート
。第５図は従来のｉ＝　ｚ　ｗ符号化アルゴリズムのフロ
ーチャート。第６図は従来の＋、、　Ｚ　Ｗ復号化アルゴリズム、の
フローチャート。第７図は従来のＬＺＷ符号化の具体例説明図第８図は辞
書構成例の説明図第９図は従来のＬＺＷ復号化の具体例説明図第１０図は
本願発明者等が既に提案している部分列分解と文字列間
の履歴の取込を行った符号化説明図である。［？１−号の説明１１０．１０−１−１ｆ’）−Ｎ、辞書１２：ｃＰＵ１４ニブログラムメモリ１６　コントロールソフト１８　最大−成長検索ソフト２０　符号化ソフト２２　復号化ソフト２４　辞書初期値作成ソフト２６　データメモリ２８：データバッファ特許出願人　富１通株代金′！］FIG. 1 is a diagram explaining the principle of the present invention. FIG. 2 is a block diagram of an embodiment of the present invention; FIG. 3 is a flowchart of the encoding algorithm of the present invention; FIG. 4 is a flowchart of the decoding algorithm of the present invention. FIG. 5 is a flowchart of a conventional i=zw encoding algorithm. FIG. 6 is a flowchart of the conventional +, ZW decoding algorithm. FIG. 7 is an explanatory diagram of a specific example of conventional LZW encoding. FIG. 8 is an explanatory diagram of a dictionary configuration example. FIG. 9 is an explanatory diagram of a specific example of conventional LZW decoding. FIG. 2 is an explanatory diagram of encoding in which substring decomposition and history between character strings are captured. [? Explanation of No. 1 1 10.10-1-1f')-N, Dictionary 12: cPU 14 Niprogram memory 16 Control software 18 Maximum growth search software 20 Encoding software 22 Decoding software 24 Dictionary initial value creation software 26 Data Memory 28: Data Buffer Patent Applicant Tomi Tsutsu Stock Price'! ]

Claims

[Claims]

(1) In a data compression method in which an input character string is specified and encoded using a reference number of a substring that matches the maximum length among already encoded substrings registered in a dictionary (10), the dictionary (10) is composed of a dictionary group consisting of a predetermined number of dictionaries (10-1 to 10-N) smaller than the number of all character types to be processed, and each dictionary (10-1 to 10-N) has at least All character types are initially registered with a reference number attached to each character, and when encoding an input character string, identification in the dictionary group is performed according to index information indicating the dependency relationship with previously encoded character strings. If there is no input string in the specified dictionary (10-i), add the next character to the reference number of the previously encoded string. A data compression method characterized by registering character strings with new reference numbers.

(2) In the data compression method according to claim 1, when encoding an input character string, data is stored in the dictionary group according to index information obtained from a part of the final character code of the previously encoded character string. A data compression method characterized by specifying a specific dictionary (10-i).

(3) In the data compression method according to claim 2, when encoding an input character string, data is stored in the dictionary group according to index information indicated by the upper bits of the final character code of the previously encoded character string. A data compression method characterized by specifying a specific dictionary (10-i).

(4) In the data compression method according to claim 1, when encoding an input character string, the input character string is encoded according to the index information obtained by referring to the lookup table based on the final character code of the character string encoded immediately before. A specific dictionary in a group of dictionaries (1
A data compression method characterized by specifying 0-i).

(5) In the data compression method according to claim 4, when encoding an input character string, data is stored in the dictionary group according to index information created from the upper bits of the final character code of the previously encoded character string. A data compression method characterized by specifying a specific dictionary (10-i).

(6) Specify the input string with the reference number of the substring that matches the maximum length among the already encoded substrings registered in the dictionary (10), and extract the original string from the encoded code word. In the data restoration method, the dictionary (10) is replaced by a predetermined number of dictionaries (10) that are smaller than the number of all character types to be processed.
-1 to 10-N), each dictionary (1 to 10-N)
At least all character types for each character string (0-1 to 10-N) are initially registered with a reference number attached to each character, and when restoring an input code word, an index is used to indicate the dependency relationship with previously restored character strings. Specify and restore a specific dictionary (10-i) in the dictionary group according to the information, and add the first character of the currently restored character string to the reference number of the previously restored character string for each restoration. A data compression method characterized by registering a character string with a new reference number.

(7) In the data restoration method according to claim 6, when restoring an input code word, a specific one in the dictionary group is selected according to index information obtained from a part of the final character code of the character string that has been restored immediately before. A data restoration method characterized by specifying a dictionary (10-i).

(8) In the data restoration method according to claim 7, when restoring an input code word, a specific one in the dictionary group is selected according to index information indicated by the upper bits of the final character code of the character string that has been restored immediately before. A data restoration method characterized by specifying a dictionary (10-i).

(9) In the data restoration method according to claim 6, when restoring the input code word, the dictionary group is based on the index information obtained by referring to the lookup table using the final character code of the character string that has been restored immediately before. A specific dictionary (10-
A data restoration method characterized by specifying i).

(10) In the data restoration method according to claim 9, when restoring the input codeword, the index information obtained by referring to the lookup table is used based on the upper bits of the final character code of the character string that has been restored immediately before. A data restoration method characterized by specifying a specific dictionary (10-i) among the dictionary group.