JPH04100322A

JPH04100322A - Data compression and decoding system

Info

Publication number: JPH04100322A
Application number: JP2066303A
Authority: JP
Inventors: Shigeru Yoshida; 茂吉田; Yasuhiko Nakano; 泰彦中野; Yoshiyuki Okada; 佳之岡田; Hirotaka Chiba; 広隆千葉
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1990-03-16
Filing date: 1990-03-16
Publication date: 1992-04-02

Abstract

PURPOSE:To reduce the redundancy between character strings and to obtain a high compression rate by representing an index representing a registration position as the entire dictionary with an index in numbers arranged in the order of appearance of each division dictionary. CONSTITUTION:A registration dictionary is split into some division dictionaries DD1-DD5, and an index representing a registration position of the entire dictionary is indicated by an index in a split dictionary group when the division dictionaries DD1-DD5 are arranged in the order of high appearance rate. For example, when a character string of an index T in the division dictionary DD4 is registered, a coded index is indicated and coded by representing the index as a sum 25 between an index 7 in the registered division dictionary DD4 and a total sum (n1+n2+n3=18) of the character strings in preceding division dictionaries DD1-DD3 to the registered division dictionary DD4. Thus, the relation of subordinate between coded character strings is fetched at index coding, the redundancy between character strings is reduced and a high compression rate is realized.

Description

【発明の詳細な説明】［概要］ユニバーサル符号化の一種である増分分解型符号化の改
良としてのＬＺＷ符号化によるデータ圧縮及び復元方式
に関し、文字列間の冗長性を削減して圧縮率を高めることを目的
とし、登録辞書を幾つかに分割して持ち、新たな部分列を辞書
登録する際には、既に登録済みの部分列との従属関係に
より分類される分割辞書を指定して登録すると共に、辞
書全体としての登録位置を分割辞書群内での登録位置を
示す参照番号で表す、この参照番号を符号化する。また
復元時には、既に登録済みの部分列との従属関係により
分類される分割辞書を指定して登録すると共に、分割辞
書群内での登録位置を示す参照番号により指定される符
号を復元する。[Detailed Description of the Invention] [Summary] Regarding a data compression and decompression method using LZW encoding as an improvement of incremental decomposition encoding, which is a type of universal encoding, the present invention aims to improve the compression rate by reducing redundancy between character strings. For the purpose of increasing the number of subsequences, the registered dictionary is divided into several parts, and when registering a new subsequence in the dictionary, specify and register the subsequence divided according to the dependency relationship with the already registered subsequence. At the same time, the registration position of the dictionary as a whole is represented by a reference number indicating the registration position within the divided dictionary group, and this reference number is encoded. Furthermore, at the time of restoration, divided dictionaries that are classified according to their subordination with already registered subsequences are designated and registered, and the code specified by the reference number indicating the registration position within the divided dictionary group is restored.

［産業上の利用分野コ本発明は、ユニバーサル符号の一種である増分分解型の
改良として知られたＬＺＷ符号による画像データ圧縮方
式に関する。[Industrial Application Field] The present invention relates to an image data compression method using an LZW code, which is known as an improved incremental decomposition type of universal code.

近年、文字コード、ベクトル情報、画像など様々な種類
のデータがコンピュータで扱われるようになっており、
扱われるデータ量も急速に増加してきている。大量のデ
ータを扱うときは、データの中の冗長な部分を省いてデ
ータ量を圧縮することで記憶容量を減らしたり、速く伝
送したりすることが望まれる。In recent years, computers have come to handle various types of data such as character codes, vector information, and images.
The amount of data handled is also rapidly increasing. When handling large amounts of data, it is desirable to reduce storage capacity and speed up transmission by eliminating redundant parts of the data and compressing the amount of data.

このように様々なデータを１つの方式でデータ圧縮でき
る方法としてユニバーサル符号化が提案されている。Universal encoding has been proposed as a method that can compress various types of data using one method.

ここで、本発明の分野は、文字コードの圧縮に限らず、
様々なデータに適用できるが、以下では、情報理論で用
いられている呼称を踏襲し、データの１ワ一ド単位を文
字と呼び、データが複数ワープつながったものを文字列
と呼ぶことにする。Here, the field of the present invention is not limited to character code compression.
Although it can be applied to a variety of data, in the following we will follow the nomenclature used in information theory, and refer to a single word unit of data as a character, and a string of data that is connected by multiple warps. .

ユニバーサル符号化の代表的な方法として、ジブーレン
ベル（２ｉマ−Ｌｅｍｐｅｌ）符号がある（詳しくは、
例えば、宗像「２ｉｖ−Ｌｅｍｐｅｌのデータ圧縮法」
。A representative method of universal encoding is the Jibo Lempel (2i-Lempel) code (for details, see
For example, Munakata "2iv-Lempel's data compression method"
.

情報処理、マ０１．２６．　Ｎｏ、　１．１９８５年を
参照のこと）。Information processing, Ma01.26. No. 1.1985).

２ｉｖ−Ｌｅｍｐｅｌ符号では、 ■ユニバーサル型と、 ■増分分解型（Ｉｎｃｒｅｍｅｎｔａｌ　ｐａｒｓｉｎ
ｇ）の２つのアルゴリズムが提案されている。2iv-Lempel code has two types: ■ Universal type and ■ Incremental parsin type.
Two algorithms have been proposed: g).

更に、ユニバーサル型アルゴリズムの改良として、ＬＺ
ＳＳ符号がある（Ｔ、　Ｃ，Ｂｅ１ｌ、　’　ＢｅＮｅ
ｒ　ＯＰＭ／Ｌ　Ｔｅｘｔ　Ｃｏｍｐｒｅｓｓｉｏｎ’
、　ＩＥＥＥ　Ｔｒａｎｓ、　ｏｎ　ＣｏｍｍｕｎＶｏ
ｌ、　Ｃ０Ｍ−３４，ＮＯ，１２，Ｄｅｃ、　１９８６
参照）。Furthermore, as an improvement of the universal algorithm, LZ
There are SS codes (T, C, Be1l, ' BeNe
r OPM/L Text Compression'
, IEEE Trans, on CommonVo
l, C0M-34, NO, 12, Dec, 1986
reference).

また、増分分解型アルゴリズムの改良としては、Ｌ　Ｚ
　Ｗ　（Ｌｅｍｐｅｌ−２ｉｙ−Ｗｅｌｃｈ）符号があ
る（Ｔ、ＡＷｅｌｃｈ、’Ａ　Ｔｅｃｈｎｉｑｕｅ　ｆ
ｏｒ　Ｈｉｇｈ−Ｐｅｒｆｏｒｍａｎｃｅ　ＤａｔａＣ
ｏｍｐｔｅｓｓｉｏｎ’、　Ｃｏｍｐｕｔｅｒ、　Ｊｕ
ｎｅ　１９８４年参照）。Moreover, as an improvement of the incremental decomposition type algorithm, L Z
There is a W (Lempel-2iy-Welch) code (T, A Welch, 'A Technique f
or High-Performance DataC
emptession', Computer, Ju
ne 1984).

これらの符号化方式の内、高速処理ができることと、ア
ルゴリズムの簡単さからＬＺＷ符号が記憶装置のファイ
ル圧縮などで使われるようになっている。Among these encoding methods, the LZW code has come to be used for file compression in storage devices because of its ability to perform high-speed processing and its simple algorithm.

［従来の技術］従来のＬＺＷ符号による符号化処理フローを第７図に示
し、復号化処理フローを第８図に示す。[Prior Art] FIG. 7 shows an encoding processing flow using a conventional LZW code, and FIG. 8 shows a decoding processing flow.

まずＬＺＷ符号化処理は、書き替え可能な辞書を持ち、
入力文字列の中を相異なる文字列（部分列）に分け、こ
の文字列を出現した順に参照番号を付けて辞書に登録す
ると共に、現在入力している文字列を辞書に登録しであ
る最長−散文字列の参照番号で表して符号化するもので
ある。First, LZW encoding processing has a rewritable dictionary,
Divide the input character string into different character strings (substrings) and register these character strings in the dictionary with reference numbers in the order in which they appear, and also register the currently input character string in the dictionary and find the longest string. - It is represented and encoded by a reference number in a scattered character string.

第９図にＬＺＷ符号化の説明図を示すと共に第１１図に
ＬＺＷ復号化の説明図を示し、更に第１０図に復号化と
復号化時に作成される辞書構成例を示す。FIG. 9 shows an explanatory diagram of LZW encoding, FIG. 11 shows an explanatory diagram of LZW decoding, and FIG. 10 shows an example of decoding and a dictionary structure created during decoding.

尚、第９．１０．１１図では説明を簡単にするため、ａ
ｂｃの３文字の組合せからなるデータを圧縮、復元する
場合の例を取り上げている。In addition, in Figure 9.10.11, to simplify the explanation, a
An example of compressing and restoring data consisting of a combination of three characters bc is taken up.

第９図のＬＺＷ符号化処理では、まずステップ８１（以
下「ステップ」は省略）で予め辞書に全文字につき一文
字からなる文字列を初期値として登録してから符号化を
始める。またＳｌでの符号化は、入力した最初の文字Ｋ
により辞書を検索して参照番号ωを求め、これを語頭文
字列とする。In the LZW encoding process shown in FIG. 9, first, in step 81 (hereinafter "step" will be omitted), a character string consisting of one character for each character is registered in the dictionary as an initial value, and then encoding is started. Also, the encoding in Sl is the first input character K
Search the dictionary to find the reference number ω, and use this as the initial character string.

次に８２で入力データの次の文字Ｋを読込み、Ｓ３で全
ての文字入力が終了したか否かチエツクした後、Ｓ４に
進んでＳｌで求めた語頭文字列ωに８２で読込んだも文
字Ｋを加えた文字列（ωＫ）が辞書にあるか否か探す。Next, the next character K of the input data is read in step 82, and after checking whether all character input has been completed in step S3, the process proceeds to step S4, where the character K read in step 82 is added to the initial character string ω determined in step S1. A search is made to see if the character string (ωK) to which the letter K is added is found in the dictionary.

Ｓ４で文字列（ωＫ）が辞書になければ、Ｓ６に進んで
Ｓｌで求めた文字にの参照番号ωを符号語ｃｏｄｅ　（
ω）として出力し、また文字列（ωＫ）に新たな参照番
号を付加して辞書に登録し、更にＳ２の入力文字Ｋを参
照番号ωに置き換えると共に辞書アドレスｎをインクリ
メントしてＳ２に戻って次の文字Ｋを読み込む。If the character string (ωK) is not in the dictionary in S4, proceed to S6 and use the code word code (
ω), add a new reference number to the character string (ωK), register it in the dictionary, replace the input character K in S2 with the reference number ω, increment the dictionary address n, and return to S2. Read the next character K.

一方、Ｓ４で文字列（ωＫ）が辞書にあればＳ５で文字
列（ωＫ）を参照番号ωに置き換え、再びＳ２に戻って
Ｓ４で文字列（ωＫ）が辞書から探せなくなるまで最大
−成長の検索を続ける。On the other hand, if the character string (ωK) is in the dictionary in S4, the character string (ωK) is replaced with the reference number ω in S5, the process returns to S2, and the maximum growth is continued until the character string (ωK) cannot be found in the dictionary in S4. Continue searching.

第９，１０図を参照してＬＺＷ符号化を具体的に説明す
ると次のようになる。LZW encoding will be specifically explained as follows with reference to FIGS. 9 and 10.

まず第９図の入力データ１ｎｐｌは左から右へと読む。First, the input data 1npl in FIG. 9 is read from left to right.

最初の文字ａを入力した時、辞書にはａの他に一致する
文字列がないので、０ＵＴＰＵＴ　Ｃ０ＤＥ　１（参照
番号ω）を符号語して出力する。そして、拡張した文字
列ａｂに参照番号４を付けて辞書に登録する。実際の辞
書登録は第１０図の右側（ＡＬＴ−ＥＲＮＡＴＥ　ＴＡ
ＢＬＥ）に示すように文字列１ｂとして登録される。When the first character a is input, there is no matching character string in the dictionary other than a, so 0UTPUT C0DE 1 (reference number ω) is output as a code word. Then, reference number 4 is added to the expanded character string ab and it is registered in the dictionary. The actual dictionary registration is on the right side of Figure 10 (ALT-ERNATE TA
BLE) is registered as a character string 1b.

続いて２番目のｂが文字列の先頭になる。辞書にはｂの
他に一致する文字がないので参照番号２を符号語として
出力し、同時に、拡張した文字列ｂａも辞書にないので
、文字列ｂａを２ａで表わし、参照番号５を付けて辞書
に登録する。３番目のａが次の文字列の先頭になる。以
下同様に、この処理を続ける。Then the second b becomes the beginning of the string. Since there is no matching character other than b in the dictionary, the reference number 2 is output as a code word, and at the same time, the expanded string ba is also not in the dictionary, so the string ba is represented by 2a and the reference number 5 is added. Register in the dictionary. The third a becomes the beginning of the next string. This process continues in the same manner.

第８図の復号化処理は第７図の符号化の逆の操作を行う
。The decoding process shown in FIG. 8 performs the reverse operation of the encoding process shown in FIG. 7.

第８図のＬＺＷ復号化では、符号化時と同様に予め辞書
に全文字につき一文字からなる文字列を初期値として登
録してから復号化を始める。In the LZW decoding shown in FIG. 8, decoding is started after a character string consisting of one character for every character is registered in the dictionary as an initial value in the same way as during encoding.

まずＳｌで最初の符号（参照番号）を読込み、現在のＣ
０ＤＥを０ＬＤｃｏｄｅとし、最初の符号は既に辞書に
登録された一文字の参照番号いずれかに該当することか
ら、入力符号Ｃ０ＤＥに一致する文字ｃｏｄｅ（Ｋ）を
探し出し、文字Ｋを出力する。First, read the first code (reference number) with Sl, and
Since 0DE is set as 0LDcode and the first code corresponds to one of the reference numbers of one character already registered in the dictionary, the character code (K) matching the input code C0DE is searched and the character K is output.

尚、出力した文字には後の例外処理のためＦＩＮｃｈｉ
ｒにセットしておく。In addition, the output characters are written to FINchi for later exception handling.
Set it to r.

次に８２に進んで次の符号を読込んでＣ０ＤＥにＩＮｃ
ｏｄｅとしてセットする。Ｓ３で新たな符号があるか否
か、即ち符号入力の終了の有無をチエツクしてＳ４に進
み、Ｓ３で入力された符号Ｃ０ＤＥが辞書に定義（登録
）されているか否かチエツクする。Next, go to 82, read the next code, and set it to C0DE.INc
Set as ode. In S3, it is checked whether there is a new code, that is, whether the code input has ended, and the process proceeds to S4, where it is checked whether the code C0DE inputted in S3 is defined (registered) in the dictionary.

通常、入力した符号語は前回までの処理で辞書に登録さ
れているため、Ｓ５に進んで符号Ｃ０ＤＨに対応する文
字列ｃｏｄｅ　（ωＫ）を辞書から読出し、Ｓ６で文字
Ｋを一時的にスタックし、参照番号Ｃ０ＤＥ（ω）を新
な符号Ｃ０ＤＥとして再度Ｓ５に戻り、このＳ５．Ｓ６
の手順を再帰的に参照番号ωが一文字Ｋに至るまで繰り
返し、最後に８７に進んでＳ６でスタックした文字をＬ
　Ｉ　ＦＯ（Ｌａｓｔ　ｉｎ　ＦａｓｔＯｕｌ）形式で
ポツプアップして出力する。同時に８７において、前回
使った符号ωと今回復元した文字列の最初の１文字Ｋを
組（ω、Ｋ）と表した文字列に、新たな参照番号を付加
して辞書に登録する。Normally, the input code word has been registered in the dictionary in the previous processing, so the process advances to S5 and the character string code (ωK) corresponding to the code C0DH is read from the dictionary, and the character K is temporarily stacked in S6. , the reference number C0DE(ω) is changed to a new code C0DE, and the process returns to S5 again. S6
Repeat the steps recursively until the reference number ω reaches one character K, and finally proceed to 87 and change the stacked character to L in S6.
Pop up and output in IFO (Last in FastOul) format. At the same time, at 87, a new reference number is added to a character string in which the previously used code ω and the first character K of the character string restored this time are expressed as a set (ω, K), and the result is registered in the dictionary.

第１１図を参照してＬＺＷ復号化処理を具体的に説明す
ると次のようになる。The LZW decoding process will be specifically explained as follows with reference to FIG.

まず第１１図で最初の入力符号（１！ＩＰＵＴ　Ｃ０Ｄ
Ｅ）は１であり、−文字ａ、　　ｂ、　　ｃについては
既に参照番号１．　２．　３として第１０図に示すよう
に辞書に登録されているため、辞書の参照により符号１
に一致する参照番号の文字列ａに置き換えて出力する。First, in Figure 11, the first input code (1! IPUT C0D
E) is 1, - for the letters a, b, c already the reference number 1. 2. Since it is registered in the dictionary as 3 as shown in Figure 10, the code 1 is registered by referring to the dictionary.
The character string a with the reference number that matches is replaced and output.

次の符号２についても同様にして文字すに置き換えて出
力する。このとき前回処理した符号１と今回復号した最
初の１文字すとを組合わせた文字列（１ｂ）に新たな参
照番号４を付加して辞書に登録する。Similarly, the next code 2 is replaced with a letter S and output. At this time, a new reference number 4 is added to the character string (1b), which is a combination of the previously processed code 1 and the first character decoded, and is registered in the dictionary.

３番目の符号４は辞書の検索により求めた文字列１ｂか
ら文字列ａｂと置き換えて文字列ａｂを出力する。同時
に前回処理した符号２と今回復号した文字列の１番目の
文字ａとの組合せた文字列２ａ　（＝ｂａ）に新たな参
照番号５を付加して辞書に登録する。The third numeral 4 replaces the character string 1b found by searching the dictionary with the character string ab and outputs the character string ab. At the same time, a new reference number 5 is added to a character string 2a (=ba), which is a combination of the previously processed code 2 and the first character a of the currently decoded character string, and is registered in the dictionary.

以下同様に、この処理を繰り返す。This process is repeated in the same manner.

第１１図のＬＺＷ復号化では次の例外処理がある。The LZW decoding shown in FIG. 11 involves the following exception handling.

この例外処理は、第６番目の入力符号８の復号で生ずる
。符号８は復号時に辞書に定義されておらず、復号でき
ない。この場合には、前回処理した符号５に前回復号し
た文字列ｂａの最初の一文字すを加えた文字列５ｂを求
め、更に２　ａ　ｂ＝ｂａｂと置き換えて出力する例外
処理を行う。そして、文字列の出力後に前回の符号語５
に今回復号した文字列の１一番目の文字すを加えた文字
列５ｂに参照番号８を付加して辞書に登録する。This exception handling occurs in the decoding of the sixth input code 8. Code 8 is not defined in the dictionary at the time of decoding and cannot be decoded. In this case, exceptional processing is performed in which a character string 5b is obtained by adding the first character ``s'' of the previously decoded character string ba to the previously processed code 5, and is further replaced with 2 a b = bab and output. Then, after outputting the string, the previous code word 5
The reference number 8 is added to the character string 5b obtained by adding the first character of the character string just decoded to the character string 5b, and the result is registered in the dictionary.

この例外処理は、第８図の復号化処理フローの８４、Ｓ
８の処理を通じて行われ、最終的に８７で文字列の出力
と新たな文字列に参照番号を付加した辞書への登録が８
７で行われる。This exception handling is carried out at 84 and S in the decoding process flow in FIG.
Finally, in step 87, the character string is output and the new character string is registered in the dictionary with a reference number added.
It will be held at 7.

尚、第８，１１図のＬＺＷ復号化は、復号側で符号を解
読しながら辞書をリアルタイムで作り出す場合を説明し
たが、符号化の際に作られた辞書をそのまま復号化側に
コピーとして使用することで符号化しても良い。この場
合に復号化側での例外処理は不要になる。In addition, in the LZW decoding shown in Figures 8 and 11, we have explained the case where the dictionary is created in real time while decoding the code on the decoding side, but the dictionary created during encoding is used as a copy on the decoding side as it is. It may be encoded by doing this. In this case, exception handling on the decoding side becomes unnecessary.

［発明が解決しようとする課題］このように従来のＬＺＷ符号によるデータ圧縮にあって
は、符号化に使用する既に符号化済みの部分列を登録し
た辞書中のインデックス（参照番号）は、第１２図に示
すように、辞書の文字列を登録した順番を示す番号で表
わしている。[Problems to be Solved by the Invention] As described above, in data compression using the conventional LZW code, the index (reference number) in the dictionary in which already encoded subsequences used for encoding are registered is As shown in FIG. 12, the character strings in the dictionary are represented by numbers indicating the order in which they were registered.

従って、符号語として使用される辞書のインデックスは
、文字列が何番目に登録されたかを示すに止まり、イン
デックス間の相関関係は小さい上、各インデックスは符
号語として殆どランダムに使用されるため、インデック
スを固定長のビット数で表わさなければならず且つ辞書
サイズに応じたビット長となり、インデックスの効率の
良い符号化ができない問題があった。Therefore, the dictionary index used as a codeword only indicates the number in which a character string is registered, and the correlation between the indexes is small, and each index is used almost randomly as a codeword. The index has to be expressed using a fixed number of bits, and the bit length depends on the dictionary size, which poses a problem in that the index cannot be encoded efficiently.

本発明は、このような従来の問題点に鑑みてなされたも
ので、文字列間の冗長性を削減して圧縮率を高めること
のできるデータ圧縮及び復元方式を提供することを目的
とする。The present invention has been made in view of such conventional problems, and an object of the present invention is to provide a data compression and restoration method that can reduce redundancy between character strings and increase compression ratio.

［課題を解決するための手段］第１図は本発明の原理説明図である。[Means to solve the problem] FIG. 1 is a diagram explaining the principle of the present invention.

本発明は第１図（ａ）に示すように、符号化済データを
相異なる部分列に分けて各部分列の毎に異なる参照番号
を付加して辞書ＴＤに登録しておき、入力データを辞書
ＴＤの部分列の内、最大炎一致する部分列の参照番号で
指定して符号化するデータ圧縮方式及び該データ圧縮方
式で得られた符号語から文字列を復元するデータ復元方
式を対象とする。As shown in FIG. 1(a), the present invention divides encoded data into different subsequences, adds a different reference number to each subsequence, and registers it in a dictionary TD, and input data is divided into different subsequences. The target is a data compression method that encodes by specifying the reference number of the most matching subsequence among the subsequences of the dictionary TD, and a data restoration method that restores a character string from the code word obtained by the data compression method. do.

まず本発明のデータ圧縮方式は、第１図（ｂ）に示すよ
うに、新たな符号化済み部分列を登録する際に、既に登
録済みの部分列との従属関係によって分類される分割辞
書ＤＤｉを指定して登録すると共に、辞書全体としての
登録位置を該登録符号化部分列の属する分割辞書より出
現確率の大きい分割辞書群ＤＤ１〜Ｄ　Ｄ　、、群内の
部分列の総数ｎに、辞書全体としての登録位置を該符号
化部分列の属する分割辞書ＤＤｉ内の参照番号を加えた
番号で指定し、この番号を符号化するように構成する。First, in the data compression method of the present invention, as shown in FIG. At the same time, the registration position of the entire dictionary is set to the divided dictionary groups DD1 to DD, which have a higher probability of appearance than the divided dictionary to which the registered encoded partial sequence belongs, and the total number n of subsequences in the group. The entire registration position is designated by a number added to the reference number in the divided dictionary DDi to which the encoded subsequence belongs, and this number is configured to be encoded.

一方、本発明のデータ復号方式は、新たな復号化済み部
分列を辞書登録する際に、既に復号化済みの部分列との
従属関係によって分類される分割辞書ＤＤｉを指定して
登録すると共に、該復号化部分列の属する分割辞書ＤＤ
ｉより出現確率の大きい分割辞書群ＤＤ、〜ＤＤ１４．
内の部分列の総数に、該復号化部分列の属する分割辞書
ＤＤｉ内の参照番号を加えた番号で指定された符号を復
号化するように構成する。On the other hand, in the data decoding method of the present invention, when registering a new decoded subsequence into a dictionary, a divided dictionary DDi classified according to the dependency relationship with already decoded subsequences is specified and registered, and Divided dictionary DD to which the decoded subsequence belongs
Divided dictionary groups DD, ~DD14. with higher appearance probability than i.
The code is configured to decode the code specified by the total number of subsequences in the subsequence plus the reference number in the divided dictionary DDi to which the decoded subsequence belongs.

［作用］このような構成を備えたＬＺＷ符号に従ったデータ圧縮
及び復元によれば次の作用が得られる。[Operations] Data compression and decompression according to the LZW code with such a configuration provides the following effects.

即ち、従来のＬＺＷ符号では、入力文字列を相異なる文
字列に分けて符号化する時、現在符号化中の各文字列は
以前に符号化された文字列とは独立に出現するとして符
号化する形式を取っている。That is, in conventional LZW codes, when an input character string is divided into different character strings and encoded, each character string currently being encoded is encoded as appearing independently of the previously encoded character string. It takes the form of

この方法は、無記憶情報源の符号化には問題ない。しか
し、実際の文章等、多くのデータは記憶情報源とみなす
ことができる。ところが従来のＬＺＷ符号では文字列が
出現する履歴を十分利用できておらず、データ圧縮後も
文字列が出現する際の従属性については冗長性が残って
いる。This method has no problem in encoding memoryless information sources. However, many data such as actual sentences can be considered as memory information sources. However, in the conventional LZW code, the history of occurrences of character strings cannot be fully utilized, and even after data compression, redundancy remains in the dependencies when character strings appear.

第１図（ａ）は従来のＬＺＷ符号化による辞書全体を１
つとして扱う場合の辞書の探索木を示し、この場合、辞
書の探索木の根（ｒｏｏｔ＝インデックスＯ）は空であ
り、ＬＺＷ符号では符号化中の文字列に対して以前に出
現した文字列の履歴は考えられていないことを示してい
る。Figure 1(a) shows the entire dictionary by conventional LZW encoding.
In this case, the root of the dictionary search tree (root = index O) is empty, and in the LZW code, the history of strings that previously appeared for the string being encoded is shown. indicates that it has not been considered.

そこで本発明は、符号化文字列間の従属関係をインデッ
クス符号化時に取り込むことによって文字列間の冗長性
を削減して圧縮率を高めることができる点に着目し、第
１図（ｂ）のように、登録辞書を幾つかの分割辞書ＤＤ
、〜ＤＤ、に分割して持ち、辞書の全体としての登録位
置を示すインデックスを各分割辞書ＤＤ、〜ＤＤ、を出
現確率の高い順に並べた時の分割辞書群中のインデック
スで表すことにより、圧縮率の向上を図る。Therefore, the present invention focuses on the fact that the redundancy between character strings can be reduced and the compression rate can be increased by incorporating the dependency relationships between encoded character strings at the time of index encoding. In this way, the registered dictionary is divided into several divided dictionaries DD.
, ~DD, and the index indicating the registration position of the dictionary as a whole is expressed as an index in the divided dictionary group when each divided dictionary DD, ~DD is arranged in order of probability of appearance. Aim to improve compression ratio.

例えば図示のように出現確率の順番に配列された分割辞
書ＤＤ、〜ＤＤ５の内の分割辞書Ｄ　Ｄ　ａに黒丸で示
すインデックス７の文字列を登録した場合、符号化され
るインデックスは、登録分割辞書ＤＤ４の１つ前までの
分割辞書ＤＤ、〜ＤＤ３内の文字列の総和ｎｌ＋ｎ２＋ｎ３＝１８に登録分割辞書ＤＤ、内のインデックス、例えば分割辞
書ＤＤ４のインデックス７を加えた値２５で表して符号
化する。For example, if a character string with index 7 indicated by a black circle is registered in the divided dictionary D D a of the divided dictionaries DD to DD5 arranged in the order of appearance probabilities as shown in the figure, the encoded index will be the registered divided The index in the registered divided dictionary DD, for example, the index 7 of the divided dictionary DD4, is added to the sum of character strings in the divided dictionaries DD, ~DD3 up to one before the dictionary DD4, nl+n2+n3=18, and is expressed as a value 25 and encoded. do.

［実施例］第２図は本発明の一実施例を示した実施例構成図である
。[Embodiment] FIG. 2 is a block diagram showing an embodiment of the present invention.

第２図において、１０はＣＰＵｒあり、ＬＺＷ符号に従
った本発明による文字列の符号化によるデータ圧縮及び
データ圧縮により得られた符号語からの文字列の復元処
理を行なう。In FIG. 2, 10 is a CPUr, which performs data compression by encoding character strings according to the present invention according to the LZW code, and restores character strings from code words obtained by data compression.

ＣＰＵｌ０に対しては可変長符号器／復号器１２が設け
られ、具体的には符号アルゴリズムに基づく符号化ソフ
ト及び復号アルゴリズムに基づく復号化ソフトで実現さ
れる。この可変長符号器／復号器１２による符号化及び
復号化については、復元文字の順序変換用のスタックメ
モリ１４が使用される。A variable length encoder/decoder 12 is provided for the CPU10, and is specifically realized by encoding software based on a coding algorithm and decoding software based on a decoding algorithm. For encoding and decoding by the variable length encoder/decoder 12, a stack memory 14 for converting the order of restored characters is used.

更にＣＰＵｌ０に対しては、全体辞書（ＴＤ）１６、分
割辞書（ＤＤ）１８及び分割辞書要素の出現順番り格納
メモリ２０が設けられる。全体辞書１６には従来のＬＺ
Ｗ符号化及び復号化と同様な辞書登録が行なわれる。分
割辞書１８は本発明特有の辞書であり、予めＮ分割され
ており、例えば符号化を例にとると新たな符号化済み文
字列を辞書登録する際に既に登録済みの文字列との従属
関係によって分離される分割辞書ＤＤｉを指定して対応
する全体辞書１６での登録位置を示すインデックス（ア
ドレス）を登録する。この分割辞書１８の中の任意の分
割辞書を指定して文字列アドレスを登録する際に分割辞
書要素の出現順番りが出現順番り格納メモリ２０に格納
される。出現順番り格納メモリ２０に格納された分割辞
書の出現の順番は、分割辞書１８に登録した文字列の辞
書全体としての位置を示す参照番号、即ち符号化しよう
とするインデックスを作成する際に使用される。Furthermore, for the CPU 10, a total dictionary (TD) 16, a divided dictionary (DD) 18, and a memory 20 for storing divided dictionary elements in the order of their appearance are provided. The general dictionary 16 has the conventional LZ
Dictionary registration similar to W encoding and decoding is performed. The divided dictionary 18 is a dictionary unique to the present invention, and is divided into N parts in advance. Taking encoding as an example, when registering a new encoded character string in the dictionary, the subordination relationship with already registered character strings is determined. The divided dictionary DDi separated by is specified, and an index (address) indicating the registration position in the corresponding overall dictionary 16 is registered. When specifying an arbitrary divided dictionary in the divided dictionary 18 and registering a character string address, the order in which the divided dictionary elements appear is stored in the appearance order storage memory 20. The order of appearance of the divided dictionaries stored in the appearance order storage memory 20 is used to create a reference number indicating the position of the character string registered in the divided dictionary 18 in the entire dictionary, that is, an index to be encoded. be done.

次に本発明の原理を符号化処理を例にとって説明する。Next, the principle of the present invention will be explained using encoding processing as an example.

まず、符号化の過程で、例えば第３図に示すように参照
辞書をいくつかの部分集合、即ち分割辞書ＤＤｉ　　（
ｉ＝１〜Ｎ）に分割して作成する。そして新たな文字列
を登録する際には、既に登録済みの文字列との従属関係
から新たな文字列を登録する分割辞書を指定して登録す
る。First, in the encoding process, the reference dictionary is divided into several subsets, that is, divided dictionaries DDi (
i=1 to N). When registering a new character string, the divided dictionary in which the new character string is to be registered is specified and registered based on the dependency relationship with the already registered character strings.

尚、分割辞書の登録データは全体辞書に登録した文字列
のアドレスとなる。Note that the registered data of the divided dictionary is the address of the character string registered in the entire dictionary.

例えば、第３図に示すように、５つの分割辞書ＤＤ、〜
ＤＤ、に分割していた場合、新たに登録しようとする文
字列が既に登録済みの文字列との従属関係から分割辞書
ＤＤ４に分類され、且つ分割辞書ＤＤ４において第７番
目の登録であったとすると、分割辞書ＤＤ４内の黒丸で
示すインデックス７のように文字列の分割辞書登録（全
体辞書アドレス）が行なわれる。For example, as shown in FIG. 3, five divided dictionaries DD, ~
DD, and the character string to be newly registered is classified into the divided dictionary DD4 due to its dependency with the already registered character strings, and is the seventh registration in the divided dictionary DD4. , character strings are registered in the divided dictionary (whole dictionary address) as indicated by index 7 indicated by a black circle in the divided dictionary DD4.

このような分割辞書登録が済んだならば、その時分割辞
書ＤＤ、〜ＤＤ、に属する文字列が出現しやすい順に分
割辞書を並べる。第３図の場合、出現しやすい順に分割
辞書ＤＤ、〜ＤＤ、を並べており、分割辞書ＤＤ＋が最
も出現しやすく、分割辞書ＤＤ、が最も出現しにくく、
今回登録を行なった分割辞書ＤＤ４は４番目に出現しや
すい順番に位置している。Once such divided dictionary registration is completed, the divided dictionaries are arranged in the order in which character strings belonging to the time-divided dictionaries DD, to DD are likely to appear. In the case of Fig. 3, the divided dictionaries DD, ~DD, are arranged in the order of ease of appearance, with the divided dictionary DD+ being the most likely to appear, and the divided dictionary DD being the least likely to appear.
The divided dictionary DD4, which has been registered this time, is located in the fourth most likely order of appearance.

このように分割辞書ＤＤ、〜ＤＤ、を分割辞書に属する
文字列が出現しやすい順に並べたならば、登録を行なっ
た分割辞書ＤＤ４の黒丸で示すインデックス７について
辞書全体としての登録位置をインデックスｉとして求め
る。If the divided dictionaries DD, ~DD are arranged in the order in which character strings belonging to the divided dictionaries are likely to appear, the registration position of the entire dictionary for index 7 indicated by the black circle of the registered divided dictionary DD4 is determined by index i. Find it as.

この辞書全体としての登録位置を示すインデックスｉは
、例えば符号化中の注目文字列が分割辞書ＤＤｘ　に属
しており、分割辞書ＤＤＡのに番目に登録された文字列
であるとすると、辞書全体としての登録位置を示すイン
デックスｉは次式で表わすことができる。For example, if the character string of interest being encoded belongs to the divided dictionary DDx and is the character string registered in the divided dictionary DDA, the index i indicating the registration position in the dictionary as a whole is The index i indicating the registered position of can be expressed by the following equation.

ｉ＝　　　Σ　　ｎ　　（Ｄ　　＋ｋｊ　　　　　　・
　・　・　（１−）Ｐ　（１１＞Ｐ　（１１但し、ｎ　（ｘ）は分割辞書Ｘの節点数ｐ　（ｘ）は分
割辞書Ｘの出現確率例えば第３図の分割辞書ＤＤ４内のインデックス７の文
字列の登録を例にとって辞書全体としての登録位置を示
すインデックスｉを計算すると次のようになる。まずΣ
ｎ　（ｊ）は文字登録を行なった分割辞書ＤＤ４より１
つ前までの分割辞書ＤＤ１〜ＤＤ、のそれぞれに登録さ
れた文字列の節点数の総和を表わしていることから、 Σｎ　（ｊ）　＝ｎｌ＋ｎ２＋ｎ３＝６＋８＋４＝１８
となる。またにノは文字列の登録を行なった分割辞書Ｄ
Ｄ４内の登録順番を示すインデックス＝７となる。従っ
て辞書全体としてのインデックスｉは、１＝１８＋７＝２５として求めることができる。i= Σ n (D + kj ・
・・ (1−)P (11>P (11) where n (x) is the number of nodes in the divided dictionary X p (x) is the appearance probability of the divided dictionary Taking the registration of a character string as an example, calculating the index i that indicates the registration position in the entire dictionary is as follows.First, Σ
n (j) is 1 from the divided dictionary DD4 in which character registration was performed.
Since it represents the total number of nodes of character strings registered in each of the previous divided dictionaries DD1 to DD, Σn (j) = nl + n2 + n3 = 6 + 8 + 4 = 18
becomes. In addition, D is a divided dictionary in which character strings are registered.
The index indicating the registration order in D4 is 7. Therefore, the index i of the entire dictionary can be determined as 1=18+7=25.

従来のＬＺＷ符号化方式では、辞書に登録した順番をイ
ンデックスとしているためインデックス間の相関は小さ
くなる。これに対し本発明にあっては、前記第（１）式
でインデックスを求めていることから、従来方式より小
さい番号のインデックスを割り当てることができる。そ
して前記第（１）式で与えられるインデックスは出現し
やすい文字列はど小さい値を持つことになり、インデッ
クスの符号化について固定長符号とせずにインデックス
のビット長より短くなるように符号化することによって
高い圧縮率を得ることができる。In the conventional LZW encoding method, since the order registered in the dictionary is used as an index, the correlation between indexes becomes small. On the other hand, in the present invention, since the index is calculated using the above-mentioned equation (1), it is possible to allocate an index with a smaller number than in the conventional method. The index given by the above formula (1) will have the smallest value for character strings that are likely to appear, so instead of encoding the index as a fixed length code, encode it so that it is shorter than the bit length of the index. By doing this, a high compression ratio can be obtained.

ここで前記第（１）式で与えられるインデックスｉは分
割辞書の累積出現確率の順番となり、この累積出現確率
については次の■、■のいずれかを用いて分割辞書の出
現しやすい順番を求める。Here, the index i given by the above equation (1) is the order of the cumulative appearance probability of the divided dictionary, and for this cumulative appearance probability, use either of the following ■ or ■ to find the order in which the divided dictionary is likely to appear. .

■分割辞書の出現確率の順番；各分割辞書のインデックスの個数をｎ　（ｊ）とすると
、ｊ番目の分割辞書ＤＤｊの出現確率は、出現確率＝ｎ
（ｊ）／ｎ　　　　・・・　（２）で近似されるため、
分割辞書の出現確率の順番は各分割辞書におけるインデ
ックスの個数ｎ　（Ｄの大きさの順番で表わされる。た
だし前記第（２）式におけるｎは、ｎ＝　Σｎ　　（Ｄ但し、Ｎは分割辞書数で定義される辞書全体としてのインデックスの総数であ
る。■Order of appearance probabilities of divided dictionaries; If the number of indexes of each divided dictionary is n (j), the appearance probability of the j-th divided dictionary DDj is as follows: Probability of appearance = n
(j)/n... Since it is approximated by (2),
The order of the appearance probabilities of divided dictionaries is expressed in the order of the number of indexes n (D) in each divided dictionary. However, n in the above formula (2) is n = Σn (D However, N is the number of divided dictionaries is the total number of indexes in the entire dictionary defined by .

■分割辞書の遷移確率の順番；直前文字列の最終文字Ｋから今回符号化しようとする注
目文字列の属する分割辞書の文字列群ｊに遷移する確率
ｐ（Ｋｌｊ）を注目文字列を符号化または復号化する毎
に更新しながら求めておき、分割辞書をｐ（Ｋｌｊ）の
大きさの順で表わす。■Order of transition probabilities in divided dictionaries: Encode the target character string by determining the probability p (Klj) of transition from the last character K of the previous character string to the character string group j of the divided dictionary to which the target character string to be encoded this time belongs. Alternatively, the divided dictionaries are obtained while being updated each time they are decoded, and the divided dictionaries are expressed in the order of the size of p(Klj).

この分割辞書の遷移確率を用いることで文字列間の冗長
性を効率的に削減することができる。By using the transition probabilities of this divided dictionary, redundancy between character strings can be efficiently reduced.

次に第４，５図の動作フロー図を参照して本発明による
符号化処理及び復号化処理を詳細に説明する。Next, the encoding process and decoding process according to the present invention will be explained in detail with reference to the operational flow diagrams of FIGS. 4 and 5.

まず第４図の動作フロー図を参照して本発明の符号化処
理を説明すると次のようになる。尚、第４図において各
ステップにおいてアンダーラインの示した部分が第７図
に示した従来の符号化処理と異なる。First, the encoding process of the present invention will be explained with reference to the operational flow diagram of FIG. 4 as follows. Note that the underlined portions in each step in FIG. 4 are different from the conventional encoding process shown in FIG. 7.

第４図において、まずＳｌで初期化処理を行なう。この
初期化処理は次の■〜■の内容をもつ。In FIG. 4, initialization processing is first performed in Sl. This initialization process has the following contents.

■第１番目の文字を含むように全体辞書ＴＤを初期化す
る。具体的にはアドレスｎ＝０〜２５５は初期登録が行
なわれた第１番目の文字用に割り当てられることから、
全体辞書ＴＤの先頭アドレスｎをｎ＝２５６とする。(2) Initialize the entire dictionary TD to include the first character. Specifically, since addresses n=0 to 255 are assigned to the first character for which initial registration has been performed,
Let the starting address n of the entire dictionary TD be n=256.

■分割辞書の先頭アドレスｎ　（ｉ）を全てｎ（ｉ）−
〇とする。但し、ｉ＝０〜分割数である。■The first address n(i) of the divided dictionary is all n(i)−
Set it as 〇. However, i=0 to the number of divisions.

■最初の文字Ｋを入力し、これを語頭文字列ωとする。■Input the first letter K and use this as the initial letter string ω.

■入力文字Ｋが属する分割辞書番号ｆ　（Ｋ）をｆ（Ｋ
）＝ｄとする。■The divided dictionary number f (K) to which the input character K belongs is set to f (K
)=d.

以上のＳｌの初期化処理が済むと８２に進み、次の入力
文字Ｋを読み込む。Ｓ３で全ての入力文字の読み込みが
終了したか否かチエツクし、終了していなければＳ４に
進み、Ｓｌでセットした語頭文字列ωに８２で読み込ん
だ入力文字Ｋを加えた文字列（ωＫ）が全体辞書ＴＤに
あるか否か検索する。文字列（ωＫ）が全体辞書ＴＤに
あればＳ５に進んで全体辞書ＴＤにおける文字列（ωＫ
）を格納しているインデックスωを次の語頭文字列ωと
してＳ２に戻り、次の入力文字Ｋを読み込む。After the above initialization processing of Sl is completed, the process advances to 82 and the next input character K is read. In S3, it is checked whether reading of all input characters has been completed. If not, the process proceeds to S4, and a character string (ωK ) is found in the overall dictionary TD. If the character string (ωK) is in the overall dictionary TD, the process advances to S5 and the character string (ωK) in the overall dictionary TD is
) is stored as the next initial character string ω, the process returns to S2, and the next input character K is read.

８２〜Ｓ５の繰り返しによりＳ４で文字列（ωＫ）が全
体辞書ＴＤになかった場合にはＳ６に進む。By repeating steps 82 to S5, if the character string (ωK) is not found in the overall dictionary TD in S4, the process advances to S6.

Ｓ６にあっては次の■〜■の処理を行なう。In S6, the following processes (1) to (2) are performed.

■前記第（１）式に基づいてインデックスｉを算出し、
可変長符号化によるｃｏｄｅ　（ｇ　（ω））として出
力する。■Calculate the index i based on the above formula (1),
It is output as code (g (ω)) by variable length encoding.

即ち、文字列（ωＫ）を登録しようとする分割辞書番号
がｄであることから、全ての分割辞書の出現確率ｐ　（
ｘ）を求め、現在登録を行なおうとするｄ番目の出現確
率ｐ　（ｘ）より高い出現確率を持つ分割辞書の部分列
の総数を算出し、この総数に分割辞書番号ｄの分割辞書
内での登録順番を示すインデックスの値を加算した値と
してインデックスｉを算出し、これをｇ（ω）として符
号化して出力する。またこのインデックスｉの算出時に
得られた分割辞書の出現の順位ｈ　（ｘ）は第２図に示
したように出現順番り格納メモリ２０に格納される。That is, since the divided dictionary number in which the character string (ωK) is to be registered is d, the appearance probability of all divided dictionaries p (
x), calculate the total number of subsequences of the divided dictionary that have a higher occurrence probability than the d-th occurrence probability p (x) that is currently being registered, and add the subsequences in the divided dictionary with the divided dictionary number d to this total number. An index i is calculated by adding the index value indicating the registration order of , and this is encoded and output as g(ω). Further, the appearance order h(x) of the divided dictionary obtained when calculating this index i is stored in the storage memory 20 in the order of appearance as shown in FIG.

■文字列（ωＫ）を全体辞書ＴＤのアドレスｎに登録す
る。■Register the character string (ωK) at address n of the overall dictionary TD.

■文字列（ωＫ）を登録した全体辞書ＴＤのアドレスｎ
を分割辞書番号ｄの分割辞書ＤＤのアドレスｎ　（ｄ）
に登録する。■Address n of the overall dictionary TD where the character string (ωK) is registered
The address n of the divided dictionary DD with the divided dictionary number d (d)
Register.

■文字列（ωＫ）の最後の一文字Ｋを新たな語頭文字列
ωに置き換える。■Replace the last character K of the character string (ωK) with a new initial character string ω.

■文字Ｋが属する分割辞書番号ｆ　（Ｋ）をｆ　（Ｋ）
＝ｄとする。■ Divided dictionary number f (K) to which the letter K belongs to f (K)
=d.

■全体辞書ＴＤのアドレスｎを１つインクリメントする
。(2) Increment the address n of the entire dictionary TD by one.

■分割辞書ＤＤのアドレスｎ　（ｄ）を１つインクリメ
ントする。(2) Increment the address n (d) of the divided dictionary DD by one.

以上の■〜■に示すＳ６の処理が済むと再びＳ２に戻っ
て次の入力文字Ｋを読み込む。After completing the processing in S6 shown in (1) to (4) above, the process returns to S2 again to read the next input character K.

以上の処理の繰り返しによりＳ３で全ての入力文字の読
み込みが終了したことが判別されると、Ｓ７に進んでＳ
６の場合と同様にインデックスｉを算出し、算出したイ
ンデックスｉを可変長符号化したｃｏｄｅ　（ｇ　（ω
））を出力して一連の処理を終了する。When it is determined in S3 that reading of all input characters has been completed by repeating the above processing, the process advances to S7 and
6, the index i is calculated, and the calculated index i is variable-length coded as code (g (ω
)) is output and the series of processing ends.

第５図は本発明の復号化処理の動作フロー図であり、各
ステップにおいてアンダーラインの示した部分の処理が
第８図の従来処理と異なる。FIG. 5 is an operational flow diagram of the decoding process according to the present invention, and the underlined parts of each step are different from the conventional process shown in FIG.

第５図において、まずＳｌで次の■〜■に示す初期化処
理を行なう。In FIG. 5, first, the following initialization processing shown in (1) to (4) is performed in Sl.

■第１番目の文字を含むように全体辞書ＴＤを初期化す
る。即ち、全体辞書ＴＤのアドレスｎ＝Ｑ〜２５５は第
１番目の文字用に割り当てられることから全体辞書ＴＤ
の先頭アドレスｎをｎ＝２５６とする。(2) Initialize the entire dictionary TD to include the first character. That is, since the address n=Q~255 of the entire dictionary TD is assigned for the first character, the address n=Q~255 of the entire dictionary TD
Let the start address n of the file be n=256.

■分割辞書ＤＤの先頭アドレスｎ　（ｉ）を全てｎ（ｉ
）＝Ｏにセットする。但し、ｉはＯ〜分割数Ｎを示す。■The first address n(i) of the divided dictionary DD is all n(i
) = set to O. However, i indicates O to the number of divisions N.

■最初の符号Ｃ０ＤＥを読み込む。■Read the first code C0DE.

■ｇ−’　（ｃｏｄｅ−’　（ＣＯＤＥ）　）により全
体辞書ＴＤのインデックスωを求める。(2) Find the index ω of the entire dictionary TD using g-'(code-' (CODE)).

但し、ｃｏｄｅ”は可変長符号を固定長符号に復号する
テーブル。However, "code" is a table for decoding a variable length code into a fixed length code.

ｇ　−１は全体辞書ＴＤのインデックスを分割辞書Ｄ　
Ｄのインデックスに変換するテーブル。g −1 is the index of the whole dictionary TD divided into dictionary D
Table to convert to index of D.

■　■で求めた全体辞書ＴＤのインデックスωをω＝　
ＯＬＤωに置く。■ The index ω of the entire dictionary TD obtained in ■ is ω=
Place it in OLDω.

■全体辞書インデックスωから第１文字Ｋを求める。■ Find the first character K from the overall dictionary index ω.

■求めた第１文字Ｋを出力する３、 ■第１文字Ｋをに１と置く。■Output the first character K found 3. ■Place the first letter K as 1.

次に８２に進んで次の入力符号コードＣ０ＤＥを読み込
み、Ｓｌ−の■と同様にして辞書全体インデックスＣ０
ＤＥを求め、このインデックスＣ０ＤＥをＩＮωに置く
。Next, proceed to 82 to read the next input symbol code C0DE, and in the same way as ■ in Sl-, the whole dictionary index C0
Determine DE and place this index C0DE in INω.

次に８３に進んで全ての入力符号の読み込みが終了した
か否かチエツクしてＳ４に進み、全体辞書ＴＤにＣ０Ｄ
Ｅが定義されているか否かチエツクする。通常、入力し
た符号語は前回までの処理で辞書に登録されているため
、Ｓ５に進んでＣ０ＤＨに対応する全体辞書インデック
スにより全体辞書から対応する文字列（ωＫ）を読み出
し、Ｓ６で文字Ｋを一時的にスタックし、参照番号ωを
新たなＣ０１）ＥとしてＳ５に戻る。このＳ５．Ｓ６の
手順を再帰的に全体辞書インデックスωか１文字Ｋに至
るまで繰り返し、最終的に８７に進んでＳ６でスタック
した文字をＬＩＦＯ（Ｌａ＋ｔ　Ｉｎ　Ｆａｃｔ　０ｕ
ｔ）形式でポツプアップして出力する。Next, proceed to 83, check whether reading of all input codes has been completed, proceed to S4, and add C0D to the overall dictionary TD.
Check if E is defined. Normally, the input code word has been registered in the dictionary in the previous processing, so the process proceeds to S5 and reads the corresponding character string (ωK) from the general dictionary using the general dictionary index corresponding to C0DH, and the character K is read out in S6. It is temporarily stuck, the reference number ω is set as a new C01)E, and the process returns to S5. This S5. The procedure in S6 is recursively repeated until the entire dictionary index ω or one character K is reached, and finally the process proceeds to 87 and the stacked characters in S6 are LIFO (La+t In Fact 0u
t) Pop up and output in format.

このＳ７の処理は次の■〜■の内容をもつ。The process of S7 has the following contents.

■スタックが空になるまでスタックをポツプアップしな
がら格納しである文字を出力する。■Pop up and store the stack until the stack is empty, then output a certain character.

■復元した第】文字をＫＪ−と嘗＜０、■復元文字の第
１文字に１より次の分割辞書の番号ｄを求める。(2) The number d of the next divided dictionary is determined from the restored character KJ- and 嘗<0, and (2) the first character of the restored character is 1.

■（ＯＬＤω、Ｋ）の組を全体辞書ＴＤのアドレスｎに
格納する。(2) Store the set (OLDω, K) at address n of the entire dictionary TD.

■全体辞書インデックスを分割辞書インデックス変換表
の（ｊ、ｎ　（ｄ））のアドレスに、この時の全体辞書
ＴＤのアドレスｎを格納する。(2) Store the address n of the entire dictionary TD at the address (j, n (d)) of the divided dictionary index conversion table.

■全体辞書のアトし・スｎを１つインクリメントする。■Increments Atoshi/Sn in the entire dictionary by one.

■分割辞書のアドレスｎ　（ｄ）を１つインクリメント
する。(2) Increment the address n (d) of the divided dictionary by one.

■全体辞書インデックスＩＮωを０Ｌｌ）ωと置く。■Set the overall dictionary index INω as 0Ll)ω.

尚、Ｓ５は第８図の従来例と同様な例外処理を行なうも
ので、Ｓ４でＣ０ＤＥが定義されていなかった場合には
Ｓ５に進んで次の■〜■の処理を行なつ０ ■Ｓ１の■でに１＝にと置かれた文字ＦＩＮｃｈａ＋を
出力する。Note that S5 performs exception handling similar to the conventional example shown in FIG. ■Output the character FINcha+, which has already been placed in 1=.

■Ｓ１の■でセットされたＯＬＤωをＣ０ＤＥと置く。■ OLDω set in ■ of S1 is set as C0DE.

■ＴＤ（ωＫｌ）の組から求めた全体辞書インデックス
をＩＮωと置く。(2) Set the overall dictionary index obtained from the set of TD(ωKl) as INω.

次に第５図の８１及びＳ２でテーブルｇ　−１により行
なわれる全体辞書インデックスから分割辞書インデック
スへの変換処理は次の■〜■により行なわれる。Next, the conversion process from the entire dictionary index to the divided dictionary index, which is performed using the table g-1 at 81 and S2 in FIG. 5, is performed by the following steps 1 to 2.

■出現頻度順に並んだ分割辞書の順番を示す番号ｊをｊ
＝０とする。■The number j indicating the order of the divided dictionaries arranged in order of appearance frequency is j
=0.

■分割辞書インデックスＸが正である間、次のループを
まわる。■While the divided dictionary index X is positive, the next loop is executed.

■ｊ＝ｊ＋１と１つインクリメントする。■Increment by one as j=j+1.

■ｘ＝ｘ−ｎ　（ｈ　（Ｄ　）を求める。但し、ｎ（ｈ
（ｊ））はｊ番目の分割辞書内の文字列数（節点個数）
である。■ Find x=x−n (h (D ). However, n(h
(j)) is the number of strings (number of nodes) in the j-th divided dictionary
It is.

■分割辞書インデックスＸがマイナスの値となった時、
ｊ＝ｊ−１番目、即ち１つ前の分割辞書で算出されたイ
ンデックスＸが分割辞書インデックスの値となる。■When the divided dictionary index X becomes a negative value,
The j=j-1st index, that is, the index X calculated in the previous divided dictionary, becomes the value of the divided dictionary index.

■　■で求められた分割辞書の辞書番号ｊとｊ番目の分
割辞書内でのインデックスＸの組（ｊ、ｘ）を出力する
。(2) A pair (j, x) of the dictionary number j of the divided dictionary obtained in (2) and the index X in the j-th divided dictionary is output.

例えば第２図で全体辞書インデックス＝２５が与えられ
たとすると、ｊ＝０とセットした後にｊ＝ｊ＋１＝１と
してｘ＝２５−ｎｌ＝２５−６＝１９を求め、次にｊ＝
２としてｘ＝１９−８＝１１を求め、続いてｊ＝３とし
てｘ＝１１−４＝７を求め、更にｊ＝４としてｘ＝７−
８＝−１を求める。このｊ＝４でｘ＝−１とマイナスに
なることから１つ前のｊ＝３におけるｘ＝７を分割辞書
インデックスとして求める。即ち、分割辞書ＤＤ４内の
黒丸で示すインデックス７が変換値として求まる。For example, if the overall dictionary index = 25 is given in Fig. 2, after setting j = 0, set j = j + 1 = 1 to obtain x = 25 - nl = 25 - 6 = 19, then set j =
2, find x=19-8=11, then set j=3 to find x=11-4=7, and then set j=4 to find x=7-
Find 8=-1. Since x=-1 at j=4, which is a negative value, x=7 at the previous j=3 is determined as the divided dictionary index. That is, index 7 indicated by a black circle in the divided dictionary DD4 is found as a converted value.

以上説明した第４，５図の符号化及び復号化の動作フて
コーの中で辞書全体の位置を示すインデックスの符号化
コード酸び復号化コードは数の小さいインデックス程、
短い符号を割り当てる公知の技術であるハフマン符号化
等の可変長符号化を用いることができる。また種々のデ
ータを圧縮できるようにユニバーサルな性質を保持させ
るためには、例えば人為的に短し・番号に短い符号長を
割り当（るイライアス（Ｅｌｉａ＋　）符号を用いれば
よい。In the encoding and decoding operation graphs of FIGS. 4 and 5 explained above, the encoding code and decoding code of the index indicating the position of the entire dictionary are as small as the index.
Variable length encoding, such as Huffman encoding, which is a known technique for assigning short codes, can be used. Furthermore, in order to maintain universal properties so that various data can be compressed, for example, an Elias (Elia+) code that is artificially shortened and assigns a short code length to the number may be used.

イライアス符号は第６図に示ｔようにγ符号と６符号で
構成される。イライアス符号のγ符号は２進数に有効桁
を示す接頭語（ｐｒｅｆｉｘ）を付けたものである。δ
符号はγ符号の接頭語を更にγ符号で表わしたものであ
る１、イライ７ス符号は接頭語より２進数の桁数が分か
るので符号語をビット詰しても一義的に復号することが
できる。The Elias code is composed of a γ code and 6 codes as shown in FIG. The γ code of the Elias code is a binary number with a prefix indicating a significant digit added thereto. δ
The code is the prefix of the γ code further represented by the γ code.1. With the erase code, the number of binary digits can be determined from the prefix, so even if the code word is bit-packed, it can be unambiguously decoded. can.

「発明の効果」以上説明してきたように本発明にすれば、辞書登録をい
くつかに分割して持ち２辞書の全体としての登録位置を
示すイー／デ・・！クスを各分割辞書の出現順に並べた
番号中のインデックスで表わすことにより文字列間の冗
長性が削減され、従来のｌ５ＺＷ符号に比べ高い圧縮率
を得ることができる。"Effects of the Invention" As explained above, according to the present invention, dictionary registration can be divided into several parts, and the registration position of the two dictionaries can be shown as a whole. By representing each character string by an index in a number arranged in the order of appearance in each divided dictionary, redundancy between character strings can be reduced, and a higher compression rate can be obtained than with the conventional 15ZW code.

[Brief explanation of drawings]

第１図は本発明の原理説明図；第２図は実施例構成図；第３図は本発明の詳細な説明図。第４図は本発明の符号化処理フロー図第５図は本発明の復号化処理フロ〜図；第６図は本発明
で用いるイライアス符号説明図：第７図は従来のＬＺＷ
符号化処理フロー図：第８図は従来のＬＺＷ復号復号化
処理フロー図画９図ＺＷ符号化説明図；第１０図は辞書構成例の説明図；第１１図はＬＺＷ復号化説明図である。図中、１０：ＣＰＵ１２：可変長符号器／′復号器１４、復元文字の順序変換用スタックＪ６・全体辞書（ＴＤ）１８：分割辞書（ＤＤ）FIG. 1 is a diagram illustrating the principle of the present invention; FIG. 2 is a diagram illustrating the configuration of an embodiment; FIG. 3 is a detailed diagram illustrating the present invention. Fig. 4 is an encoding process flow diagram of the present invention; Fig. 5 is a decoding process flow diagram of the present invention; Fig. 6 is an explanatory diagram of the Elias code used in the present invention; Fig. 7 is a conventional LZW
Encoding process flowchart: FIG. 8 is a conventional LZW decoding process flow diagram; FIG. 9 is an explanatory diagram of ZW encoding; FIG. 10 is an explanatory diagram of a dictionary configuration example; FIG. 11 is an explanatory diagram of LZW decoding. In the figure, 10: CPU 12: Variable length encoder/decoder 14, stack J6 for converting the order of restored characters/total dictionary (TD) 18: divided dictionary (DD)

Claims

[Claims]

(1) Divide the encoded data into different subsequences, add a different reference number to each subsequence, and register it in a dictionary, and match the input data with the maximum length among the subsequences in the dictionary. In a data compression method that specifies and encodes a subsequence using its reference number, when registering a new encoded subsequence in a dictionary, a divided dictionary is created that is classified according to the dependency relationship with already registered subsequences. In addition to specifying and registering, the registration position of the entire dictionary is set to the total number of subsequences in the divided dictionary group that have a higher probability of occurrence than the divided dictionary to which the registered encoded subsequence belongs. A data compression method characterized by specifying a number with a reference number added and encoding the number.

(2) Divide the decoded data into different subsequences, add a different reference number to each subsequence, and register it in the dictionary.
In a data restoration method in which the input restored data is decoded by specifying the reference number of the maximum length matching subsequence among the subsequences in the dictionary, when registering a new decoded subsequence in the dictionary, , specifies and registers a divided dictionary that is classified according to the dependency relationship with the already decoded subsequence, and also registers the registration position of the entire dictionary in a divided dictionary group that has a higher probability of appearance than the divided dictionary to which the decoded subsequence belongs. A data restoration method characterized by decoding a code specified by a number obtained by adding a reference number in the divided dictionary to which the decoded subsequence belongs to the total number of subsequences.