JPH04145726A

JPH04145726A - Data compression and restoring system

Info

Publication number: JPH04145726A
Application number: JP2269930A
Authority: JP
Inventors: Hirotaka Chiba; 広隆千葉; Yoshiyuki Okada; 佳之岡田; Shigeru Yoshida; 茂吉田; Yasuhiko Nakano; 泰彦中野
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1990-10-08
Filing date: 1990-10-08
Publication date: 1992-05-19
Anticipated expiration: 2015-09-18
Also published as: JP3088740B2

Abstract

PURPOSE:To improve the compression rate of data compression of an LZW code by registering plural character strings in advance in a dictionary while taking the frequency of use of character strings into account and using the dictionary so as to apply compression and decoding. CONSTITUTION:A CPU 12 based on an initial registration software 24 gives a registration number to each of one character of a font being a processing object and registers the result to a dictionary 10 initially. Moreover, registration numbers 4-12 are given to character strings consisting of 2-10 characters with high frequency of use and the result is registered initially in the dictionary 10. When the initial registration is finished, a desired input data or code data is stored in a data buffer 28 of a data memory 26 and encoding processing by an encoding software 20 or decoding processing based on a decoding software 22 is implemented. Thus, registration numbers are assigned at the initializing of the dictionary depending on number of character strings and when a character string registered is in use, the registration number is used for the compression processing to attain high compression.

Description

【発明の詳細な説明】［概要］符号比隣データを相異なる部分列に分けて辞書に登録し
、入力データを辞書中の部分列の内、最大長一致するも
の登録番号で指定して符号化し、また符号語を辞書を使
用して復号するユニバーサル符号化の一種であるＬＺＷ
符号によるデータ圧縮及び復元方式に関し、辞書の初期登録により圧縮率を高めることを目的とし、１文字のみならず、出現頻度の高い１文字以上からなる
文字列の組に予め登録番号を割り当てて辞書に初期登録
するように構成する。[Detailed description of the invention] [Summary] Sign ratio adjacent data is divided into different substrings and registered in a dictionary, and the input data is specified by the registration number of the substring that matches the maximum length among the substrings in the dictionary. LZW is a type of universal encoding that encodes code words and decodes them using a dictionary.
Regarding data compression and decompression methods using codes, the purpose is to increase the compression rate through initial registration in dictionaries. configure for initial registration.

［産業上の利用分野］本発明は、ユニバーサル符号の一種である増分分解型の
改良として知られたＬＺＷ符号によるデータ圧縮及び復
元方式に関する。[Field of Industrial Application] The present invention relates to a data compression and decompression method using an LZW code, which is known as an improved incremental decomposition type of universal code.

近年、文字コード、ベクトル情報１画像など様々な種類
のデータがコンピュータで扱われるようになっており、
扱われるデータ量も急速に増加してきている。大量のデ
ータを扱うときは、データの中の冗長な部分を省いてデ
ータ量を圧縮することで、記憶容量を減らしたり、速く
伝送したりできるようになる。In recent years, computers have come to handle various types of data such as character codes, vector information, and single images.
The amount of data handled is also rapidly increasing. When handling large amounts of data, by compressing the amount of data by eliminating redundant parts, you can reduce storage capacity and speed up transmission.

様々なデータを１つの方式でデータ圧縮できる方法とし
てユニバーサル符号化が提案されている。Universal encoding has been proposed as a method that can compress various data using one method.

ここで、本発明の分野は、文字コードの圧縮に限らず、
様々なデータに適用できるが、以下では、情報理論で用
いられている呼称を踏襲し、データの１ワ一ド単位を文
字と呼び、データが任意ワードつながったものを文字列
と呼ぶことにする。Here, the field of the present invention is not limited to character code compression.
Although it can be applied to a variety of data, in the following we will follow the nomenclature used in information theory, and refer to a single word unit of data as a character, and a string of arbitrary words of data. .

ユニバーサル符号の代表的な方法として、ＺｌｖＬｅｍ
ｐｅｌ　（ジブーレンペル）符号がある（詳しくは、例
えば、宗像ｊ２ｉｙ−Ｌｅｍｐｅｌのデータ圧縮法」、
情報処理、　ＶＯｌ、　２６．　Ｎｏ、　１．　！９８
５年を参照ノコと）。ZlvLem is a typical method for universal codes.
There is a pel (Jibou Lempel) code (for details, for example, Munakata j2iy-Lempel data compression method",
Information processing, VOl, 26. No, 1. ! 98
5 years (see Noko and).

ｚｉｖ−Ｌｅｍｐｅｌ符号では ■ユニバーサル型と、 ■増分分解型（Ｉｎｃｒｅｍｅｎｔａｌ　ｐａｒｓｉｎ
ｇ）の２つのアルゴリズムが提案されている。In ziv-Lempel code, there are two types: ■Universal type and ■Incremental parsin type.
Two algorithms have been proposed: g).

さらに、ユニバーサル型アルゴリズムの改良として、Ｌ
ＺＳＳ符号がある（Ｔ、Ｃ，Ｂｅ１ｌ、’Ｂｅｔｔｅｒ
ＯＰＭ／Ｌ　Ｔｅｘｔ　　Ｃｏｍｐｒｅｓｓｉｏｎ　、
　ＩＥＥＥ　Ｔｒａｎｓ、　ｅｌｌ　Ｃｏｍｍａｎ、、
　　Ｖｏｌ、Ｃ０Ｍ−３４，ＮＯ，Ｉ２．　０５Ｃ，１
９８６参照）。Furthermore, as an improvement of the universal algorithm, L
There is a ZSS code (T, C, Be1l, 'Better
OPM/L Text Compression,
IEEE Trans, ellComman,,
Vol, C0M-34, NO, I2. 05C,1
986).

また、増分分解型アルゴリズムの改良としては、Ｌ　Ｚ
Ｗ　（Ｌｅｍｐｅ１４ｉｖ−Ｗｅｌｃｈ）符号がある（
Ｔ、＾、　Ｗｅｃｈ、’＾Ｔｅｃｈｎｉｑｕｅ　ｆｏｒ
　Ｈｌｇｂ−Ｐｅｒｆｏｒｆｆｌｘｎｃｅ　Ｄａｔａ　
Ｃ。Moreover, as an improvement of the incremental decomposition type algorithm, L Z
There is a W (Lempe14iv-Welch) code (
T, ^, Wech,'^Technique for
Hlgb-Performance Data
C.

ｍｐ＋ｅｓｉｉｏｎ″、Ｃｏｍｐｕｔｅｒ、　　Ｊｕｎ
ｅ　１９８４参照）。mp+esion'', Computer, Jun
e 1984).

これらの符号の内、高速処理ができることと、アルゴリ
ズムの簡単さからＬＺＷ符号が記憶装置のファイル圧縮
などで使われるようになっている。Among these codes, the LZW code has come to be used for file compression in storage devices because of its high-speed processing capability and simple algorithm.

［従来の技術］従来のＬＺＷ符号の符号化アルゴリズムを第８図に示し
、また復号化アルゴリズムを第９図に示す。[Prior Art] A conventional LZW code encoding algorithm is shown in FIG. 8, and a decoding algorithm is shown in FIG. 9.

ＬＺＷ符号化は、書き替え可能な辞書をもち、入力文字
コード・データ中を相異なる文字列に分け、この文字列
を出現した順に番号を付けて辞書に登録するとともに、
現在入力している文字列を辞書に登録しである最長−散
文字列の番号で表して符号化するものである。LZW encoding has a rewritable dictionary, divides the input character code data into different character strings, numbers these character strings in the order of appearance, and registers them in the dictionary.
The currently input character string is registered in the dictionary, represented by the number of the longest-dispersed character string, and encoded.

第８図のＬＺＷ符号化処理では、まずステップＳＬ（以
下「ステップ」を省略）で予め全文字につき一文字から
成る文字列を初期値として辞書に登録してから符号化を
始める。またＳｌでは入力した最初の文字Ｋにより辞書
を検索して参照番号ωを求め、これを語頭文字列（ｐ＋
ｅｌｉＸｓｔｒｉｎｇ）とする。In the LZW encoding process shown in FIG. 8, first, in step SL (hereinafter "step" is omitted), a character string consisting of one character for each character is registered in the dictionary as an initial value, and then encoding is started. In addition, in Sl, the dictionary is searched using the input first character K to find the reference number ω, and this is added to the initial character string (p+
eliXstring).

次に８２で入力データの次の文字Ｋを読み込み、Ｓ３で
文字入力が終了したか否かをチエツクした後、Ｓ４に進
んでＳｌで求めた語頭文字列ωにＳ２で読み込んだ文字
Ｋを加えた文字列ωＫが辞書にあるか否か探す。Next, the next character K of the input data is read in 82, and after checking whether character input is completed in S3, the process proceeds to S4, and the character K read in S2 is added to the initial character string ω determined in Sl. Search whether the added character string ωK exists in the dictionary.

Ｓ４で文字列ωＫが辞書になければ、Ｓ６に進んでＳｌ
で求めた文字にの参照番号ωを符号語ｃｏｄｅ　（ω）
として出力し、また文字列（ωＫ）に新たな参照番号を
付加して辞書に登録し、さらに８２の入力文字Ｋを参照
番号ωに置き換えるとともに、辞書アドレスｎをインク
リメントしてＳ２に戻って次の文字Ｋを読み込む。If the character string ωK is not in the dictionary in S4, proceed to S6 and select Sl.
The reference number ω for the character found in is the code word code (ω)
, and adds a new reference number to the character string (ωK) and registers it in the dictionary.Furthermore, replaces the input character K of 82 with the reference number ω, increments the dictionary address n, and returns to S2 for the next Read the letter K.

一方、Ｓ４で文字列（ωＫ）が辞書にあれば、Ｓ５で文
字列（ωＫ）を参照番号ωに置き換え、再びＳ２に戻っ
て文字列（ωＫ）が辞書から探せなくなるまで最大一致
長の探索を続ける。On the other hand, if the character string (ωK) is found in the dictionary in S4, the character string (ωK) is replaced with the reference number ω in S5, and the process returns to S2 to search for the maximum match length until the character string (ωK) cannot be found in the dictionary. Continue.

第１０．１１図を参照してＬＺＷ符号化を具体的に説明
すると次のようになる。LZW encoding will be specifically explained as follows with reference to FIG. 10.11.

まず第１０図の入力データＩＮＰＵＴは左から右へ読み
込む。最初の文字ａを入力したとき、辞書には文字ａの
他に一致する文字列がないので、０ＵＴＰＵＴ　Ｃ０Ｄ
ＥＩ　（参照番号ω）を符号語として出力する。そして
文字ａを新たな語頭文字列ωとする。First, the input data INPUT in FIG. 10 is read from left to right. When you enter the first letter a, there are no matching strings in the dictionary other than the letter a, so 0UTPUT C0D
Output EI (reference number ω) as a code word. Then, let the character a be a new word initial character string ω.

次に２番目の文字すを入力し、この入力文字すを語頭文
字列ωに加えた文字列ωに＝ａｂは辞書にないことから
、文字すの０ＵＴＰＵＴ　Ｃ０ＤＥ２を符号語として出
力する。そして、拡張した文字列ａｂに参照番号４をつ
けて辞書に登録する。実際の登録は第１１図の右側に示
すように文字列１ｂの形で登録される。そして２番目に
入力した文字すが新たな語頭文字列ωとなる。Next, the second character S is input, and since =ab is not found in the dictionary in the character string ω obtained by adding this input character S to the initial character string ω, the character S 0UTPUT C0DE2 is output as a code word. Then, reference number 4 is added to the expanded character string ab and it is registered in the dictionary. The actual registration is in the form of a character string 1b as shown on the right side of FIG. The second input character becomes a new initial character string ω.

続いて３番目の文字ａを入力したとすると、入力文字ａ
に語頭文字列ωを加えた拡張文字列ωに＝ｂａ＝２ａは
辞書にないことから、文字すの０ＵＴＰＵＴ　Ｃ０ＤＥ
　２を符号語として出力した後、拡張文字列ωに＝ｂａ
を２ａで表わし、参照番号５を付けて辞書に登録する。If you then input the third character a, the input character a
= ba = 2a is not in the dictionary, so the extended character string ω is obtained by adding the initial character string ω to the character string ω, so the character 0UTPUT C0DE
After outputting 2 as a code word, = ba is added to the extended character string ω.
is represented by 2a, and is registered in the dictionary with reference number 5.

そして３番目に入力した文字ａが新たな語頭文字列ωと
なる。The third input character a becomes the new initial character string ω.

４番目の入力文字すについては拡張文字列ωに＝ａｂは
符号語４として既に辞書に登録されているので、文字列
ωＫを新たな語頭文字列ωとし、５番目の文字Ｃを入力
して拡張文字列ωに＝４ｃ＝ａｂｃを作る。この拡張文
字列ωに＝ａｂｃは辞書に登録されていないことから、
文字列ａｂ＝１ｂの０ＵＴＰ［ＩＴ　Ｃ０ＤＥ　４を符
号語として出力し、拡張文字列ωに＝ａｂｃを辞書に４
０の形で参照番号６を付けて登録する。以下、同様にこ
の処理を続ける。For the fourth input character S, = ab is already registered in the dictionary as code word 4 in the extended character string ω, so the character string ωK is set as a new initial character string ω, and the fifth character C is input. and create =4c=abc in the extended character string ω. Since =abc is not registered in the dictionary for this extended character string ω,
0UTP of character string ab = 1b [IT C0DE 4 is output as a code word, and = abc is 4 in the dictionary for expanded character string ω.
Register with reference number 6 in the form of 0. This process continues in the same manner.

第９図の復号化処理は第８図の符号化の逆の操作を行う
。The decoding process shown in FIG. 9 performs the reverse operation of the encoding process shown in FIG. 8.

第９図の復号化では、符号化と同様に予め辞書に全文字
につき一文字からなる文字列を初期値として登録してか
ら復号を始める。In the decoding shown in FIG. 9, similarly to encoding, a character string consisting of one character for each character is registered in the dictionary as an initial value, and then decoding is started.

まずＳｌで最初の符号（参照番号）を読み込み、現在の
Ｃ０ＤＥを０ＬＤｃｏｄｅとし、最初の符号は既に辞書
に登録された一文字の参照番号いずれかに該当すること
から、入力符号Ｃ０ＤＨに一致する文字ｃｏｄｅ（Ｋ）
を探し出し、文字Ｋを出力する。なお、出力した文字（
Ｋ）は後の例外処理のためＦＩＮｃｈａｒにセットして
おく。First, read the first code (reference number) with Sl, set the current C0DE to 0LDcode, and since the first code corresponds to one of the single-character reference numbers already registered in the dictionary, the character code that matches the input code C0DH (K)
Find out and output the letter K. Note that the output characters (
K) is set to FINchar for later exception handling.

次に８２に進んで次の符号を読み込んでＣ０ＤＥにＩＮ
ｃｏｄｅとしてセットする。Ｓ３で新たな符号があるか
否か、すなわち符号入力の終了の有無をチエツクしてＳ
４に進み、Ｓ３で入力された符号Ｃ０ＤＥが辞書に定義
（登録）されているか否かチエツクする。通常、入力し
た符号語は前回までの処理で辞書に登録されているため
、Ｓ５に進んで符号Ｃ０ＤＥに対応する文字列Ｃｏｄｅ
　（ωＫ）を辞書から読み出し、Ｓ６で文字列Ｋを一時
的にスタックし、参照番号ｃｏｄｅ　（ω）を新たなＣ
０ＤＥとして再度Ｓ５に戻り、このＳ５．Ｓ６の手順を
再帰的に参照番号ωが一文字に至るまで繰り返し、最後
にＳ７に進んでＳ６でスタックした文字をＬ　ｉ　Ｌ　
Ｏ（Ｌａｓｊ　ＩｎＦｘｓｔ　０ｕｔ）形式でポツプア
ップして出力する。同時に８７において、前回使った符
号ωと今回復元した文字列の最初の一文字Ｋを組（ω、
Ｋ）と表した文字列に、新たな参照番号を付加して辞書
に登録する。Next, go to 82, read the next code, and input it to C0DE.
Set as code. In S3, it is checked whether there is a new code, that is, whether the code input has ended, and then S3 is executed.
Proceeding to step 4, it is checked whether the code C0DE input in step S3 is defined (registered) in the dictionary. Normally, the input code word has been registered in the dictionary in the previous processing, so the process proceeds to S5 and the character string Code corresponding to the code C0DE is entered.
(ωK) is read from the dictionary, the character string K is temporarily stacked in S6, and the reference number code (ω) is added to the new C
Return to S5 again as 0DE, and this S5. The procedure in S6 is recursively repeated until the reference number ω reaches one character, and finally, the process advances to S7 and the stacked characters in S6 are L i L
Pop up and output in O (Lasj InFxst 0ut) format. At the same time, at 87, the code ω used last time is combined with the first character K of the character string restored this time (ω,
Add a new reference number to the character string expressed as K) and register it in the dictionary.

第１２図を参照して復号化処理を具体的に説明すると次
のようになる。The decoding process will be explained in detail with reference to FIG. 12 as follows.

まず第１２図で最初の入力符号は１であり、文字ａ、　
　ｂ、　　ｃについては既に参照番号１，２゜３として
第２表に示すように辞書に登録されているため、辞書の
参照により符号１に一致する参照番号の文字列ａに置き
換えて出力する。次の符号２についても同様にして文字
すに置き換えて出力する。このとき前回処理した符号と
今回復号した最初の一文字すとを組み合わせた１ｂ（＝
ａｂ）に新たな参照番号４を付加して辞書に登録する。First, in Fig. 12, the first input code is 1, and the character a,
Since b and c have already been registered in the dictionary as reference numbers 1, 2, and 3 as shown in Table 2, they are replaced by the character string a with the reference number matching 1 by reference to the dictionary and output. Similarly, the next code 2 is replaced with a letter S and output. At this time, 1b (=
ab) is added with a new reference number 4 and registered in the dictionary.

３番目の符号４は辞書の探索により１ｂからａｂと置き
換えて文字列ａｂを出力する。同時に前回処理した符号
２と今回復号した文字列の１番目の文字ａとを組み合せ
た文字列２ａ（＝ｂａ）を新たな参照番号５を付加して
辞書に登録する。The third code 4 replaces 1b with ab by searching the dictionary and outputs the character string ab. At the same time, a character string 2a (=ba), which is a combination of the previously processed code 2 and the first character a of the currently decoded character string, is added with a new reference number 5 and registered in the dictionary.

以下同様に、この処理を繰り返す。This process is repeated in the same manner.

第１２図の復号化では次の例外処理がある。The decoding shown in FIG. 12 involves the following exception handling.

この例外処理は、第６番目の入力符号８の復号で生ずる
。符号８は復号時に辞書に定義されておらず、復号でき
ない。この場合には、前回処理した符号５に前回復号し
た文字列ｂａの最初の一文字すを加えた文字列５ｂを求
め、さらに２ａｂ。This exception handling occurs in the decoding of the sixth input code 8. Code 8 is not defined in the dictionary at the time of decoding and cannot be decoded. In this case, a character string 5b is obtained by adding the first character of the previously decoded character string ba to the previously processed code 5, and then 2ab.

ｂａｂと置き換えられて出力される。そして、文字列の
出力語に前回の符号語５に今回復号した文字列の文字す
を加えた文字列５ｂに参照番号８を付加して辞書に登録
する。It is replaced with bab and output. Then, a reference number 8 is added to a character string 5b obtained by adding the characters of the character string just decoded to the previous code word 5 to the output word of the character string, and the result is registered in the dictionary.

この例外処理は第９図の復号化処理フローの８４、Ｓ８
の処理を通じて行われ、最終的にＳ７で文字列の出力と
新たな文字列に参照番号を付加して辞書への登録がＳ７
で行われる。This exception handling is performed at 84 and S8 in the decoding process flow in FIG.
Finally, in S7, the character string is output, a reference number is added to the new character string, and the new character string is registered in the dictionary.
It will be held in

なお、第８．９図の符号化／復号化処理は、同じ辞書を
作り出しながら行う。Note that the encoding/decoding process in FIG. 8.9 is performed while creating the same dictionary.

［発明が解決しようとする課題］従来のＬＺＷ符号は、過去に出現したデータを辞書に登
録し、登録済みの情報を使用して圧縮を行っている。し
かし、過去に出現したデータを全て辞書に登録すること
が効果的な辞書の登録法であると断言できない。例えば
、画像データの場合、オール白やオール黒パターンの出
現頻度が高い事は分かっており１、これらのデータを逐
次登録していたのでは、辞書登録がある程度進まないと
高い圧縮率を得ることができない問題がある。[Problems to be Solved by the Invention] In the conventional LZW code, data that has appeared in the past is registered in a dictionary, and the registered information is used to perform compression. However, it cannot be asserted that registering all data that has appeared in the past in a dictionary is an effective dictionary registration method. For example, in the case of image data, it is known that all-white and all-black patterns appear frequently1, and if these data are registered sequentially, a high compression rate cannot be obtained unless the dictionary registration progresses to a certain extent. I have a problem where I can't.

本発明は、このような従来の問題点に鑑みてなされたも
ので、辞書の初期登録により圧縮率を高めることのでき
るＬＺＷ符号によるデータ圧縮及び復元方式を提供する
ことを目的とする。The present invention has been made in view of such conventional problems, and an object of the present invention is to provide a data compression and restoration method using an LZW code that can increase the compression rate through initial dictionary registration.

［課題を解決するための手段］第１図は本発明の原理説明図である。[Means to solve the problem] FIG. 1 is a diagram explaining the principle of the present invention.

まず本発明は、符号化済データを相異なる部分列に分け
て、この部分列を辞書１０に登録しておき、入力データ
を辞書１０中の部分列の内、最大長一致するもの登録番
号で指定して符号化する符号化手段１００を備えたＬＺ
Ｗ符号によるデータ圧縮方式を対象とする。First, the present invention divides encoded data into different sub-sequences, registers these sub-sequences in the dictionary 10, and inputs the input data using the registration number of the sub-sequence in the dictionary 10 that has the maximum length. LZ equipped with encoding means 100 for specifying and encoding
The target is a data compression method using W code.

このようなデータ圧縮方式につき本発明にあっては、１
文字以上からなる文字列の組に予め登録番号を割り当て
て辞書１０に初期登録しておき、辞書１０を使用して符
号化手段１００により入力データを圧縮符号化すること
を特徴とする。Regarding such a data compression method, in the present invention, 1
The present invention is characterized in that a registration number is previously assigned to a set of character strings consisting of characters or more, and initial registration is made in a dictionary 10, and input data is compressed and encoded by an encoding means 100 using the dictionary 10.

具体的には、圧縮すべきデータ中において出現頻度が高
い文字を含む文字列に予め登録番号を割り当てて辞書１
０に初期登録する。また圧縮すべきデータ中において１
文字が複数連続する文字列、例えばａａ、ａａａ、ａａ
ａａ、ａａａａａ等にに予め登録番号を割り当てて辞書
１ｏに初期登録する。Specifically, registration numbers are assigned in advance to character strings that include characters that appear frequently in the data to be compressed, and
Initial registration is 0. Also, in the data to be compressed, 1
A string of consecutive characters, such as aa, aaa, aa
A registration number is assigned in advance to aa, aaaaa, etc., and initial registration is made in the dictionary 1o.

さらに本発明は、符号化済データを相異なる部分列に分
けて、この部分列を辞書１０に登録しておき、入力デー
タを辞書１０中の部分列の内、最大長一致するもの登録
番号で指定して符号化された符号語を、辞書１０を使用
して元のデータに復元する復号化手段２００を備えたデ
ータ復元方式を対象とする。Furthermore, the present invention divides the encoded data into different subsequences, registers these subsequences in the dictionary 10, and inputs the input data with the registration number of the subsequence in the dictionary 10 that has the maximum length. The present invention is directed to a data restoration method that includes a decoding means 200 that uses a dictionary 10 to restore a designated encoded code word to original data.

このようなデータ復元方式についても本発明にあっては
、１文字以上からなる文字列の組に予め登録番号を割り
当てて辞書１０に初期登録しておき、この辞書１ｏを使
用して復号化手段２００により入力データを復元するこ
とを特徴とする。In the present invention, such a data restoration method also assigns a registration number to a set of character strings consisting of one or more characters and registers them initially in the dictionary 10. 200 to restore input data.

この復号についても、同様に圧縮すべきデータ中におい
て出現頻度が高い文字を含む文字列に予め登録番号を割
り当てて辞書１０に初期登録する。For this decoding as well, registration numbers are assigned in advance to character strings that include characters that appear frequently in the data to be compressed, and are initially registered in the dictionary 10.

また圧縮すべきデータ中において１文字が複数連続する
文字列、例えばａ　ａ、　　ａ　ａ　ａ、　　ａ　ａ　
ａ　ａ。Also, in the data to be compressed, character strings containing multiple consecutive characters, such as a a, a a a, a a
a a.

ａａａａａ等にに予め登録番号を割り当てて辞書１０に
初期登録する。A registration number is assigned in advance to aaaaa, etc., and initial registration is made in the dictionary 10.

［作用］このような構成を備えた本発明のデータ圧縮及び復元方
式によれば、予め出現頻度の高い文字列、例えば同じ文
字が連続する文字列を、辞書の初期化時に個数に応じて
登録番号に割り当てておき、登録済みの文字列λが出現
した場合には、登録番号を使用して圧縮処理を行うこと
により高圧縮を実現できる。[Operation] According to the data compression and decompression method of the present invention having such a configuration, character strings with a high frequency of appearance, for example, character strings in which the same characters are consecutive, are registered in advance according to the number at the time of initializing the dictionary. If a registered character string λ is assigned to a number and a registered character string λ appears, high compression can be achieved by performing compression processing using the registration number.

［実施例］第２図は本発明の一実施例を示した実施例構成図である
。[Embodiment] FIG. 2 is a block diagram showing an embodiment of the present invention.

第２図において、１２は制御手段としてのＣＰＵであり
、ＣＰＵ１２に対してはプログラムメモリ１４とデータ
メモリ２６が接続される。In FIG. 2, 12 is a CPU as a control means, and a program memory 14 and a data memory 26 are connected to the CPU 12.

プログラムメモリ１４にはコントロールソフト１６、Ｌ
ＺＷ符号を用いた最大一致長検索を行なう最大一致長検
索ソフト１８、入力文字列をＬＺＷ符号に変換する符号
化ソフト２０、符号化ソフト２０でＬＺＷ符号に変換さ
れた符号を元の文字列に復元する復号化ソフト２２、及
び処理対象となる１文字のそれぞれに加えて使用頻度の
高い文字列λ、例えばこの実施例にあっては文字列λと
して同一文字ａが２〜１５個連続した各文字列に登録番
号を付けて初期登録する初期登録ソフト２４を備える。The program memory 14 contains control software 16, L
A maximum match length search software 18 that performs a maximum match length search using a ZW code, an encoding software 20 that converts an input character string to an LZW code, and a code converted to an LZW code by the encoding software 20 to the original character string. In addition to the decoding software 22 to be restored and each character to be processed, frequently used character strings λ, for example, in this embodiment, each character string λ consisting of 2 to 15 consecutive characters a is used as the character string λ. Initial registration software 24 is provided for initial registration by attaching a registration number to a character string.

一方、データメモリ２６には、これから符号化しようと
する文字列或いはこれから復号化しようとする符号列を
格納するデータバッファ２８と、ＬＺＷ符号を対象とし
た符号化及び復号化の際に逐次作成されながら使用され
る辞書１０を備える。On the other hand, the data memory 26 includes a data buffer 28 for storing a character string to be encoded or a code string to be decoded, and a data buffer 28 for storing a character string to be encoded or a code string to be decoded. It is equipped with a dictionary 10 that can be used while

この第２図の実施例における本発明による辞書の初期登
録の次のようにして行われる。The initial registration of the dictionary according to the present invention in the embodiment of FIG. 2 is performed as follows.

まず、ＣＰＵ１２はコントロールソフト１６による制御
のもとに初期登録ソフト２４を起動し、辞書１０の初期
登録を行なう。即ち、ＣＰＵ１２は初期登録ソフト２４
に基づき、処理対象となる文字種における１文字のそれ
ぞれに登録番号（参照番号）を付けて辞書１０に初期登
録する。First, the CPU 12 starts the initial registration software 24 under the control of the control software 16 to perform initial registration of the dictionary 10. That is, the CPU 12 uses the initial registration software 24.
Based on this, each character in the character type to be processed is assigned a registration number (reference number) and initially registered in the dictionary 10.

説明を簡単にするため処理対象としてａ、　　ｂ。To simplify the explanation, we will use a and b as processing targets.

Ｃの３文字を考えると、第５図の辞書構成説明図に示す
ように参照番号１，２．３を付けて文字ａ。Considering the three letters C, the reference numbers 1, 2, and 3 are added to the letter a as shown in the dictionary structure explanatory diagram of FIG.

ｂ、ｃが初期登録される。b and c are initially registered.

これに加えて本発明にあっては、出現頻度の高い文字列
λとして文字ａが２〜１０個連続した文字列を第５図の
辞書構成に示すように、ａａ、ａａａ、”　　・、　　
ａａａａａａａａａａに登録番号４〜１２を付けて辞書
に初期登録するる。尚、第５図では、ａＸ２．ａＸ３．
　　・争・ａ×１０として示している。In addition, in the present invention, character strings consisting of 2 to 10 consecutive characters a are used as frequently occurring character strings λ, as shown in the dictionary structure of FIG. 5, such as aa, aaa, ".
Add registration numbers 4 to 12 to aaaaaaaaaa and initially register it in the dictionary. In addition, in FIG. 5, aX2. aX3.
・Dispute・It is shown as a×10.

このような辞書１０に対する初期登録が済んだならば、
データメモリ２６のデータバッファ２８に対しては所望
の入力データ又は符号データを格納し、符号化ソフト２
０による符号化処理或いは復号化ソフト２２に基づく復
号処理が行われる。Once the initial registration for such dictionary 10 is completed,
Desired input data or encoded data is stored in the data buffer 28 of the data memory 26, and the encoding software 2
Encoding processing based on 0 or decoding processing based on decoding software 22 is performed.

本発明によるＬＺＷ符号の符号化アルゴリズムを第３図
に示し、復号化アルゴリズムを第４図にポす。The encoding algorithm of the LZW code according to the present invention is shown in FIG. 3, and the decoding algorithm is shown in FIG.

ここで第５図の入力文字を対象として第３図のＬＺＷ符
号の符号化を説明すると、まずＳｌで文字コードｉを辞
書アドレスｉに１文字ずつ登録する初期登録に加え、予
め出現頻度が高い文字をλとし、文字λを１つだけでな
く、λの個数に応じて複数通りの文字列を辞書に登録し
ておく。例えば文字ａ、　　ｂ、　　ｃを対象とした第
６図の辞書構成の例では、文字ａを１個から１０個まで
の連続する文字列の各々を登録している。To explain the encoding of the LZW code shown in Figure 3 using the input characters shown in Figure 5, first, in addition to the initial registration in which character code i is registered one character at a time to dictionary address i in Sl, A character is assumed to be λ, and not only one character λ but also a plurality of character strings are registered in the dictionary according to the number of λ. For example, in the example of the dictionary structure shown in FIG. 6 which targets the characters a, b, and c, each of consecutive character strings from 1 to 10 characters a is registered.

Ｓｌの初期登録が済んだならば、Ｓ２〜Ｓ７の処理によ
り初期登録した文字λが複数連続した場合の処理を行う
。即ち、Ｓ３でに＝λを判別してＳ４で文字λの連続個
数を示すλカウンタを１つインクリメントし、入力文字
Ｋが文字λでな（なるまで繰り返す。即ち、第５図の１
番目の入力文字ａは文字λであることから、Ｓ３から８
４に進んでλカウンタを１つインクリメントする。次の
入力文字すは文字λでないことから８５に進み、このと
きλカウンタ＝１であることから８６へ進んでλカウン
タの値を符号コードＣ０ＤＥ　（λ）＝Ｃ０ＤＥ２とし
て出力し、Ｓ７で２番目に入力した文字すを新たな語頭
文字列ωとする。Once the initial registration of Sl has been completed, processing for a case where a plurality of initially registered characters λ are consecutive is performed by processing in S2 to S7. That is, in S3, = λ is determined, and in S4, the λ counter indicating the number of consecutive characters λ is incremented by one, and the process is repeated until the input character K is not the character λ.
Since the th input character a is the character λ, S3 to 8
Proceed to step 4 and increment the λ counter by one. Since the next input character is not a character λ, the process advances to 85, and since the λ counter = 1 at this time, the process advances to 86, where the value of the λ counter is output as the code code C0DE (λ) = C0DE2, and in S7, the second Let the characters inputted in ``S'' be a new initial character string ω.

次の８８、Ｓ９、ＳＬｌ、８１２．Ｓ１３及び８２２は
第８図に示した従来のＬＺＷ符号化処理と同じであり、
カッコ内に第８図の符号を示している。Next 88, S9, SLl, 812. S13 and 822 are the same as the conventional LZW encoding process shown in FIG.
The symbols in FIG. 8 are shown in parentheses.

このような従来Ｓ同じＬＺＷ符号化の処理中に、本発明
にあっては、新たにＳＩＯから分岐して８１４〜Ｓ２１
の処理に至る出現頻度の高い文字λの処理が入る。During such LZW encoding processing, which is the same as in the conventional S, the present invention newly branches from SIO and processes 814 to S21.
Processing for the character λ with high frequency of appearance is included.

具体的には、Ｓ７で２番目に入力した文字すを語頭文字
列ωとした後に、Ｓ８で３番目の文字ａを読み、Ｓ９を
介してＳ１０でに＝λが判別されてＳ１４の処理に進む
。Specifically, after the second character inputted in S7 is set as the initial character string ω, the third character a is read in S8, and =λ is determined in S10 via S9, and the processing in S14 is performed. Proceed to.

Ｓ１４にあっては、そのときの語頭文字列ω＝ｂを符号
コードＣ０ＤＥ　（ω）　＝Ｃ０ＤＥ２として出力した
後、Ｓ１５でλカウンタを１つインクリメントし、Ｓ１
６で次の文字すを読み、３１８を介してＳ１９でに＝λ
を判別し、この場合には文字すはλでないことから３２
０で符号コードとしてｃｏｄｅ（λ）　＝ｃｏｄｅｌを
出力し、λ＝０にリセットしした後に再びＳ７に戻る。In S14, the initial character string ω=b at that time is output as the code code C0DE (ω) =C0DE2, and then the λ counter is incremented by one in S15.
Read the next character at 6, then go to S19 via 318 =λ
In this case, since the character S is not λ, 32
0, code(λ) = codel is output as the code, and after resetting λ=0, the process returns to S7.

これに対し第５図の１３番目の入力文字からはλとして
の文字ａが７つ連続することから、この場合には８１５
〜１９の処理の繰り返しによりλカウンタの計数値λ＝
７が得られ、Ｓ２０において符号コードとしてｃｏｄｅ
　９を出力するようになる。On the other hand, from the 13th input character in Figure 5, there are seven consecutive characters a as λ, so in this case, 815
By repeating the process of ~19, the count value of the λ counter λ=
7 is obtained, and in S20, code
9 will be output.

ここで１０個以上の文字ａが連続する文字列の場合、例
えば１５個のａが連続する場合は、参照番号１２．７の
２つ符号コードを使って表現する。Here, in the case of a character string in which 10 or more characters a are consecutive, for example, in the case where 15 characters a are consecutive, it is expressed using a two-code code with reference number 12.7.

また、本発明のデータ符号方式では、符号処理の途中で
λに相当する文字が出現した場合、Ｓ１４においてそれ
までの文字列ωを符号コードとじて出力し、文字列ωに
＝ωλの辞書登録は行わない。In addition, in the data encoding system of the present invention, if a character corresponding to λ appears during encoding processing, the character string ω up to that point is output as a code code in S14, and the character string ω is registered in the dictionary with =ωλ. will not be carried out.

次に第４図の本発明による復号処理を説明する。Next, the decoding process according to the present invention shown in FIG. 4 will be explained.

第４図の復号処理にあっては、５ｌ−Ｓ４．Ｓ８及びＳ
９が出現頻度の高い文字λの連続個数を示す符号コード
を復号するために設けられており、それ以外の処理は第
９図に示した従来の復号化処理と同じであり、括弧内の
符号で対応関係を示す。In the decoding process of FIG. 4, 5l-S4. S8 and S
9 is provided to decode the code indicating the consecutive number of characters λ that appear frequently, and the other processing is the same as the conventional decoding processing shown in Fig. 9. indicates the correspondence relationship.

いま第７図に示す入力符号コードを復元する場合を例に
とると、１２番目の符号コードまでは従来の復号と同じ
である。但し、符号コードｃｏｄｅ　ｌについては、Ｓ
４又はＳ９でλ＝ａを１個分出力している。Taking as an example the case of decoding the input code shown in FIG. 7, decoding up to the 12th code is the same as the conventional decoding. However, for the code code S
4 or S9, one λ=a is output.

第７図の１３番目の符号コードｃｏｄｅ　９については
、Ｓ８でλコードであることが判別されてＳ９に進み、
符号コードの値９による辞書の参照により７個分の文字
ａを出力する。Regarding the 13th code code 9 in FIG. 7, it is determined in S8 that it is a λ code, and the process proceeds to S9.
By referencing the dictionary using the code code value 9, seven characters a are output.

尚、上記の実施例にあっては、初期登録する文字列とし
て同じ文字が複数連続する文字列を例にとるものであっ
たが、本発明はこれに限定されず、出現頻度が高い文字
列であれば適宜の文字列を初期登録するようにしてもよ
い。In the above embodiment, a character string in which multiple identical characters are consecutive is taken as an example of a character string to be initially registered, but the present invention is not limited to this, and character strings that appear frequently If so, an appropriate character string may be initially registered.

［発明の効果］以上説明したように本発明によれば、文字列の使用頻度
を考慮して辞書に予め複数の個数を表現した文字列を登
録使用しておき、この辞書を使用して圧縮復元を行う事
により、ＬＺＷ符号のデータ圧縮の圧縮率を更に高い圧
縮率とすることができる。[Effects of the Invention] As explained above, according to the present invention, a character string expressing a plurality of numbers is registered in advance in a dictionary in consideration of the frequency of use of the character string, and the dictionary is used to compress the string. By performing restoration, the compression rate of data compression using the LZW code can be made even higher.

[Brief explanation of drawings]

第１図は本発明の原理説明図；第２図は本発明の実施例構成図；第３図は本発明によるＬＺＷ符号の符号化の処理フロー
図；第４図は本発明によるＬＺＷ符号の復号の処理フロー図
；対５図は本発明によるＬＺＷ符号の符号化説明図；第６
図は本発明による辞書構成説明図；第７図は本発明によ
るＬＺＷ符号の復元説明図；第８図は従来のＬＺＷ符号
化処理フロー図：第９図は従来のＬＺＷ復号化処理フロ
ー図；第１０図は従来のＬＺＷ符号化説明図；第１１図
は従来の辞書構成例の説明図；第１２図は従来のＬＺＷ
復号説明図である。図中、１０：辞書１２　二　ＣＰＵ１４ニブログラムメモリ１６：コントロールソフト１８：最大−成長検索ソフ２０：符号化ソフト２２：復号化ソフト２４：初期登録ソフト２６・データメモリ２８：データバッファ１００：符号化手段２００：復号化手段ト杢姥明の界理課明図第１図第２図装本のＬＺＷ彎芳１０閂王！７０−口第８図本発明によるし第工ＺＷ符号の復元説明図７図促禾ｎＬＡＷ付今Ｋｆｆｉ石明園第１０図駿禾の碍管１１ηｆ１のろｉ１圓第１１図は禾内ＬＺＷ慄乃１こ故明口第１２図Fig. 1 is a diagram explaining the principle of the present invention; Fig. 2 is a configuration diagram of an embodiment of the present invention; Fig. 3 is a processing flow diagram of LZW code encoding according to the present invention; Fig. 4 is a diagram showing the encoding process of LZW code according to the present invention Decoding processing flow diagram; Figure 5 is an explanatory diagram of LZW code encoding according to the present invention; Figure 6
Figure 7 is an explanatory diagram of the dictionary configuration according to the present invention; Figure 7 is an explanatory diagram of LZW code restoration according to the present invention; Figure 8 is a flowchart of a conventional LZW encoding process; Figure 9 is a flowchart of a conventional LZW decoding process; Fig. 10 is an explanatory diagram of conventional LZW encoding; Fig. 11 is an explanatory diagram of a conventional dictionary configuration example; Fig. 12 is an explanatory diagram of conventional LZW encoding.
It is a decoding explanatory diagram. In the figure, 10: Dictionary 12 2 CPU 14 Niprogram memory 16: Control software 18: Maximum-growth search software 20: Encoding software 22: Decoding software 24: Initial registration software 26/data memory 28: Data buffer 100: Encoding means 200: decoding means Moku Umei's Kairi Kamei Diagram 1 Figure 2 Book's LZW Kaifang 10 Bar King! 70-mouth Fig. 8 Explanation of restoration of the ZW code according to the present invention Fig. 7 Prompt nLAW attached now Kffi Sekimeien Fig. 10 Sunhe's insulating tube 11 ηf1 Noro i1 circle Fig. 11 is the internal LZW Kffi Therefore Mingkou Figure 12

Claims

[Claims]

(1) Divide the encoded data into different subsequences, register the subsequences in the dictionary (10), and input the input data to the subsequence with the maximum length matching among the subsequences in the dictionary (10). In a data compression method equipped with an encoding means (100) that specifies and encodes with a registration number, a registration number is assigned in advance to a set of character strings consisting of one or more characters, and initial registration is made in the dictionary (10). Then, select the corresponding dictionary (1
0) is used to compress and encode input data by the encoding means (100).

(2) In the data compression method according to claim 1, a registration number is assigned in advance to a character string including characters that appear frequently in the data to be compressed, and the character string is initially registered in the dictionary (10). data compression method.

(3) The data compression method according to claim 1, characterized in that in the data to be compressed, a character string containing a plurality of consecutive characters is assigned a registration number in advance and is initially registered in the dictionary (10). Data compression method.

(4) Divide the encoded data into different subsequences, register the subsequences in the dictionary (10), and input the input data to the subsequence with the maximum length matching among the subsequences in the dictionary (10). The code word specified by the registration number and encoded is stored in the dictionary (10).
decryption means (200) for restoring the original data using
In a data restoration method equipped with the dictionary (10
), and the input data is restored by the decoding means (100) using the dictionary (10).

(5) In the data restoration method according to claim 1, a registration number is assigned in advance to a character string including characters that appear frequently in the data to be compressed, and the character string is initially registered in the dictionary (10). Data recovery method.

(6) In the data restoration method according to claim 1, a registration number is assigned in advance to a string of consecutive single characters in the data to be compressed, and the string is initially registered in the dictionary (10). Data recovery method.