JPH0683573A

JPH0683573A - Data compression system

Info

Publication number: JPH0683573A
Application number: JP3056704A
Authority: JP
Inventors: Hirotaka Chiba; 広隆千葉; Yoshiyuki Okada; 佳之岡田; Shigeru Yoshida; 茂吉田; Yasuhiko Nakano; 泰彦中野
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1991-03-20
Filing date: 1991-03-20
Publication date: 1994-03-25
Anticipated expiration: 2014-09-20
Also published as: JP2952067B2

Abstract

PURPOSE:To attain the coding properly corresponding to the requirement of high speed processing and high compression by generating a Hash address resulting from a reference number of a partial string in addition to the information of an element of an input character. CONSTITUTION:A coding means 2 divides a coded character string into different partial strings, adds a different reference number from each section and registers the result in a dictionary 1, and compresses data by the coding designated by a reference number of a partial string whose maximum length is coincident with that among partial strings in the dictionary 1 from the input retrieval character string. A dictionary retrieval means 3 uses the external Hash mesh method for the retrieval of the partial string, a Hash address resulting from adding information Km extracted from the element of an input character K to a reference number (i) of the partial string registered in the dictionary 1 is generated to generate a contact list with a division number in response to the bit number of additional information Km for the retrieval of the dictionary 1. Thus, the dynamic data compression processing suitable for high compression rate processing or high speed processing is properly implemented.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、ユバーサル符号化の一
種である増分分解型の改良としてのＬＺＷ符号化による
デ−タ圧縮方式に関する。近年、文字コ−ド、ベクトル
情報、画像など様々な種類のデ−タがコンピュ−タで扱
われるようになっており、扱われるデ−タ量も急速に増
加してきている。大量のデ−タを扱うときは、デ−タの
中の冗長な部分を省いてデ−タ量を圧縮することで、記
憶容量を減らしたり、速く伝送したりできるようにな
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a data compression method by LZW coding as an improvement of an incremental decomposition type which is a kind of universal coding. In recent years, various types of data such as character codes, vector information, and images have been handled by computers, and the amount of handled data has been increasing rapidly. When handling a large amount of data, the redundant portion of the data is omitted and the amount of data is compressed so that the storage capacity can be reduced or the data can be transmitted at high speed.

【０００２】このような様々なデ−タを１つの方式でデ
−タ圧縮できる方法としてユニバ−サル符号化が提案さ
れている。ここで、本発明の分野は、文字コ−ドの圧縮
に限らず、様々なデ−タに適用できるが、以下では、情
報理論で用いられている呼称を踏襲し、デ−タの１ワー
ド単位を文字と呼び、デ−タが複数ワードツながったも
のを文字列と呼ぶことにする。Universal coding has been proposed as a method of compressing various kinds of data by one method. Here, the field of the present invention is not limited to compression of character codes, but can be applied to various data, but in the following, one word of data will be used, following the name used in information theory. A unit is called a character, and a unit of data consisting of a plurality of words is called a character string.

【０００３】ユニバ−サル符号の代表的な方法として、
ジブーレンペル（Ziv-Lempel）符号がある（詳しくは、
例えば、宗像「Ziv-Lempelのデ−タ圧縮法」、情報処
理、Vol.26,No.1,1985年を参照のこと）。ジフーレンペ
ル符号では、ユニバ−サル型増分分解型（Incremental parsing ）の２つのアルゴリズムが提案されている。As a typical method of the universal code,
There is a Ziv-Lempel code (for details,
For example, see Munakata “Ziv-Lempel Data Compression Method”, Information Processing, Vol. 26, No. 1, 1985). Two algorithms of universal type and incremental decomposition type (Incremental parsing) have been proposed for the Differenpel code.

【０００４】更に、ユニバ−サル型アルゴリズムの改良
として、ＬＺＳＳ符号がある（T.C.Bell, “Better OPM
/L Text Compression ”,IEEE Trans. on Commun.,Vol.
COM-34,No.12,DEC.1986 参照）。また、増分分解型アル
ゴリズムの改良としては、ＬＺＷ（Lempel-Ziv-Welch）
符号がある（T.A.Welch,“A Technique for High-Perfo
rmance Data Compression ”,Computer,June 1984 参
照）。Further, as an improvement of the universal type algorithm, there is LZSS code (TCBell, "Better OPM").
/ L Text Compression ”, IEEE Trans. On Commun., Vol.
See COM-34, No. 12, DEC. 1986). As an improvement of the incremental decomposition type algorithm, LZW (Lempel-Ziv-Welch)
Signed (TAWelch, “A Technique for High-Perfo
rmance Data Compression ”, Computer, June 1984).

【０００５】これらの符号の内、高速処理ができること
と、アルゴリズムの簡単さからＬＺＷ符号が記憶装置の
ファイル圧縮などで使われるようになっている。Among these codes, the LZW code has come to be used for file compression of a storage device because of its high-speed processing and the simplicity of the algorithm.

【０００６】[0006]

【従来の技術】従来のＬＺＷ符号による符号化処理フロ
ーを図９に示し、復号化処理フローを図１０に示す。ま
ずＬＺＷ符号化処理は、書き替え可能な辞書を持ち、入
力文字列の中を相異なる文字列（部分列）に分け、この
文字列を出現した順に参照番号を付けて辞書に登録する
と共に、現在入力している文字列を、辞書に登録してあ
る最長一致文字列の参照番号で表して符号化するもので
ある。2. Description of the Related Art FIG. 9 shows an encoding process flow using a conventional LZW code, and FIG. 10 shows a decoding process flow. First, the LZW encoding process has a rewritable dictionary, divides the input character string into different character strings (substrings), adds reference numbers in the order in which they appear, and registers them in the dictionary. The character string currently input is represented by the reference number of the longest matching character string registered in the dictionary and is encoded.

【０００７】図１１にＬＺＷ符号化の説明図を示すと共
に図１３にＬＺＷ復号化の説明図を示し、更に図１２に
符号化及び復号化時に作成される辞書構成例を示す。
尚、図１１，１２，１３にあっては説明を簡単にするた
め、ａｂｃの３文字の組合せからなるデ―タを圧縮、復
元する場合の例を取り上げている。図９のＬＺＷ符号化
処理では、まずステップＳ１で予め辞書に全文字につき
一文字からなる文字列を初期値として登録してから符号
化を始める。FIG. 11 shows an explanatory diagram of LZW encoding, FIG. 13 shows an explanatory diagram of LZW decoding, and FIG. 12 shows an example of a dictionary structure created at the time of encoding and decoding.
Note that, in FIGS. 11, 12, and 13, for simplification of description, an example of compressing and restoring data consisting of a combination of three letters abc is taken. In the LZW encoding process of FIG. 9, first, in step S1, a character string consisting of one character for every character is registered in the dictionary in advance as an initial value, and then encoding is started.

【０００８】ステップＳ１の符号化は入力した最初の文
字Ｋにより辞書を検索して参照番号ωを求め、これを語
頭文字列とする。次にステップＳ２で入力データの次の
文字Ｋを読込み、ステップＳ３で文字入力が終了したか
否かチェックした後、ステップＳ４に進んでステップＳ
１で求めた語頭文字列ωにステップＳ２で読込んだ文字
Ｋを加えた拡張文字列（ωＫ）が辞書にあるか否か探
す。In the encoding in step S1, a reference number ω is obtained by searching the dictionary with the input first character K, and this is used as the initial character string. Next, in step S2, the next character K of the input data is read, and in step S3 it is checked whether or not the character input is completed. Then, the process proceeds to step S4 and step S4.
It is searched whether or not the extended character string (ωK) obtained by adding the character K read in step S2 to the initial character string ω obtained in 1 is in the dictionary.

【０００９】ステップＳ４で文字列（ωＫ）が辞書にな
ければ、ステップＳ６に進んでステップＳ１で求めた文
字Ｋの参照番号ωを符号語code（ω）として出力し、ま
た文字列（ωＫ）に新たな参照番号を付加して辞書に登
録し、更にステップＳ２の入力文字Ｋを参照番号ωに置
き換えると共に辞書アドレスｎをインクリメントしてス
テップＳ２に戻って次の文字Ｋを読み込む。If the character string (ωK) is not in the dictionary in step S4, the process proceeds to step S6, the reference number ω of the character K obtained in step S1 is output as the code word code (ω), and the character string (ωK) is also output. A new reference number is added to and registered in the dictionary, the input character K in step S2 is replaced with the reference number ω, the dictionary address n is incremented, and the process returns to step S2 to read the next character K.

【００１０】一方、ステップＳ４で文字列（ωＫ）が辞
書にあればステップＳ５で文字列（ωＫ）を参照番号ω
に置き換え、再びステップＳ２に戻ってステップＳ４で
文字列（ωＫ）が辞書から探せなくなるまで最大一致長
の検索を続ける。On the other hand, if the character string (ωK) is found in the dictionary at step S4, the character string (ωK) is referred to as reference number ω at step S5.
, And the process returns to step S2 and the search for the maximum matching length is continued until the character string (ωK) cannot be searched from the dictionary in step S4.

【００１１】図１１，１２を参照してＬＺＷ符号化を具
体的に説明すると次のようになる。まず図１１の入力デ
ータinput は左から右へと読む。最初の文字ａを入力し
た時、辞書には文字ａの他に一致する文字列がないの
で、OUTPUT CODE １（参照番号ω）を符号語して出力す
る。そして文字ａを語頭文字列ωとする。The LZW coding will be described in detail with reference to FIGS. First, the input data input in FIG. 11 is read from left to right. When the first character a is input, since there is no matching character string other than the character a in the dictionary, OUTPUT CODE 1 (reference number ω) is coded and output. Then, the letter a is set to the initial letter string ω.

【００１２】次に２番目の文字ｂを入力したとすると、
この入力文字を語頭文字列ωに加えた拡張文字列ωＫ＝
ａｂは辞書にないことから、文字ｂのOUTPUT CODE ２を
符号語として出力する。そして、拡張文字列ωＫ＝ａｂ
に参照番号４を付けて辞書に登録する。実際の辞書登録
は図１２の右側に示すように文字列１ｂとして登録され
る。そして文字ｂが語頭文字列ωとなる。Next, if the second character b is input,
Extended character string ωK = which adds this input character to the initial character string ω
Since ab is not in the dictionary, OUTPUT CODE 2 of the character b is output as a code word. Then, the extended character string ωK = ab
Is registered in the dictionary with reference numeral 4. The actual dictionary registration is registered as a character string 1b as shown on the right side of FIG. Then, the letter b becomes the initial letter string ω.

【００１３】続いて３番目の文字ａを入力したとする
と、文字ｂに語頭文字列ωを加えた拡張文字列ωＫ＝ｂ
ａ＝２ａは辞書にないことから、文字ａのOUTPUT CODE
1 を符号語として出力した後、拡張文字列ωＫ＝ｂａを
２ａで表わし、参照番号５を付けて辞書に登録する。そ
して文字ａが新たな語頭文字列ωとなる。４番目の入力
文字ｂについては拡張文字列ωＫ＝ａｂは１ｂの符号語
４として既に辞書に登録されているので、文字列ωＫを
新たな語頭文字列ωとし、５番目の文字ｃを入力して拡
張文字列ωＫ＝４ｃ＝ａｂｃを作る。この拡張文字列ω
Ｋ＝ａｂｃは辞書に登録されていないことから、文字列
ａｂ＝１ｂのOUTPUT CODE4 を符号語として出力し、拡
張文字列ωＫ＝ａｂｃを辞書に４ｃの形で符号語６とし
て登録する。以下同様に、この処理を続ける。Next, if the third character a is input, an extended character string ωK = b obtained by adding the initial character string ω to the character b.
Since a = 2a is not in the dictionary, the OUTPUT CODE of the character a
After outputting 1 as a code word, the extended character string ωK = ba is represented by 2a, and the reference number 5 is attached to register it in the dictionary. Then, the character a becomes a new initial character string ω. As for the fourth input character b, the extended character string ωK = ab is already registered in the dictionary as the code word 4 of 1b, so the character string ωK is set as a new initial character string ω and the fifth character c is input. Then, the extended character string ωK = 4c = abc is created. This extended string ω
Since K = abc is not registered in the dictionary, OUTPUT CODE4 of the character string ab = 1b is output as a code word, and the extended character string ωK = abc is registered in the dictionary as a code word 6 in the form of 4c. This process is continued in the same manner thereafter.

【００１４】図１０の復号化処理は図９の符号化の逆の
操作を行う。図１０のＬＺＷ復号化では、符号化時と同
様に予め辞書に全文字につき一文字からなる文字列を初
期値として登録してから復号化を始める。まずステップ
Ｓ１で最初の符号（参照番号）を読込み、現在のCODEを
OLDcodeとし、最初の符号は既に辞書に登録された一文
字の参照番号いずれかに該当することから、入力符号CO
DEに一致する文字code(K) を探し出し、文字Ｋを出力す
る。The decoding process of FIG. 10 performs the reverse operation of the encoding of FIG. In the LZW decoding of FIG. 10, similarly to the case of encoding, a character string consisting of one character for all characters is registered in the dictionary in advance as an initial value and then decoding is started. First, in step S1, the first code (reference number) is read and the current CODE is
Input code CO because the first code corresponds to any one-character reference number already registered in the dictionary.
Find the character code (K) that matches DE and output the character K.

【００１５】尚、出力した文字Ｋは後の例外処理のため
FINchar にセットしておく。次にステップＳ２に進んで
次の符号を読込んでCODEにINcodeとしてセットする。ス
テップＳ３で新たな符号があるか否か、即ち符号入力の
終了の有無をチェックしてステップＳ４に進み、ステッ
プＳ３で入力された符号CODEが辞書に定義（登録）され
ているか否かチェックする。通常、入力した符号語は前
回までの処理で辞書に登録されているため、ステップＳ
５に進んで符号CODEに対応する文字列code（ωＫ）を辞
書から読出し、ステップＳ６で文字Ｋを一時的にスタッ
クし、参照番号CODE（ω）を新な符号CODEとして再度ス
テップＳ５に戻り、このステップＳ５，ステップＳ６の
手順を再帰的に参照番号ωが一文字Ｋに至るまで繰り返
し、最後にステップＳ７に進んでステップＳ６でスタッ
クした文字をＬＩＦＯ（Last In Fast Out) 形式でポッ
プアップして出力する。同時にステップＳ７において、
前回使った符号ωと今回復元した文字列の最初の１文字
Ｋを組（ωＫ）と表した文字列に、新たな参照番号を付
加して辞書に登録する。The output character K is for exception processing later.
Set to FINchar. Next, in step S2, the next code is read and set as CODE in INcode. In step S3, it is checked whether or not there is a new code, that is, whether or not the code input has been completed, and the process proceeds to step S4. In step S3, it is checked whether the code CODE input in step S3 is defined (registered) in the dictionary. . Normally, the input codeword is registered in the dictionary by the processing up to the previous time, so step S
5, the character string code (ωK) corresponding to the code CODE is read from the dictionary, the character K is temporarily stacked in step S6, the reference number CODE (ω) is set as a new code CODE, and the process returns to step S5. The procedure of steps S5 and S6 is recursively repeated until the reference number ω reaches one character K, and finally, the process proceeds to step S7, and the characters stacked in step S6 are popped up in the LIFO (Last In Fast Out) format and output. To do. At the same time, in step S7,
A new reference number is added to the character string in which the code ω used last time and the first character K of the character string restored this time are represented as a set (ωK) and registered in the dictionary.

【００１６】図１３を参照してＬＺＷ復号化処理を具体
的に説明すると次のようになる。まず図１３で最初の入
力符号語(INPUT CODE)は１であり、一文字ａ，ｂ，ｃに
ついては既に参照番号１，２，３として図１２に示すよ
うに辞書に登録されているため、辞書の参照により符号
語１に一致する参照番号の文字列ａに置き換えて出力す
る。The LZW decoding process will be described in detail with reference to FIG. First, in FIG. 13, the first input code word (INPUT CODE) is 1, and the characters a, b, and c are already registered in the dictionary as reference numbers 1, 2, and 3 as shown in FIG. Is output by replacing with the character string a of the reference number that matches the code word 1.

【００１７】次の符号語２についても同様にして文字ｂ
に置き換えて出力する。このとき前回処理した符号語１
と今回復号した文字列の１番目の文字ｂとを組合わせた
文字列ωＫ＝１ｂに新たな参照番号４を付加して辞書に
登録する。３番目の符号語４は辞書の検索により求めた
文字列１ｂから文字列ａｂと置き換えて文字列ａｂを出
力する。同時に前回処理した符号語２と今回復号した文
字列の１番目の文字ａとの組合せた文字列ωＫ＝２ａ
（＝ｂａ）に新たな参照番号５を付加して辞書に登録す
る。Similarly for the next code word 2, the character b
Replace with and output. Codeword 1 processed last time
A new reference number 4 is added to the character string ωK = 1b, which is a combination of the first character b of the character string decoded this time, and registered in the dictionary. The third code word 4 replaces the character string 1b obtained by searching the dictionary with the character string ab and outputs the character string ab. At the same time, a character string ωK = 2a obtained by combining the previously processed codeword 2 and the first character a of the character string decoded this time
A new reference number 5 is added to (= ba) and registered in the dictionary.

【００１８】以下同様に、この処理を繰り返す。Similarly, this process is repeated.

【００１９】図１３のＬＺＷ復号化では次の例外処理が
ある。この例外処理は、第６番目の入力符号語８の復号
で生ずる。符号語８は復号時に辞書に定義されておら
ず、復号できない。この場合には、前回処理した符号語
５に前回復号した文字列ｂａの最初の一文字ｂを加えた
文字列５ｂを求め、更に５ｂ＝２ａｂ＝ｂａｂと置き換えて出力する例外処理を行う。そして、文字列
の出力後に前回の符号語５に今回復号した文字列の１番
目の文字ｂを加えた文字列５ｂに参照番号８を付加して
辞書に登録する。The LZW decoding of FIG. 13 has the following exception processing. This exception processing occurs in the decoding of the sixth input codeword 8. Codeword 8 is not defined in the dictionary at the time of decoding and cannot be decoded. In this case, a character string 5b is obtained by adding the first character b of the previously decoded character string ba to the code word 5 processed last time, and further replaced with 5b = 2ab = bab to perform exceptional processing. Then, after the character string is output, the reference number 8 is added to the character string 5b obtained by adding the first character b of the character string decoded this time to the previous code word 5 and registered in the dictionary.

【００２０】この例外処理は、図１０の復号化処理フロ
ーのステップＳ４，ステップＳ８の処理を通じて行わ
れ、最終的にステップＳ７で文字列の出力と新たな文字
列に参照番号を付加した辞書への登録がステップＳ７で
行われる。尚、図１０，１３のＬＺＷ復号化は、復号側
で符号を解読しながら辞書をリアルタイムで作り出す場
合を説明したが、符号化の際に作られた辞書をそのまま
復号化側にコピーとして使用することで符号化しても良
い。この場合に復号化側での例外処理は不要になる。This exception processing is performed through the processing of steps S4 and S8 of the decoding processing flow of FIG. 10, and finally, in step S7, the output of the character string and the addition of a reference number to the new character string are made to the dictionary. Is registered in step S7. Note that the LZW decoding of FIGS. 10 and 13 has been described as a case where a dictionary is created in real time while decoding the code on the decoding side, but the dictionary created at the time of encoding is used as it is as a copy on the decoding side. It may be encoded by that. In this case, the exception processing on the decoding side becomes unnecessary.

【００２１】しかし、図９のフローチャートに示す手順
でＬＺＷ符号化を行うと、１つの文字列を辞書検索する
たびに、最悪、辞書全体をサ−チしなければならなら
ず、辞書検索に時間がかかる問題があった。そこで従来
の辞書検索方式にあっては、外部ハッシュ法（open has
hing 又はchaining）を用いて処理速度を上げている。However, if LZW encoding is performed according to the procedure shown in the flowchart of FIG. 9, the worst case is that the entire dictionary must be searched every time one character string is searched for in the dictionary. There was a problem that takes. Therefore, in the conventional dictionary search method, the external hash method (open has
The processing speed is increased by using hing or chaining).

【００２２】まず一般的なハッシュ法による辞書検索に
あっては、複数の文字列からなる集合Ｓを考えたとき、
集合Ｓの文字列ｘの格納位置を、文字列ｘそのものから
格納位置を示すアドレスを直接計算できる仕組みになっ
ており、高速の辞書検索ができる。文字列の記憶場所、
即ちハッシュ表に０からm-1 までのアドレスが付されて
いるとすると、ハッシュ法では、関数ｈ：Ｓ→〔０，１，・・・，m-1 〕を一つ定めて、集合Ｓの文字列ｘのアドレスをｈ（ｘ）
として求める。この関数ｈをハッシュ関数、値ｈ（ｘ）
を文字列ｘのハッシュアドレスという。First, in a dictionary search by a general hash method, when considering a set S consisting of a plurality of character strings,
The storage position of the character string x of the set S can directly calculate the address indicating the storage position from the character string x itself, and high-speed dictionary search can be performed. String storage location,
That is, if addresses from 0 to m-1 are given to the hash table, in the hash method, one function h: S → [0,1, ..., m-1] is defined and the set S The address of the character string x of h (x)
Ask as. This function h is a hash function, and the value h (x)
Is called the hash address of the character string x.

【００２３】ハッシュ法は、通常、集合Ｓの大きさがア
ドレス数ｍに比べてはるかに大きい場合に用いられる。
しかしながら、ハッシュ関数ｈをどのように選んだとし
ても、集合Ｓの相異なる文字列ｘ１，ｘ２に対してｈ（ｘ１）＝ｈ（ｘ２）ハッシュアドレスが一致してしまう場合が起こり得る。
これを衝突と呼び、衝突に対する対策の一つとして外部
ハッシュ法（open hashing, またはchaining）が用いら
れる。The hash method is usually used when the size of the set S is much larger than the number of addresses m.
However, no matter how the hash function h is selected, it is possible that h (x1) = h (x2) hash addresses match with different character strings x1 and x2 of the set S.
This is called a collision, and an external hashing method (open hashing, or chaining) is used as one of the countermeasures against the collision.

【００２４】外部ハッシュ法は図１４に示すように、索
引（ディレクトリ）で示されるハッシュアドレスｉ毎に
連結リストを用意し、衝突を起こしたハッシュアドレス
ｈ（ｘ）＝ｉの文字列ｘは、連結リストの先頭から順番
に格納する。同じハッシュアドレスｈ（ｘ）をもつそれ
ぞれの連結リストはバケット（bucket) と呼ばれる。In the external hash method, as shown in FIG. 14, a linked list is prepared for each hash address i indicated by an index (directory), and a character string x having a collision hash address h (x) = i is Store them in order from the beginning of the linked list. Each linked list with the same hash address h (x) is called a bucket.

【００２５】辞書検索に外部ハッシュ法のリスト構造を
利用したＬＺＷ符号化の処理フローを図１５に示す。ま
た図１６に従来の辞書の構成例を示し、この辞書構成に
対応して辞書メモリ上の配置を図１７に示す。まず図１
７において、辞書メモリは、ファーストメモリ（ｆｉｒ
ｓｔ）１００、ネクストメモリ（ｎｅｘｔ）２００及び
拡張メモリ（ｅｘｔｅｎｔｉｏｎ；ｅｘｔと省略）３０
０で構成される。ここでファーストメモリ１００が図１
４に示した外部ハッシュ法の索引（ディレクトリ）に対
応し、ネクストメモリ２００が図１４の連結リストの
「next」に対応し、更に拡張メモリ３００が図１４の
「name」に対応する。FIG. 15 shows a processing flow of LZW encoding using a list structure of the external hash method for dictionary search. 16 shows an example of the structure of a conventional dictionary, and FIG. 17 shows the arrangement on the dictionary memory corresponding to this dictionary structure. Figure 1
7, the dictionary memory is a first memory (fir).
st) 100, next memory (next) 200, and extended memory (extension; abbreviated as ext) 30
It consists of 0. Here, the first memory 100 is shown in FIG.
14 corresponds to the index (directory) of the external hash method shown in FIG. 4, the next memory 200 corresponds to “next” in the linked list of FIG. 14, and the extended memory 300 corresponds to “name” of FIG.

【００２６】また図１６の辞書構成にあっては、右下に
取出して示すように、１つのノードに次の情報を示して
いる。（１）ノード内；拡張メモリの登録シンボル（２）ノード左上；アドレス（３）ノード左下；次のファーストメモリのアドレス（４）ノード右下；ネクストメモリのアドレス尚、数値Ｏはメモリ内容が空であることを示す。In the dictionary structure of FIG. 16, the following information is shown in one node as shown in the lower right part. (1) In the node; Extended memory registration symbol (2) Node upper left; Address (3) Node lower left; Next first memory address (4) Node lower right; Next memory address Note that the value O is empty. Is shown.

【００２７】図１５のＬＺＷ符号化処理を、説明を簡単
にするため文字Ａ、Ｂ、Ｃの３文字を対象とした場合を
例にとって説明すると次のようになる。まずステップＳ
１で次の初期化処理を行う。（１）第１番目の文字を含むように辞書を初期化する。
ここでアルファベットＡ、Ｂ、Ｃの３文字を対象として
いることから、Ａ、Ｂ、Ｃの文字コードをそのままハッ
シュアドレスとして図１６の辞書メモリのアドレス１，
２，３に登録する。The LZW encoding process of FIG. 15 will be described below by taking as an example the case where three characters A, B, and C are used for the sake of simplicity. First step S
At 1, the following initialization processing is performed. (1) Initialize the dictionary to include the first character.
Since the three letters of the alphabets A, B, and C are targeted here, the character codes of A, B, and C are used as the hash address as they are, and the address 1 of the dictionary memory in FIG.
Register in a few steps.

【００２８】（２）辞書への現在文字登録数ｎを前記
（２）で登録した文字数にセットする。アルファベット
３文字の場合には、ｎ＝３となる。（３）入力した最初の文字Ｋを語頭文字列ｉとする。こ
の場合、最初の入力文字は「Ａ」であることから語頭文
字列ｉ＝１とする。（４）辞書検索用配列を０に初期化する。即ち、ファー
スト、ネクスト及び拡張のメモリの検索用配列はfirst
[1,Nmax],next［1,Nmax］、EXT ［1,Nmax］で表わされ
るので、これを０に初期化する。(2) The current character registration number n in the dictionary is set to the number of characters registered in (2) above. In the case of three letters of the alphabet, n = 3. (3) The first input character K is the initial character string i. In this case, since the first input character is "A", the initial character string i = 1. (4) The dictionary search array is initialized to 0. That is, the search array for the first, next, and extended memories is first
Since it is represented by [1, Nmax], next [1, Nmax], EXT [1, Nmax], it is initialized to 0.

【００２９】以上のステップＳ１の初期化処理が済んだ
ならば、ステップＳ２移行の処理に進み、その結果、現
在図１６及び図１７に示す辞書が作成された段階にある
ものとする。この状態でいま文字列「ＡＡＡＡ」を入力
して符号化する場合の処理を説明する。When the initialization process of step S1 is completed, the process proceeds to step S2, and as a result, it is assumed that the dictionaries shown in FIGS. 16 and 17 are currently created. A process for inputting and encoding the character string "AAAA" in this state will be described.

【００３０】ステップＳ１の初期化は済んでいるので、
最初の入力文字「Ａ」を語頭文字列ω＝１とし、ステッ
プＳ１で最初の入力文字「Ａ」を語頭文字列ω＝１と
し、ステップＳ２で２番目の入力文字「Ａ」を読む。続
いてステップＳ３で未処理文字があることが判別されて
ステップＳ５〜ステップＳ９に示す辞書検索ステップに
進む。Since the initialization of step S1 has been completed,
The first input character “A” is the initial character string ω = 1, the first input character “A” is the initial character string ω = 1 in step S1, and the second input character “A” is read in step S2. . Then, in step S3, it is determined that there is an unprocessed character, and the process proceeds to the dictionary search step shown in steps S5 to S9.

【００３１】辞書検索ステップでは、まずステップＳ５
で語頭文字列ω＝１をカウンタｉにｉ＝１としてセット
し、且つｊカウンタをｊ＝０にセットする。ここでカウ
ンタｉはファーストメモリの格納値で指定される辞書メ
モリのアドレス値であり、またカンウタｊはネクストメ
モリの格納値で指定される辞書メモリのアドレス値であ
る。In the dictionary search step, first, step S5
The initial character string ω = 1 is set in the counter i with i = 1, and the j counter is set with j = 0. Here, the counter i is the address value of the dictionary memory specified by the stored value of the first memory, and the counter j is the address value of the dictionary memory specified by the stored value of the next memory.

【００３２】次にステップＳ６でｉカウンタで指定され
た図１７の辞書メモリのアドレス１の内容を読み、拡張
メモリ３００からシンボル（ｓｍｂｏｌ）として「Ａ」
を読出し、またファーストメモリ１００から次のファー
ストアドレス「４」を読出してｉカウンタをｉ＝４にセ
ットする。続いてステップＳ７に進み、辞書登録ステッ
プに移行するか否か判断するためにｉ＝０か否かチェッ
クし、このときｉ＝４であることからステップＳ８に進
み、ステップＳ６のアドレス１の拡張メモリ３００を参
照して得たシンボル「Ａ」と、１番目の入力文字「Ａ」
との一致を判別する。この場合、両者は一致しているこ
とからステップＳ２に戻り、３番目の入力文字「Ａ」を
読込む。Next, in step S6, the contents of address 1 of the dictionary memory of FIG. 17 designated by the i counter are read, and "A" is read from the extension memory 300 as a symbol (smbol).
Is read out, the next first address “4” is read out from the first memory 100, and the i counter is set to i = 4. Subsequently, the process proceeds to step S7, and it is checked whether i = 0 to determine whether to proceed to the dictionary registration step. Since i = 4 at this time, the process proceeds to step S8, and expansion of address 1 in step S6. The symbol “A” obtained by referring to the memory 300 and the first input character “A”
Determine the match with. In this case, since the two match, the process returns to step S2 and the third input character "A" is read.

【００３３】続いてステップＳ３を介してステップＳ５
に進み、辞書メモリのアドレスωにそのときのカウンタ
ｉの値ｉ＝４をセットし、辞書メモリのアドレス４を参
照する。次にステップＳ６で辞書メモリのアドレス４の
内容を読み、拡張メモリ３００に格納したシンボル（ｓ
ｍｂｏｌ）として「Ｂ」を読出し、またファーストメモ
リ１００から次のファーストアドレス「６」を読出して
ｉカウンタをｉ＝６にセットする。Then, step S5 is executed through step S3.
Then, the value i of the counter i at that time i = 4 is set to the address ω of the dictionary memory, and the address 4 of the dictionary memory is referred to. Next, in step S6, the contents of address 4 of the dictionary memory are read, and the symbol (s
“B” is read out as the MMBol), the next first address “6” is read out from the first memory 100, and the i counter is set to i = 6.

【００３４】続いてステップＳ７に進み、ｉ＝０か否か
チェックし、このときｉ＝６であることからステップＳ
８に進み、ステップＳ６のアドレス４の拡張メモリ３０
０から得たシンボル「Ｂ」と、ステップＳ２で得ている
入力文字「Ａ」との一致を判別する。この場合、両者は
不一致あることからステップＳ９に進む。ステップＳ９
では、まずｉカウンタに辞書メモリのアドレス４の参照
でネクストメモリ２００から得たｊ＝１０の値をセット
してｉ＝１０とする。このｉカウンタとｊカウンタの置
き換えは、ステップＳ７の判断をｉカウンタについての
み行っていることから、これをｊカウンタについてもで
きるようにするためである。Then, in step S7, it is checked whether i = 0. Since i = 6 at this time, step S7 is executed.
8, the expansion memory 30 of address 4 in step S6
It is determined whether the symbol “B” obtained from 0 matches the input character “A” obtained in step S2. In this case, since the two do not match, the process proceeds to step S9. Step S9
Then, first, the value of j = 10 obtained from the next memory 200 by referring to the address 4 of the dictionary memory is set in the i counter to set i = 10. The replacement of the i counter with the j counter is to make it possible for the j counter because the determination in step S7 is made only for the i counter.

【００３５】続いて置き換えが済んだｉカウンタで指定
される辞書メモリのアドレス１０を参照し、アドレス１
０の拡張メモリ３００に格納したシンボル「Ａ」を読出
し、更に、アドレス１０のファーストメモリ１００に格
納している次のファーストメモリのアドレス値１１をｉ
カウンタにセットする。次にステップＳ７に戻り、この
ときｉ＝１１であることからステップＳ９で得られたア
ドレス１０のシンボル「Ａ」と入力文字「Ａ」とを比較
し、一致していることからステップＳ２に進み、３番目
の文字の処理に進む。Next, referring to the address 10 of the dictionary memory designated by the i counter which has been replaced, the address 1
The symbol “A” stored in the extended memory 300 of 0 is read, and the address value 11 of the next first memory stored in the first memory 100 of address 10 is set to i.
Set in the counter. Next, the process returns to step S7. Since i = 11 at this time, the symbol "A" of the address 10 obtained in step S9 is compared with the input character "A". If they match, the process proceeds to step S2. Proceed to processing the third character.

【００３６】３番目及び４番目の入力文字「Ａ」につい
ては１番目の入力文字と同様の処理が行われ、辞書メモ
リのアドレス１０から１１、更にアドレス１１から１２
に進み、アドレス１２の処理が済むとステップＳ３で処
理対象となる文字がなくなることからステップＳ１６に
進んで最終アドレスω＝１２を符号語ｃｏｄｅ（ω）と
して出力して一連の処理を終える。The same processing as that of the first input character is performed for the third and fourth input characters "A", and the addresses 10 to 11 and the addresses 11 to 12 of the dictionary memory are further processed.
When the processing of the address 12 is completed, there is no character to be processed in step S3, and therefore the processing proceeds to step S16, where the final address ω = 12 is output as the code word code (ω), and the series of processing ends.

【００３７】次にステップＳ１１〜ステップＳ１５の辞
書登録ステップの処理を説明する。辞書登録は辞書検索
ステップのファーストメモリ１００又はネクストメモリ
２００の検索でｉ＝０となった時に行われる。即ち、ス
テップＳ７でｉ＝０が判別されると、もはや辞書検索は
できないのでステップＳ１０でそのときの辞書アドレス
ωを符号語ｃｏｄｅ（ω）として出力して辞書登録ステ
ップに入る。Next, the processing of the dictionary registration step of steps S11 to S15 will be described. The dictionary registration is performed when i = 0 in the search of the first memory 100 or the next memory 200 in the dictionary search step. That is, when i = 0 is determined in step S7, the dictionary search can no longer be performed, and the dictionary address ω at that time is output as the code word code (ω) in step S10 to enter the dictionary registration step.

【００３８】辞書登録ステップでは、まずステップＳ１
１でその時点での辞書メモリの現在登録文字数ｎをｉカ
ウンタにセットし、更にｎを１つインクリメントする。
続いてステップＳ１２でｊ＝０か否かチェックし、ｊ＝
０でなければｉ＝０であるのでステップＳ１３に進んで
ファーストメモリ１００の登録処理を行う。ｊ＝０であ
ればステップＳ１４に進んでネクストメモリの登録処理
を行う。In the dictionary registration step, first, step S1
At 1, the number n of characters currently registered in the dictionary memory at that time is set in the i counter, and n is further incremented by 1.
Then, in step S12, it is checked whether or not j = 0, and j =
If it is not 0, i = 0. Therefore, the process proceeds to step S13 and the registration process of the first memory 100 is performed. If j = 0, the process proceeds to step S14 to perform the next memory registration process.

【００３９】ステップＳ１３のファーストメモリ１００
の登録処理は、（１）ｉカウンタで指定されるメモリアドレスｎのファ
ーストメモリ１００に中に、次の登録先を示す（ｎ＋
１）の値を格納し、（２）次のメモリアドレス（ｎ＋１）の拡張メモリ１０
０に入力文字Ｋをシンボルとして登録する。具体的に図
１６、図１７でアドレス１１に続いて入力文字「Ａ」を
登録する場合を例にとると、ｉカウンタで指定されるメ
モリアドレス１１のファーストメモリ１００に中に、次
の登録先を示すアドレス値１２を格納し、次のメモリア
ドレス１２の拡張メモリ１００に入力文字「Ａ」をシン
ボルとして登録する。First memory 100 in step S13
The registration process of (1) indicates the next registration destination in the first memory 100 at the memory address n designated by the i counter (n +
1) The value is stored, and (2) the expansion memory 10 of the next memory address (n + 1) is stored.
The input character K is registered in 0 as a symbol. Specifically, in the case where the input character “A” is registered following the address 11 in FIGS. 16 and 17, the next registration destination is stored in the first memory 100 at the memory address 11 designated by the i counter. Is stored, and the input character “A” is registered as a symbol in the extension memory 100 at the next memory address 12.

【００４０】一方、ステップＳ１４のネクストメモリ２
００の登録処理は、（１）ｉカウンタで指定されるメモリアドレスのネクス
トメモリ２００に中に、次の登録先を示す（ｎ＋１）の
値を格納し、（２）次のメモリアドレス（ｎ＋１）の拡張メモリ１０
０に入力文字Ｋをシンボルとして登録する。On the other hand, the next memory 2 in step S14
The registration process of 00 is as follows: (1) The value of (n + 1) indicating the next registration destination is stored in the next memory 200 of the memory address specified by the i counter, and (2) the next memory address (n + 1) Extended memory 10
The input character K is registered in 0 as a symbol.

【００４１】具体的に図１６、図１７でアドレス１１に
続いて入力文字「Ａ」を登録する場合を例にとると、ｉ
カウンタで指定されるメモリアドレス１１のネクストメ
モリ２００の中に、次の登録先を示すアドレス値１０を
格納し、次のメモリアドレス１０の拡張メモリ１００に
入力文字「Ａ」をシンボルとして登録する。以上の登録
処理が済むと、登録が済んだ文字Ｋをｉカウンタにセッ
トしてステップＳ２からの辞書検索ステップに戻る。Specifically, in the case where the input character "A" is registered following the address 11 in FIGS.
The address value 10 indicating the next registration destination is stored in the next memory 200 of the memory address 11 designated by the counter, and the input character “A” is registered as a symbol in the expansion memory 100 of the next memory address 10. When the above registration process is completed, the registered character K is set in the i counter and the process returns to the dictionary search step from step S2.

【００４２】[0042]

【発明が解決しようとする課題】このような従来のＬＺ
Ｗ符号は、ソフトウェアで符号化すると辞書検索処理に
多くの時間を要することから、辞書検索に外部ハッシュ
法を用いて高速化している。しかし、外部ハッシュ法に
よる書検索では、入力文字と候補文字との照合をシーケ
ルシャルに行うため、辞書検索時間が全体時間の約８０
％を占め、高速化が難しいという欠点があった。[Problems to be Solved by the Invention] Such a conventional LZ
Since the W code requires a lot of time for the dictionary search process when encoded by software, the W code is speeded up by using the external hash method for the dictionary search. However, in the book search by the external hash method, since the input character and the candidate character are collated sequentially, the dictionary search time is about 80% of the total time.
%, Which is a drawback that it is difficult to speed up.

【００４３】一方、本願発明者等にあっては、辞書検索
をするとき既に符号化済みの入力文字の情報を用いて連
結リストを複数個に分割して探索することで高速化を可
能とした符号化方式を提案している。しかし、実際の符
号化にあっては、使用出来るメモリ容量は予め決まって
おり、入力データの大きさによっては辞書メモリを全て
使用しないで符号化が終わる場合がある。また、用途に
よっては圧縮率よりも処理時間を優先させて符号化を行
いたい場合もある。On the other hand, the inventors of the present application can speed up the processing by dividing the linked list into a plurality of pieces and searching by using the information of the input characters that have already been encoded when performing the dictionary search. A coding scheme is proposed. However, in the actual encoding, the usable memory capacity is predetermined, and depending on the size of the input data, the encoding may end without using all the dictionary memory. In addition, depending on the application, it may be desired to prioritize the processing time over the compression rate for encoding.

【００４４】しかしながら、従来の符号化方式では高速
化の要求と高圧縮の要求をうまく融合させて符号化する
ことが難しいという問題があった。本発明は、このよう
な従来の問題点に鑑みてなされたもので、高速化と高圧
縮の要求に適切に対応した符号化ができるデータ圧縮方
式を提供することを目的とする。However, the conventional encoding method has a problem that it is difficult to properly combine the request for high speed and the request for high compression for encoding. The present invention has been made in view of the above conventional problems, and an object of the present invention is to provide a data compression method capable of encoding appropriately corresponding to demands for high speed and high compression.

【００４５】[0045]

【課題を解決するための手段】図１は本発明の原理説明
図である。まず本発明は、符号化済み文字列を相異なる
部分列に分けて各部分列毎に異なる参照番号を付加して
辞書１に登録しておき、入力文字列を辞書１の中の部分
列の内、最大長一致するものの参照番号で指定した符号
化によりデータ圧縮する符号化手段２と、部分列の検索
に外部ハッシュ法を使用し、辞書１に登録した部分列の
参照番号ｉに入力文字Ｋの要素から抽出した情報Ｋｍを
加えたハッシュ・アドレスを生成することにより、付加
情報Ｋｍのビット数に応じた分割数の連結リストを生成
して辞書１を検索する辞書検索手段３とを備えたデータ
圧縮方式を対象とする。FIG. 1 is a diagram for explaining the principle of the present invention. First, according to the present invention, an encoded character string is divided into different partial strings, a different reference number is added to each partial string and registered in the dictionary 1, and the input character string is stored in the dictionary 1. Among them, the encoding means 2 that compresses the data by the encoding specified by the reference number of the one having the maximum length match, and the external hash method is used to search for the substring, and the input character is input to the reference number i of the substring registered in the dictionary 1. A dictionary search means 3 for searching the dictionary 1 by generating a hash address to which the information Km extracted from the element of K is added to generate a linked list of the number of divisions according to the number of bits of the additional information Km. Data compression methods are targeted.

【００４６】このようなデータ圧縮方式につき本発明に
あっては、部分列の参照番号ｉに加える入力文字Ｋの要
素から抽出される情報Ｋｍのビット数を、適宜に変えて
連結リストの分割数を指定する分割数指定手段４を設け
たことを特徴とする。ここで分割数指定手段４は、部分
列の参照番号ｉに加える入力文字Ｋの要素から抽出され
る情報Ｋｍのビット数を、高速処理に適合したビット数
（図３（ｂ）参照）或いは高圧縮処理に適合したビット
数（図３（ａ）参照）に指定する。In the present invention regarding such a data compression method, the number of bits of the information Km extracted from the element of the input character K added to the reference number i of the subsequence is appropriately changed and the number of divisions of the linked list is changed. It is characterized in that a division number specifying means 4 for specifying is provided. Here, the division number designating unit 4 sets the number of bits of the information Km extracted from the element of the input character K added to the reference number i of the partial sequence to the number of bits suitable for high-speed processing (see FIG. 3B) or high. The number of bits suitable for the compression processing (see FIG. 3A) is designated.

【００４７】具体的には、検索分割数判定手段５により
入力文字列データの先頭に付加された情報から判定され
た分割数に基づいてビット数を指定する。更に分割数指
定手段４は、入力文字Ｋの要素から抽出される情報Ｋｍ
のビット数を、予め指定した分割滓に対応するビット数
を指定するようにしていもよい。Specifically, the number of bits is designated based on the number of divisions determined from the information added to the beginning of the input character string data by the retrieval division number determination means 5. Further, the division number designating means 4 uses the information Km extracted from the elements of the input character K.
The number of bits may be specified as the number of bits corresponding to the previously designated division.

【００４８】[0048]

【作用】このような構成を備えた本発明のデータ圧縮方
式によれば、次の作用が得られる。データ圧縮の処理条
件としては、時間はかかっても高圧縮率が求められる場
合と、圧縮率よりも処理時間を優先させて高速に処理し
たい場合もある。According to the data compression method of the present invention having such a configuration, the following effects can be obtained. As a processing condition for data compression, there are a case where a high compression rate is required even if it takes a long time, and a case where the processing time is prioritized over the compression rate and high-speed processing is desired.

【００４９】このような高圧縮率と高速化は相反する処
理条件であるが、本発明にあっては外部ハッシャ法に使
用するハッシュ・アドレスとして、部分列の参照番号、
即ちアドレスｉに入力文字Ｋの要素の情報Ｋｍ、即ち入
力文字コードのあるビット数Ｋｍを加えたハッシュ・ア
ドレスとすることで、付加情報のビット数Ｋｍに応じて
分割し、この連結リストの分割数を決める付加情報とし
て使用する文字コードの有効ビット数Ｋｍを任意に指定
することで、高圧縮率或いは高速処理の各々に適合した
ダイナミックなデータ圧縮処理を適切に行うことができ
る。Although such a high compression rate and high speed are contradictory processing conditions, in the present invention, as the hash address used in the external hasher method, the reference number of the subsequence,
That is, by dividing the address i by the information Km of the element of the input character K, that is, the number Km of bits of the input character code, the hash address is divided according to the number of bits Km of the additional information, and the linked list is divided. By arbitrarily designating the effective bit number Km of the character code used as the additional information for determining the number, it is possible to appropriately perform the dynamic data compression processing suitable for each of the high compression rate and the high speed processing.

【００５０】[0050]

【実施例】図２は本発明の辞書検索機能を備えたデータ
圧縮方式の一実施例を示した実施例構成図である。図２
において、処理対象となる原デ−タ（文字データ或いは
符号語データ）１０はＤＭＡ（Direct Memory Access）
制御回路１２を介して入力される。制御手段としてのＭ
ＰＵ１４は入力された原デ−タ１０を、１文字Ｋと、今
までの文字列の参照番号ｉに１文字Ｋの文字コードの要
素ビットＫｍを付加したハッシュ・アドレスを辞書検索
回路１６の複数文字読込み回路１８にセットした後、辞
書検索回路１６を起動する。この１文字Ｋの文字コード
の要素ビットは、外部ハッシュ法における連結リストの
検索分割数を決めるもので、文字コードを８ビットとす
ると、要素ビットＫｍの有効ビット数を例えば上位から
選ぶことで、次の９種類の連結リストの検索分割数が得
られる。このような連結リストの分割数において、処理対象とな
る２５６種の全文字種に一致する分割数２５６のときが
完全ハッシュとなり、辞書の検索は１回で済む。FIG. 2 is a block diagram of an embodiment showing an embodiment of a data compression system having a dictionary search function of the present invention. Figure 2
In the above, the original data (character data or code word data) 10 to be processed is DMA (Direct Memory Access).
It is input via the control circuit 12. M as control means
The PU 14 uses the input original data 10 as one character K and a plurality of hash addresses obtained by adding the element bit Km of the character code of the one character K to the reference number i of the character string so far in the dictionary search circuit 16. After setting in the character reading circuit 18, the dictionary search circuit 16 is activated. The element bit of the character code of this 1 character K determines the number of search divisions of the linked list in the external hash method. When the character code is 8 bits, the effective bit number of the element bit Km is selected from, for example, the higher order. The following nine types of linked list search division numbers are obtained. When the number of divisions of such a linked list is 256, which corresponds to all 256 character types to be processed, a perfect hash is obtained, and the dictionary can be searched only once.

【００５１】本発明にあっては、この９通りの中の分割
数の中から指定された処理条件に適合する分割数を指定
して符号化を行う。具体的な分割数の指定は、既に符号
化した直前の文字列の参照番号（辞書アドレス）ｉに加
える次の１文字Ｋの要素ビットＫｍのビット数をいくつ
にするかで指定できる。このための図１の原理説明図に
示した分割数指定手段４としての機能はＭＰＵ１４のプ
ログラム制御により実現される。図３は入力データのサ
イズ（大きさ）に対する連結リストの検索分割数を示し
たもので、図３（ａ）に高圧縮の処理条件に適合した特
性を示し、図３（ｂ）に高速の処理条件に適合した特性
を示す。In the present invention, encoding is performed by designating the number of divisions that matches the designated processing condition from among the nine division numbers. The specific number of divisions can be specified by setting the number of bits of the element bit Km of the next one character K added to the reference number (dictionary address) i of the immediately preceding character string already encoded. The function as the division number designating means 4 shown in the principle explanatory diagram of FIG. 1 for this purpose is realized by the program control of the MPU 14. FIG. 3 shows the number of search divisions of the linked list with respect to the size (size) of the input data. FIG. 3 (a) shows characteristics suitable for high compression processing conditions, and FIG. 3 (b) shows high speed. It shows the characteristics suitable for the processing conditions.

【００５２】即ち、時間はかかっても高圧縮率を得たい
場合には、図３（ａ）の入力データサイズに反比例する
分割数の指定を行う。この場合には、入力データサイズ
が大きい程、連結リストの分割数が少なくなり、辞書中
の一致する部分列の最大長を長くできるので、高圧縮率
が得られる。しかし、一致する部分列が長くなること
で、辞書の検索回数が増え、処理時間は長くなる。また
メモリ容量が一定の場合には、使用されないメモリを有
効に利用することができる。That is, when it is desired to obtain a high compression rate even if it takes time, the number of divisions that is inversely proportional to the input data size in FIG. 3A is designated. In this case, the larger the input data size, the smaller the number of divisions of the linked list, and the longer the maximum length of the matching subsequence in the dictionary, so that a high compression rate can be obtained. However, the length of the matching substring increases the number of times the dictionary is searched, and the processing time increases. Further, when the memory capacity is constant, unused memory can be effectively used.

【００５３】一方、圧縮率は得られなくとも処理時間を
短縮したい場合には、図３（ｂ）に示す入力データサイ
ズに比例した分割数を指定する。この場合には、入力デ
ータサイズに応じて分割数が増加し、最大分割数２５６
では完全ハッシュとなるため、１回の辞書検索で符号化
できる。またメモリ容量が一定の場合には、入力データ
サイズにかかわらず、圧縮率を犠牲にして全て一定時間
に処理できることを意味する。On the other hand, if it is desired to shorten the processing time without obtaining the compression rate, the division number proportional to the input data size shown in FIG. 3B is designated. In this case, the number of divisions increases according to the input data size, and the maximum number of divisions 256
Since it is a perfect hash, it can be encoded by one dictionary search. Further, when the memory capacity is constant, it means that all can be processed in a constant time at the expense of the compression rate regardless of the input data size.

【００５４】図３（ａ）（ｂ）に示すいずれかの条件に
適合した入力データサイズに基づく分割数の指定は、オ
ペレータが処理対象とするデータサイズを知ってＭＰＵ
１４に対し分割数を直接指定してもよい。また入力デー
タの先頭にあるデータの大きさを示す値を予め設定して
おき、このデータの大きさをＭＰＵ１４で読取って図３
（ａ）又は図３（ｂ）に示す分割特性に従って自動的に
入力データの大きさに対する分最適な分割数に変換する
こともできる。このＭＰＵ１４による入力データの大き
さを判別する機能が、図１の原理説明図に示した検索分
割数判定手段５としての機能である。To specify the number of divisions based on the input data size that meets any of the conditions shown in FIGS. 3A and 3B, the operator knows the data size to be processed and the MPU
The number of divisions may be directly specified for 14. Further, a value indicating the size of the data at the head of the input data is set in advance, the size of this data is read by the MPU 14, and the value shown in FIG.
It is also possible to automatically convert into the optimum number of divisions corresponding to the size of the input data according to the division characteristics shown in FIG. The function of determining the size of the input data by the MPU 14 is the function as the search division number determining means 5 shown in the principle explanatory diagram of FIG.

【００５５】このための入力データ形式は図４（ａ）に
示すように、先頭に入力データの大きさを示す値があ
り、そのあとに本来のデータ系列が続く。更に符号化済
みのデータは図４（ｂ）に示すように、先頭に符号化に
使用した辞書の大きさ、即ち使用辞書サイズ（分割数及
び各分割辞書の使用サイズ）があり、その後に符号化済
データが続く。As shown in FIG. 4 (a), the input data format for this purpose has a value indicating the size of the input data at the beginning, followed by the original data series. Further, as shown in FIG. 4B, the encoded data has the size of the dictionary used for encoding, that is, the used dictionary size (the number of divisions and the used size of each divided dictionary) at the beginning, and the encoded data The converted data follows.

【００５６】従って復元時には、先頭の辞書使用の大き
さから、復元に使用する最大の大きさの辞書を決めて復
元を行うことができる。再び図２を参照するに、辞書検
索路１６は以後、辞書メモリ２０より１文字伸ばした文
字列の候補文字を読込み、一致検査回路２２で入力文字
と候補文字との一致検査（照合）を行ない、連結検出回
路２４で候補文字の有無の検出を行なう。Therefore, at the time of restoration, it is possible to decide the dictionary having the maximum size to be used for restoration from the size of the leading dictionary used for restoration. Referring again to FIG. 2, the dictionary search path 16 thereafter reads the candidate character of the character string extended by one character from the dictionary memory 20, and the match checking circuit 22 checks the input character and the candidate character for matching (matching). The connection detection circuit 24 detects the presence / absence of candidate characters.

【００５７】パイプライン制御回路２６は、一致検査回
路２２による入力文字と候補文字の照合と連結検出回路
２４による候補文字の有無の検出とに並行して辞書メモ
リ２０に次の候補文字の読出しをかける。このようにパ
イプライン制御回路２６でパイプライン処理を行なうこ
とで、候補文字の複数個ごとの探索と照合処理が辞書メ
モリ２０のサイクル・タイムで実行することができる。The pipeline control circuit 26 reads the next candidate character into the dictionary memory 20 in parallel with the matching check circuit 22 collating the input character with the candidate character and the concatenation detection circuit 24 detecting the presence or absence of the candidate character. Call. By carrying out the pipeline processing in the pipeline control circuit 26 in this way, it is possible to execute the search and collation processing for each of a plurality of candidate characters in the cycle time of the dictionary memory 20.

【００５８】更に辞書検索回路１６には連続アドレス回
路２８が設けられ、連続アドレス回路２８は連続アドレ
スを発生し、複数文字読込み回路１８に辞書メモリ２０
の連続アドレスに登録されているハッシュアドレス及び
候補文字を読出すようにする。ＬＺＷ符号の符号化で
は、辞書メモリ２０中の最大長一致する文字列を求め
る。従って、入力文字を付加して文字列を逐次一文字ず
つ伸ばしていき、候補文字がなくなったところで最大一
致長の文字列であることが分かる。このとき、最大一致
長文字列まではアドレスωを使用した参照番号で表わさ
れており、その参照番号ωを入出力ポ−ト３０から外部
に圧縮された符号語code（ω）として出力する。図５は
図２の辞書探索回路１６の詳細を示した実施例構成図で
ある。Further, the dictionary search circuit 16 is provided with a continuous address circuit 28, which generates continuous addresses and causes the plural character reading circuit 18 to have a dictionary memory 20.
The hash address and the candidate character registered in the continuous address of are read. In encoding the LZW code, a character string having the maximum length matching in the dictionary memory 20 is obtained. Therefore, it is understood that the input character is added and the character string is sequentially extended character by character, and when there are no candidate characters, the character string has the maximum matching length. At this time, up to the maximum matching length character string is represented by a reference number using the address ω, and the reference number ω is output from the input / output port 30 to the outside as a compressed code word code (ω). . FIG. 5 is a block diagram of an embodiment showing details of the dictionary search circuit 16 of FIG.

【００５９】図５において、ＭＰＵ１は最初に文字列の
１番目の文字参照番号ｉ及び２番目の文字Ｋの８ビット
文字コードの例えば最上位ビットＫm の組（Ｋｍ，ｉ）
をアドレス・レジスタ１８−１にセットすると共に、入
力した２番目の文字Ｋをレジスタ２２−１にセットす
る。次にパイプライン制御回路２６に辞書検索回路１６
の起動を指令する。In FIG. 5, the MPU 1 first sets, for example, the most significant bit Km (Km, i) of the 8-bit character code of the first character reference number i and the second character K of the character string.
Is set in the address register 18-1 and the input second character K is set in the register 22-1. Next, the dictionary control circuit 16 is added to the pipeline control circuit 26.
Command to start.

【００６０】パイプライン制御回路２６は、まずＦＦ２
８−１をＫｍ＝０にリセットしてから辞書メモリ２０に
読出をかける。ＦＦ２４−２は辞書メモリ２０のアドレ
スの最上位ビット (MSB)であり、アドレス・レジスタ１
８−１の内容が下位アドレスとなって辞書メモリ２０の
配列 Firstに対応する領域を読み出す。ここで、辞書メ
モリ２０の構成の一例を図６に示し、また図６に対応し
た辞書メモリ２０の配列を図７に示す。尚、図６、図７
は説明を簡単にするためａ，ｂ，ｃの３文字の符号化を
例にとっている。The pipeline control circuit 26 first detects the FF2.
After 8-1 is reset to Km = 0, the dictionary memory 20 is read. The FF 24-2 is the most significant bit (MSB) of the address of the dictionary memory 20, and the address register 1
The content of 8-1 becomes the lower address and the area corresponding to the array First of the dictionary memory 20 is read. Here, an example of the configuration of the dictionary memory 20 is shown in FIG. 6, and the arrangement of the dictionary memory 20 corresponding to FIG. 6 is shown in FIG. 6 and 7
For simplification of description, the three-character encoding of a, b, and c is taken as an example.

【００６１】このメモリ配列において、ｆｉｒｓｔ０，
１は本来のハッシュアドレスｉに加えた次の１文字の最
上位ビットＫｍで決まり、Ｋｍ＝０でｆｉｒｓｔ１Ｋｍ＝１でｆｉｒｓｔ１が指定される。In this memory array, first0,
1 is determined by the most significant bit Km of the next one character added to the original hash address i. Km = 0 first1 Km = 1 specifies first1.

【００６２】従って、図５において第１番目の文字のア
ドレス（参照番号）ｉと２番目の文字の最上位ビットＫ
m をアドレスとして辞書メモリ２０をアクセスすると、
このときＦＦ２８−１のリセットによりＫｍ＝０である
ことから、図７の辞書メモリの中の配列ｆｉｒｓｔ０
（第１ファーストメモリ）及びｅｘｔｅｎｔｉｏｎ（拡
張メモリ）に対応する領域を読み出す。辞書メモリ２０
より読出した１ワードの内容の内、連結リスト・アドレ
スに対応する部分（ｆｉｒｓｔ０）はアドレスレジスタ
１８−１にセットし、候補文字Ｋ´に対応する部分（ｅ
ｘｔｅｎｔｉｏｎ）はレジスタ１８−２にセットする。Therefore, in FIG. 5, the address (reference number) i of the first character and the most significant bit K of the second character are shown.
When the dictionary memory 20 is accessed with m as an address,
At this time, since Km = 0 due to the reset of the FF 28-1, the array first0 in the dictionary memory of FIG.
The areas corresponding to (first first memory) and extension (extended memory) are read. Dictionary memory 20
A portion (first0) corresponding to the linked list address in the read one word content is set in the address register 18-1 and a portion (e) corresponding to the candidate character K ′ is set.
xtention) is set in the register 18-2.

【００６３】これと同時に、アドレスレジスタ１８−１
に既に格納されていた内容のうち文字Ｋの最上位ビット
Ｋｍを除く部分ｉをレジスタ１８−３に移す。また、Ｆ
Ｆ２８−１の内容ＫｍをＦＦ２４−２に移す。この辞書
メモリ２０の読出と平行して、レジスタ２２−１中の入
力文字とレジスタ１８−２中の候補文字が一致比較回路
２２−２で比較照合される。At the same time, the address register 18-1
The part i of the contents already stored in the register except the most significant bit Km of the character K is transferred to the register 18-3. Also, F
The content Km of F28-1 is transferred to FF24-2. In parallel with the reading of the dictionary memory 20, the input character in the register 22-1 and the candidate character in the register 18-2 are compared and collated by the coincidence comparison circuit 22-2.

【００６４】比較照合により入力文字Ｋと候補文字Ｋ´
が一致すると、パイプライン制御回路２６は、次の入力
文字をレジスタ２２−１にセットし、このときＦＦ２８
−１はリセットによりのままであることから、Ｋｍ＝０
となってアドレスレジスタ１８−１で指定されるアドレ
スの辞書メモリ２０の配列ｆｉｒｓｔ０及びｅｘｔｅｎ
ｔｉｏｎに対応する領域を読出し、辞書メモリ２０より
読出した１ワードの内容の内、連結リスト・アドレスに
対応する部分（ｆｉｒｓｔ０）はアドレスレジスタ１８
−１にセットし、候補文字Ｋ´に対応する部分（ｅｘｔ
ｅｎｔｉｏｎ）はレジスタ１８−２にセットし、比較照
合を行い、以下同様に繰り返す。By comparing and collating, the input character K and the candidate character K '
, The pipeline control circuit 26 sets the next input character in the register 22-1. At this time, the FF 28
Since -1 remains as it is after reset, Km = 0
Becomes the array first0 and extent of the dictionary memory 20 of the address specified by the address register 18-1.
region corresponding to the linked list address (first 0) of the contents of 1 word read from the dictionary memory 20 is read from the address register 18
-1 is set, and the part corresponding to the candidate character K '(ext
(ention) is set in the register 18-2, comparison and collation are performed, and the same is repeated thereafter.

【００６５】このような比較照合の際に、比較照合と同
時にＮＯＲ回路２４−１で辞書メモリ２０から読出して
格納したアドレスレジスタ１８−１の内容がオール０で
あるか否か判別されており、もし、オール０の場合、候
補文字がなくなったことが検出される。候補文字がなく
なったときのＮＯＲ回路２４−１の出力はがＭＰＵ１４
及びパイプライン制御回路２６に与えられ、ＭＰＵ１４
は最後に比較照合が一致したメモリアドレスの値を符号
語として出力し、次の入力文字の探索に移る。次の文字
の探索についても同様に、レジスタ１８−３の内容（既
に符号化済みの直前の参照番号）ｉ及び入力文字Ｋの最
上位ビットＫm の組をアドレス・レジスタ１８−１にセ
ットすると共に、入力文字をレジスタ２２−１にセット
して２文字目以降の探索処理を行う。At the time of such comparison and comparison, at the same time as the comparison and comparison, it is determined whether or not the contents of the address register 18-1 read from the dictionary memory 20 and stored by the NOR circuit 24-1 are all 0. If all 0s, it is detected that there are no candidate characters. The output of the NOR circuit 24-1 when there are no candidate characters is MPU14.
And the pipeline control circuit 26,
Outputs the value of the memory address where the comparison and collation finally match as a code word, and moves to the search for the next input character. Similarly for the search for the next character, the set of the content (registered immediately before the reference number) i of the register 18-3 and the most significant bit Km of the input character K is set in the address register 18-1. , The input character is set in the register 22-1, and the search process for the second and subsequent characters is performed.

【００６６】一方、比較照合の結果が不一致であった場
合には、同じアドレスの配列ｎｅｘｔの領域を読出して
アドレスレジスタ１８−１にセットし、次の辞書メモリ
２０の読出しを行い、一致する候補文字が得られるまで
配列ｎｅｘｔの読出しを繰り返す。このような本発明の
辞書検索を図６、図７につき具体的に説明すると次のよ
うになる。On the other hand, if the result of comparison and collation is a mismatch, the area of the array next of the same address is read and set in the address register 18-1, and the next dictionary memory 20 is read to find a matching candidate. The reading of the array next is repeated until a character is obtained. The dictionary search of the present invention will be described in detail with reference to FIGS. 6 and 7.

【００６７】いま図６、図７に辞書が構成された状態で
文字列「ａａａａ」を符号化するものとする。ここで文
字「ａ，ｃ」の最上位ビットをＫｍ＝０、文字「ｂ」の
最上位ビットをＫｍ＝１とする。まず１番目の文字
「ａ」の参照番号はｉ＝１であり、また２番目の文字
「ａ」の文字コードの最上位ビットＭＳＢがＫｍ＝０で
あるから、図７の辞書メモリ１のアドレス１の中のＫｍ
＝０で指定されるｆｉｒｓｔ０の内容１０と、ｅｘｔｅ
ｎｔｉｏｎの候補文字「ａ」が読出される。この場合、
入力文字「ａ」と候補文字「ａ」とは一致することか
ら、次にｆｉｒｓｔ０から得られたアドレス１０により
次辞書メモリをアクセスして候補文字「ａ」を読出し、
更に３番目の文字「ａ」の最上位ビットＫｍ＝０で指定
される配列ｆｉｒｓｔ０の内容１１を読出す。この３番
目の文字「ａ」についても候補文字「ａ」との一致が得
られ、同様に４番目及び５番目の文字を処理し、最後の
５番目の文字「ａ」の配列ｆｉｒｓｔ０の内容は０にな
っていることから、候補文字が無くなったことを判別
し、最終アドレス１２を入力文字列「ａａａａ」の符号
語として出力する。Assume that the character string "aaaa" is encoded with the dictionary constructed in FIGS. 6 and 7. Here, the most significant bit of the character “a, c” is Km = 0, and the most significant bit of the character “b” is Km = 1. First, since the reference number of the first character "a" is i = 1 and the most significant bit MSB of the character code of the second character "a" is Km = 0, the address of the dictionary memory 1 in FIG. Km in 1
Content of first0 specified by = 0 and the extent
The candidate character "a" of ntion is read. in this case,
Since the input character “a” and the candidate character “a” match, the next dictionary memory is accessed by the address 10 obtained from first0 and the candidate character “a” is read out.
Further, the content 11 of the array first0 designated by the most significant bit Km = 0 of the third character "a" is read. The third character "a" is also matched with the candidate character "a", the fourth and fifth characters are processed in the same manner, and the content of the array fifth0 of the last fifth character "a" is Since it is 0, it is determined that there are no candidate characters, and the final address 12 is output as the code word of the input character string “aaaa”.

【００６８】一方、文字列「ａｂｃ」については、２番
目の文字の最上位ビットはＫｍ＝１であることから、ア
ドレス１の配列ｆｉｒｓｔ１の内容４が読出され、候補
文字との一致を条件にアドレス５の候補文字を読出し、
最終的にアドレス６の候補文字との一致が得られ、入力
文字「ａｂｃ」の符号語６を出力する。尚、候補文字と
の一致が得られなかった時のｎｅｘｔの検索は従来と同
じである。On the other hand, for the character string "abc", since the most significant bit of the second character is Km = 1, the content 4 of the array first1 of the address 1 is read out, and the match with the candidate character is used as a condition. Read the candidate character at address 5,
Finally, a match with the candidate character at address 6 is obtained, and the code word 6 of the input character "abc" is output. Note that the next search when the match with the candidate character is not obtained is the same as the conventional one.

【００６９】図８は本発明による符号化アルゴリズムの
フローチャートを示したもので、基本的には図１５の従
来方式と同じである。相違点は、（１）ステップＳ５で
メモリアドレスとして直前に符号化された文字列の参照
番号ｉと次の１文字Ｋの要素ビット、例えば最上位ビッ
トＫｍとの組でメモリアドレスｌを作成している点、
（２）ステップＳ６でｌ＝（Ｋｍ，ｉ）の組で決まる辞
書メモリｆｉｓｔ［ｌ］の読出しで次のメモリアドレス
ｉの中の分割されたｆｉｒｓｔを求めている点、（３）
更にステップＳ１３のアドレスｉへの候補文字の登録時
に、１つ前のアドレスｉ−１のｆｉｒｓｔ０，１のどち
らにアドレスｉを登録するかを候補文字の最上位ビット
Ｋｍに応じて区別している点、である。FIG. 8 shows a flowchart of the encoding algorithm according to the present invention, which is basically the same as the conventional system of FIG. The difference is that (1) the memory address 1 is created by a combination of the reference number i of the character string encoded immediately before as the memory address in step S5 and the element bit of the next one character K, for example, the most significant bit Km. Points,
(2) In step S6, the divided first in the next memory address i is obtained by reading the dictionary memory first [l] determined by the set of l = (Km, i), (3)
Furthermore, at the time of registering the candidate character to the address i in step S13, which of the first 0 and the first 1 of the previous address i-1 to register the address i is distinguished according to the most significant bit Km of the candidate character. ,.

【００７０】尚、上記の実施例では候補文字をそのまま
辞書メモリに格納し、比較する方法を述べたが、メモリ
容量を減らすため、候補文字はハッシュ・アドレスに付
加したビットＫｍを除いたビットだけをもたせても良
い。また、本発明の他の実施例としてハッシュ・アドレ
スに入力文字の特定のビットＫｍを付加するのでなく、
入力文字を加工して作り出した情報のビットを付加して
も同様に実現できることは明らかである。In the above embodiment, the method of storing the candidate characters as they are in the dictionary memory and comparing them has been described. However, in order to reduce the memory capacity, the candidate characters are only bits excluding the bit Km added to the hash address. You may give it. Also, as another embodiment of the present invention, instead of adding a specific bit Km of the input character to the hash address,
It is obvious that the same can be realized by adding bits of information created by processing the input character.

【００７１】[0071]

【発明の効果】以上説明したように本発明によれば、辞
書メモリの例えば入力データサイズに対する分割数の特
性を、高速化か高圧縮化かの処理条件に応じて選び、入
力データの大きさに応じて自動的に若しくは人為的に指
定することにより、符号化毎にダイナミックに決定して
符号化を行うようにすることで、高速化の要素と高圧縮
の要素をうまく融合させた符号化によるデータ圧縮を実
現できる。As described above, according to the present invention, the characteristic of the number of divisions with respect to the input data size of the dictionary memory is selected in accordance with the processing condition of high speed or high compression, and the size of the input data is selected. By automatically or artificially specifying according to the above, it is decided dynamically for each encoding so that the encoding can be performed. Data compression by can be realized.

[Brief description of drawings]

【図１】本発明の圧縮方式の原理説明図FIG. 1 is an explanatory diagram of the principle of the compression method of the present invention.

【図２】本発明の実施例構成図FIG. 2 is a block diagram of an embodiment of the present invention.

【図３】本発明の入力データサイズに対する分割数を処
理条件に分けて示した特性図FIG. 3 is a characteristic diagram showing the number of divisions with respect to the input data size of the present invention divided into processing conditions.

【図４】本発明の入力データ形式及び符号化済みデータ
形式を示した説明図FIG. 4 is an explanatory diagram showing an input data format and an encoded data format of the present invention.

【図５】図２の辞書検索回路の詳細を示した実施例構成
図5 is a block diagram of an embodiment showing details of the dictionary search circuit of FIG.

【図６】図２の符号化に使用する辞書メモリの構成を示
した説明図6 is an explanatory diagram showing a configuration of a dictionary memory used for encoding in FIG.

【図７】図６に対応した辞書メモリの配置説明図FIG. 7 is an explanatory diagram of an arrangement of a dictionary memory corresponding to FIG.

【図８】本発明の符号化アルゴリズムを示したフローチ
ャートFIG. 8 is a flowchart showing an encoding algorithm of the present invention.

【図９】従来のＬＺＷ符号化アルゴリズムのフローチャ
ートFIG. 9 is a flowchart of a conventional LZW encoding algorithm.

【図１０】従来のＬＺＷ復号化アルゴリズムのフローチ
ャートFIG. 10 is a flowchart of a conventional LZW decoding algorithm.

【図１１】従来のＬＺＷ符号化の具体例説明図FIG. 11 is an explanatory diagram of a specific example of conventional LZW encoding.

【図１２】辞書構成例の説明図FIG. 12 is an explanatory diagram of a dictionary configuration example.

【図１３】従来のＬＺＷ復号化の具体例説明図FIG. 13 is an explanatory diagram of a specific example of conventional LZW decoding.

【図１４】外部ハッシュ法のリスト構造説明図FIG. 14 is an explanatory diagram of a list structure of the external hash method.

【図１５】外部ハッシュ法を用いた従来のＬＺＷ符号の
符号化アルゴリズムを示したフローチャートFIG. 15 is a flowchart showing a conventional LZW code encoding algorithm using the external hash method.

【図１６】図１５の符号化に使用する辞書メモリの構成
を示した説明図16 is an explanatory diagram showing the configuration of a dictionary memory used for encoding in FIG.

【図１７】図１６に対応した辞書メモリの配置説明図FIG. 17 is an explanatory diagram of an arrangement of a dictionary memory corresponding to FIG.

[Explanation of symbols]

１：辞書２：辞書検索手段３：データ付加手段４：判定手段１０：原データ１２：ＤＭＡ制御回路１４：ＭＰＵ１６：辞書検索手段（辞書検索回路）１８：複数文字読込み回路１８−１：アドレスレジスタ１８−２，１８−３：レジスタ２０：辞書メモリ２２：一致検査回路２２−１：レジススタ２２−２：比較器２４：連結検出回路２４−１：ＮＯＲ回路２４−２：ＦＦ２６：パイプライン制御回路２８−１：ＦＦ 1: Dictionary 2: Dictionary search means 3: Data addition means 4: Judgment means 10: Original data 12: DMA control circuit 14: MPU 16: Dictionary search means (dictionary search circuit) 18: Multiple character reading circuit 18-1: Address Registers 18-2, 18-3: Register 20: Dictionary memory 22: Matching check circuit 22-1: Register 22-2: Comparator 24: Connection detection circuit 24-1: NOR circuit 24-2: FF 26: Pipeline Control circuit 28-1: FF

フロントページの続き (72)発明者中野泰彦神奈川県川崎市中原区上小田中1015番地富士通株式会社内Continued Front Page (72) Inventor Yasuhiko Nakano 1015 Kamiodanaka, Nakahara-ku, Kawasaki-shi, Kanagawa Fujitsu Limited

Claims

[Claims]

1. An encoded character string is divided into different partial strings, different reference numbers are added to the respective partial strings, and registered in the dictionary (1), and the input character string is stored in the dictionary (1). An encoding means (2) for compressing data by designating by encoding with a reference number of a substring having the maximum length match among the substrings, and an external hash method is used for searching the substring, and the dictionary ( By generating the hash address by adding the information (Km) extracted from the element of the input character (K) to the reference number (i) of the subsequence registered in 1), the additional information (K) is generated.
In the data compression method, which comprises a dictionary search means (3) for generating and searching a linked list having the number of divisions corresponding to the number of bits of m), an input character ( A data compression method characterized in that a division number designating means (4) for designating the number of divisions of the linked list by appropriately changing the number of bits of the information (Km) extracted from the element K) is provided.

2. The data compression method according to claim 1, wherein the division number designating means (4) is information extracted from an element of an input character (K) added to a reference number (i) of the subsequence. A data compression method characterized in that the number of bits of (Km) is designated as the number of bits suitable for high-speed processing or the number of bits suitable for high-compression processing.

3. The data compression method according to claim 1, wherein the division number designating means (4) is information extracted from an element of an input character (K) added to a reference number (i) of the subsequence. A data compression method, wherein the number of bits of (Km) is designated based on information indicating the size of character string data to be encoded.

4. The data compression method according to claim 1, further comprising search division number determination means (5) for determining the division number from the information added to the beginning of the character string. method.

5. The data compression method according to claim 1, wherein the division number designating means (4) is information extracted from an element of an input character (K) added to a reference number (i) of the subsequence. A data compression method, wherein the number of bits of (Km) is set to the number of bits corresponding to a predetermined number of divisions.