JP3012679B2

JP3012679B2 - Data compression method

Info

Publication number: JP3012679B2
Application number: JP2281433A
Authority: JP
Inventors: 泰彦中野; 茂吉田; 佳之岡田; 広隆千葉
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1990-10-19
Filing date: 1990-10-19
Publication date: 2000-02-28
Anticipated expiration: 2015-02-28
Also published as: JPH04156111A

Description

【発明の詳細な説明】〔目次〕概要産業上の利用分野従来の技術（第７図乃至第13図）発明が解決しようとする課題課題を解決するための手段（第１図）作用実施例（ａ）一実施例の説明（第２図乃至第６図）（ｂ）他の実施例の説明発明の効果〔概要〕 LZW符号を用いてデータ圧縮するデータ圧縮方法に関
し、辞書中の登録部分列を高速に検索し、符号化時間を短
縮することを目的とし、符号化済データを相異なる部分列に分けて、該部分列
を辞書に登録しておき、連結する部分列の検索順を示す
検索用リストに従って、入力データと該辞書中の部分列
を比較検索し、入力データを該辞書中の部分列の内、最
大長一致するものの参照番号で指定して符号化するデー
タ圧縮方法において、検索された部分列を該検索用リス
ト中の１つ前の部分列と置き換えるように該検索用リス
トを書き換える。[Contents] Outline Industrial application field Conventional technology (FIGS. 7 to 13) Problems to be Solved by the Invention Means for Solving the Problems (FIG. 1) Action Embodiment (A) Description of one embodiment (FIGS. 2 to 6) (b) Description of another embodiment [Summary] Regarding a data compression method for compressing data using LZW codes, a registered part in a dictionary For the purpose of searching columns at high speed and shortening the encoding time, the encoded data is divided into different sub-sequences, the sub-sequences are registered in a dictionary, and the search order of the concatenated sub-sequences is changed. A data compression method for comparing and searching input data and sub-strings in the dictionary according to the search list shown, and designating and encoding the input data by the reference number of the sub-string in the dictionary that matches the maximum length , One of the searched substrings in the search list The search list is rewritten to replace the previous subsequence.

[Industrial applications]

本発明は、LZW符号を用いてデータ圧縮するデータ圧
縮方法に関する。The present invention relates to a data compression method for compressing data using an LZW code.

近年、文字コード、ベクトル情報、画像等様々な種類
のデータがコンピュータで扱われるようになっており、
扱われるデータ量も急速に増加してきている。大量のデ
ータを扱うときは、データの中の冗長な部分を省いてデ
ータ量を圧縮することで、記憶容量を減らしたり、速く
伝送したりできるようになる。In recent years, various types of data such as character codes, vector information, and images have been handled by computers.
The amount of data handled is also rapidly increasing. When dealing with a large amount of data, by compressing the amount of data by omitting redundant portions in the data, the storage capacity can be reduced or the data can be transmitted faster.

様々なデータを１つの方式でデータ圧縮できる方法と
してユニバーサル符号化が提案されている。ここで、本
発明の分野は、文字コードの圧縮に限らず、様々なデー
タに適用できるが、以下では、情報論理で用いられてい
る呼称を踏襲し、データの1word単位を文字と呼び、デ
ータが任意wordつながったものを文字列と呼ぶことにす
る。Universal coding has been proposed as a method that can compress various data in one system. Here, the field of the present invention is not limited to character code compression, and can be applied to various types of data.However, in the following, following the name used in information logic, a 1-word unit of data is called a character, Are connected to an arbitrary word.

ユニバーサル符号の代表的な方法として、Ziv−Lempe
l（ジブ−レンペル）符号がある（詳しくは、例えば、
宗像「Ziv−Lempelのデータ圧縮法」、情報処理、Vol.2
6、No.1,1985年を参照のこと）。As a typical method of universal code, Ziv-Lempe
There is an l (Jib-Lempel) code (for example, for example,
Munakata "Ziv-Lempel Data Compression", Information Processing, Vol.2
6, No. 1, 1985).

Ziv−Lempel符号ではユニバーサル型と、増分分
解型（Incremental parsing）の２つのアルゴリズムが
提案されている。さらに、ユニバーサル型アルゴリズム
の改良として、LZSS符号がある（T.C.Bell、“Better O
PM/L Text Compression"、IEEE Trans.on Commun.、Vo
l.COM−34,No.12,Dec.1986参照）。また、増分分解型ア
ルゴリズムの改良としては、LZW（Lempel−Ziv−Welc
h）符号がある（T.A.Welch、“A Technique for High−
Performance Deta Compression",Computer,June1984参
照）。For the Ziv-Lempel code, two algorithms of a universal type and an incremental decomposition type (Incremental parsing) have been proposed. Furthermore, as an improvement of the universal algorithm, there is an LZSS code (TCBell, “Better O
PM / L Text Compression ", IEEE Trans.on Commun., Vo
l.COM-34, No. 12, Dec. 1986). Further, as an improvement of the incremental decomposition type algorithm, LZW (Lempel-Ziv-Welc
h) Signed (TAWelch, “A Technique for High-
Performance Data Compression ", Computer, June 1984).

これらの符号の内、高速処理ができることと、アルゴ
リズムの簡単さからLZW符号が記憶装置のファイル圧縮
などで使われるようになっている。Among these codes, the LZW code is used for file compression of a storage device or the like because of its high speed processing and the simplicity of the algorithm.

[Conventional technology]

先ず、LZW符号について第７図乃至第９図を用いて説
明する。First, the LZW code will be described with reference to FIGS. 7 to 9.

第７図はLZW符号化処理フロー図、第８図はLZW復号化
処理フロー図、第９図はLZW符号化、復号化説明図であ
る。FIG. 7 is a flowchart of the LZW encoding process, FIG. 8 is a flowchart of the LZW decoding process, and FIG. 9 is an explanatory diagram of the LZW encoding and decoding.

尚、第９図では説明を簡単にするためabcの３文字の
組合わからなるデータを圧縮、復元する場合を取上げて
いる。FIG. 9 shows a case where data consisting of a combination of three characters of abc is compressed and decompressed for the sake of simplicity.

LZW符号化は、書き替え可能な辞書をもち、入力文字
コード、データ中を相異なる文字列に分け、この文字例
を出現した順に番号を付けて辞書に登録するとともに、
現在入力している文字列を辞書に登録してある最長一致
文字列の番号で表して、符号化するものである。LZW encoding has a rewritable dictionary, divides the input character code and data into different character strings, adds numbers to these character examples in the order they appear, and registers them in the dictionary.
The currently input character string is represented by the number of the longest matching character string registered in the dictionary and encoded.

第７図のフロー図により符号化処理を説明する。 The encoding process will be described with reference to the flowchart of FIG.

先ずステップS1（以下「ステップ」を省略）で予め全
文字につき一文字からなる文字列を初期値として登録し
てから符号化を始める。S1の符号化は、入力した最初の
文字Ｋにより辞書を検索して参照番号ωを求め、これを
語頭文字列（prefix string）とする。First, in step S1 (hereinafter "step" is omitted), a character string consisting of one character for all characters is registered in advance as an initial value, and then encoding is started. In the encoding of S1, a dictionary is searched by the first character K input to obtain a reference number ω, which is used as a prefix string.

次にS2で入力データの次の文字を読み込み、S3で文字
入力が終了したか否かをチェックした後、S4に進んでS1
で求めた語頭文字列ω又はS5のωにS2で読み込んだ文字
Ｋを加えた（ωＫ）が辞書にあるか否か探す。Next, the next character of the input data is read in S2, and it is checked whether or not the character input is completed in S3.
A search is made to see if the dictionary has a character (ωK) obtained by adding the character K read in S2 to the initial character string ω obtained in S5 or ω in S5.

S4で文字列（ωＫ）が辞書になければ、S6に進んでS1
で求めた文字Ｋの参照番号ωを符号語code（ω）として
出力し、また文字列（ωＫ）に新たな参照番号を付加し
て辞書に登録し、さらにS2の入力文字Ｋを参照番号ωに
置き換えるとともに、辞書アドレスをインクリメントし
てS2に戻って次の文字Ｋを読み込む。If the character string (ωK) is not in the dictionary in S4, the process proceeds to S6 and S1
The reference number ω of the character K obtained in step (1) is output as a code word code (ω), a new reference number is added to the character string (ωK), and the character string (ωK) is registered in the dictionary. And increment the dictionary address and return to S2 to read the next character K.

一方、S4で文字列（ωＫ）が辞書にあれば、S5で文字
列（ωＫ）を参照番号ωに置き換え、再びS2に戻って文
字列（ωＫ）が辞書から探せなくなるまで最大一致長の
探索を続ける。On the other hand, if the character string (ωK) is found in the dictionary in S4, the character string (ωK) is replaced with the reference number ω in S5, and the process returns to S2 to search for the maximum matching length until the character string (ωK) cannot be found in the dictionary. Continue.

第９図（Ａ）、（Ｂ）を参照して符号化を具体的に説
明すると次のようになる。The encoding is specifically described below with reference to FIGS. 9 (A) and 9 (B).

先ず第９図（Ａ）の入力データを左から右へ読み込
む。最初の文字ａを入力したとき、辞書10にはａの他に
一致する文字列がないので、output code（参照番号
ω）を符号語として出力する。そして、拡張した文字列
abに参照番号４をつけて辞書10に登録する。実際の登録
は文字列（1b）の形となる。続いて２番目のｂが文字列
の先頭になる。辞書10にはｂの他に一致する文字列がな
いので、参照番号２を符号語として出力し、拡張した文
字列baを実際には2aの形で参照番号５をつけて辞書10に
登録する。３番目のａが次の文字列の先頭になる。以
下、同様にこの処理を続ける。First, the input data of FIG. 9A is read from left to right. When the first character a is input, there is no matching character string other than a in the dictionary 10, so the output code (reference number ω) is output as a code word. And the expanded string
ab is assigned a reference number 4 and registered in the dictionary 10. The actual registration is in the form of a character string (1b). Subsequently, the second b becomes the head of the character string. Since there is no matching character string other than b in the dictionary 10, reference number 2 is output as a code word, and the expanded character string ba is actually registered in the dictionary 10 with reference number 5 in the form of 2a. . The third "a" is the beginning of the next character string. Hereinafter, this process is similarly continued.

第８図の復号化処理は第７図の符号化の逆の操作を行
う。The decoding process in FIG. 8 performs the reverse operation of the encoding in FIG.

第８図の復号化では、符号化と同様に予め辞書に全文
字につき一文字から成る文字列を初期値として登録して
から復号を始める。In the decoding of FIG. 8, similarly to the encoding, the decoding is started after a character string consisting of one character for every character is registered in the dictionary as an initial value in advance.

先ずS1で最初の符号（参照番号）を読み込み、現在の
CODEをOLDcodeとし、最初の符号は既に辞書に登録され
た一文字の参照番号いずれかに該当することから、入力
符号CODEに一致する文字code（Ｋ）を探し出し、文字Ｋ
を出力する。なお、出力した文字（Ｋ）は後述するS8の
例外処理のためFINcharにセットしておく。First, the first code (reference number) is read in S1, and the current code is read.
CODE is OLDcode, and since the first code corresponds to one of the reference numbers of one character already registered in the dictionary, a character code (K) that matches the input code CODE is searched for, and the character K
Is output. The output character (K) is set in FINchar for the exception processing in S8 described later.

次にS2に進んで次の符号を読み込んでCODEにINcodeと
してセットする。Then, the process proceeds to S2, where the next code is read and set as CODE in INCODE.

S3で新たな符号があるか否か、すなわち符号入力の終
了の有無をチェックしてS4に進み、S3で入力された符号
CODEが辞書に定義（登録）されているか否かチェックす
る。Check whether there is a new code in S3, that is, check whether the code input is completed, proceed to S4, and enter the code input in S3.
Check whether the CODE is defined (registered) in the dictionary.

通常、入力した符号語は前回までの処理で辞書に登録
されているため、S5に進んで符号CODEに対応する文字列
code（ωＫ）を辞書から読み出し、S6で文字列Ｋを一時
的にスタックし、参照番号code（ω）を新たなCODEとし
て再度S5に戻り、このS5、S6の手順を再帰的に参照番号
ωが一文字に至るまで繰り返し、最後にS7に進んでS6で
スタックした文字をLIFO（Last In Fast Out）形式でポ
ップアップして出力する。同時にS7において、前回使っ
た符号ωと今回復元した文字列の最初の一文字Ｋを組
（ω、Ｋ）と表した文字列に、新たな参照番号を付加し
て辞書に登録する。Normally, since the input code word is registered in the dictionary in the previous processing, the process proceeds to S5 and the character string corresponding to the code CODE is processed.
The code (ωK) is read from the dictionary, the character string K is temporarily stacked in S6, the reference number code (ω) is set as a new CODE, and the process returns to S5 again. Is repeated until one character is reached. Finally, the process proceeds to S7, and the characters stacked in S6 are popped up and output in LIFO (Last In Fast Out) format. At the same time, in S7, a new reference number is added to the character string represented as a combination (ω, K) of the code ω used last time and the first character K of the character string restored this time, and registered in the dictionary.

第９図（Ｃ）、（Ｄ）を参照して復号化処理を具体的
に説明すると次のようになる。The decoding process is specifically described below with reference to FIGS. 9 (C) and 9 (D).

先ず第９図（Ｄ）で最初の入力文字は１であり、一文
字ａ、ｂ、ｃについては既に参照番号１、２、３として
第９図（Ｃ）に示すように辞書10に登録されているた
め、辞書10の参照により符号１に一致する参照番号の文
字列ａに置き換えて出力する。次の符号２についても同
様にして文字ｂに置き換えて出力する。このとき前回処
理した符号と今回復号した最初の一文字ｂとを組み合わ
せた（1b）に新たな参照番号４を付加して辞書10に登録
する。First, in FIG. 9 (D), the first input character is 1, and one character a, b, c is already registered in the dictionary 10 as reference numbers 1, 2, 3 as shown in FIG. 9 (C). Therefore, by referring to the dictionary 10, it is replaced with the character string a of the reference number that matches the code 1, and is output. Similarly, the next code 2 is replaced with the character b and output. At this time, a new reference number 4 is added to (1b), which is a combination of the previously processed code and the first character b decoded this time, and registered in the dictionary 10.

３番目の符号４は辞書10の探索により1bからabと置き
換えて文字列abを出力する。同時に前回処理した符号２
と今回復号した文字列の１番目の文字ａとの組合わせた
文字列2a（＝ba）を新たな参照番号５を付加して辞書10
に登録する。The third code 4 replaces 1b with ab by searching the dictionary 10 and outputs a character string ab. Code 2 processed at the same time
The character string 2a (= ba) obtained by combining the character string 2a (= ba) with the first character a of the character string decoded this time
Register with.

以下同様に、この処理を繰り返す。 Hereinafter, similarly, this processing is repeated.

第９図（ｄ）の復号化では次の例外処理がある。 9 (d) has the following exception processing.

この例外処理は、第６番目の入力符号８の復号で生ず
る。符号８は復号時に辞書に定義されておらず、復号で
きない。この場合には、前回処理した符号５に前回復号
した文字列baの最初の一文字ｂを加えた文字列5bを求
め、さらに2ab、babと置き換えられて出力される。そし
て、文字列の出力語に前回の符号語５に今回復号した文
字列の文字ｂを加えた文字列5bに参照番号８を付加して
辞書に登録する。This exception processing occurs when the sixth input code 8 is decoded. The code 8 is not defined in the dictionary at the time of decoding and cannot be decoded. In this case, a character string 5b is obtained by adding the first character b of the previously decoded character string ba to the previously processed code 5 and further replaced with 2ab and bab for output. Then, a reference number 8 is added to a character string 5b obtained by adding the character b of the character string decoded this time to the previous code word 5 to the output word of the character string and registered in the dictionary.

この列外処理は第８図の復号化処理フローのS4、S8の
処理を通じて行われ、最終的にS7で文字列の出力と新た
な文字列に参照番号を付加した辞書への登録S7で行われ
る。This out-of-column processing is performed through the processing of S4 and S8 in the decoding processing flow of FIG. 8, and finally the output of the character string in S7 and the registration in the dictionary in which the reference number is added to the new character string. Will be

なお、第７図、第８図の符号化／復号化処理は、同じ
辞書を作り出しながら行う。The encoding / decoding processing shown in FIGS. 7 and 8 is performed while creating the same dictionary.

第７図の流れ図に示す手順で符号化すると、１つの文
字列を辞書探索するたびに最悪、辞書全体をサーチしな
ければならないために時間がかかった。そこで、従来は
辞書探索に外部ハッシュ法（open hashing、または、ch
aining）を用いて処理速度を上げていた（例えば、オー
ム社刊、情報処理学会編、情報処理バンドブック、を参
照のこと）。Encoding according to the procedure shown in the flow chart of FIG. 7 takes time because, at worst, every time a character string is searched for in a dictionary, the entire dictionary must be searched. Therefore, in the past, external hashing (open hashing or ch
aining) to increase the processing speed (see, for example, Ohmsha Publishing, Information Processing Society of Japan, Information Processing Bandbook).

第10図は外部ハッシュ法の説明図である。 FIG. 10 is an explanatory diagram of the external hash method.

文字列からなる集合Ｓを考えた時、Ｓの文字列ｘのあ
る位置を、文字列ｘからｘの格納位置のアドレスが直接
計算できる仕組みになっていると高速の探索ができる。
これを実現するのがハッシュ法である。When considering a set S composed of character strings, a high-speed search can be performed if a position where a character string x of S is located is directly calculated from the character string x.
The hash method realizes this.

記憶場合（ハッシュ表）に０からｍ−１までのアドレ
スが付されているとすると、ハッシュ法では、関数 h:S→〔0,1,・・・,m−１〕を一つ定めて、Ｓの文字列ｘのアドレスをｈ（ｘ）で求
める。関数ｈをハッシュ関数、値ｈ（ｘ）をｘのハッシ
ュ・アドレスといっている。Assuming that addresses from 0 to m-1 are assigned to the storage case (hash table), one function h: S → [0,1,..., M-1] is defined by the hash method. , S, the address of the character string x is determined by h (x). The function h is called a hash function, and the value h (x) is called a hash address of x.

ハッシュ法は、通常、Ｓの大きさがｍに比べてはるか
に大きい場合に用いられる。そこで、ｈをどのように選
んだとしても、Ｓの相異なる文字列x₁、x₂に対してｈ
（x₁）＝ｈ（x₂）となる場合が起こり得る。これを衝突
と呼び、衝突に対する対策の一つとして外部ハッシュ法
（open hashingまたは、chainig）が用いられる。The hash method is usually used when the size of S is much larger than m. Therefore, no matter how h is selected, h is set for different character strings x ₁ and x _{2 of} S.
(X ₁ ) = h (x ₂ ). This is called a collision, and an external hashing method (open hashing or chainig) is used as one of the measures against the collision.

外部ハッシュ法は、第10図に示すように、ハッシュア
ドレスｉごとにリストを用意し、ｈ（ｘ）＝ｉとなるｘ
はそのリストの先頭から順にしまう。同じハッシュアド
レスをもつそれぞれのリストはバケット（bucket）と呼
ばれる。In the external hash method, as shown in FIG. 10, a list is prepared for each hash address i, and x (h (x) = i)
Are ordered from the top of the list. Each list with the same hash address is called a bucket.

第11図乃至第13図は従来技術の説明図であり、第11図
は探索木の一例、第12図はその文字列格納テーブルと外
部ハッシュテーブルの状態、第13図は辞書探索に外部ハ
ッシュ法を用いたLZW符号のフローチャートである。
（詳細については翔泳社刊、AP−Labo編著、ハードディ
スク・クックブック参照のこと）。11 to 13 are explanatory diagrams of the prior art, FIG. 11 is an example of a search tree, FIG. 12 is a state of a character string storage table and an external hash table, and FIG. 6 is a flowchart of an LZW code using the method.
(For details, see Shouisha Publishing, edited by AP-Labo, Hard Disk Cookbook).

第12図において、文字列格納テーブル10bは、インデ
ックスｉに対するコードと文字ext〔ｉ〕が格納されて
おり、配列first100はインデックス（アドレス）ｉに対
する最初の連結インデックスfirst〔ｉ〕が、配列next1
01はそのインデックス（アドレス）ｉに対する次の連結
インデックスnext〔ｉ〕を格納する。In FIG. 12, the character string storage table 10b stores the code and the character ext [i] for the index i, and the array first100 stores the first concatenated index first [i] for the index (address) i in the array next1.
01 stores the next connected index next [i] for the index (address) i.

配列first100が第10図の外部ハッシュ法の索引dictin
aryに対応し、配列next101が第10図の連結リストに対応
する。The array first100 is the index dictin of the external hash method of Fig. 10.
The array next101 corresponds to the linked list in FIG. 10 corresponding to ary.

外部ハッシュ法では、新たな文字Ｋを入力したとき、
それまでの文字列の参照番号（ハッシュ・アドレス）ｉ
に文字Ｋを付加した文字列の参照番号を外部ハッシュ法
で求めるものである。In the external hash method, when a new character K is input,
Reference number (hash address) i of the previous character string
The reference number of the character string in which the character K is added to is obtained by the external hash method.

外部ハッシュ法により、参照番号ｉの文字列に一文字
を付加した文字列をｉをハッシュ・アドレス（索引）と
して引く。連結リストには、文字列ｉに付加された文字
がnameに格納してあり、nameの文字と文字Ｋの一致を検
査し、不一致なら逐次連結リストを手繰ることによっ
て、これまで出現した全ての一文字付加文字列を探索す
ることができる。もし、バケット中に文字Ｋを付加した
文字列がない場合は、最終的にリストの連結アドレス０
が得られ、該当する文字列が登録されていないことを知
ることができる。According to the external hash method, a character string obtained by adding one character to the character string of the reference number i is subtracted as i as a hash address (index). In the linked list, the character added to the character string i is stored in the name, and the matching between the character of the name and the character K is checked. One character additional character string can be searched. If there is no character string to which the character K is added in the bucket, the linked address 0
Is obtained, and it can be known that the corresponding character string is not registered.

次に、第11図に示すように、文字列“ab"、“ah"、
“az"、“abf"、“abx"、“ahd"、“ahf"、“azc"、“a
zg"、“azw"が辞書に登録されている場合に、文字列“a
zw"を検索する場合を例にとり、従来の検索法を説明す
る。Next, as shown in FIG. 11, the character strings “ab”, “ah”,
“Az”, “abf”, “abx”, “ahd”, “ahf”, “azc”, “a
When "zg" and "azw" are registered in the dictionary, the character string "a
A conventional search method will be described using an example of searching for "zw".

初期状態の文字列の格納テーブル10bの状態と外部ハ
ッシュテーブル10a（100、101）の様子を第12図に示す
通りである。The state of the character string storage table 10b in the initial state and the state of the external hash table 10a (100, 101) are as shown in FIG.

第１文字目の“a"は既に登録済みであり、２文字目の
“z"を検索するために、“a"をハッシュアドレスとして
配列fisrを引くと、P1が見つかる。従ってP1に格納され
ている“a"に続く文字列の“ab"が見つかる。しかし、
これとは一致しないので、今度はP1はハッシュアドレス
として配列nextを引くとP2となる。ここでP2に格納され
ている文字列の“ah"が見つかるが、これも一致しない
ので、更にP2をハッシュアドレスとして配列nextを引く
とP3が見つかる。ここで２文字目の文字“z"と一致す
る。The first character "a" has already been registered, and when searching for the second character "z", P1 is found by subtracting the array fisr using "a" as the hash address. Therefore, the character string "ab" following "a" stored in P1 is found. But,
Since it does not match this, P1 becomes P2 when the array next is subtracted as a hash address. Here, "ah" of the character string stored in P2 is found, but this does not match, so P3 is found by further subtracting the array next using P2 as the hash address. Here, it matches the second character “z”.

次に３文字目の文字“w"の検索に移る。３文字目の検
索は、先ずP3をハッシュアドレスとした配列firstを引
く。するとP8となり３文字目の“d"が検索される。しか
し一致しないので、今度はP8をハッシュアドレスとした
配列nextを引くとP9となる。しかし、これも一致しない
のでP9をハッシュアドレスとする配列nextをさらに引く
とP10となり、目的の３文字目の“w"が見つかる。検索
は第11図の〜の経路を通って行われる。Next, the process proceeds to the search for the third character “w”. In the search for the third character, first, an array first with P3 as a hash address is subtracted. Then, it becomes P8 and the third character "d" is searched. However, because they do not match, the array next with P8 as the hash address is now P9. However, this does not match, so if the array next with P9 as the hash address is further subtracted, it becomes P10, and the target third character "w" is found. The search is performed through the route shown in FIG.

[Problems to be solved by the invention]

従来では、一度文字列が登録されると外部ハッシュテ
ーブルは固定となり、書き換えられることはなかった。Conventionally, once a character string is registered, the external hash table is fixed and is not rewritten.

従って検索頻度の高い文字列でも、一度検索経路の長
い所に格納されると、毎回その長い経路を通って検索さ
れるので効率が悪く、登録が遅れたために、使用頻度の
高い文字列でも、同じハッシュアドレスを持つリストの
後の方に登録されていれば、検索に毎回時間がかかると
いう問題があった。Therefore, even if a character string with a high search frequency is stored in a long place on the search path, it is searched every time through the long path, which is inefficient and registration is delayed. If it is registered at the end of the list with the same hash address, there is a problem that it takes time to search each time.

従って、本発明は、辞書中の登録部分列を高速に検索
し、符号化時間を短縮することができるデータ圧縮方法
を提供することを目的とする。Therefore, an object of the present invention is to provide a data compression method capable of searching a registered sub-sequence in a dictionary at high speed and shortening the encoding time.

[Means for solving the problem]

第１図は本発明の原理図である。 FIG. 1 is a diagram illustrating the principle of the present invention.

本発明は、第１図に示すように、符号化済データを相
異なる部分列に分けて、該部分列を辞書10に登録してお
き、連結する部分列の検索順を示す検索用リスト10aに
従って、入力データと該辞書10中の部分列を比較検索
し、入力データを該辞書10中の部分列の内、最大長一致
するものの参照番号で指定して符号化するデータ圧縮方
法において、検索された部分列を該検索用リスト10a中
の１つ前の部分列と置き換えるように該検索用リスト10
aを書き換えるものである。In the present invention, as shown in FIG. 1, the encoded data is divided into different sub-sequences, the sub-sequences are registered in the dictionary 10, and a search list 10a indicating the search order of the connected sub-sequences. In the data compression method, the input data is compared with a subsequence in the dictionary 10 in accordance with the following, and the input data is specified and designated by the reference number of the subsequence in the dictionary 10 that matches the maximum length. The search list 10 is replaced with the subsequence in the search list 10a.
is to rewrite a.

又、本発明は、前記検索用リスト10aは、前記部分列
の文字数の少ないものから多いものに検索順を示してお
り、前記検索された部分列を前記検索用リスト10aの同
一文字数の部分列において１つ前にくるように前記検索
用リスト10aを書き換えるものである。Further, in the present invention, the search list 10a indicates a search order from a small number of characters in the partial string to a large number of characters in the partial string, and the searched partial string is a partial string having the same number of characters in the search list 10a. Is to rewrite the search list 10a so that it comes immediately before.

[Action]

本発明は、文字列の使用頻度を考慮して、一度検索さ
れた文字列は、同じハッシュアドレスを持つ文字列中の
１つ前に置き換える処理を行う。すると２回目の検索か
らは、同じ文字列を検索するのに検索回数が一文字につ
き一回少なつて済む。検索される度に、前へ前へと置き
換えられるので、検索回数が多いほど連結リスト中で、
同じハッシュアドレスを持つ文字列中の先頭の方に登録
されるようになる。According to the present invention, in consideration of the frequency of use of a character string, the character string once searched is replaced with the character string having the same hash address immediately before the character string. Then, after the second search, the number of times of searching for the same character string is reduced by one per character. Each time a search is performed, the search is replaced by the previous search.
The character string having the same hash address will be registered at the beginning.

このため、辞書を用いた符号化を高速化でき、符号化
時間を短縮できる。Therefore, encoding using the dictionary can be speeded up, and the encoding time can be reduced.

〔Example〕

（ａ）一実施例の説明第２図は、本発明の一実施例処理フロー図、第３図乃
至第６図はその動作説明図である。(A) Description of one embodiment FIG. 2 is a processing flowchart of one embodiment of the present invention, and FIGS. 3 to 6 are explanatory diagrams of the operation thereof.

尚、S1〜S7は第７図のS1〜S7と同一である。 Incidentally, S1 to S7 are the same as S1 to S7 in FIG.

S1）プロセッサ（第１図参照）は、第１番目の文字を含
むように辞書10を初期化する。即ち、文字コードｌを辞
書10のアドレス、ｍ（＝ｌ）に登録する。S1) The processor (see FIG. 1) initializes the dictionary 10 to include the first character. That is, the character code 1 is registered at the address m (= 1) of the dictionary 10.

又、文字数に辞書10への現登録文字列数ｎをセットす
る。Also, the number n of characters currently registered in the dictionary 10 is set as the number of characters.

更に文字列格納テーブル10bを用いて入力した最初の
文字Ｋを語頭文字列として参照番号（インデックス）ｉ
に変換する。Further, the first character K input using the character string storage table 10b is referred to as a first character string as a reference number (index) i.
Convert to

次に、辞書検索用配列first100のfirst〔NMAX〕、nex
t101のnext〔NMAX〕、文字列テーブル10bのext〔NMAX〕
を０に初期化する。Next, the first [NMAX] and nex of the dictionary search array first100
next [NMAX] of t101, ext [NMAX] of character string table 10b
Is initialized to 0.

S2）プロセッサは、次の入力文字Ｋを読む。S2) The processor reads the next input character K.

S3）プロセッサは、次の文字Ｋがあるかを調べる。S3) The processor checks whether there is a next character K.

S7）次の文字Ｋがなければ、文字Ｋの符号code（ω）を
出力して終了する。S7) If there is no next character K, the code K (ω) of the character K is output and the processing ends.

S4、S5）一方、次の文字Ｋがあれば、辞書検索ステップ
に入る。S4, S5) On the other hand, if there is the next character K, the process proceeds to the dictionary search step.

先ず、プロセッサは検索用インデックスｉを文字列ω
とし、登録用インデックスｊを０に、位置検知用インデ
ックスlkを０にする。First, the processor sets the search index i to the character string ω
And the registration index j is set to 0 and the position detection index lk is set to 0.

次に、配列first100を参照しインデックスｉの連結イ
ンデックスfirst〔ｉ〕を求め、インデックスｉにセッ
トする。Next, the concatenated index first [i] of the index i is obtained with reference to the array first100, and set to the index i.

このインデック、ｉが「０」かを調べ、「０」でなけ
れば、文字列テーブル10bのインデックスｉの文字列ext
〔ｉ〕が入力文字Ｋかを調べる。The index, i, is checked to see if it is "0". If not, the character string ext of the index i in the character string table 10b
It is checked whether [i] is the input character K.

入力文字Ｋでなければ、登録用インデックスｊを位置
検知用インデックスlkにセーブし、検索用インデックス
ｉを登録用インデックスｊにセーブする。If it is not the input character K, the registration index j is saved in the position detection index lk, and the search index i is saved in the registration index j.

更に、配列next101をインデックスｉで参照し、連結
インデックスnext〔ｉ〕を求め、インデックスｉにセッ
トし、インデックスｉ＝０かの判定に戻る。Further, the array next101 is referred to by the index i, the concatenated index next [i] is obtained, set to the index i, and the process returns to the determination of whether the index i = 0.

S8）一方、文字列ext〔ｉ〕が入力文字「Ｋ」なら、位
置検知用インデックスlkが「０」かを判定する。S8) On the other hand, if the character string ext [i] is the input character “K”, it is determined whether the position detection index lk is “0”.

lkが「０」でないことは、３番目以降の連結インデッ
クスで入力文字Ｋが見付かったことになり、前の部分列
と入れ替えを行う。If lk is not "0", it means that the input character K has been found at the third and subsequent concatenated indices, and the previous subsequence is replaced.

即ち、インデックスｉをインデックスlkのnext〔lk〕
に移し、配列next101のインデックスｉのnext〔ｉ〕を
インデックスｊのnext〔ｊ〕に移し、インデックスｊを
next〔ｉ〕に移して、ステップへ戻る。That is, the index i is set to the index [lk] next [lk].
And the next [i] of the index i of the array next101 is moved to the next [j] of the index j, and the index j is
Move to next [i] and return to step.

一方、lk＝０であると、２番目以前の連結インデック
スで入力文字Ｋが見付かったことになる。On the other hand, if lk = 0, it means that the input character K has been found at the second or earlier connected index.

そこで、登録用インデックスｊが「０」かを判定す
る。Therefore, it is determined whether the registration index j is “0”.

ｊ＝０ということは、first〔ｉ〕、即ち連結の１番
目の連結インデックスで入力文字Ｋが見付かったことに
なり、入れ替えの必要がないため、ステップS2へ戻る。When j = 0, it means that the input character K was found at the first [i], that is, the first connected index of the connection, and there is no need to replace the input character K, so the process returns to step S2.

一方、ｊ≠０であると、２番目以降の連結インデック
スで入力文字Ｋが見付かったことになり、先頭への入れ
替えを行う。On the other hand, if j ≠ 0, the input character K is found at the second and subsequent concatenated indices, and the input character K is replaced with the leading character.

即ち、配列next101のインデックスｉのnext〔ｉ〕を
１つ前のインデックスｊのnext〔ｊ〕に移し、配列firs
t100のインデックスωのfirst〔ω〕をnext〔ｉ〕に移
し、インデックスｉをfirst〔ω〕に移す。That is, the next [i] of the index i of the array next101 is moved to the next [j] of the previous index j, and the array firs
The first [ω] of the index ω of t100 is moved to next [i], and the index i is moved to first [ω].

これによって、入力文字Ｋのインデックスｉが先頭に
書き換えられる。As a result, the index i of the input character K is rewritten to the head.

そして、ステップS2へ戻る。 Then, the process returns to step S2.

S6）ステップS4で、ｉ＝０なら辞書10にないため、登録
ステップに入る。S6) In step S4, if i = 0, it is not in the dictionary 10, so the registration step is started.

先ず、code（ω）を符号語として出力する。 First, code (ω) is output as a code word.

次に、文字列数ｎをインデックスｉにセットし、文字
列数ｎをｎ＋１にインクリメントし、文字Ｋを文字列テ
ーブル10bのインデックスｉにnext〔ｉ〕として格納す
る。Next, the number of character strings n is set in the index i, the number of character strings n is incremented to n + 1, and the character K is stored in the index i of the character string table 10b as next [i].

次に、登録用インデックスｊが「０」かを調べる。 Next, it is checked whether the registration index j is “0”.

ｊ＝０なら、その段の文字列は１個のため、インデッ
クスｉ配列firstのfirst〔ω〕にセットする。If j = 0, since there is only one character string at that stage, it is set to the first [ω] of the index i array first.

一方、ｊ≠０なら、その段の文字列は２個以上のた
め、インデックスｉを配列next〔ω〕にセットする。On the other hand, if j ≠ 0, the index i is set in the array next [ω] because there are two or more character strings at that stage.

そして、文字Ｋをインデックスｉにセットし、ステッ
プに戻る。Then, the character K is set in the index i, and the process returns to the step.

第３図乃至第６図を用いて具体例について説明する。 A specific example will be described with reference to FIGS.

第３図（Ａ）のように、第11図と同一の例をとり、文
字列「azw」を検索することで説明する。As shown in FIG. 3A, the same example as in FIG. 11 will be used to search for the character string "azw".

第３図（Ａ）の場合、文字列テーブル10b、配列first
100、配列next101は第３図（Ｂ）となる。In the case of FIG. 3A, the character string table 10b, the array first
100 and the array next101 are as shown in FIG. 3 (B).

入力文字Ｋ＝Ｚを入力し、配列first、配列nextをた
どると、インデックスｉとして、next〔ｉ＝P2〕＝P3が
得られる。When the input character K = Z is input and the array first and the array next are traced, next [i = P2] = P3 is obtained as the index i.

この時、ext〔P3〕＝Ｚであるから、ｊ＝P2、lk＝P1
である。At this time, since ext [P3] = Z, j = P2, lk = P1
It is.

従って、ステップS8で、lk≠０となり、第３図（Ａ）
のように「Ｚ」を１つ前の「ｈ」といれ代える。Therefore, in step S8, lk ≠ 0, and FIG.
And replace "Z" with the previous "h".

即ち、第４図に示すように、ｉ＝P3をnext〔lk＝P1〕
に、next〔ｉ＝P3〕＝０をnext〔ｊ＝P2〕に、ｊ＝P2を
next〔ｉ＝P3〕にセットする。That is, as shown in FIG. 4, i = P3 is changed to next [lk = P1].
, Next [i = P3] = 0 to next [j = P2], and j = P2
Set next [i = P3].

これによって、連結状態は、第４図（Ａ）のように、
第３図（Ａ）の「Ｚ」と「ｈ」が入れ代わり、テーブル
状態は第４図（Ｂ）の如くなる。As a result, the connected state becomes as shown in FIG.
“Z” and “h” in FIG. 3A are interchanged, and the table state is as shown in FIG. 4B.

次に、ステップS2に戻り、次文字「ｗ」が入力される
と、配列first100、配列next101を第５図（Ｂ）のよう
にたどり、インデックスｉとしてnext〔P9〕＝P10が得
られる。Next, returning to step S2, when the next character "w" is input, the array first100 and the array next101 are traced as shown in FIG. 5B, and next [P9] = P10 is obtained as the index i.

この時、ext〔P10〕＝ｗであり、ｊ＝P9、lk＝P8であ
る。At this time, ext [P10] = w, j = P9, and lk = P8.

従って、ステップS8で、lk≠０となり、第５図（Ａ）
のように「ｗ」を１つ前の「ｇ」と入れ代える。Therefore, in step S8, lk ≠ 0, and FIG.
"W" is replaced with "g" immediately before.

即ち、ｉ＝P10をnext〔lk＝P8〕に、next〔ｉ＝P10〕
＝０をnext〔ｊ＝P9〕に、ｊ＝P9をnext〔ｉ＝P10〕に
セットする。That is, i = P10 is replaced with next [lk = P8], and next [i = P10].
= 0 is set to next [j = P9], and j = P9 is set to next [i = P10].

これによって連結状態は、第６図（Ａ）のように、第
５図（Ａ）の「ｇ」と「ｗ」が入れ代わり、テーブル状
態は第６図（Ｂ）の如くなる。As a result, as shown in FIG. 6A, "g" and "w" in FIG. 5A are interchanged, and the table state becomes as shown in FIG. 6B.

このように、検索文字が１文字づつ見つかる度に、ハ
ッシュテーブルの中身を同じハッシュアドレスを持つ文
字列中の１つ前に置き換えるうに書き換えていくことに
より、次回、同じ文字列の検索が行われた場合、前回よ
り短い経路で検索が行われ検索処理が高速化できる。こ
の方法は、一度検索されても連結リスト中の位置が１つ
ずつしか上がらないので、使用頻度の少ない文字列がき
た場合は、なかなか上がらず、使用頻度の高い文字列が
来た場合は、１回検索される度に確実に１個ずつ位置が
上がっていくので、検索頻度に対応した連結リスト構造
が構築できる。As described above, every time a search character is found, the contents of the hash table are rewritten so as to be replaced by the contents of the character string having the same hash address, so that the same character string is searched next time. In this case, the search is performed on a shorter route than the previous one, and the search process can be sped up. In this method, the position in the linked list only goes up one by one even if it is searched once, so if a character string that is used less frequently comes, it will not go up easily, and if a character string that is used frequently comes, Each time the search is performed once, the position is surely increased by one, so that a linked list structure corresponding to the search frequency can be constructed.

またコード及び文字の登録処理と、ハッシュテーブル
配列first、配列nextの登録処理、連結リストの書き換
えと次文字への検索続行処理をそれぞれパイプラインで
行うようにすればより高速な符号化処理が行える。If the code and character registration processing, hash table array first and array next registration processing, linked list rewriting and search continuation processing for the next character are performed in the respective pipelines, higher-speed encoding processing can be performed. .

（ｂ）他の実施例の説明上述の実施例の他に、本発明は次のような変形が可能
である。(B) Description of Other Embodiments In addition to the above-described embodiments, the present invention can be modified as follows.

ハッシュテーブルの形式は配列first、配列nextの
形式に限らず、他のものであってもよい。The format of the hash table is not limited to the format of the array first and array next, and may be another format.

code（ω）として、更にランレングス符号化等を用
いて圧縮してもよい。The code (ω) may be further compressed using run-length coding or the like.

文字列に限らず、符号化データ列であってもよい。 Not limited to a character string, it may be an encoded data string.

以上本発明を実施例により説明したが、本発明は本発
明の主旨に従い種々の変形が可能であり、本発明からこ
れらを排除するものではない。Although the present invention has been described with reference to the embodiments, the present invention can be variously modified in accordance with the gist of the present invention, and these are not excluded from the present invention.

〔The invention's effect〕

以上説明した様に、本発明によれば，次の効果を奏す
る。As described above, the present invention has the following effects.

LZW符号化に際し、辞書検索をするとき、検索され
た文字列が常に最短経路で検索されるように外部ハッシ
ュテーブルを逐一書き直しながら検索を行うので、使用
頻度の高い文字列など検索経路が短くなり高速な検索処
理ができ、符号化が高速化される。In LZW encoding, when performing a dictionary search, the search is performed while rewriting the external hash table one by one so that the searched character string is always searched on the shortest path, so the search path for frequently used character strings becomes shorter. High-speed search processing can be performed, and encoding can be speeded up.

１つの前のものと置き代えるだけなので、それ程急
激に検索木を変えないので、比較的ランダム符号列の検
索の効率が向上する。Since it only replaces the previous one, it does not change the search tree so rapidly, so that the efficiency of searching for a random code string is relatively improved.

[Brief description of the drawings]

第１図は本発明の原理図、第２図は本発明の一実施例処理フロー図、第３図乃至第６図は本発明の一実施例動作説明図、第７図はLZW符号化処理フロー図、第８図はLZW復号化処理フロー図、第９図はLZW符号化、復号化説明図、第10図は外部ハッシュ法の説明図、第11図乃至第13図は従来技術の説明図である。図中、10……辞書、 10a……検索用リスト。 FIG. 1 is a principle diagram of the present invention, FIG. 2 is a processing flowchart of an embodiment of the present invention, FIGS. 3 to 6 are explanatory diagrams of an operation of the embodiment of the present invention, and FIG. FIG. 8 is an LZW decoding process flow diagram, FIG. 9 is an explanatory diagram of LZW encoding and decoding, FIG. 10 is an explanatory diagram of an external hashing method, and FIGS. FIG. In the figure, 10: dictionary, 10a: search list.

───────────────────────────────────────────────────── フロントページの続き (72)発明者千葉広隆神奈川県川崎市中原区上小田中1015番地富士通株式会社内 (56)参考文献特開昭60−116228（ＪＰ，Ａ) 特開昭63−209228（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) H03M 7/42 ──────────────────────────────────────────────────続き Continuation of the front page (72) Inventor Hirotaka Chiba 1015 Kamiodanaka, Nakahara-ku, Kawasaki City, Kanagawa Prefecture Inside Fujitsu Limited (56) References JP-A-60-116228 (JP, A) JP-A-63-209228 (JP, A) (58) Field surveyed (Int. Cl. ⁷ , DB name) H03M 7/42

Claims

(57) [Claims]

An encoded data is divided into different sub-sequences, the sub-sequences are registered in a dictionary (10), and input in accordance with a search list (10a) indicating a search order of a connected sub-sequence. A data compression method for comparing and searching data and subsequences in the dictionary (10), and designating and encoding input data by a reference number of a subsequence in the dictionary (10) that matches the maximum length, A data compression method characterized by rewriting the search list (10a) so that the searched subsequence is replaced with the previous subsequence in the search list (10a).

2. The search list (10a) indicates a search order from the smallest number of characters in the partial string to the largest number of characters in the partial string, and the searched partial string is represented by the same number of characters in the search list (10a). The data compression method according to claim 1, wherein the search list (10a) is rewritten so as to be immediately preceding in the subsequence.