JPH04129429A

JPH04129429A - Dictionary retrieval system for data compressor

Info

Publication number: JPH04129429A
Application number: JP2251499A
Authority: JP
Inventors: Yoshiyuki Okada; 佳之岡田; Hirotaka Chiba; 広隆千葉; Shigeru Yoshida; 茂吉田; Yasuhiko Nakano; 泰彦中野
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1990-09-20
Filing date: 1990-09-20
Publication date: 1992-04-30
Anticipated expiration: 2015-05-08
Also published as: JP3038234B2

Abstract

PURPOSE:To attain quick dictionary retrieval by forming a link list utilizing the external hash method with a consecutive address in the dictionary retrieval of LZW coding. CONSTITUTION:The system is provided with a dictionary memory 20 in which a link address of an external has address based on input data consists of consecutive address of a next memory 200 partially and with a dictionary retrieval means 16 generating continuously an address of the next memory 200 based on the input data and retrieving an object data of an expansion memory 300 coincident with the input data. That is, the index address of the dictionary memory 20 having a list structure based on the external hash method in the dictionary retrieval of the LZW coding is formed by a consecutive address. Thus, since a succeeding address is predicted when one hash address is decided, the retrieval address of an object character is more quickened and the coding by high speed read for the dictionary memory 20 is attained and the coding processing time is reduced.

Description

[Detailed description of the invention] 【overview】

ユバ−サル符号化の一種である増分分解型の改良として
のＬＺＷ符号化によるデータ圧縮装置の辞書検索方式に
関し、外部ハツシュ法のリスト構造を利用した辞書メモリの高
速読出を可能にして辞書検索時間を短縮することを目的
とし、辞書メモリをファーストメモリ（索引メモリ）、ネクス
トメモリ（連結メモリ）及び候補文字を格納した拡張メ
モリでなる外部ハツシュ法に従ったリスト構造とし、ネ
クスメモリの索引アドレスを連続アドレスに構成し、入
力文字に基づく最初の検索に続いて連続アドレスによる
検索を行って高速化するように構成する。Regarding a dictionary search method for a data compression device using LZW encoding as an improvement of the incremental decomposition type, which is a type of universal encoding, the dictionary search time is reduced by enabling high-speed reading of the dictionary memory using the list structure of the external hash method. With the aim of reducing The system is configured to use consecutive addresses, and after the initial search based on input characters, a search using consecutive addresses is performed to speed up the search.

[Industrial application field]

本発明は、ユバ−サル符号化の一種である増分分解型の
改良としてのＬＺＷ符号化によるデータ圧縮装置の辞書
検索方式に関する。近年、文字コード、ベクトル情報、画像など様々な種類
のデータがコンピュータで扱われるようになっており、
扱われるデータ量も急速に増加してきている。大量のデ
ータを扱うときは、データの中の冗長な部分を省いてデ
ータ量を圧縮することで、記憶容量を減らしたり、速く
伝送したりできるようになる。このような様々なデータを１つの方式でデータ圧縮でき
る方法としてユニバーサル符号化が提案されている。ここで、本発明の分野は、文字コードの圧縮に限らず、
様々なデータに適用できるが、以下では、情報理論で用
いられている呼称を踏襲し、データの１ワ一ド単位を文
字と呼び、データが複数ワードッなかったものを文字列
と呼ぶことにする。ユニバーサル符号の代表的な方法として、ジブーレンペ
ル（ｚｉｖ−Ｌｅｍｐｅｌ）符号がある（詳しくは、例
えば、宗像１”　ｚｉｖ−Ｌｅｍｐｅｌのデータ圧縮法
」、情報処理、Ｖｏｌ、２６．　Ｎｏ、　１．１９ｆ１
５年を参照のこと）。ジフーレンペル符号では、 ■ユニバーサル型 ■増分分解型（Ｉｎｃｒｅｍｅｎｔａｌ　ｐａｒｓｉｎ
ｇ　）の２っのアルゴリズムが提案されている。更に、ユニバーサル型アルゴリズムの改良として、ＬＺ
ＳＳ符号がある（Ｔ、　Ｃ，Ｂｅ１ｌ、　　”ＢｅＮｅ
ｒ　ＯＰＭ／Ｌ　Ｔｅｘｔ　Ｃｏｍｐｒｅｓｓｉｏｎ　
　、　ＩＥＥＥ　Ｔｒａｎｓ、　ｏｎ　Ｃｏｍｍｕｎ、
　、　ＶＯｌ、　Ｃ０Ｍ−３４，ＮＯ，１２，Ｉ！ＥＣ
，１９８６参照）。また、増分分解型アルゴリズムの改良としては、Ｌ　Ｚ
Ｗ　（Ｌｅｍｐｅｌ−２ｉｖ−Ｗｅｌｃｈ）符号がある
（Ｔ、　Ａ、　ＷｅＩｃｈ、　”Ａ　Ｔｅｃｈｎｉｑｕ
ｅ　ｔａｒＨｉｇｈ−Ｐｅｒｆｏｒｍａｎｃｅ　Ｄａｔ
ａＣｏｍｐｔｅｓｓｉｏｎ　　、　Ｃｏｍｐｕｔｅｒ、
　Ｊｕｎｅ　１９８４参照）。これらの符号の内、高速処理ができることと、アルゴリ
ズムの簡単さからＬＺＷ符号が記憶装置のファイル圧縮
などで使われるようになっている。The present invention relates to a dictionary search method for a data compression device using LZW encoding as an improvement on the incremental decomposition type, which is a type of universal encoding. In recent years, computers have come to handle various types of data such as character codes, vector information, and images.
The amount of data handled is also rapidly increasing. When handling large amounts of data, by compressing the amount of data by eliminating redundant parts, you can reduce storage capacity and speed up transmission. Universal encoding has been proposed as a method that can compress such various data using one method. Here, the field of the present invention is not limited to character code compression.
Although it can be applied to a variety of data, in the following we will follow the nomenclature used in information theory, and refer to a single word unit of data as a character, and data that does not contain multiple words as a character string. . A typical universal code is the Ziv-Lempel code (for details, see Munakata 1 "Ziv-Lempel Data Compression Method", Information Processing, Vol. 26. No. 1.19f1)
(See Year 5). In Zifflempel codes, ■ Universal type ■ Incremental parsin type
Two algorithms have been proposed (g). Furthermore, as an improvement of the universal algorithm, LZ
There are SS codes (T, C, Be1l, “BeNe
r OPM/L Text Compression
, IEEE Trans.
, VOl, C0M-34, NO, 12, I! EC
, 1986). Moreover, as an improvement of the incremental decomposition type algorithm, L Z
There is a W (Lempel-2iv-Welch) code (T, A, WeIch, ``A Technique
e tarHigh-Performance Dat
aComputation, Computer,
(See June 1984). Among these codes, the LZW code has come to be used for file compression in storage devices because of its high-speed processing capability and simple algorithm.

[Conventional technology]

従来のＬＺＷ符号による符号化処理フローを第７図に示
し、復号化処理フローを第８図に示す。まずＬＺＷ符号化処理は、書き替え可能な辞書を持ち、
入力文字列の中を相異なる文字列（部分列）に分け、こ
の文字列を出現した順に参照番号を付けて辞書に登録す
ると共に、現在入力している文字列を、辞書に登録しで
ある最長−散文字列の参照番号で表して符号化するもの
である。第９図にＬＺＷ符号化の説明図を示すと共に第１０図に
ＬＺＷ復号化の説明図を示し、更に第１１図に復号化時
に作成される辞書構成例を示す。尚、第９．１０．１１図では説明を簡単にするため、ａ
ｂｃの３文字の組合せからなるデータを圧縮、復元する
場合の例を取り上げている。第７図のＬＺＷ符号化処理では、まずステップ８１、（
以下「ステップ」は省略）で予め辞書に全文字につき一
文字からなる文字列を初期値として登録してから符号化
を始める。Ｓｌの符号化は入力した最初の文字Ｋにより辞書を検索
して参照番号ωを求め、これを語頭文字列とする。次にＳ２で入力データの次の文字Ｋを読込み、Ｓ３で文
字入力が終了したか否かチエツクした後、Ｓ４に進んで
Ｓｌで求めた語頭文字列ωに８２で読込んだ文字Ｋを加
えた拡張文字列（ωＫ）が辞書にあるか否か探す。Ｓ４で文字列（ωＫ）が辞書になければ、Ｓ６に進んで
Ｓｌで求めた文字にの参照番号ωを符号語ｃｏｄｅ　（
ω）として出力し、また文字列（ωＫ）に新たな参照番
号を付加して辞書に登録し、更にＳ２の入力文字Ｋを参
照番号ωに置き換えると共に辞書アドレスｎをインクリ
メントしてＳ２に戻って次の文字Ｋを読み込む。一方、Ｓ４で文字列（ωＫ）が辞書にあればＳ５で文字
列（ωＫ）を参照番号ωに置き換え、再びＳ２に戻って
Ｓ４で文字列（ωＫ）が辞書から探せなくなるまで最大
一致長の検索を続ける。第９，１０図を参照してＬＺＷ符号化を具体的に説明す
ると次のようになる。まず第９図の入力データ１ｎｐｕｔは左から右へと読む
。最初の文字ａを入力した時、辞書には文字ａの他に一
致する文字列がないので、０ＵＴＰＵＴ　Ｃ０ＤＥｌ（
参照番号ω）を符号語して出力する。そして文字ａを語
頭文字列ωとする。次に２番目の文字すを入力したとすると、この入力文字
を語頭文字列ωに加えた拡張文字列ωＫａｂは辞書にな
いことから、文字すの０ＵＴＰＵＴ　Ｃ０ＤＥ　２を符
号語として出力する。そして、拡張文字列ωに＝ａｂに
参照番号４を付けて辞書に登録する。実際の辞書登録は
第１０図の右側に示すように文字列１ｂとして登録され
る。そして文字すが語頭文字列ωとなる。続いて３番目の文字ａを入力したとすると、文字すに語
頭文字列ωを加えた拡張文字列ωに＝ｂａ＝２ａは辞書
にないことから、文字ａの０ＵＴＰＵＴＣＯＩ）Ｅ　１
を符号語として出力した後、拡張文字列ωに＝ｂａを２
ａで表わし、参照番号５を付けて辞書に登録する。そし
て文字ａが新たな語頭文字列ωとなる。４番目の入力文字すについては拡張文字列ωに＝ａｂは
１ｂの符号語４として既に辞書に登録されているので、
文字列ωＫを新たな語頭文字列ωとし、５番目の文字Ｃ
を入力して拡張文字列ωに＝４　ｃ＝ａ　ｂ　ｃを作る
。この拡張文字列ωに＝ａｂｃは辞書に登録されていな
いことから、文字列ａ　ｂ＝１　ｂの０ＵＴＰＵＴ　Ｃ
０ＤＥ　４を符号語として出力し、拡張文字列ωに＝ａ
ｂｃを辞書に４０の形で符号語６として登録する。以下
同様に、この処理を続ける。第８図の復号化処理は第７図の符号化の逆の操作を行う
。第８図のＬＺＷ復号化では、符号化時と同様に予め辞書
に全文字につき一文字からなる文字列を初期値として登
録してから復号化を始める。まずＳｌで最初の符号（参照番号）を読込み、現在のＣ
０ＤＥを０ＬＤｃｏｄｅとし、最初の符号は既に辞書に
登録された一文字の参照番号いずれかに該当することか
ら、入力符号Ｃ０ＤＨに一致する文字ｃｏｄｅ（Ｋ）を
探し出し、文字Ｋを出力する。尚、出力した文字には後の例外処理のためＦＩＮｃｈａ
ｒにセットしておく。次に８２に進んで次の符号を読込んでＣ０ＤＥにＩＮｃ
ｏｄｅとしてセットする。Ｓ３で新たな符号があるか否
か、即ち符号入力の終了の有無をチエツクしてＳ４に進
み、Ｓ３で入力された符号Ｃ０ＤＥが辞書に定義（登録
）されているか否かチエツクする。通常、入力した符号語は前回までの処理で辞書に登録さ
れているため、Ｓ５に進んで符号Ｃ０ＤＨに対応する文
字列ｃｏｄｅ　（ωＫ）を辞書から読出し、Ｓ６で文字
Ｋを一時的にスタックし、参照番号Ｃ０ＤＥ（ω）を新
な符号Ｃ０ＤＥとして再度Ｓ５に戻り、このＳ５．Ｓ６
の手順を再帰的に参照番号ωが一文字Ｋに至るまで繰り
返し、最後に８７に進んでＳ６でスタックした文字をＬ
　Ｉ　ＦＯ（Ｌａｓｔ　Ｉｎ　ＦａｒｔＯｕｊ）形式で
ポツプアップして出力する。同時に８７において、前回
使った符号ωと今回復元した文字列の最初の１文字Ｋを
組（ωＫ）と表した文字列に、新たな参照番号を付加し
て辞書に登録する。第１１図を参照してＬＺＷ復号化処理を具体的に説明す
ると次のようになる。まず第１１図で最初の入力符号語（ＩＮＰＵＴ　Ｃ０Ｄ
Ｅ）は１であり、−文字ａ、ｂ、ｃについては既に参照
番号１．　２．　３として第１０図に示すように辞書に
登録されているため、辞書の参照により符号語１に一致
する参照番号の文字列ａに置き換えて出力する。次の符号語２についても同様にして文字すに置き換えて
出力する。このとき前回処理した符号語１と今回復号し
た文字列の１番目の文字すとを組合わせた文字列ωに＝
１ｂに新たな参照番号４を付加して辞書に登録する。３番目の符号語４は辞書の検索により求めた文字列１ｂ
から文字列ａｂと置き換えて文字列ａｂを出力する。同
時に前回処理した符号語２と今回復号した文字列の１番
目の文字ａとの組合せた文字列ωに＝２ａ　（＝ｂａ）
に新たな参照番号５を付加して辞書に登録する。以下同様に、この処理を繰り返す。第１１図のＬＺＷ復号化では次の例外処理がある。この例外処理は、第６番目の入力符号語８の復号で生ず
る。符号語８は復号時に辞書に定義されておらず、復号
できない。この場合には、前回処理した符号語５に前回
復号した文字列ｂａの最初の一文字すを加えた文字列５
ｂを求め、更に５　ｂ＝２　ａ　ｂ＝ｂ　ａ　ｂと置き換えて出力する例外処理を行う。そして、文字列
の出力後に前回の符号語５に今回復号した文字列の１番
目の文字すを加えた文字列５ｂに参照番号８を付加して
辞書に登録する。この例外処理は、第６図の復号化処理フローの８４、Ｓ
８の処理を通じて行われ、最終的に８７で文字列の出力
と新たな文字列に参照番号を付加した辞書への登録が８
７で行われる。尚、第８．１１図のＬＺＷ復号化は、復号側で符号を解
読しながら辞書をリアルタイムで作り出す場合を説明し
たが、符号化の際に作られた辞書をそのまま復号化側に
コピーとして使用することで符号化しても良い。この場
合に復号化側での例外処理は不要になる。このように第７図の処理フロー図に示す手順でＬＺＷ符
号化を行うと、１つの文字列を辞書検索するたびに、最
悪、辞書全体をサーチしなければならならず、辞書検索
に時間がかかる問題があった。そこで従来の辞書検索方式にあっては、外部／１ツシユ
法（ｏｐｅｎ　ｈａｓｈｉｎｇ　　又は　ｃｈａｉｎｉ
ｎｇ）を用いて処理速度を上げている。まず−膜内なハツシュ法による辞書検索にあっては、複
数の文字列からなる集合Ｓを考えたとき、集合Ｓの文字
列Ｘの格納位置を、文字列Ｘそのものから格納位置を示
すアドレスを直接計算できる仕組みになっており、高速
の辞書検索ができる。文字列の記憶場所、即ちハツシュ表に０から田−１まで
のアドレスが付されているとすると、ハツシュ法では、
関数ｈ：ｓ→（０，１，・・・、　ｍ−１）を一つ定めて、
集合Ｓの文字列Ｘのアドレスをｈ（ｘ）として求める。この関数りをハツシュ関数、値ｈ　（ｘ）を文字列Ｘの
ハツシュアドレスという。ハツシュ法は、通常、集合Ｓの大きさがアドレス数ｍに
比べてはるかに大きい場合に用いられる。しかしながら、ハツシュ関数りをどのように選んだとし
ても、集合Ｓの相異なる文字列ｘｉ、ｘ２に対してｈ　　（ｘｉ）＝ｈ　　（ｘ２）ハツシュアドレスが一致してしまう場合が起こり得る。これを衝突と呼び、衝突に対する対策の一つとして外部
ハツシュ法（ｏｐｅｎ　ｈａｓｈｉｎｇ、　　またはｃ
ｈａｉｎｉｎｇ）が用いられる。外部ハツシュ法は第１２図に示すように、索引（ディレ
クトリ）で示されるハツシュアドレスｉ毎に連結リスト
を用意し、衝突を起こしたハツシュアドレスｈ（ｘ）＝
ｉの文字列Ｘは、連結リストの先頭から順番に格納する
。同じハツシュアドレスｈ　（ｘ）をもつそれぞれの連
結リストはパケット（ｂｕｃｋｅｔ）と呼ばれる。辞書検索に外部ハツシュ法のリスト構造を利用したＬＺ
Ｗ符号化の処理フロー図を第１３図に示す。また第１４
図は外部ハツシュ法に従った辞書メモリの構成を示した
もので、第１５図に示す符号化文字列のツリー構造を例
にとってＬＺＷ符号化の検索手順と登録手順を具体的に
示している。まず第１４図において、辞書メモリは、ファーストメモ
リ（Ｆｉｒｓｔ　Ｍｅｍｏｒｙ）　１００、ネクストメ
モリ　（Ｎｅｘｔ　Ｍｅｍｏｓ）　２００及びネクスト
メモリ２００の拡張メモリ（Ｅｘｔｅｎｌｉｏｎ　Ｍｅ
ｍｏｒｙ）　３００で構成される。ここでファーストメ
モリ１００が第１２図に示した外部ハツシュ法の索引（
ディレクトリ）に対応し、ネクストメモリ２００が第１
２図の連結リストのｒｎｅｘｔｊに対応し、更に拡張メ
モリ３００が第１２図のｒｎａｍｅＪに対応する。また第１５図のツリー構造は、文字に、。、　Ｋ２．。Ｋ２゜２、・・・、に４．が既に登録され、破線で示す
に４２は新たに登録される場合を示している。このツリ
ー構造における階層は、第１３図の処理において、ｉカ
ウンタで示され、同じ階層における文字の数はｊカウン
タで表される。従って、各文字の登録アドレスはω、とじて表わされる
。いま第１５図の登録済みのツリー構造に含まれる文字列「Ｋ１０・Ｋ２２．　Ｋ３２・Ｋ４２」が入力した時の
第１３図の処理フローに従った辞書検索によるＬＺＷ符
号化及び登録を説明すると次のようになる。第１３図において、まずＳｌで次の初期化処理を行う。 ■第１番目の文字を含むように辞書を初期化する。例えばアルファベット２６文字であれば、文字コードを
そのままハツシュアドレスとして第１４図のファースト
メモリに登録する。第１５図の場合、ツリートップにあ
る文字ＫＩＯがアドレスω、０に登録された状態を意味
する。 ■辞書への現在文字登録数ｎを前記■で登録した文字数
にセットする。アルファベット２６文字の場合には、ｎ
＝２６となる。 ■入力した最初の文字Ｋを語頭文字列ｉとする。第１５図の場合、最初の入力文字はに、。であることか
ら語頭文字列ｉ＝１とする。尚、以下の処理フロー中で
は語頭文字列ｉをｊカウンタとして説明する。 ■辞書検索用配列を０に初期化する。即ち、ファースト
、ネクスト及び拡張のメモリの検索用配列はＩｉ＋ｓｌ
［１，Ｎｍａｘ］、ｎｅｘｔ　［１，Ｎｍａｘ］　、Ｅ
ＸＴ　　［１，Ｎｍａｘ］で表わされるので、これを０
に初期化する。Ｓｌの初期化処理が済んだならば、Ｓ２に進んで次の文
字「Ｋ２□」を読込む。次に８３で未処理の文字がある
か否かチエツクする。全ての処理が終ればＳ１６に進ん
で符号語ｃｏｄｅ　（ω）を出力して処理を終了する。このとき未処理文字があるので８５〜Ｓ９に示す辞書検
索ステップに進む。辞書検索ステップは、まずＳ５でアドレスω。にそのときの語頭文字列ｉ＝１の値をセットし、且つｊ
カウンタをｊ＝０にセットする。これによりファースト
メモリのアドレスω１．＝ω、０が生成される。次に８６でファーストメモリ１００のアドレスω、。の
内容を読むとアドレスω１．＝ω２１が得られるので、
ｉカウンタをｉ＝２にセットする。続いてＳ７に進み、ｉ＝０か否かチエツクし、このとき
ｉ＝２であることがらＳ８に進み、Ｓ６のファーストメ
モリ１００から得られたアドレスω２．の拡張メモリ３
００を参照して文字ｒＫ２＋Ｊを読出し、Ｓ２で得てい
る入力文字「Ｋ２２」との一致を判別する。この場合、
両者は不一致であることから８９に進み、このときのｉ
カウンタの値ｉ＝２をｊカウンタにセットしてｊ＝２と
し、またネクストメモリ２００のアドレスω２．に格納
されているアドレスω、＝ω２□のｊをｊカウンタにｉ
＝２としてセットする。このため新たなアドレスω、＝
ω２□が作り出される。続いてＳ７に戻り、ｉ＝０をチエツクし、このときｉ＝
２であることから再びＳ８に進んでアドレスω２□の拡
張メモリ３００の登録文字「Ｋ２□」を読出して入力文
字「Ｋ２２」との一致を判別する。このとき両者は一致することから８２に戻り、次の文字
「Ｋ３□」を読込む。以下同様にして８５〜Ｓ９の処理
の繰り返しにより、第１４図の実線の矢印で示す順番に
辞書検索が行なわれ、既に登録済みの文字ｒＫ４＋Ｊま
での検索処理が行われる。登録文字「Ｋ４□」の検索が終了してＳ８で最後の入力
文字１に４□」で不一致が判別された場合には、Ｓ９で
ｉ＝２にセットすると共に、アドレスω４．のネクスト
メモリ２００の内容が０であることから、ｉ＝０にセッ
トする。このためＳ７に戻った時にｉ＝０が判別され、
辞書検索ステップを抜は出してＳＩＯに進み、それまで
の文字列「Ｋ、。、に２□、に３゜Ｊを示すアドレスω
３□を符号語Ｃ０ｄｅ　（ω）として出力し、ＳＬ１〜
１４の辞書登録ステップに進む。辞書登録ステップにあっては、まずＳｌｌで現在登録文
字列ｎをｎ＝ｉ、即ちｎ＝４にセットし、更にｎを１つ
インクリメントする。そして文字「Ｋ４□」を拡張メモ
リ３００のアドレスω、＝ω４２に登録する。次に８１２でｊ＝０か否かをチエツクし、このときｊ＝
２であることから８１４に進み、ネクストメモリ２００
のアドレスω４．に文字「Ｋ４２」を登録したアドレス
ω４□を書込む。一方、Ｓ１２でｊ＝０であれば、即ち
、ファーストメモリ１００への登録に移行した状態であ
れば、第１４図のファーストメモリ１００のアドレスω
０１．ω２□、ω３２に示すように、拡張メモリ３００
の文字登録アドレスを格納する。この文字登録ステップにおける文字「Ｋ４２」の登録に
より、第１４図のネクストメモリ２００及び拡張メモリ
３００は、下部に破線で仕切って示すアドレスω４１．
ω４２の登録状態となり、第１５図に示すツリー構造に
新たな文字「Ｋ４２」のアドレスω４２が追加されたこ
とになる。尚、第１４図では、アドレスω４．について
は説明の都合上、検索と登録で重複して示している。ＳＬｌ〜Ｓ１４の辞書登録ステップが終了すると、Ｓ１
５で登録した文字「Ｋ４□」を新たな語頭文字列１１即
ち、ｉカウンタの値にセットし、再びＳ２に戻って文字
ｒＫ、ｓ２Ｊをツリートップとして、その後に続く文字
列の辞書検索に移行する。FIG. 7 shows an encoding processing flow using a conventional LZW code, and FIG. 8 shows a decoding processing flow. First, LZW encoding processing has a rewritable dictionary,
Divide the input character string into different character strings (substrings) and register these character strings in the dictionary with reference numbers in the order in which they appear, and also register the currently input character string in the dictionary. The longest-dispersed character string is represented by a reference number and encoded. FIG. 9 shows an explanatory diagram of LZW encoding, FIG. 10 shows an explanatory diagram of LZW decoding, and FIG. 11 shows an example of a dictionary structure created during decoding. In addition, in Figure 9.10.11, to simplify the explanation, a
An example of compressing and restoring data consisting of a combination of three characters bc is taken up. In the LZW encoding process in FIG. 7, first step 81, (
(hereinafter, "step" is omitted), a character string consisting of one character for each character is registered in the dictionary as an initial value, and then encoding is started. To encode Sl, a dictionary is searched using the input first character K to obtain a reference number ω, and this is used as the initial character string. Next, in S2, the next character K of the input data is read, and after checking whether character input is completed in S3, the process proceeds to S4, and the character K read in in 82 is added to the initial character string ω obtained in Sl. Search whether the added extended character string (ωK) exists in the dictionary. If the character string (ωK) is not in the dictionary in S4, proceed to S6 and use the code word code (
ω), add a new reference number to the character string (ωK), register it in the dictionary, replace the input character K in S2 with the reference number ω, increment the dictionary address n, and return to S2. Read the next character K. On the other hand, if the character string (ωK) is in the dictionary in S4, the character string (ωK) is replaced with the reference number ω in S5, the process returns to S2, and the maximum match length is increased in S4 until the character string (ωK) cannot be found in the dictionary. Continue searching. LZW encoding will be specifically explained as follows with reference to FIGS. 9 and 10. First, the input data 1nput in FIG. 9 is read from left to right. When you enter the first letter a, there are no matching strings in the dictionary other than the letter a, so 0UTPUT C0DEl(
The reference number ω) is output as a code word. Then, let the character a be the initial character string ω. Next, if the second character S is input, the extended character string ωKab obtained by adding this input character to the initial character string ω is not found in the dictionary, so the character S 0UTPUT C0DE 2 is output as a code word. Then, the extended character string ω is added with reference number 4 to =ab and registered in the dictionary. In actual dictionary registration, the character string 1b is registered as shown on the right side of FIG. Then, the character S becomes the word-initial character string ω. If you then input the third character a, the expanded character string ω, which is the initial character string ω added to the character S, =ba=2a is not in the dictionary, so the character a is 0UTPUTCOI)E 1
After outputting as a code word, =ba is added to the extended string ω by 2
It is represented by a and is registered in the dictionary with the reference number 5. Then, the letter a becomes a new initial character string ω. For the fourth input character, the extended character string ω = ab is already registered in the dictionary as code word 4 of 1b, so
Let the character string ωK be a new initial character string ω, and the fifth character C
Input and create the expanded character string ω =4 c=a b c. Since =abc is not registered in the dictionary for this extended character string ω, 0UTPUT C of character string a b = 1 b
Output 0DE 4 as a code word, and add =a to the extended character string ω
bc is registered in the dictionary as code word 6 in the form of 40. This process continues in the same manner. The decoding process shown in FIG. 8 performs the reverse operation of the encoding process shown in FIG. 7. In the LZW decoding shown in FIG. 8, decoding is started after a character string consisting of one character for every character is registered in the dictionary as an initial value in the same way as during encoding. First, read the first code (reference number) with Sl, and
Since 0DE is set as 0LDcode and the first code corresponds to one of the reference numbers of one character already registered in the dictionary, the character code (K) matching the input code C0DH is searched and the character K is output. Note that the output characters are FINcha for later exception handling.
Set it to r. Next, go to 82, read the next code, and set it to C0DE.INc
Set as ode. In S3, it is checked whether there is a new code, that is, whether the code input has ended, and the process proceeds to S4, where it is checked whether the code C0DE inputted in S3 is defined (registered) in the dictionary. Normally, the input code word has been registered in the dictionary in the previous processing, so the process advances to S5 and the character string code (ωK) corresponding to the code C0DH is read from the dictionary, and the character K is temporarily stacked in S6. , the reference number C0DE(ω) is changed to a new code C0DE, and the process returns to S5 again. S6
Repeat the steps recursively until the reference number ω reaches one character K, and finally proceed to 87 and change the stacked character to L in S6.
Pop up and output in IFO (Last In FartOuj) format. At the same time, at 87, a new reference number is added to the character string representing the set (ωK) consisting of the previously used code ω and the first character K of the character string restored this time, and the character string is registered in the dictionary. The LZW decoding process will be specifically explained as follows with reference to FIG. First, in Figure 11, the first input code word (INPUT C0D
E) is 1, - for the letters a, b, c already the reference number 1. 2. 3 is registered in the dictionary as shown in FIG. 10, so by referring to the dictionary, the character string a having the reference number matching code word 1 is replaced and output. Similarly, the next code word 2 is replaced with the character S and output. At this time, the character string ω that is a combination of the code word 1 processed last time and the first character of the character string just decoded is =
Add a new reference number 4 to 1b and register it in the dictionary. The third code word 4 is the character string 1b found by dictionary search.
is replaced with the character string ab and the character string ab is output. At the same time, the character string ω that is the combination of code word 2 processed last time and the first character a of the character string just decoded is = 2a (=ba)
is added with a new reference number 5 and registered in the dictionary. This process is repeated in the same manner. The LZW decoding shown in FIG. 11 involves the following exception handling. This exception handling occurs in the decoding of the sixth input codeword 8. Code word 8 is not defined in the dictionary at the time of decoding and cannot be decoded. In this case, the character string 5 is obtained by adding the first character of the previously decoded character string ba to the previously processed code word 5.
Exception processing is performed to find b and then replace it with 5 b=2 a b=b a b and output it. After outputting the character string, a reference number 8 is added to a character string 5b obtained by adding the first character of the currently decoded character string to the previous code word 5, and the result is registered in the dictionary. This exception handling is performed at 84 and S in the decoding process flow in FIG.
Finally, in step 87, the character string is output and the new character string is registered in the dictionary with a reference number added.
It will be held at 7. In addition, in the LZW decoding shown in Figure 8.11, we have explained the case where the dictionary is created in real time while decoding the code on the decoding side, but the dictionary created during encoding can be used as a copy on the decoding side as is. It may be encoded by doing this. In this case, exception handling on the decoding side becomes unnecessary. If LZW encoding is performed according to the procedure shown in the processing flow diagram in Figure 7, in the worst case, the entire dictionary will have to be searched every time a dictionary is searched for one character string, and the dictionary search will take time. There was such a problem. Therefore, in conventional dictionary search methods, open hashing or chain
ng) to increase processing speed. First, in a dictionary search using the intra-membrane hash method, when considering a set S consisting of multiple character strings, the storage position of the character string X in the set S can be determined from the address of the character string X itself. It has a mechanism that allows direct calculation, and allows for high-speed dictionary searches. Assuming that the storage location of the character string, that is, the hash table, is assigned addresses from 0 to -1, in the hash method,
Define one function h: s → (0, 1, ..., m-1),
Find the address of character string X in set S as h(x). This function is called a hash function, and the value h (x) is called a hash address of the character string X. The hash method is normally used when the size of the set S is much larger than the number m of addresses. However, no matter how the hash function is selected, a case may occur in which the hash addresses for different character strings xi and x2 of the set S match h (xi)=h (x2). This is called a collision, and one of the countermeasures against collision is the external hashing method (open hashing, or c
haining) is used. As shown in Figure 12, in the external hash method, a linked list is prepared for each hash address i indicated by an index (directory), and the hash address h(x) =
The character strings X of i are stored in order from the beginning of the linked list. Each linked list with the same hash address h(x) is called a packet. LZ using list structure of external hash method for dictionary search
A processing flow diagram of W encoding is shown in FIG. Also the 14th
The figure shows the structure of a dictionary memory according to the external hash method, and specifically shows the search procedure and registration procedure of LZW encoding using the tree structure of the encoded character string shown in FIG. 15 as an example. First, in FIG. 14, the dictionary memories include a first memory 100, a next memory 200, and an extension memory of the next memory 200.
mory) Consists of 300. Here, the first memory 100 has an index of the external hash method shown in FIG.
directory), and the next memory 200 is the first
This corresponds to rnextj in the linked list in FIG. 2, and the extended memory 300 corresponds to rnameJ in FIG. Also, the tree structure in Figure 15 is for characters. , K2. . K2゜2,..., 4. has already been registered, and the broken line 42 indicates a case where it is newly registered. In the process of FIG. 13, the hierarchy in this tree structure is represented by an i counter, and the number of characters in the same hierarchy is represented by a j counter. Therefore, the registered address of each character is expressed as ω. Now, the LZW encoding and registration by dictionary search according to the processing flow in Figure 13 when the character string "K10/K22. K32/K42" included in the registered tree structure in Figure 15 is input will be explained as follows. become that way. In FIG. 13, first, the following initialization process is performed in Sl. ■ Initialize the dictionary to include the first character. For example, if there are 26 alphabetic characters, the character code is directly registered as a hash address in the first memory shown in FIG. 14. In the case of FIG. 15, it means that the character KIO at the top of the tree is registered at address ω,0. (2) Set the current number of characters registered in the dictionary n to the number of characters registered in (2) above. In the case of 26 alphabetic characters, n
=26. ■Let the first character K input be the initial character string i. In the case of Figure 15, the first input character is . Therefore, the word initial character string i=1. In the following processing flow, the initial character string i will be described as a j counter. ■Initialize the dictionary search array to 0. That is, the search array for first, next, and extended memories is Ii+sl.
[1, Nmax], next [1, Nmax], E
Since it is expressed as XT [1, Nmax], set it to 0
Initialize to . When the initialization process of Sl is completed, the process advances to S2 and the next character "K2□" is read. Next, at 83, a check is made to see if there are any unprocessed characters. When all the processing is completed, the process proceeds to S16, where the code word code (ω) is output, and the processing ends. At this time, since there are unprocessed characters, the process proceeds to dictionary search steps 85 to S9. The dictionary search step begins with address ω in S5. Set the value of the initial word string i=1 at that time, and j
Set the counter to j=0. As a result, the first memory address ω1. =ω, 0 is generated. Next, at 86, the address ω of the first memory 100 is determined. If you read the contents of the address ω1. = ω21 is obtained, so
Set the i counter to i=2. Next, the process proceeds to S7, where it is checked whether or not i=0, and since i=2 at this time, the process proceeds to S8, where the address ω2. extended memory 3
00, the character rK2+J is read out, and it is determined whether it matches the input character "K22" obtained in S2. in this case,
Since the two do not match, proceed to 89, and at this time i
The counter value i=2 is set in the j counter to make j=2, and the address ω2. of the next memory 200 is set. j of address ω, = ω2□ stored in j counter i
=2. Therefore, the new address ω,=
ω2□ is created. Next, return to S7, check i=0, and at this time i=
2, the process goes to S8 again to read the registered character "K2□" from the extended memory 300 at the address ω2□ and determine whether it matches the input character "K22". At this time, since the two match, the process returns to 82 and the next character "K3□" is read. Thereafter, by repeating the processes 85 to S9 in the same manner, dictionary searches are performed in the order shown by the solid arrows in FIG. 14, and the search processing up to the already registered characters rK4+J is performed. When the search for the registered character "K4□" is completed and it is determined in S8 that there is a mismatch between the last input character 1 and 4□, i=2 is set in S9, and the address ω4. Since the content of the next memory 200 is 0, i=0 is set. Therefore, when returning to S7, it is determined that i=0,
Skip the dictionary search step and proceed to SIO, and retrieve the address ω indicating the character string ``K, ., 2□, 3゜J''.
3□ is output as the code word C0de (ω), and SL1~
Proceed to step 14 of dictionary registration. In the dictionary registration step, first, the currently registered character string n is set to n=i, that is, n=4, and n is further incremented by one. Then, the character "K4□" is registered at the address ω,=ω42 of the extended memory 300. Next, in 812, it is checked whether j=0 or not, and at this time, j=
Since it is 2, proceed to 814 and next memory 200
address ω4. Write the address ω4□ in which the characters "K42" are registered. On the other hand, if j=0 in S12, that is, if the state has shifted to registration in the first memory 100, the address ω of the first memory 100 in FIG.
01. As shown in ω2□ and ω32, the extended memory 300
Stores the character registration address. By registering the character "K42" in this character registration step, the next memory 200 and expansion memory 300 in FIG. 14 are stored at the address ω41.
ω42 is now in the registered state, and the address ω42 of the new character “K42” has been added to the tree structure shown in FIG. In addition, in FIG. 14, the address ω4. For convenience of explanation, these are shown redundantly for search and registration. When the dictionary registration steps SL1 to S14 are completed, S1
Set the character "K4□" registered in step 5 to the new initial character string 11, that is, the value of the i counter, and return to S2 again and use the characters rK and s2J as the top of the tree to perform a dictionary search for subsequent character strings. Transition.

[Problem to be solved by the invention]

このように従来のＬＺＷ符号化にあっては、ソフトウェ
アにより第７図に示した処理フローを実行して符号化す
る場合、辞書検索処理に多くの時間を要するとこから、
外部ハツシュ法を利用して第１３図の処理フローにより
辞書検索の高速化を図っている。しかしながら、外部ハツシュ法を利用した辞書検索にあ
っては、候補文字の続出、候補文字と入力文字との照合
、一致不一致の判定がシーケルシャルに行なわれるため
に、辞書検索時間が全体時間の約８０％を占め、より一
層の高速化が必要とされている。また、候補文字の読出しに外部ハツシュ法を利用したリ
スト構造を採用しているため、現在の候補文字の格納ア
ドレスと次の候補文字の格納アドレスとの間にはあまり
関連性がなく、随時読み出すしかなく、アドレスの先だ
しが出来ず、辞書メモリを構成する素子の性能を最大限
に活かすことができなかった。例えば、辞書メモリとしてＤＲＡＭを用いる場合、アド
レスに連続性が無いため、例えば列アドレス（Ｒｏｗ　
Ａｄｒｅｓｓ）を固定して行アドレス（（Ｃｏｔｕｍ　
ＡｄｒｅｓｓＪのみを変化させるページモード等の高速
読出が困難であった。例えば第１４図の場合では、ネクストメモリ２００のア
ドレスω３□、ω３３にはアドレスの連続性が無いので
、第１６図に示すように列アドレスと行アドレスを個別
にその都度指定する普通のり一ドモードとなり、高速化
が図れない問題があった。本発明は、このような従来の問題点に鑑みてなされたも
ので、外部ハツシュ法のリスト構造を利用した辞書メモ
リの高速読出を可能にして辞書検索時間を短縮できるデ
ータ圧縮装置の辞書検索方式を提供することを目的とす
る。As described above, in conventional LZW encoding, when encoding is performed by software by executing the processing flow shown in FIG. 7, dictionary search processing takes a lot of time.
The processing flow shown in FIG. 13 uses the external hash method to speed up the dictionary search. However, in dictionary searches using the external hash method, candidate characters are generated one after another, candidate characters are compared with input characters, and matches and mismatches are determined in a sequential manner. This accounts for 80% of the time, and there is a need for even higher speeds. In addition, since a list structure using an external hash method is used to read out candidate characters, there is little correlation between the storage address of the current candidate character and the storage address of the next candidate character, and they can be read out at any time. Therefore, it was not possible to read addresses first, and it was not possible to make the most of the performance of the elements that made up the dictionary memory. For example, when using DRAM as a dictionary memory, there is no continuity in addresses, so for example, column addresses (Row
Fix the row address ((Cotum
It has been difficult to perform high-speed reading such as in a page mode in which only AddressJ is changed. For example, in the case of Fig. 14, there is no continuity of addresses in the addresses ω3□ and ω33 of the next memory 200, so as shown in Fig. 16, the normal glued mode in which the column address and row address are specified individually each time is used. Therefore, there was a problem that speeding up could not be achieved. The present invention has been made in view of such conventional problems, and provides a dictionary search method for a data compression device that can shorten dictionary search time by enabling high-speed reading of dictionary memory using the list structure of the external hash method. The purpose is to provide

【課題を解決するための手段］第１図は本発明の原理説明図である。まず本発明は、符号化済みデータを相異なる部分列に分
けて各部分列毎に異なる参照番号を付加して辞書に登録
しておき、入力データを該辞書中の部分列の内、最大長
一致する部分列の参照番号で指定して符号化するデータ
圧縮装置、例えばＬｚＷ符号化を行なうデータ圧縮装置
を対象とする。このようなデータ圧縮装置の辞書検索方式として本発明
にあっては、外部ハツシュ法のリスト構造に従ったファ
ーストメモリ１００及び拡張メモリ３００を有するネク
ストメモリ２００を備え、入力データに基づく外部ハツ
シュアドレスの連結アドレスを、部分的にネクストメモ
リ２００の連続アドレスで構成した辞書メモリ２０と、
入力データに基づいてネクストメモリ２００のアドレス
を連続的に発生して入力データに一致する拡張メモリ３
００の候補データを検索する辞書検索手段１６と設けた
ことを特徴とする。ここで辞書検索手段１６は、入力データと候補データの
一致検査、候補データの有無、次の候補データの読出し
を平行して行うパイプライン制御手段２６を備える。ま
た辞書メモリ２０のアクセスモードとして高速ページモ
ードを使用する。【作用】このような構成を備えた本発明によるデータ圧縮装置の
辞書検索方式によれば、ＬＺＷ符号化の辞書検索におい
て外部ハツシュ法に基づくリスト構造をもつ辞書メモリ
の索引アドレスを連続アドレスで構成することで、１つ
のハツシュアドレスが決まれば次のアドレスが予測でき
るので、候補文字の検索アクセスをより高速化し、辞書
メモリの高速読出による符号化ができ、符号化処理時間
を短縮することができる。[Means for Solving the Problem] FIG. 1 is a diagram illustrating the principle of the present invention. First, the present invention divides encoded data into different subsequences, adds a different reference number to each subsequence, and registers it in a dictionary. The present invention is directed to a data compression apparatus that specifies and encodes matching subsequences using reference numbers, such as a data compression apparatus that performs LzW encoding. As a dictionary search method for such a data compression device, the present invention includes a first memory 100 according to the list structure of the external hash method and a next memory 200 having an extended memory 300, and stores an external hash address based on input data. a dictionary memory 20 whose concatenated addresses are partially composed of consecutive addresses of the next memory 200;
Expansion memory 3 that continuously generates addresses of the next memory 200 based on input data to match the input data.
The present invention is characterized in that it is provided with a dictionary search means 16 for searching candidate data of 00. Here, the dictionary search means 16 includes a pipeline control means 26 that performs a match check between input data and candidate data, presence or absence of candidate data, and reading of the next candidate data in parallel. Furthermore, the high speed page mode is used as the access mode for the dictionary memory 20. [Operation] According to the dictionary search method of the data compression device according to the present invention having such a configuration, in the dictionary search of LZW encoding, the index address of the dictionary memory having a list structure based on the external hash method is composed of continuous addresses. By doing this, once one hash address is determined, the next address can be predicted, which speeds up search access for candidate characters, enables encoding by high-speed reading of dictionary memory, and reduces encoding processing time. can.

【Example】

第２図の本発明の辞書検索方式を備えたデータフ圧縮装
置（符号化装置）の一実施例を示した実施例構成図であ
る。第２図において、処理対象となる原データ１０はＤＭＡ
　（Ｄｉｒｅｃｔ　Ｍｅｍｏｒ７　Ａｃｃｅｓｓ）制御
回路１２を介して入力される。制御手段としてのＭＰＵ
Ｉ４は入力された原データ１０を、１−文字と今までの
文字列の参照番号を辞書検索回路１６の複数文字読込み
回路１８にセットした後、辞書検索回路１６を起動する
。辞書検索回路１６は以後、辞書メモリ２０より１文字伸
ばした文字列の候補文字を読込み、一致検査回路２２で
入力文字と候補文字との一致検査（照合）を行ない、連
結検出回路２４で候補文字の有無の検出を行なう。パイプライン制御回路２６は、一致検査回路２２による
入力文字と候補文字の照合と連結検出回路２４による候
補文字の有無の検出とに並行して辞書メモリ２０に次の
候補文字の読出しをかける。このようにパイプライン制御回路２６でパイプライン処
理を行なうことで、候補文字の複数個ごとの探索と照合
処理が辞書メモリ２０のサイクル・タイムで実行するこ
とができる。更に辞書検索回路１６には連続アドレス回路２８が設け
られ、連続アドレス回路２８は連続アドレスを発生し、
複数文字読込み回路１８に辞書メモリ２０の連続アドレ
スに登録されているノ１ツシュアドレス及び候補文字を
読出すようにする。ＬＺＷ符号の符号化では、辞書メモリ２０中の最大長一
致する文字列を求める。従って、入力文字を付加して文
字列を逐次−文字ずつ伸ばしていき、候補文字がなくな
ったところで最大一致長の文字列であることが分かる。このとき、最大一致長文字列まではアドレスωを使用し
た参照番号で表わされており、その参照番号ωを入出力
ボート３０から外部に圧縮された符号語ｃｏｄｅ　（ω
）として出力する。第３図は第２図に示した本発明の辞書検索回路１６の詳
細な構成を辞書メモリ２０と共に示した実施例構成図で
ある。第３図において、アドレスレジスタ１８−１゜レジスタ
１８−２及びレジスタ１８−３が第２図の複数文字読込
み回路１８に対応し、レジスタ２２−１．比較器２２−
２が第２図の一致検査回路２２に対応し、ＮＯＲ回路２
４−１が第２図の連結検出回路２４に対応し、更にカウ
ンタ２８−１が第２図の連続アドレス回路２８に対応す
る。次に第３図の実施例による辞書検索を、第４図の検索手
順と登録手順の説明図及び第５図の辞書メモリ２０の登
録状態を示すツリー構造説明図をを参照して説明する。尚、以下の説明でメモリアドレスωは、上位アドレス１
１下位アドレスｊによりω１．として表されるものとす
る。いま原データ１０として第５図のツリー構造に含まれる
文字列「Ｋ、。、に２゜、　　Ｋ３２．　Ｋ４□」が入力した
とする。まずＭＰＵ１４は最初に入力した文字列の１番目の文字
に１ｏの１文字分の参照番号ω１ｏを上位アドレスを指
定するアドレスレジスタ１８−１にセットすると共に、
入力した２番目の文字に２゜をレジスタ１８−２にセッ
トする。次にパイプライン制御回路２６に辞書検索回路１６の起
動を指令する。パイプライン制御回路２６は、まず連続
アドレスを発生するカウンタ２８−１を０にセットして
から辞書メモリ２０に続出をかける。カウンタ２８−１
の内容は辞書メモリ２０のアドレスの最下位２ビツト（
Ｌ　Ｓ　Ｂ）を指定する。従って、アドレスレジスタ１
８−１の内容ω１．−ω１ｏによるが辞書メモリ２０の
上位アドレスの指定と、カウンタ２８−１の内容ｊ＝０
による辞書メモリ２０の下位アドレスの指定でなるアド
レス（ω＋ｏ＋０）により第４図のファーストメモリ１
００をアクセスしてω２１を読出し、アドレスレジスタ
１８−１にセットする。次にアドレスレジスタ１８−１の内容ω２．を上位アド
レス、カウンタ２８−１の内容を下位アドレスとしたア
ドレス（ω２１十〇）により辞書メモリ２０のネクスト
メモリ２００及び拡張メモリ３００をアクセスし、第１
番目の候補文字に２１及び第２番目の候補文字に２２の
連結アドレスω２□を読出す。読出した第１番目の候補
文字に２＋はレジスタ１８−２にセットし、第２番目の
候補文字に２２の連結アドレスω２□はレジスタ１８−
３にセットする。そして、レジスタ２２−１にセットさ
れている入力文字に２□とレジスタ１８−２にセットさ
れた第１番目の候補文字に２１を比較器２２−２で比較
して一致、不一致の判定を行なう。両者は一致しないことから、不一致の判定が出され、次
の候補文字に２□を読出すが、このときカウンタ２８−
１の値を１つインクリメントして辞書メモリ２０の下位
アドレスのみを変えたネクストメモリ２００のアドレス
（ω２１＋１）を発生し、ネクストメモリ２００のアク
セスで次の候補文字に２２をレジスタ１８−２に読出す
。このとき上位アドレスを指定しているアドレスレジス
タ１８１の内容ω２□はそのままである。以下同様に、この動作を繰りの返すが、カウンタ２８−
１を使用して無闇に連続アドレスを発生させることは、
辞書メモリ２０を大きくするので、この実施例にあって
は、４回の連続アドレスを発生させることを考えている
。例えば文字コードが８ビツトの場合、９ビツトを越え
るアドレスは意味がないからである。従って、検索の４回に１回はネクストメモリ２００の連
続アドレスではなく、ファーストメモリ１００のアクセ
スで得られた連結アドレスω１１を使用する。即ち、上
位アドレスを固定したままカンウタ２８−１で連続する
下位アドレス「００゜０１．１０．ＩＩＪを４回発生す
ると、次の連続アドレス「００」への切替えと同時に、
レジスタ１８−３に４回目のアクセスでレジスタス１８
−３で格納されているファーストメモリ１００の連結ア
ドレスをアドレスレジスタ１８−１にセットする。例えば第４図のネクストメモリ２００の上位アドレスω
９．を例にとると、カウンタ２８−１による下位アドレ
スのインクリメントで、 ω３１十〇（＝ω３．） ω３１＋１　（＝ω３□） ω３１＋２（＝ω３３） ω３１＋３（＝ω３４）が連続アドレスとして発生され、５回目はネクストメモ
リ２００に格納された次の連続アドレスへの連結アドレ
スω３．を続出して上位アドレスとして再び連続アドレ
スの発生を最初から繰り返す。このような辞書検索により比較器２２−２で入力文字と
候補文字の照合が一致したときは、同時にＮＯＲ回路２
４−１でレジスタ１８−３の内容（ネクストメモリ２０
０の連結アドレス）がオル０であるか否かを検査し、オ
ールＯとなるまで辞書検索を繰り返す。もしレジスタ１
８−３がオール０であれば、検索すべき候補文字がなく
なったことが検出される。この場合には、ＭＰＵ１４及
びパイプライン制御回路２６は、辞書検索回路１６の検
索処理を終了させ、それまでの辞書検索により最後に一
致した候補文字のアドレスを符号語ｃｏｄｅ　（ω）と
して出力する。第４図の場合、入力文字ｒＫ４＋ｊでネクストメモリ２
００の内容がオール０となることから、この段階で辞書
検索を終了し、最後に一致した候補文字ｒＫ４＋Ｊのア
ドレス（ω４１＋０）を符号語Ｃ０ｄｅ（ω）として出
力する。続いてＭＰＵ１４は、最後に残った入力文字「Ｋ４□」
につきアドレス（ω４゜＋１）の拡張メモリ３００への
登録と、ネクストメモリ２００のアドレス（ω４１＋０
）への連結アドレスω４□の登録を行った後、入力文字
ｒＫ４２Ｊを語頭文字列ｉとして新たな辞書検索に移行
する。このように本発明では、連続的にアドレスを発生して候
補文字及び連結アドレスを検索できるので、辞書メモリ
２０として第６図に示すような列アドレスを固定した状
態で行アドレスをのみを変化させる連続アドレスによる
高速ページモードが使用でき、候補文字及びその連結ア
ドレスが高速で読出せるので、辞書探索の高速実行が実
現できる。FIG. 3 is an embodiment configuration diagram showing an embodiment of a data compression device (encoding device) equipped with the dictionary search method of the present invention shown in FIG. 2; In FIG. 2, the original data 10 to be processed is a DMA
(Direct Memor7 Access) Input via the control circuit 12. MPU as a control means
I4 sets the input original data 10, the 1-character and the reference number of the previous character string in the multiple character reading circuit 18 of the dictionary search circuit 16, and then activates the dictionary search circuit 16. Thereafter, the dictionary search circuit 16 reads the candidate character of the character string extended by one character from the dictionary memory 20, the match check circuit 22 performs a match check (verification) between the input character and the candidate character, and the concatenation detection circuit 24 reads the candidate character. Detects the presence or absence of. The pipeline control circuit 26 reads the next candidate character from the dictionary memory 20 in parallel with the match checking circuit 22 collating the input character with the candidate character and the connection detection circuit 24 detecting the presence or absence of the candidate character. By performing pipeline processing in the pipeline control circuit 26 in this manner, search and collation processing for each of a plurality of candidate characters can be executed within the cycle time of the dictionary memory 20. Furthermore, the dictionary search circuit 16 is provided with a continuous address circuit 28, which generates continuous addresses.
A plurality of character reading circuit 18 is made to read out one-touch addresses and candidate characters registered at consecutive addresses in a dictionary memory 20. In encoding with the LZW code, a character string in the dictionary memory 20 that matches the maximum length is obtained. Therefore, by adding input characters, the character string is successively extended by one character, and when there are no more candidate characters, it is known that the character string has the maximum matching length. At this time, the character string up to the maximum match length is represented by a reference number using the address ω, and that reference number ω is transferred from the input/output boat 30 to the external compressed code word code (ω
). FIG. 3 is an embodiment configuration diagram showing the detailed configuration of the dictionary search circuit 16 of the present invention shown in FIG. 2 together with the dictionary memory 20. In FIG. 3, address register 18-1, register 18-2, and register 18-3 correspond to the multiple character reading circuit 18 of FIG. 2, and registers 22-1, . Comparator 22-
2 corresponds to the coincidence check circuit 22 in FIG.
4-1 corresponds to the connection detection circuit 24 of FIG. 2, and counter 28-1 corresponds to the continuous address circuit 28 of FIG. Next, dictionary search according to the embodiment of FIG. 3 will be explained with reference to FIG. 4, which is an explanatory diagram of the search procedure and registration procedure, and FIG. 5, which is an explanatory diagram of the tree structure showing the registration state of the dictionary memory 20. In the following explanation, memory address ω is the upper address 1.
ω1 by the 1st lower address j. shall be expressed as . Assume that the character string "K, ., 2°, K32. K4□" included in the tree structure of FIG. 5 is input as the original data 10. First, the MPU 14 sets the reference number ω1o corresponding to one character of 1o to the first character of the first input character string in the address register 18-1 that specifies the upper address.
Set 2° to the second input character in register 18-2. Next, the pipeline control circuit 26 is instructed to start up the dictionary search circuit 16. The pipeline control circuit 26 first sets the counter 28-1, which generates consecutive addresses, to 0, and then writes the consecutive addresses to the dictionary memory 20. counter 28-1
The content of is the lowest two bits of the address of the dictionary memory 20 (
Specify LSB). Therefore, address register 1
Contents of 8-1 ω1. - Due to ω1o, the upper address of the dictionary memory 20 is specified and the content of the counter 28-1 j=0
The address (ω+o+0) specified by the lower address of the dictionary memory 20 by
00 to read out ω21 and set it in the address register 18-1. Next, the contents ω2 of address register 18-1. The next memory 200 and expansion memory 300 of the dictionary memory 20 are accessed by the address (ω2100) with the upper address being the upper address and the contents of the counter 28-1 being the lower address, and the first
The concatenated address ω2□ of 21 for the th candidate character and 22 for the second candidate character is read out. 2+ is set in the register 18-2 for the first candidate character read, and the concatenated address ω2□ of 22 is set in the register 18-2 for the second candidate character.
Set to 3. Then, the comparator 22-2 compares the input character set in the register 22-1 with 2□ and the first candidate character set in the register 18-2 with 21 to determine whether they match or do not match. . Since the two do not match, it is determined that they do not match, and 2□ is read out as the next candidate character, but at this time the counter 28-
The address (ω21+1) of the next memory 200 is generated by incrementing the value of 1 by 1 to change only the lower address of the dictionary memory 20, and when the next memory 200 is accessed, 22 is read into the register 18-2 as the next candidate character. put out. At this time, the contents ω2□ of the address register 181 specifying the upper address remain unchanged. This operation is repeated in the same way, but the counter 28-
1 to generate consecutive addresses blindly,
Since the dictionary memory 20 is to be enlarged, this embodiment is designed to generate four consecutive addresses. For example, if the character code is 8 bits, an address exceeding 9 bits has no meaning. Therefore, once in four searches, the concatenated address ω11 obtained by accessing the first memory 100 is used instead of the continuous address of the next memory 200. That is, when the counter 28-1 generates the consecutive lower address "00°01.10.IIJ" four times while the upper address is fixed, it simultaneously switches to the next consecutive address "00".
Register 18 on the fourth access to register 18-3
-3 is stored in the first memory 100 is set in the address register 18-1. For example, the upper address ω of the next memory 200 in FIG.
9. For example, when the lower address is incremented by the counter 28-1, ω3110 (=ω3.) ω31+1 (=ω3□) ω31+2 (=ω33) ω31+3 (=ω34) are generated as consecutive addresses, and the fifth address is is the link address ω3. to the next consecutive address stored in the next memory 200. are successively generated, and the generation of consecutive addresses is repeated from the beginning as the upper address. When the input character and the candidate character match in the comparator 22-2 through such a dictionary search, the NOR circuit 2 simultaneously
4-1, the contents of register 18-3 (next memory 20
0's concatenated address) is all 0's, and the dictionary search is repeated until all 0's are found. If register 1
If 8-3 is all 0, it is detected that there are no more candidate characters to search for. In this case, the MPU 14 and the pipeline control circuit 26 terminate the search process of the dictionary search circuit 16, and output the address of the last matching candidate character in the dictionary search so far as the code word code (ω). In the case of Figure 4, input character rK4+j moves to next memory 2.
Since the contents of 00 are all 0, the dictionary search is ended at this stage, and the address (ω41+0) of the last matched candidate character rK4+J is output as the code word C0de(ω). Next, the MPU 14 inputs the last remaining input character "K4□"
The address (ω4°+1) is registered in the extended memory 300, and the address (ω41+0) of the next memory 200 is registered in the extended memory 300.
) After registering the connected address ω4□, a new dictionary search is performed using the input character rK42J as the initial character string i. In this way, in the present invention, candidate characters and concatenated addresses can be retrieved by continuously generating addresses, so that only the row address can be changed while the column address is fixed as shown in FIG. 6 as the dictionary memory 20. A high-speed page mode using continuous addresses can be used, and candidate characters and their concatenated addresses can be read out at high speed, so dictionary searches can be performed at high speed.

【Effect of the invention】

以上説明したように本発明によれば、ＬＺＷ符号化の辞
書探索において外部ハツシュ法を利用した連結リストを
連続アドレスで構成したため、１つのアドレスが決まれ
ばアドレスの予測による先だしができ、辞書メモリとし
て例えばＤＲＡＭを使用した際の高速ページモードの実
現によりメモリ素子の性能をフルに発揮して辞書検索の
高速化を図ることができる。As explained above, according to the present invention, a linked list using the external hash method is constructed of consecutive addresses in a dictionary search for LZW encoding, so that once one address is determined, it is possible to start by predicting the address, and the dictionary memory For example, by realizing a high-speed page mode when using a DRAM, the performance of the memory element can be fully utilized and dictionary searches can be performed at high speed.

[Brief explanation of the drawing]

第１図は本発明の原理説明図；第２図は本発明の実施例構成図；第３図は本発明の辞書検索回路の詳細を示た実施例構成
説明図；第４図は本発明のＬＺＷ符号の検索手順と登録手順の説
明図；第５図は本発明の辞書登録内容を示すツリー構造図；第
６図は本発明の高速ページモードを使用した場合のＤＲ
ＡＭリードモードのタイミングチャート；第７図は従来のＬＺＷ符号化処理フロー図；第８図は従
来のＬＺＷ復号化処理フロー図；第９図はＬＺＷ符号化
説明図；第１０図は辞書構成例の説明図；第１１図はＬＺＷ符号化説明図；第１２図は外部ハツシュ法のリスト構造説明図；第１３
図は外部ハツシュ法を利用した従来のＬＺＷ符号化処理
フロー図；第１４図は第１３図のＬＺＷ符号の検索手順と登録手順
の説明図；第１５図は第１４図の辞書登録内容を示たツリー構造図
；第１６図は高速ページモードが使用出来ないＤＲＡＭリ
ードモードのタイミングチャートである。図中、１０：原データ１２　：　ＤＭＡ制御回路１４：ＭＰＵ１６：辞書検索手段（辞書検索回路）１８：複数文字読込み回路１８−１：アドレスレバスタ１８−２．１８−３：レジスタ２０：辞書メモリ２２ニ一致検査回路２２−１　：レジスタ２２−２：比較器２４：連結検出回路２４−１：ＮＯＲ回路２６：パイプライン制御回路２８：連続アドレス回路２８−１　：カウンタ３０：入出力回路１００・ファーストメモリ２００ネクストメモリ３００；拡張メモリFIG. 1 is a diagram explaining the principle of the present invention; FIG. 2 is a configuration diagram of an embodiment of the invention; FIG. 3 is a diagram illustrating the configuration of an embodiment showing details of the dictionary search circuit of the invention; FIG. An explanatory diagram of the LZW code search procedure and registration procedure; Figure 5 is a tree structure diagram showing the dictionary registration contents of the present invention; Figure 6 is the DR when using the high-speed page mode of the present invention.
Timing chart of AM read mode; Figure 7 is a conventional LZW encoding process flow diagram; Figure 8 is a conventional LZW decoding process flow diagram; Figure 9 is an explanatory diagram of LZW encoding; Figure 10 is a dictionary configuration example Fig. 11 is an explanatory diagram of LZW encoding; Fig. 12 is an explanatory diagram of the list structure of the external hash method;
The figure is a flowchart of conventional LZW encoding processing using the external hash method; Figure 14 is an explanatory diagram of the LZW code search procedure and registration procedure of Figure 13; Figure 15 shows the dictionary registration contents of Figure 14. Figure 16 is a timing chart of the DRAM read mode in which the high-speed page mode cannot be used. In the figure, 10: Original data 12: DMA control circuit 14: MPU 16: Dictionary search means (dictionary search circuit) 18: Multiple character reading circuit 18-1: Address register 18-2.18-3: Register 20: Dictionary Memory 22: Coincidence check circuit 22-1: Register 22-2: Comparator 24: Connection detection circuit 24-1: NOR circuit 26: Pipeline control circuit 28: Continuous address circuit 28-1: Counter 30: Input/output circuit 100・First memory 200 Next memory 300; Expansion memory

Claims

[Claims]

(1) Divide the encoded data into different subsequences, add a different reference number to each subsequence, and register it in a dictionary, and match the input data with the maximum length among the subsequences in the dictionary. A data compression device that specifies and encodes a subsequence using a reference number, comprising a next memory (200) having a first memory (100) and an extended memory (300) according to a list structure of an external hash method, a dictionary memory (20) in which concatenated addresses of external hash addresses based on the input data are partially composed of consecutive addresses of the next memory (200);
) for searching candidate data in the extended memory (300) that matches input data; .

(2) In the dictionary search method of the data compression device according to claim 1, the dictionary search means (16) performs a match check between the input data and candidate data, the presence or absence of candidate data, and the reading of the next candidate data in parallel. A dictionary search method for a data compression device, characterized in that it is equipped with a pipeline control means (26) for performing the following operations.

(3) A dictionary search method for a data compression device according to claim 1, wherein a high-speed page mode is used as an access mode for the dictionary memory (20).