JP3038234B2

JP3038234B2 - Dictionary search method for data compression equipment

Info

Publication number: JP3038234B2
Application number: JP2251499A
Authority: JP
Inventors: 佳之岡田; 広隆千葉; 茂吉田; 泰彦中野
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1990-09-20
Filing date: 1990-09-20
Publication date: 2000-05-08
Anticipated expiration: 2015-05-08
Also published as: JPH04129429A

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【Overview】

ユバーサル符号化の一種である増分分解型の改良とし
てのLZW符号化によるデ−タ圧縮装置の辞書検索方式に
関し、外部ハッシュ法のリスト構造を利用した辞書メモリの
高速読出を可能にして辞書検索時間を短縮することを目
的とし、辞書メモリをファーストメモリ（索引メモリ）、ネクス
トメモリ（連結メモリ）及び候補文字を格納した拡張メ
モリでなる外部ハッシュ法に従ったリスト構造とし、ネ
スクトメモリの索引アドレスを連続アドレスに構成し、
入力文字に基づく最初の検索に続いて連続アドレスによ
る検索を行って高速化するように構成する。Regarding a dictionary search method of a data compression device using LZW coding as an improvement of the incremental decomposition type, which is a kind of universal coding, the dictionary search time is made possible by enabling high-speed reading of a dictionary memory using a list structure of an external hash method. The dictionary memory has a list structure according to the external hash method consisting of a first memory (index memory), a next memory (concatenated memory), and an extended memory storing candidate characters, and a continuous index address of the nested memory. Configured in the address,
The configuration is such that the search is performed by a continuous address following the first search based on the input character, thereby speeding up.

[Industrial applications]

本発明は、ユバーサル符号化の一種である増分分解型
の改良としてのLZW符号化によるデ−タ圧縮装置の辞書
検索方式に関する。近年、文字コ−ド、ベクトル情報、画像など様々な種
類のデ−タがコンピュ−タで扱われるようになってお
り、扱われるデ−タ量も急速に増加してきている。大量
のデ−タを扱うときは、デ−タの中の冗長な部分を省い
てデ−タ量を圧縮することで、記憶容量を減らしたり、
速く伝送したりできるようになる。このような様々なデ−タを１つの方式でデ−タ圧縮で
きる方法としてユニバ−サル符号化が提案されている。ここで、本発明の分野は、文字コ−ドの圧縮に限ら
ず、様々なデ−タに適用できるが、以下では、情報理論
で用いられている呼称を踏襲し、デ−タの１ワード単位
を文字と呼び、デ−タが複数ワードツながったものを文
字列と呼ぶことにする。ユニバ−サル符号の代表的な方法として、ジブーレン
ペル（Ziv−Lempel）符号がある（詳しくは、例えば、
宗像「Ziv−Lempelのデ−タ圧縮法」、情報処理、Vol.2
6,No.1,1985年を参照のこと）。ジフーレンペル符号では、ユニバ−サル型増分分解型（Incremental parsing）の２つのアルゴリズムが提案されている。更に、ユニバ−サル型アルゴリズムの改良として、LZ
SS符号がある（T.C.Bell,“Better OPM/L Text Compres
sion",IEEE Trans.on Commun.,Vol.COM−34,No.12,DEC.
1986参照）。また、増分分解型アルゴリズムの改良としては、LZW
（Lempel−Ziv−Welch）符号がある（T.A.Welch,“A Te
chnique for High−Performance Data Compression",Co
mputer,June 1984参照）。これらの符号の内、高速処理ができることと、アルゴ
リズムの簡単さからLZW符号が記憶装置のファイル圧縮
などで使われるようになっている。The present invention relates to a dictionary search method for a data compression device using LZW coding as an improvement of an incremental decomposition type, which is a kind of universal coding. In recent years, various types of data such as character codes, vector information, and images have been handled by computers, and the amount of data handled has rapidly increased. When dealing with a large amount of data, the storage capacity can be reduced by omitting redundant portions in the data and compressing the data amount.
It can be transmitted quickly. Universal coding has been proposed as a method for compressing such various data in a single system. Here, the field of the present invention is not limited to character code compression, and can be applied to various data. In the following, one word of data will be used, following the name used in information theory. The unit is called a character, and the data obtained by combining a plurality of words is called a character string. As a typical method of the universal code, there is a Ziv-Lempel code (for example, for example, for example,
Munakata "Ziv-Lempel Data Compression Method", Information Processing, Vol.2
6, No. 1, 1985). Two algorithms of the universal type and the incremental parsing type have been proposed for the dihurempel code. Furthermore, as an improvement of the universal algorithm, LZ
There is an SS code (TCBell, “Better OPM / L Text Compres
sion ", IEEE Trans.on Commun., Vol.COM-34, No. 12, DEC.
1986). In addition, as an improvement of the incremental decomposition algorithm, LZW
(Lempel-Ziv-Welch) code (TAWelch, "A Te
chnique for High-Performance Data Compression ", Co
mputer, June 1984). Among these codes, the LZW code is used for file compression of a storage device or the like because of its high speed processing and the simplicity of the algorithm.

【従来の技術】従来のLZW符号による符号化処理フローを第７図に示
し、復号化処理フローを第８図に示す。まずLZW符号化処理は、書き替え可能な辞書を持ち、
入力文字列の中を相異なる文字列（部分列）に分け、こ
の文字列を出現した順に参照番号を付けて辞書に登録す
ると共に、現在入力している文字列を、辞書に登録して
ある最長一致文字列の参照番号で表して符号化するもの
である。第９図にLZW符号化の説明図を示すと共に第10図にLZW
復号化の説明図を示し、更に第11図に復号化時に作成さ
れる辞書構成例を示す。尚、第9,10,11図では説明を簡単にするため、abcの３
文字の組合せからなるデ−タを圧縮、復元する場合の例
を取り上げている。第７図のLZW符号化処理では、まずステップS1（以下
「ステップ」は省略）で予め辞書に全文字につき一文字
からなる文字列を初期値として登録してから符号化を始
める。 S1の符号化は入力した最初の文字Ｋにより辞書を検索
して参照番号ωを求め、これを語頭文字列とする。次にS2で入力データの次の文字Ｋを読込み、S3で文字
入力が終了したか否かチェックした後、S4に進んでS1で
求めた語頭文字列ωにS2で読込んだ文字Ｋを加えた拡張
文字列（ωＫ）が辞書にあるか否か探す。 S4で文字列（ωＫ）が辞書になければ、S6に進んでS1
で求めた文字Ｋの参照番号ωを符号語code（ω）として
出力し、また文字列（ωＫ）に新たな参照番号を付加し
て辞書に登録し、更にS2の入力文字Ｋを参照番号ωに置
き換えると共に辞書アドレスｎをインクリメントしてS2
に戻って次の文字Ｋを読み込む。一方、S4で文字列（ωＫ）が辞書にあればS5で文字列
（ωＫ）を参照番号ωに置き換え、再びS2に戻ってS4で
文字列（ωＫ）が辞書から探せなくなるまで最大一致長
の検索を続ける。第9,10図を参照してLZW符号化を具体的に説明すると
次のようになる。まず第９図の入力データinputは左から右へと読む。
最初の文字ａを入力した時、辞書には文字ａの他に一致
する文字列がないので、OUTPUT CODE 1（参照番号ω）
を符号語して出力する。そして文字ａを語頭文字列ωと
する。次に２番目の文字ｂを入力したとすると、この入力文
字を語頭文字列ωに加えた拡張文字列ωＫ＝abは辞書に
ないことから、文字ｂのOUTPUT CODE 2を符号語として
出力する。そして、拡張文字列ωＫ＝abに参照番号４を
付けて辞書に登録する。実際の辞書登録は第10図の右側
に示すように文字列1bとして登録される。そして文字ｂ
が語頭文字列ωとなる。続いて３番目の文字ａを入力したとすると、文字ｂに
語頭文字列ωを加えた拡張文字列ωＫ＝ba＝2aは辞書に
ないことから、文字ａのOUTPUT CODE 1を符号語として
出力した後、拡張文字列ωＫ＝baを2aで表わし、参照番
号５を付けて辞書に登録する。そして文字ａが新たな語
頭文字列ωとなる。４番目の入力文字ｂについては拡張文字列ωＫ＝abは
1bの符号語４として既に辞書に登録されているので、文
字列ωＫを新たな語頭文字列ωとし、５番目の文字ｃを
入力して拡張文字列ωＫ＝4c＝abcを作る。この拡張文
字列ωＫ＝abcは辞書に登録されていないことから、文
字列ab＝1bのOUTPUT CODE 4を符号語として出力し、拡
張文字列ωＫ＝abcを辞書に4cの形で符号語６として登
録する。以下同様に、この処理を続ける。第８図の復号化処理は第７図の符号化の逆の操作を行
う。第８図のLZW復号化では、符号化時と同様に予め辞書
に全文字につき一文字からなる文字列を初期値として登
録してから復号化を始める。まずS1で最初の符号（参照番号）を読込み、現在のCO
DEをOLDcodeとし、最初の符号は既に辞書に登録された
一文字の参照番号いずれかに該当することから、入力符
号CODEに一致する文字code（Ｋ）を探し出し、文字Ｋを
出力する。尚、出力した文字Ｋは後の例外処理のためFINcharに
セットしておく。次にS2に進んで次の符号を読込んでCODEにINcodeとし
てセットする。S3で新たな符号があるか否か、即ち符号
入力の終了の有無をチェックしてS4に進み、S3で入力さ
れた符号CODEが辞書に定義（登録）されているか否かチ
ェックする。通常、入力した符号語は前回までの処理で
辞書に登録されているため、S5に進んで符号CODEに対応
する文字列code（ωＫ）を辞書から読出し、S6で文字Ｋ
を一時的にスタックし、参照番号CODE（ω）を新な符号
CODEとして再度S5に戻り、このS5,S6の手順を再帰的に
参照番号ωが一文字Ｋに至るまで繰り返し、最後にS7に
進んでS6でスタックした文字をLIFO（Last In Fast Ou
t）形式でポップアップして出力する。同時にS7におい
て、前回使った符号ωと今回復元した文字列の最初の１
文字Ｋを組（ωＫ）と表した文字列に、新たな参照番号
を付加して辞書に登録する。第11図を参照してLZW復号化処理を具体的に説明する
と次のようになる。まず第11図で最初の入力符号語（INPUT CODE）は１で
あり、一文字a,b,cについては既に参照番号1,2,3として
第10図に示すように辞書に登録されているため、辞書の
参照により符号語１に一致する参照番号の文字列ａに置
き換えて出力する。次の符号語２についても同様にして文字ｂに置き換え
て出力する。このとき前回処理した符号語１と今回復号
した文字列の１番目の文字ｂとを組合わせた文字列ωＫ
＝1bに新たな参照番号４を付加して辞書に登録する。３番目の符号語４は辞書の検索により求めた文字列1b
から文字列abと置き換えて文字列abを出力する。同時に
前回処理した符号語２と今回復号した文字列の１番目の
文字ａとの組合せた文字列ωＫ＝2a（＝ba）に新たな参
照番号５を付加して辞書に登録する。以下同様に、この処理を繰り返す。第11図のLZW復号化では次の例外処理がある。この例外処理は、第６番目の入力符号語８の復号で生
ずる。符号語８は復号時に辞書に定義されておらず、復
号できない。この場合には、前回処理した符号語５に前
回復号した文字列baの最初の一文字ｂを加えた文字列5b
を求め、更に 5b＝2ab＝bab と置き換えて出力する例外処理を行う。そして、文字列
の出力後に前回の符号語５に今回復号した文字列の１番
目の文字ｂを加えた文字列5bに参照番号８を付加して辞
書に登録する。この例外処理は、第６図の復号化処理フローのS4,S8
の処理を通じて行われ、最終的にS7で文字列の出力と新
たな文字列に参照番号を付加した辞書への登録がS7で行
われる。尚、第8,11図のLZW復号化は、復号側で符号を解読し
ながら辞書をリアルタイムで作り出す場合を説明した
が、符号化の際に作られた辞書をそのまま復号化側にコ
ピーとして使用することで符号化しても良い。この場合
に復号化側での例外処理は不要になる。このように第７図の処理フロー図に示す手順でLZW符
号化を行うと、１つの文字列を辞書検索するたびに、最
悪、辞書全体をサーチしなければならならず、辞書検索
に時間がかかる問題があった。そこで従来の辞書検索方式にあっては、外部ハッシュ
法（open hashing又はchaining）を用いて処理速度を上
げている。まず一般的なハッシュ法による辞書検索にあっては、
複数の文字列からなる集合Ｓを考えたとき、集合Ｓの文
字列ｘの格納位置を、文字列ｘそのものから格納位置を
示すアドレスを直接計算できる仕組みになっており、高
速の辞書検索ができる。文字列の記憶場所、即ちハッシュ表に０からｍ−１ま
でのアドレスが付されているとすると、ハッシュ法で
は、関数 h:S→〔0,1,・・・,m−１〕を一つ定めて、集合Ｓの文字列ｘのアドレスをｈ（ｘ）
として求める。この関数ｈをハッシュ関数、値ｈ（ｘ）
を文字列ｘのハッシュアドレスという。ハッシュ法は、通常、集合Ｓの大きさがアドレス数ｍ
に比べてはるかに大きい場合に用いられる。しかしながら、ハッシュ関数ｈをどのように選んだと
しても、集合Ｓの相異なる文字列x1,x2に対してｈ（x1）＝ｈ（x2）ハッシュアドレスが一致してしまう場合が起こり得る。
これを衝突と呼び、衝突に対する対策の一つとして外部
ハッシュ法（open hashing,またはchaining）が用いら
れる。外部ハッシュ法は第12図に示すように、索引（ディレ
クトリ）で示されるハッシュアドレスｉ毎に連結リスト
を用意し、衝突を起こしたハッシュアドレスｈ（ｘ）＝
ｉの文字列ｘは、連結リストの先頭から順番に格納す
る。同じハッシュアドレスｈ（ｘ）をもつそれぞれの連
結リストはバケット（bucket）と呼ばれる。辞書検索に外部ハッシュ法のリスト構造を利用したLZ
W符号化の処理フロー図を第13図に示す。また第14図は
外部ハッシュ法に従った辞書メモリの構成を示したもの
で、第15図に示す符号化文字列のツリー構造を例にとっ
てLZW符号化の検索手順と登録手順を具体的に示してい
る。まず第14図において、辞書メモリは、ファーストメモ
リ（First Memory）100、ネクストメモリ（Next Memor
y）200及びネクストメモリ200の拡張メモリ（Extention
Memory）300で構成される。ここでファーストメモリ10
0が第12図に示した外部ハッシュ法の索引（ディレクト
リ）に対応し、ネクストメモリ200が第12図の連結リス
トの「next」に対応し、更に拡張メモリ300が第12図の
「name」に対応する。また第15図のツリー構造は、文字K₁₀,K₂₁,K₂₂,、・・
・,K₄₁が既に登録され、破線で示すK₄₂は新たに登録さ
れる場合を示している。このツリー構造における階層
は、第13図の処理において、ｉカウンタで示され、同じ
階層における文字の数はｊカウンタで表される。従って、各文字の登録アドレスはω_ijとして表わされ
る。いま第15図の登録済みのツリー構造に含まれる文字列「K₁₀,K₂₂,K₃₂,K₄₂」が入力した時の第13図の処理フローに従った辞書検索に
よるLZW符号化及び登録を説明すると次のようになる。第13図において、まずS1で次の初期化処理を行う。第１番目の文字を含むように辞書を初期化する。例えばアルファベット26文字であれば、文字コードを
そのままハッシュアドレスとして第14図のファーストメ
モリに登録する。第15図の場合、ツリートップにある文
字K₁₀がアドレスω₁₀に登録された状態を意味する。辞書への現在文字登録数ｎを前記で登録した文字数
にセットする。アルファベット26文字の場合には、ｎ＝
26となる。入力した最初の文字Ｋを語頭文字列ｉとする。第15図
の場合、最初の入力文字はK₁₀であることから語頭文字
列ｉ＝１とする。尚、以下の処理フロー中では語頭文字
列ｉをｉカウンタとして説明する。辞書検索用配列を０に初期化する。即ち、ファース
ト、ネクスト及び拡張のメモリの検索用配列はfirst
［1,Nmax］,next［1,Nmax］、EXT［1,Nmax］で表わされ
るので、これを０に初期化する。 S1の初期化処理が済んだならば、S2に進んで次の文字
「K₂₂」を読込む。次にS3で未処理の文字があるか否か
チェックする。全ての処理が終ればS16に進んで符号語c
ode（ω）を出力して処理を終了する。このとき未処理
文字があるのでS5〜S9に示す辞書検索ステップに進む。辞書検索ステップは、まずS5でアドレスω_ijにそのと
きの語頭文字列ｉ＝１の値をセットし、且つｊカウンタ
をｊ＝０にセットする。これによりファーストメモリの
アドレスω_ij＝ω₁₀が生成される。次にS6でファーストメモリ100のアドレスω₁₀の内容
を読むとアドレスω_ij＝ω₂₁が得られるので、ｉカウン
タをｉ＝２にセットする。続いてS7に進み、ｉ＝０か否かチェックし、このとき
ｉ＝２であることからS8に進み、S6のファーストメモリ
100から得られたアドレスω₂₁の拡張メモリ300を参照し
て文字「K₂₁」を読出し、S2で得ている入力文字「K₂₂」
との一致を判別する。この場合、両者は不一致であるこ
とからS9に進み、このときのｉカウンタの値ｉ＝２をｊ
カウンタにセットしてｊ＝２とし、またネクストメモリ
200のアドレスω₂₁に格納されているアドレスω_ij＝ω
₂₂のｉをｉカウンタにｉ＝２としてセットする。このた
め新たなアドレスω_ij＝ω₂₂が作り出される。続いてS7に戻り、ｉ＝０をチェックし、このときｉ＝
２であることから再びS8に進んでアドレスω₂₂の拡張メ
モリ300の登録文字「K₂₂」を読出して入力文字「K₂₂」
との一致を判別する。このとき両者は一致することから
S2に戻り、次の文字「K₃₂」を読込む。以下同様にしてS
5〜S9の処理の繰り返しにより、第14図の実線の矢印で
示す順番に辞書検索が行なわれ、既に登録済みの文字
「K₄₁」までの検索処理が行われる。登録文字「K₄₁」の検索が終了してS8で最後の入力文
字「K₄₂」で不一致が判別された場合には、S9でｉ＝２
にセットすると共に、アドレスω₄₁のネクストメモリ20
0の内容が０であることから、ｉ＝０にセットする。こ
のためS7に戻った時にｉ＝０が判別され、辞書検索ステ
ップを抜け出してS10に進み、それまでの文字列「K₁₀,K
₂₂,K₃₂」を示すアドレスω₃₂を符号語code（ω）として
出力し、S11〜14の辞書登録ステップに進む。辞書登録ステップにあっては、まずS11で現在登録文
字列ｎをｎ＝ｉ、即ちｎ＝４にセットし、更にｎを１つ
インクリメントする。そして文字「K₄₂」を拡張メモリ3
00のアドレスω_ij＝ω₄₂に登録する。次にS12でｊ＝０か否かをチェックし、このときｊ＝
２であることからS14に進み、ネクストメモリ200のアド
レスω₄₁に文字「K₄₂」を登録したアドレスω₄₂を書込
む。一方、S12でｊ＝０であれば、即ち、ファーストメ
モリ100への登録に移行した状態であれば、第14図のフ
ァーストメモリ100のアドレスω₁₁,ω₂₂,ω₃₂に示すよ
うに、拡張メモリ300の文字登録アドレスを格納する。この文字登録ステップにおける文字「K₄₂」の登録に
より、第14図のネクストメモリ200及び拡張メモリ300
は、下部に破線で仕切って示すアドレスω₄₁,ω₄₂の登
録状態となり、第15図に示すツリー構造に新たな文字
「K₄₂」のアドレスω₄₂が追加されたことになる。尚、
第14図では、アドレスω₄₁については説明の都合上、検
索と登録で重複して示している。 S11〜S14の辞書登録ステップが終了すると、S15で登
録した文字「K₄₂」を新たな語頭文字列ｉ、即ち、ｉカ
ウンタの値にセットし、再びS2に戻って文字「K₄₂」を
ツリートップとして、その後に続く文字列の辞書検索に
移行する。2. Description of the Related Art FIG. 7 shows a conventional encoding process flow using an LZW code, and FIG. 8 shows a decoding process flow. First, the LZW encoding process has a rewritable dictionary,
The input character string is divided into different character strings (subsequences), and the character strings are registered in the dictionary with reference numbers in the order of appearance, and the character string currently input is registered in the dictionary. It is represented by the reference number of the longest matching character string and encoded. FIG. 9 is an explanatory diagram of LZW encoding, and FIG.
An explanatory diagram of the decoding is shown, and FIG. 11 shows an example of a dictionary configuration created at the time of the decoding. In FIGS. 9, 10, and 11, for simplicity of explanation, abc 3
An example of compressing and decompressing data composed of character combinations is described. In the LZW encoding process of FIG. 7, first, in step S1 (hereinafter, "step" is omitted), a character string composed of one character for every character is registered in the dictionary as an initial value, and then encoding is started. In the encoding of S1, a dictionary is searched using the first character K input to obtain a reference number ω, and this is used as the initial character string. Next, the next character K of the input data is read in S2, and it is checked whether the character input is completed in S3. Then, the process proceeds to S4, and the character K read in S2 is added to the initial character string ω obtained in S1. A search is performed to determine whether the added extended character string (ωK) exists in the dictionary. If the character string (ωK) is not in the dictionary in S4, the process proceeds to S6 and S1
Is output as a code word code (ω), a new reference number is added to the character string (ωK) and registered in a dictionary, and the input character K of S2 is further referred to as a reference number ω. And increment the dictionary address n to S2
To read the next character K. On the other hand, if the character string (ωK) is found in the dictionary in S4, the character string (ωK) is replaced with the reference number ω in S5, and the process returns to S2 again and the maximum matching length is reached until the character string (ωK) cannot be searched from the dictionary in S4. Continue searching. The following specifically describes the LZW encoding with reference to FIGS. First, the input data input of FIG. 9 is read from left to right.
When the first character a is entered, there is no matching character string in addition to the character a in the dictionary, so OUTPUT CODE 1 (reference number ω)
Is output as a code word. Then, the character a is set to the initial character string ω. Next, assuming that the second character b is input, since the extended character string ωK = ab obtained by adding the input character to the initial character string ω is not in the dictionary, the OUTPUT CODE 2 of the character b is output as a code word. . Then, a reference number 4 is added to the extended character string ωK = ab and registered in the dictionary. The actual dictionary registration is registered as a character string 1b as shown on the right side of FIG. And the letter b
Becomes the initial character string ω. Subsequently, if the third character a is input, the extended character string ωK = ba = 2a obtained by adding the initial character string ω to the character b is not in the dictionary, so the OUTPUT CODE 1 of the character a is output as a code word. After that, the extended character string ωK = ba is represented by 2a, and is added to the reference number 5 and registered in the dictionary. Then, the character a becomes a new initial character string ω. For the fourth input character b, the expanded character string ωK = ab is
Since the code word 4 of 1b has already been registered in the dictionary, the character string ωK is set as a new initial character string ω, and the fifth character c is input to create an extended character string ωK = 4c = abc. Since this extended character string ωK = abc is not registered in the dictionary, OUTPUT CODE 4 of the character string ab = 1b is output as a code word, and the extended character string ωK = abc is written in the dictionary as code word 6 in the form of 4c. sign up. Hereinafter, similarly, this processing is continued. The decoding process in FIG. 8 performs the reverse operation of the encoding in FIG. In the LZW decoding of FIG. 8, similarly to the encoding, a character string consisting of one character for every character is registered in the dictionary as an initial value before decoding starts. First, the first code (reference number) is read in S1, and the current CO
DE is OLDcode, and since the first code corresponds to one of the reference numbers of one character already registered in the dictionary, a character code (K) that matches the input code CODE is searched for and the character K is output. Note that the output character K is set in FINchar for later exception processing. Next, the process proceeds to S2, where the next code is read and set as CODE in CODE. In S3, it is checked whether or not there is a new code, that is, whether or not the code input is completed, and the process proceeds to S4, where it is checked whether or not the code CODE input in S3 is defined (registered) in the dictionary. Normally, the input code word has been registered in the dictionary in the previous processing, so the process proceeds to S5, where the character string code (ωK) corresponding to the code CODE is read from the dictionary, and the character K
And temporarily stack the reference number CODE (ω) with the new code
Returning to S5 again as a CODE, the steps of S5 and S6 are recursively repeated until the reference number ω reaches one character K. Finally, the process proceeds to S7 and the characters stacked in S6 are LIFO (Last In Fast Ou).
t) Pop up and output in the format. At the same time, in S7, the code ω used last time and the first 1
A new reference number is added to the character string representing the character K as a set (ωK) and registered in the dictionary. The LZW decoding process will be specifically described with reference to FIG. First, in FIG. 11, the first input code word (INPUT CODE) is 1, and the characters a, b, and c are already registered in the dictionary as reference numbers 1, 2, and 3 as shown in FIG. Is replaced with a character string a having a reference number that matches the code word 1 by referring to the dictionary and output. Similarly, the next code word 2 is replaced with the character b and output. At this time, a character string ωK obtained by combining the code word 1 processed last time and the first character b of the character string decoded this time.
Add a new reference number 4 to = 1b and register it in the dictionary. The third code word 4 is the character string 1b obtained by searching the dictionary
And replace it with the string ab to output the string ab. At the same time, a new reference number 5 is added to the character string ωK = 2a (= ba), which is a combination of the code word 2 processed last time and the first character a of the character string decoded this time, and registered in the dictionary. Hereinafter, similarly, this processing is repeated. In the LZW decoding of FIG. 11, there is the following exception processing. This exception handling occurs in the decoding of the sixth input codeword 8. Codeword 8 is not defined in the dictionary at the time of decoding and cannot be decoded. In this case, a character string 5b obtained by adding the first character b of the previously decoded character string ba to the previously processed code word 5
, And perform exception processing, replacing 5b = 2ab = bab and outputting. Then, after outputting the character string, the reference number 8 is added to the character string 5b obtained by adding the first character b of the character string decoded this time to the previous code word 5 and registered in the dictionary. This exception processing corresponds to steps S4 and S8 in the decryption processing flow of FIG.
Finally, the output of the character string in S7 and the registration in the dictionary in which the reference number is added to the new character string are performed in S7. In the LZW decoding of FIGS. 8 and 11, the case where the dictionary is created in real time while decoding the code on the decoding side has been described, but the dictionary created at the time of encoding is used as it is as a copy to the decoding side May be encoded. In this case, the exception processing on the decoding side becomes unnecessary. When LZW encoding is performed according to the procedure shown in the processing flow diagram of FIG. 7, the entire dictionary must be searched at the worst every time a character string is searched. There was such a problem. Therefore, in the conventional dictionary search method, the processing speed is increased by using an external hashing method (open hashing or chaining). First, in general dictionary search by hash method,
When considering a set S composed of a plurality of character strings, the storage position of the character string x of the set S can be directly calculated from the character string x itself, and the address indicating the storage position can be directly calculated. . Assuming that a storage location of a character string, that is, an address from 0 to m−1 is assigned to a hash table, the function h: S → [0,1,... And set the address of the character string x of the set S to h (x)
Asking. This function h is a hash function, and the value h (x)
Is referred to as a hash address of the character string x. In the hash method, the size of the set S is usually the number of addresses m
Used when it is much larger than. However, no matter how the hash function h is selected, h (x1) = h (x2) hash addresses may coincide with different character strings x1 and x2 of the set S.
This is called collision, and an external hashing method (open hashing, or chaining) is used as one of the measures against collision. In the external hash method, as shown in FIG. 12, a linked list is prepared for each hash address i indicated by an index (directory), and the hash address h (x) =
The character string x of i is stored in order from the head of the linked list. Each linked list with the same hash address h (x) is called a bucket. LZ using list structure of external hash method for dictionary search
FIG. 13 shows a flowchart of the W encoding process. FIG. 14 shows the configuration of a dictionary memory according to the external hash method, and specifically shows a search procedure and a registration procedure of LZW encoding using the tree structure of the encoded character string shown in FIG. 15 as an example. ing. First, in FIG. 14, the dictionary memory includes a first memory (First Memory) 100 and a next memory (Next Memory).
y) Extension memory of 200 and next memory 200
Memory) 300. Here first memory 10
0 corresponds to the index (directory) of the external hash method shown in FIG. 12, the next memory 200 corresponds to “next” in the linked list in FIG. 12, and the extended memory 300 corresponds to “name” in FIG. Corresponding to The tree structure in FIG. 15 is composed of the characters K ₁₀ , K ₂₁ , K ₂₂ ,.
·, K ₄₁ is already registered, K ₄₂ indicated by a broken line shows the case where the newly registered. The hierarchy in the tree structure is indicated by an i counter in the process of FIG. 13, and the number of characters in the same hierarchy is indicated by a j counter. Therefore, the registration address of each character is represented as ω _ij . String now included in the registered tree structure of FIG. 15 _{_{_{"K 10, K 22, K 32}}} , K 42 " LZW coding and registration by the dictionary search in accordance with the processing flow of FIG. 13 when the input Can be described as follows. In FIG. 13, first, the following initialization processing is performed in S1. Initialize the dictionary to include the first character. For example, if the alphabet is 26 characters, the character code is registered as it is in the first memory in FIG. 14 as a hash address. For Figure 15, it means the state in which the character K ₁₀ in the tree top is registered in the address omega _10. The current character registration number n in the dictionary is set to the number of characters registered above. For 26 alphabets, n =
It becomes 26. The first character K input is assumed to be the initial character string i. For Figure 15, the prefix string i = 1 since the first input character is K _10. In the following processing flow, the initial character string i is described as an i counter. The dictionary search array is initialized to 0. That is, the first, next, and extended memory search arrays are first
Since they are represented by [1, Nmax], next [1, Nmax] and EXT [1, Nmax], these are initialized to 0. After the initialization of S1, the process proceeds to S2 and reads the next character “K ₂₂ ”. Next, in S3, it is checked whether there is any unprocessed character. When all processing is completed, the process proceeds to S16, where the codeword c
ode (ω) is output and the process ends. At this time, since there are unprocessed characters, the process proceeds to the dictionary search steps shown in S5 to S9. In the dictionary search step, first, the value of the initial character string i = 1 at that time is set in the address ω _ij in S5, and the j counter is set to j = 0. As a result, the address ω _ij = ω ₁₀ of the first memory is generated. Then since reading the contents of the address omega ₁₀ First memory 100 address ω _ij = ω ₂₁ is obtained in S6, and sets the i counter i = 2. Then, the process proceeds to S7, where it is checked whether i = 0. At this time, since i = 2, the process proceeds to S8, and the first memory of S6 is determined.
The character “K ₂₁ ” is read out with reference to the extended memory 300 at the address ω ₂₁ obtained from 100, and the input character “K ₂₂ ” obtained in S2 is read.
Is determined. In this case, since they do not match, the process proceeds to S9, and the value i = 2 of the i counter at this time is set to j.
Set in the counter, j = 2, and next memory
Address ω _ij = ω stored at 200 addresses ω ₂₁
₂₂ is set as i = 2 in the i counter. Thus, a new address ω _ij = ω ₂₂ is created. Subsequently, the process returns to S7, where i = 0 is checked.
Since it is 2, the flow advances to S8 again to read out the registered character “K ₂₂ ” in the extended memory 300 at the address ω ₂₂ and input the character “K ₂₂ ”.
Is determined. At this time, because they match
Returns to the S2, reads the next character "K _32". Similarly, S
By repeating the processing of 5～S9, dictionary search is performed in the order indicated by arrows in solid line in FIG. 14, the search processing already up registered letter "K _41" is performed. When the search for the registered character “K ₄₁ ” is completed and a mismatch is determined for the last input character “K ₄₂ ” in S 8, i = 2 in S 9.
Together with the set, the next memory 20 of the address ω ₄₁
Since 0 is 0, i = 0 is set. Therefore, when returning to S7, it is determined that i = 0, the process exits the dictionary search step and proceeds to S10, where the character string "K ₁₀ , K
_22, the K ₃₂ "address omega ₃₂ illustrating the output as a code word code (omega), the process proceeds to the dictionary registration step of S11～14. In the dictionary registration step, first, in S11, the currently registered character string n is set to n = i, that is, n = 4, and n is further incremented by one. And extended memory 3 the letter "K _42"
Register at address ω _ij = ω ₄₂ of 00. Next, in S12, it is checked whether or not j = 0.
Proceeds from it is a 2 to S14, write the address ω ₄₂ that registered the letter "K _42" to address ω ₄₁ of the next memory 200. On the other hand, if j = 0 in S12, i.e., if the state has shifted to the register of the first memory 100, the address omega ₁₁ of first memory 100 in FIG. 14, omega _22, as shown in omega _32, extend The character registration address of the memory 300 is stored. The registration of the character "K _42" in the character registration step, the next memory 200 and the extended memory 300 in FIG. 14
The address omega ₄₁ shown is partitioned by a broken line in the lower part becomes the registration state of the omega _42, so that the address omega ₄₂ of the new character "K _42" in the tree structure shown in FIG. 15 has been added. still,
In Figure 14, for convenience of explanation about the address omega _41, it is shown in an overlapping search and registration. When S11~S14 dictionary registration step is finished, a new prefix string i the letter "K _42" registered in S15, i.e., set to the value of the i counter, the letter "K _42" back to S2 again As the tree top, the processing shifts to dictionary search for the character string that follows.

[Problems to be solved by the invention]

このように従来のLZW符号化にあっては、ソフトウェ
アにより第７図に示した処理フローを実行して符号化す
る場合、辞書検索処理に多くの時間を要するとこから、
外部ハッシュ法を利用して第13図の処理フローにより辞
書検索の高速化を図っている。しかしながら、外部ハッシュ法を利用した辞書検索に
あっては、候補文字の読出、候補文字と入力文字との照
合、一致不一致の判定がシ−ケルシャルに行なわれるた
めに、辞書検索時間が全体時間の約80％を占め、より一
層の高速化が必要とされている。また、候補文字の読出しに外部ハッシュ法を利用した
リスト構造を採用しているため、現在の候補文字の格納
アドレスと次の候補文字の格納アドレスとの間にはあま
り関連性がなく、随時読み出すしかなく、アドレスの先
だしが出来ず、辞書メモリを構成する素子の性能を最大
限に活かすことができなかった。例えば、辞書メモリとしてDRAMを用いる場合、アドレ
スに連続性が無いため、例えば列アドレス（Row Adres
s）を固定して行アドレス（（Colum Adress）のみを変
化させるページモード等の高速読出が困難であった。例えば第14図の場合では、ネクストメモリ200のアド
レスω₃₂,ω₃₃にはアドレスの連続性が無いので、第16
図に示すように列アドレスと行アドレスを個別にその都
度指定する普通のリ−ドモ−ドとなり、高速化が図れな
い問題があった。本発明は、このような従来の問題点に鑑みてなされた
もので、外部ハッシュ法のリスト構造を利用した辞書メ
モリの高速読出を可能にして辞書検索時間を短縮できる
データ圧縮装置の辞書検索方式を提供することを目的と
する。As described above, in the conventional LZW encoding, when encoding is performed by executing the processing flow shown in FIG. 7 by software, it takes a lot of time for the dictionary search process.
Using the external hash method, the speed of dictionary search is increased by the processing flow of FIG. However, in the dictionary search using the external hash method, the reading of the candidate characters, the matching of the candidate characters with the input characters, and the determination of the match / mismatch are sequentially performed, so that the dictionary search time is shorter than the entire time. It accounts for about 80%, and higher speeds are needed. In addition, since a list structure using an external hash method is used for reading candidate characters, there is not much relation between the storage address of the current candidate character and the storage address of the next candidate character, and reading is performed at any time. However, it was not possible to advance the address, and it was not possible to make the most of the performance of the elements constituting the dictionary memory. For example, when a DRAM is used as a dictionary memory, there is no continuity in the address.
s) fixed to a row address ((Colum Adress) fast reading of the page mode or the like in which only the change is difficult. For example, in the case of FIG. 14, address omega ₃₂ of the next memory 200, the omega ₃₃ address Since there is no continuity of
As shown in the figure, a normal read mode is used in which a column address and a row address are individually specified each time, and there is a problem that the speed cannot be increased. The present invention has been made in view of such a conventional problem, and a dictionary search method of a data compression apparatus capable of shortening a dictionary search time by enabling high-speed reading of a dictionary memory using a list structure of an external hash method. The purpose is to provide.

[Means for Solving the Problems]

第１図は本発明の原理説明図である。まず本発明は、符号化済みデータを相異なる部分列に
分けて各部分列毎に異なる参照番号を付加して辞書に登
録しておき、入力デ−タを該辞書中の部分列の内、最大
長一致する部分列の参照番号で指定して符号化するデ−
タ圧縮装置、例えばLZW符号化を行なうデータ圧縮装置
を対象とする。このようなデータ圧縮装置の辞書検索方式として本発
明にあっては、外部ハッシュ法のリスト構造に従ったフ
ァーストメモリ100及び拡張メモリ300を有するネクスト
メモリ200を備え、入力データに基づく外部ハッシュア
ドレスの連結アドレスを、部分的にネクストメモリ200
の連続アドレスで構成した辞書メモリ20と、入力データ
に基づいてネクストメモリ200のアドレスを連続的に発
生して入力データに一致する拡張メモリ300の候補デー
タを検索する辞書検索手段16と設けたことを特徴とす
る。ここで辞書検索手段16は、入力データと候補データの
一致検査、候補データの有無、次の候補データの読出し
を平行して行うパイプライン制御手段26を備える。また
辞書メモリ20のアクセスモードとして高速ページモード
を使用する。FIG. 1 is a diagram illustrating the principle of the present invention. First, according to the present invention, encoded data is divided into different sub-sequences, different reference numbers are added to the respective sub-sequences and registered in a dictionary, and input data is stored in the sub-sequences in the dictionary. Data to be encoded by specifying the reference number of the subsequence that matches the maximum length
Data compression apparatus, for example, a data compression apparatus that performs LZW encoding. According to the present invention, as a dictionary search method of such a data compression device, a next memory 200 having a first memory 100 and an extended memory 300 according to a list structure of an external hash method is provided, and an external hash address based on input data is provided. The link address is partially stored in the next memory 200.
And a dictionary search means 16 for continuously generating the address of the next memory 200 based on the input data and searching for candidate data of the extension memory 300 that matches the input data. It is characterized by. Here, the dictionary search means 16 includes a pipeline control means 26 which performs a parallel check of input data and candidate data, the presence or absence of candidate data, and the reading of the next candidate data in parallel. In addition, a high-speed page mode is used as an access mode of the dictionary memory 20.

[Action]

このような構成を備えた本発明によるデータ圧縮装置
の辞書検索方式によれば、LZW符号化の辞書検索におい
て外部ハッシュ法に基づくリスト構造をもつ辞書メモリ
の索引アドレスを連続アドレスで構成することで、１つ
のハッシュアドレスが決まれば次のアドレスが予測でき
るので、候補文字の検索アクセスをより高速化し、辞書
メモリの高速読出による符号化ができ、符号化処理時間
を短縮することができる。According to the dictionary search method of the data compression device according to the present invention having such a configuration, in the LZW encoding dictionary search, the index addresses of the dictionary memory having the list structure based on the external hash method are configured by continuous addresses. If one hash address is determined, the next address can be predicted. Therefore, the search access of the candidate character can be further speeded up, the encoding can be performed by reading the dictionary memory at high speed, and the encoding processing time can be shortened.

【Example】

第２図の本発明の辞書検索方式を備えたデータフ圧縮
装置（符号化装置）の一実施例を示した実施例構成図で
ある。第２図において、処理対象となる原デ−タ10はDMA（D
irect Memory Access）制御回路12を介して入力され
る。制御手段としてのMPU14は入力された原データ10
を、１文字と今までの文字列の参照番号を辞書検索回路
16の複数文字読込み回路18にセットした後、辞書検索回
路16を起動する。辞書検索回路16は以後、辞書メモリ20より１文字伸ば
した文字列の候補文字を読込み、一致検査回路22で入力
文字と候補文字との一致検査（照合）を行ない、連結検
出回路24で候補文字の有無の検出を行なう。パイプライン制御回路26は、一致検査回路22による入
力文字と候補文字の照合と連結検出回路24による候補文
字の有無の検出とに並行して辞書メモリ20に次の候補文
字の読出しをかける。このようにパイプライン制御回路
26でパイプライン処理を行なうことで、候補文字の複数
個ごとの探索と照合処理が辞書メモリ20のサイクル・タ
イムで実行することができる。更に辞書検索回路16には連続アドレス回路28が設けら
れ、連続アドレス回路28は連続アドレスを発生し、複数
文字読込み回路18に辞書メモリ20の連続アドレスに登録
されているハッシュアドレス及び候補文字を読出すよう
にする。 LZW符号の符号化では、辞書メモリ20中の最大長一致
する文字列を求める。従って、入力文字を付加して文字
列を逐次一文字すつ伸ばしていき、候補文字がなくなっ
たところで最大一致長の文字列であることが分かる。こ
のとき、最大一致長文字列まではアドレスωを使用した
参照番号で表わされており、その参照番号ωを入出力ポ
−ト30から外部に圧縮された符号語code（ω）として出
力する。第３図は第２図に示した本発明の辞書検索回路16の詳
細な構成を辞書メモリ20と共に示した実施例構成図であ
る。第３図において、アドレスレジスタ18−1,レジスタ18
−２及びレジスタ18−３が第２図の複数文字読込み回路
18に対応し、レジスタ22−1,比較器22−２が第２図の一
致検査回路22に対応し、NOR回路24−１が第２図の連結
検出回路24に対応し、更にカウンタ28−１が第２図の連
続アドレス回路28に対応する。次に第３図の実施例による辞書検索を、第４図の検索
手順と登録手順の説明図及び第５図の辞書メモリ20の登
録状態を示すツリー構造説明図をを参照して説明する。
尚、以下の説明でメモリアドレスωは、上位アドレス
ｉ、下位アドレスｊによりω_ijとして表されるものとす
る。いま原データ10として第５図のツリー構造に含まれる
文字列「K₁₀,K₂₂,K₃₂,K₄₂」が入力したとする。まずMPU14は最初に入力した文字列の１番目の文字K₁₀
の１文字分の参照番号ω₁₀を上位アドレスを指定するア
ドレスレジスタ18−１にセットすると共に、入力した２
番目の文字K₂₂をレジスタ18−２にセットする。次にパイプライン制御回路26に辞書検索回路16の起動
を指令する。パイプライン制御回路26は、まず連続アド
レスを発生するカウンタ28−１を０にセットしてから辞
書メモリ20に読出をかける。カウンタ28−１の内容は辞
書メモリ20のアドレスの最下位２ビット（LSB）を指定
する。従って、アドレスレジスタ18−１の内容ω_ij＝ω
₁₀によるが辞書メモリ20の上位アドレスの指定と、カウ
ンタ28−１の内容ｊ＝０による辞書メモリ20の下位アド
レスの指定でなるアドレス（ω₁₀＋０）により第４図の
ファーストメモリ100をアクセスしてω₂₁を読出し、ア
ドレスレジスタ18−１にセットする。次にアドレスレジスタ18−１の内容ω₂₁を上位アドレ
ス、カウンタ28−１の内容を下位アドレスとしたアドレ
ス（ω₂₁＋０）により辞書メモリ20のネクストメモリ20
0及び拡張メモリ300をアクセスし、第１番目の候補文字
K₂₁及び第２番目の候補文字K₂₂の連結アドレスω₂₂を読
出す。読出した第１番目の候補文字K₂₁はレジスタ18−
２にセットし、第２番目の候補文字K₂₂の連結アドレス
ω₂₂はレジスタ18−３にセットする。そして、レジスタ
22−１にセットされている入力文字K₂₂とレジスタ18−
２にセットされた第１番目の候補文字K₂₁を比較器22−
２で比較して一致、不一致の判定を行なう。両者は一致しないことから、不一致の判定が出され、
次の候補文字K₂₂を読出すが、このときカウンタ28−１
の値を１つインクリメントして辞書メモリ20の下位アド
レスのみを変えたネクストメモリ200のアドレス（ω₂₁
＋１）を発生し、ネクストメモリ200のアクセスで次の
候補文字K₂₂をレジスタ18−２に読出す。このとき上位
アドレスを指定しているアドレスレジスタ18−１の内容
ω₂₂はそのままである。以下同様に、この動作を繰りの返すが、カウンタ28−
１を使用して無闇に連続アドレスを発生させることは、
辞書メモリ20を大きくするので、この実施例にあって
は、４回の連続アドレスを発生させることを考えてい
る。例えば文字コードが８ビットの場合、９ビットを越
えるアドレスは意味がないからである。従って、検索の４回に１回はネクストメモリ200の連
続アドレスではなく、ファーストメモリ100のアクセス
で得られた連結アドレスω_ijを使用する。即ち、上位ア
ドレスを固定したままカンウタ28−１で連続する下位ア
ドレス「00,01,10,11」を４回発生すると、次の連続ア
ドレス「00」への切替えと同時に、レジスタ18−３に４
回目のアクセスでレジスタス18−３で格納されているフ
ァーストメモリ100の連結アドレスをアドレスレジスタ1
8−１にセットする。例えば第４図のネクストメモリ200の上位アドレスω
₃₁を例にとると、カウンタ28−１による下位アドレスの
インクリメントで、 ω₃₁＋０（＝ω₃₁） ω₃₁＋１（＝ω₃₂） ω₃₁＋２（＝ω₃₃） ω₃₁＋３（＝ω₃₄）が連続アドレスとして発生され、５回目はネクストメモ
リ200に格納された次の連続アドレスへの連結アドレス
ω₃₅を読出して上位アドレスとして再び連続アドレスの
発生を最初から繰り返す。このような辞書検索により比較器22−２で入力文字と
候補文字の照合が一致したときは、同時にNOR回路24−
１でレジスタ18−３の内容（ネクストメモリ200の連結
アドレス）がオ−ル０であるか否かを検査し、オール０
となるまで辞書検索を繰り返す。もしレジスタ18−３が
オ−ル０であれば、検索すべき候補文字がなくなったこ
とが検出される。この場合には、MPU14及びパイプライ
ン制御回路26は、辞書検索回路16の検索処理を終了さ
せ、それまでの辞書検索により最後に一致した候補文字
のアドレスを符号語code（ω）として出力する。第４図の場合、入力文字「K₄₁」でネクストメモリ200
の内容がオール０となることから、この段階で辞書検索
を終了し、最後に一致した候補文字「K₄₁」のアドレス
（ω₄₁＋０）を符号語code（ω）として出力する。続いてMPU14は、最後に残った入力文字「K₄₂」につき
アドレス（ω₄₂＋１）の拡張メモリ300への登録と、ネ
クストメモリ200のアドレス（ω₄₁＋０）への連結アド
レスω₄₂の登録を行った後、入力文字「K₄₂」を語頭文
字列ｉとして新たな辞書検索に移行する。このように本発明では、連続的にアドレスを発生して
候補文字及び連結アドレスを検索できるので、辞書メモ
リ20として第６図に示すような列アドレスを固定した状
態で行アドレスをのみを変化させる連続アドレスによる
高速ページモードが使用でき、候補文字及びその連結ア
ドレスが高速で読出せるので、辞書探索の高速実行が実
現できる。FIG. 2 is a configuration diagram showing an embodiment of a data compression apparatus (encoding apparatus) provided with the dictionary search method of the present invention shown in FIG. In FIG. 2, original data 10 to be processed is a DMA (D
irect Memory Access) input via the control circuit 12. The MPU 14 as a control means receives the input raw data 10
Dictionary search circuit for one character and the reference number of the previous character string
After setting in the 16-character reading circuit 18, the dictionary search circuit 16 is activated. Thereafter, the dictionary search circuit 16 reads the candidate character of the character string extended by one character from the dictionary memory 20, performs a match check (collation) between the input character and the candidate character in the match check circuit 22, and a candidate character Is detected. The pipeline control circuit 26 reads the next candidate character from the dictionary memory 20 in parallel with the matching between the input character and the candidate character by the match check circuit 22 and the detection of the presence or absence of the candidate character by the connection detection circuit 24. Thus, the pipeline control circuit
By performing the pipeline processing at 26, the search and collation processing for each of a plurality of candidate characters can be executed in the cycle time of the dictionary memory 20. Further, the dictionary search circuit 16 is provided with a continuous address circuit 28. The continuous address circuit 28 generates a continuous address, and the multiple character reading circuit 18 reads a hash address and a candidate character registered in the continuous address of the dictionary memory 20. I will put it out. In encoding the LZW code, a character string that matches the maximum length in the dictionary memory 20 is obtained. Therefore, the input character is added and the character string is sequentially extended by one character, and when the candidate character disappears, it is understood that the character string has the maximum matching length. At this time, the character string up to the maximum matching length character string is represented by a reference number using the address ω, and the reference number ω is output from the input / output port 30 as an externally compressed codeword code (ω). . FIG. 3 is a block diagram showing an embodiment of the dictionary search circuit 16 of the present invention shown in FIG. In FIG. 3, the address register 18-1, the register 18
-2 and the register 18-3 are the multi-character reading circuit of FIG.
18, the register 22-1 and the comparator 22-2 correspond to the coincidence checking circuit 22 in FIG. 2, the NOR circuit 24-1 corresponds to the connection detecting circuit 24 in FIG. 1 corresponds to the continuous address circuit 28 in FIG. Next, the dictionary search according to the embodiment of FIG. 3 will be described with reference to the explanatory diagram of the search procedure and the registration procedure of FIG. 4 and the explanatory diagram of the tree structure showing the registered state of the dictionary memory 20 of FIG.
In the following description, the memory address ω is represented as ω _ij by the upper address i and the lower address j. It is assumed that a character string “K ₁₀ , K ₂₂ , K ₃₂ , K ₄₂ ” included in the tree structure in FIG. First MPU14 the first character K ₁₀ of the first type character string
1 reference number omega ₁₀ characters of while set in the address register 18-1 to specify the higher address, and input 2
To set the th character K ₂₂ to the register 18-2. Next, the pipeline control circuit 26 is instructed to start the dictionary search circuit 16. The pipeline control circuit 26 first sets a counter 28-1 for generating a continuous address to 0, and then reads the dictionary memory 20. The contents of the counter 28-1 specify the least significant two bits (LSB) of the address of the dictionary memory 20. Therefore, the contents of the address register 18-1 ω _ij = ω
The first memory 100 shown in FIG. 4 is accessed by the upper address of the dictionary memory 20 according to ₁₀ and the address (ω ₁₀ +0) which specifies the lower address of the dictionary memory 20 by the content j = 0 of the counter 28-1. to set the ω ₂₁ read, in the address register 18-1 Te. Next, the next memory 20 of the dictionary memory 20 is determined by an address (ω ₂₁ +0) using the contents ω ₂₁ of the address register 18-1 as the upper address and the contents of the counter 28-1 as the lower address.
0 and the extended memory 300 are accessed, and the first candidate character is
It reads the K ₂₁ and link address omega ₂₂ of the second candidate character K _22. Read out the first candidate character K ₂₁ registers 18-
Is set to 2, link address omega ₂₂ of the second candidate character K ₂₂ are set in the register 18-3. And the register
Input character K ₂₂ and registers are set to 22-1 18-
Comparing the first candidate character K ₂₁ which is set to 2 units 22-
A comparison is made in step 2 to determine the match or mismatch. Since they do not match, a disagreement is determined,
But to read the next candidate character K _22, this time counter 28-1
One of the value of the increment to the next memory 200 of the address was changed only the lower address of the dictionary memory 20 (ω ₂₁
+1) generates and reads in the access of the next memory 200 the next candidate characters K ₂₂ to the register 18-2. The contents ω ₂₂ of the address register 18-1 that specify the higher address this time is as it is. Hereinafter, similarly, this operation is repeated.
Generating consecutive addresses blindly using 1 is
In order to increase the size of the dictionary memory 20, this embodiment considers generating four consecutive addresses. For example, if the character code is 8 bits, an address exceeding 9 bits is meaningless. Therefore, once every four searches, the concatenated address ω _ij obtained by accessing the first memory 100 is used instead of the continuous address of the next memory 200. That is, if the lower address “00,01,10,11” is generated four times in the counter 28-1 while the upper address is fixed, the register 18-3 is simultaneously stored in the register 18-3 at the same time as switching to the next continuous address “00”. 4
At the second access, the connection address of the first memory 100 stored in the registers 18-3 is stored in the address register 1
Set to 8-1. For example, the upper address ω of the next memory 200 in FIG.
Taking ₃₁ as an example, by incrementing the lower address by the counter 28-1, ω ₃₁ +0 (= ω ₃₁ ) ω ₃₁ +1 (= ω ₃₂ ) ω ₃₁ +2 (= ω ₃₃ ) ω ₃₁ +3 (= ω ₃₄ ) Is generated as a continuous address, and the fifth time, the concatenated address ω ₃₅ to the next continuous address stored in the next memory 200 is read out, and the generation of the continuous address is repeated again from the beginning as the upper address. When the comparison between the input character and the candidate character matches in the comparator 22-2 by such a dictionary search, the NOR circuit 24-
In step 1, it is checked whether the contents of the register 18-3 (the link address of the next memory 200) is all 0s.
Repeat the dictionary search until. If the register 18-3 is all 0s, it is detected that there are no more candidate characters to be searched. In this case, the MPU 14 and the pipeline control circuit 26 terminate the search processing of the dictionary search circuit 16, and output the address of the last candidate character that has been matched by the dictionary search up to that time as a codeword code (ω). For Figure 4, the next memory 200 in the input character "K _41"
Is zero, the dictionary search is terminated at this stage, and the address (ω ₄₁ +0) of the last candidate character “K ₄₁ ” that matches is output as the codeword code (ω). Subsequently, the MPU 14 registers the address (ω ₄₂ +1) of the last remaining input character “K ₄₂ ” in the extension memory 300 and the registration of the connection address ω _{42 in} the address (ω ₄₁ +0) of the next memory 200. after, to migrate to the new dictionary search for the input character "K _42" as a prefix string i. As described above, according to the present invention, since addresses can be continuously generated to search for candidate characters and concatenated addresses, only the row addresses are changed while the column addresses are fixed as shown in FIG. A high-speed page mode using continuous addresses can be used, and candidate characters and their concatenated addresses can be read at high speed, so that high-speed dictionary search can be realized.

【The invention's effect】

以上説明したように本発明によれば、LZW符号化の辞
書探索において外部ハッシュ法を利用した連結リストを
連続アドレスで構成したため、１つのアドレスが決まれ
ばアドレスの予測による先だしができ、辞書メモリとし
て例えばDRAMを使用した際の高速ページモードの実現に
よりメモリ素子の性能をフルに発揮して辞書検索の高速
化を図ることができる。As described above, according to the present invention, in the dictionary search of LZW encoding, a linked list using the external hash method is configured by continuous addresses, so that if one address is determined, it is possible to precede the address by predicting the address. For example, by realizing a high-speed page mode when a DRAM is used, the performance of the memory element can be fully exhibited and the speed of dictionary search can be increased.

[Brief description of the drawings]

第１図は本発明の原理説明図；第２図は本発明の実施例構成図；第３図は本発明の辞書検索回路の詳細を示た実施例構成
説明図；第４図は本発明のLZW符号の検索手順と登録手順の説明
図；第５図は本発明の辞書登録内容を示すツリー構造図；第
６図は本発明の高速ページモードを使用した場合のDRAM
リードモードのタイミングチャート；第７図は従来のLZW符号化処理フロー図；第８図は従来のLZW復号化処理フロー図；第９図はLZW符号化説明図；第10図は辞書構成例の説明図；第11図はLZW符号化説明図；第12図は外部ハッシュ法のリスト構造説明図；第13図は外部ハッシュ法を利用した従来のLZW符号化処
理フロー図；第14図は第13図のLZW符号の検索手順と登録手順の説明
図；第15図は第14図の辞書登録内容を示たツリー構造図；第16図は高速ページモードが使用出来ないDRAMリードモ
ードのタイミングチャートである。図中、 10:原データ 12:DMA制御回路 14:MPU 16:辞書検索手段（辞書検索回路） 18:複数文字読込み回路 18−1:アドレスレバスタ 18−2,18−3:レジスタ 20:辞書メモリ 22:一致検査回路 22−1:レジスタ 22−2:比較器 24:連結検出回路 24−1:NOR回路 26:パイプライン制御回路 28:連続アドレス回路 28−1:カウンタ 30:入出力回路 100:ファーストメモリ 200:ネクストメモリ 300:拡張メモリFIG. 1 is a diagram illustrating the principle of the present invention; FIG. 2 is a diagram illustrating the configuration of an embodiment of the present invention; FIG. 3 is a diagram illustrating the configuration of an embodiment showing details of a dictionary search circuit according to the present invention; FIG. 5 is a diagram showing a tree structure showing dictionary registration contents of the present invention; FIG. 6 is a DRAM structure using the high-speed page mode of the present invention;
FIG. 7 is a flowchart of a conventional LZW encoding process; FIG. 8 is a flowchart of a conventional LZW decoding process; FIG. 9 is an explanatory diagram of LZW encoding; FIG. FIG. 11 is an explanatory diagram of LZW encoding; FIG. 12 is an explanatory diagram of a list structure of an external hash method; FIG. 13 is a flowchart of a conventional LZW encoding process using an external hash method; FIG. Explanatory diagram of LZW code search procedure and registration procedure of FIG. 13; FIG. 15 is a tree structure diagram showing dictionary registration contents of FIG. 14; FIG. 16 is a timing chart of DRAM read mode in which high-speed page mode cannot be used It is. In the figure, 10: raw data 12: DMA control circuit 14: MPU 16: dictionary search means (dictionary search circuit) 18: multiple character reading circuit 18-1: address revaster 18-2, 18-3: register 20: dictionary Memory 22: Match check circuit 22-1: Register 22-2: Comparator 24: Link detection circuit 24-1: NOR circuit 26: Pipeline control circuit 28: Continuous address circuit 28-1: Counter 30: I / O circuit 100 : First memory 200: Next memory 300: Extended memory

───────────────────────────────────────────────────── フロントページの続き (72)発明者中野泰彦神奈川県川崎市中原区上小田中1015番地富士通株式会社内 (56)参考文献特開昭59−231683（ＪＰ，Ａ) 特開昭60−116228（ＪＰ，Ａ) 特開昭64−7230（ＪＰ，Ａ) 特開昭61−13340（ＪＰ，Ａ) 特開昭58−155589（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) H03M 7/42 ──────────────────────────────────────────────────続き Continuation of the front page (72) Inventor Yasuhiko Nakano 1015 Uedanaka, Nakahara-ku, Kawasaki City, Kanagawa Prefecture Inside Fujitsu Limited (56) References JP-A-59-231683 (JP, A) JP-A-60-116228 (JP, a) JP Akira 64-7230 (JP, a) JP Akira 61-13340 (JP, a) JP Akira 58-155589 (JP, a) (58 ) investigated the field (Int.Cl. ⁷ , DB name) H03M 7/42

Claims

(57) [Claims]

An encoded data is divided into different sub-sequences, a different reference number is added to each sub-sequence and registered in a dictionary, and input data is stored in a sub-sequence in the dictionary. A data compression apparatus for coding by designating a reference number of a subsequence that matches the maximum length has a first memory ((100)) and an extended memory (300) according to a list structure of an external hash method. A dictionary memory (20) comprising a next memory (200), wherein a concatenated address of an external hash address based on input data is partially constituted by a continuous address of the next memory (200)
And the next memory (200) based on the input data
And dictionary search means (16) for searching for candidate data of the extension memory (300) which continuously generates the above address and matches the input data.

2. A dictionary search method for a data compression apparatus according to claim 1, wherein said dictionary search means (16) checks whether the input data matches the candidate data, determines whether or not the candidate data exists, and reads the next candidate data. A dictionary search method for a data compression device, comprising: a pipeline control means (26) for performing the above in parallel.

3. A dictionary search method for a data compression device according to claim 1, wherein a high-speed page mode is used as an access mode of said dictionary memory (20).