JP3835489B2

JP3835489B2 - Data compression apparatus and decompression apparatus dictionary search registration method

Info

Publication number: JP3835489B2
Application number: JP02506696A
Authority: JP
Inventors: 佳之岡田
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1996-02-13
Filing date: 1996-02-13
Publication date: 2006-10-18
Anticipated expiration: 2016-02-13
Also published as: JPH09218877A

Description

【０００１】
【発明の属する技術分野】
本発明は、入力文字列と辞書に既に登録した文字部分列との最長一致検索により符号化を行うデータ圧縮装置及び復元装置の辞書検索登録方法に関し、特に複数の入力文字と登録済文字列から内部ハッシュによりインデックスを生成して辞書を検索登録するデータ圧縮装置および復元装置の辞書検索登録方法に関する。
【０００２】
【従来の技術】
近年、文字コード、ベクトル情報、画像など様々な種類のデータがコンピュータで扱われるようになっており、扱われるデータ量も急速に増加してきている。大量のデータを扱うときは、データの中の冗長な部分を省いてデータ量を圧縮することで、記憶容量を減らしたり、速く伝送したりできるようになる。
【０００３】
様々なデータを１つの手順でデータ圧縮できる方法としてユニバーサル符号化が提案されている。ここで、本発明の分野は、文字コードの圧縮に限らず、様々なデータに適用できるが、以下では、情報理論で用いられている呼称を踏襲し、データの１ワード単位を文字と呼び、データが任意の複数ワードにつながったものを文字列と呼ぶことにする。
【０００４】
ユニバーサル符号の代表的な方法として、ジブ・レンペル（Ziv-Lempel）符号がある（詳しくは、例えば、宗像「Ziv-Lempelのデータ圧縮法」、情報処理、Vol.26,No.1,1985年を参照のこと）。
ジブ・レンペル符号には、スライド辞書法と、動的辞書法(Incremental
parsing)の２つのアルゴリズムが提案されている。さらに、スライド辞書型アルゴリズムの改良として、ＬＺＳＳ符号がある（T.C. Bell,"Better OPM/L TextCompression",IEEE Trans. on Commun., Vol.COM-34,No.12, Dec. 1986参照）。
【０００５】
また動的辞書型アルゴリズムの改良としては、ＬＺＷ（Lempel-Ziv-Welch）符号がある（T.A. Welch,"A Technique for High-Performance Data Compression",Computer, June 1984参照）。これらの符号のうち、高速処理ができることと、アルゴリズムの簡単さからＬＺＷ符号が記憶装置のファイル圧縮などで使われるようになっている。
【０００６】
図２３にＬＺＷ符号における辞書の木構成を示し、図２４にＬＺＷ符号における文字列の符号化を示す。ＬＺＷ符号化は、書き替え可能な辞書を持ち、入力文字列（ソースデータ）中を相異なる文字列に分け、この文字列を出現した順に番号をつけて辞書に登録すると共に、現在入力している文字列を辞書に登録してある最長一致文字列の番号で表して、符号化するものである。
【０００７】
図２５はＬＺＷ符号化処理の具体例であり、説明を簡単にするため、ａ，ｂ，ｃの３文字の場合を例にとっている。このため符号化に使用する図２６の辞書には、文字ａ，ｂ，ｃの各々が初期登録されている。
図２５において、入力データは左から右へ読み込む。最初の文字ａを入力したとき、辞書には文字ａの他に一致する文字列がないので、参照番号（インデックス）を符号語として出力する。そして、拡張した文字列ａｂに参照番号４をつけて辞書に登録する。実際の登録は文字列（１ｂ）の形となる。
【０００８】
続いて２番目の文字ｂが文字列の先頭になる。辞書には文字ｂの他に一致する文字列がないので、参照番号２を符号語として出力し、拡張した文字列ｂａを実際には２ａの形で参照番号５をつけて辞書に登録する。３番目の文字ａが次の文字列の先頭になる。以下、同様にこの処理を続ける。
図２７のフローチャートは、ＬＺＷ符号化のアルゴリズムである。まずステップＳ１で予め全文字につき１文字からなる文字列を初期値として辞書に登録してから符号化を始める。ステップＳ２では入力した最初の文字Ｋを辞書検索の参照番号（インデックス）ωとし、これを語頭文字列（prefix string ）とする。次にステップＳ３で入力データの次の文字Ｋを読み込み、ステップＳ４ではステップＳ２で求めた語頭文字列ωにステップＳ３で読み込んだ文字Ｋを加えた文字列（ωＫ）が現在の辞書にあるか否か検索する。
【０００９】
ステップＳ４で文字列（ωＫ）が辞書にあれば、ステップＳ５で文字列（ωＫ）を参照番号ωに置き換え、ステップＳ５で入力データが終了かどうかを判断した後、再びステップＳ３に戻って文字列（ωＫ）が辞書から探せなくなるまで最大一致長の検索を続ける。
次にステップＳ４で文字列（ωＫ）が辞書になければ、ステップＳ７に進んで、ステップＳ２で求めた文字Ｋの参照番号ωを符号語code（ω）として出力し、また文字列（ωＫ）に新たな参照番号を付加して辞書に登録し、さらにステップＳ２の入力文字Ｋを参照番号ωに置き換えるとともに、辞書アドレスＮをインクリメントして、ステップＳ５のチェックを受けた後、ステップＳ２に戻って次の文字Ｋを読み込む。
【００１０】
図２８はＬＺＷ復号化処理の具体例であり、説明を簡単にするため、ａ，ｂ，ｃの３文字の組み合わせを例にとっている。まず最初の入力符号は１であり、１文字ａ，ｂ，ｃについては、図２６のように既に参照番号１，２，３として辞書に登録されているため、辞書の参照により符号１に一致する参照番号の文字列ａに置き換えて出力する。
【００１１】
次の符号２についても同様にして文字ｂに置き換えて出力する。このとき前回処理した符号１と今回復号した最初の１文字ｂとを組み合わせた（１ｂ）に新たな参照番号４を付加して辞書に登録する。３番目の符号４は辞書の探索により１ｂからａｂと置き換えて文字列ａｂを出力する。同時に前回処理した符号２と今回復号した文字列の１番目の文字ａとの組み合わせ文字列２ａ（＝ｂａ）を新たな参照番号５を付加して辞書に登録する。以下同様に、この処理を繰り返す。
【００１２】
ここで、図２８の復号化には次の例外処理がある。この例外処理は、第６番目の入力符号８の復号で生ずる。符号８は復号時に辞書に定義されておらず、復号できない。この場合には、前回処理した符号５に前回復号した文字列ｂａの最初の一文字ｂを加えた文字列５ｂを求め、更に２ａｂ，ｂａｂと置き換えられて出力される。そして、文字列の出力語に前回の符号語５に今回復号した文字列の文字ｂを加えた文字列５ｂに参照番号８を付加して辞書に登録する。
【００１３】
この例外処理は図２９の復号化処理フローのステップＳ４，ステップＳ９の処理を通じて行われ、最終的にステップＳ７で文字列の出力と新たな文字列に参照番号を付加した辞書への登録が行われる。
図２９のフローチャートは、ＬＺＷ復号化のアルゴリズムであり、図２７の符号化の逆の操作を行う。まずステップＳ１において符号化と同様に予め辞書に全文字につき１文字からなる文字列を初期値として登録してから復号を始める。ステップＳ２で最初の符号（参照番号）を読み込み、現在のCODEをOLDcode とし、最初の符号は既に辞書に登録された一文字の参照番号いずれかに該当することから、入力符号CODEに一致する文字code（Ｋ）を探し出し、文字Ｋを出力する。尚、出力した文字（Ｋ）は後の例外処理のためcharにセットしておく。
【００１４】
次にステップＳ３に進んで次の符号を読み込んでCODEにNEWcode としてセットする。次にステップＳ４に進み、ステップＳ３で入力された符号CODEが辞書に定義（登録）されているか否かチェックする。通常、入力した符号語は前回までの処理で辞書に登録されているため、ステップＳ５に進んで符号CODEに対応する文字列code（ωＫ）を辞書から読み出し、ステップＳ６で文字列Ｋを一時的にスタックし、参照番号code（ω）を新たなCODEとして再度ステップＳ５に戻し、このステップＳ５，ステップＳ６の手順を再帰的に参照番号ωが一文字に至るまで繰り返す。
【００１５】
最後にステップＳ７に進んで、ステップＳ６でスタックした文字をＬＩＬＯ（Last In Fast Out）形式でポップアップして出力する。同時にステップＳ７において、前回使った符号ωと今回復元した文字列の最初の一文字Ｋを組（ω，Ｋ）と表した文字列に新たな参照番号を付加して辞書に登録する。
尚、ステップＳ４において登録されていない符号であった（符号化において直前の参照番号を参照する場合におきる）場合、ステップＳ９にて、OLDcode を CODEに、code(OLDcode,char)をNEWcode に戻した後にステップＳ５へ進むようにする。また図２８の第６番目の入力符号８の復号を生ずる例外処理は、ステップＳ４，ステップＳ９の処理を通じて行われ、最終的にステップＳ７で文字列の出力と新たな文字列に参照番号を付加した辞書への登録が行われる。
【００１６】
【発明が解決しようとする課題】
しかしながら、このような従来のＬＺＷ符号等を用いたデータ圧縮及び復元の処理にあっては、辞書に対する文字登録が１バイト単位に行われており、このため符号化又は復号化における辞書検索が１バイトずつしか行うことができず、辞書の検索及び登録に時間がかかる問題があった。
【００１７】
この問題を具体的に説明すると次のようになる。図３０は、従来のデータ圧縮処理における辞書の検索登録のブロックであり、複数入力部１００、単数入力一致検出部１０２、単数入力登録部１０４及び辞書メモリ１０６で構成されている。
図３１は図３０における辞書検索処理である。いま第１番目の文字Ｋ１を入力すると、辞書メモリ１０６の登録で実現されている木構造の接点アドレスを示すインデックスωと入力文字Ｋ１をステップＳ１で組み合せ、ステップＳ２で内部ハッシュによってハッシュ値Ｈ（ω，Ｋ１）を求め、ステップＳ３でハッシュ値Ｈ（ω，Ｋ１）をインデックスとして辞書メモリ１０６の検索を行う。
【００１８】
この辞書検索によって辞書メモリ１０６から読み出したインデックスω’と文字Ｋ１’が、ハッシュ値の生成に使用したインデックスωと文字Ｋ１に各々一致するかどうかをステップＳ５で比較する。両者の一致を検出すると、ステップＳ１に戻って新たなインデックスと次の文字で新ハッシュ値を作り、辞書検索を繰り返す。
【００１９】
一致しない場合は、ステップＳ６でリハッシュ関数の定義に従ってリハッシュ値を求めて、インデックスωと文字Ｋ１に各々一致するまで辞書検索を繰り返す。最初のハッシュ値による検索で辞書登録がなかったり、又はリハッシュ値による検索で辞書登録がなかった場合には、ステップＳ４で符号化及び辞書の登録を行う。
【００２０】
このような従来の辞書検索は、入力した文字の１バイト単位にハッシュ処理及びリハッシュ処理を伴う辞書の検索処理と登録処理を行わなければならないため、辞書の検索登録に時間がかかる問題があった。
本発明は、このような従来の問題点に鑑みてなされたもので、辞書の検索と登録を高速化して処理時間を短縮できるようにしたデータ圧縮装置及び復元復元の辞書検索登録方法を提供することを目的とする。
【００２１】
【課題を解決するための手段】
図１は本発明の原理説明図である。データ圧縮装置を例にとると、本発明は、図１（Ａ）のように、入力される文字列と辞書１６に既に登録した文字部分列との最長一致検索により符号化を行うデータ圧縮復元に用いる辞書検索登録方法として、複数入力検索手段１０で、予め定めた数の複数の入力文字（Ｋ１，Ｋ２，・・・Ｋｎ）と既に検索された辞書に登録済文字列を示すインデックス（ω）から内部ハッシュによってハッシュ値Ｈ（ω，Ｋ１，Ｋ２，・・・Ｋｎ）を生成し、次に複数入力一致検出手段１２でハッシュ値をインデックスとして複数の入力文字（Ｋ１，Ｋ２，・・・Ｋｎ）を対象に辞書１６の検索を行う。この検索により文字列の登録が無ければ、複数入力登録手段１４で文字列（ω，Ｋ１，Ｋ２，・・・Ｋｎ）の辞書１６への登録を行う。
【００２２】
ここで、複数の入力文字の数をｎとした場合、インデックスで表わされた既に検索済みの文字列ωに続く文字数毎に設けたｎ個の辞書、即ち（ω、Ｋ１）、（ω，ｋ１，Ｋ２）、・・・（ω，Ｋ１，Ｋ２，・・Ｋｎ）毎に設けたｎ個の辞書を、内部ハッシュにより生成した文字数毎のハッシュ値Ｈ（ω、Ｋ１）、Ｈ（ω，ｋ１，Ｋ２）、・・・Ｈ（ω，Ｋ１，Ｋ２，・・Ｋｎ）により、同時に辞書検索する。
【００２３】
また内部ハッシュのハッシュ関数として、複数の入力文字と既に検索された辞書に登録済文字列のインデックスの排他的論理和をとる。この場合、ハッシュ値の各ビットが均等且つ対称となるように、複数の入力文字と辞書のインデックスとの排他的論理和を求める。ここで、複数の入力文字と辞書のインデックスとのビット数が相違して排他的論理和を求める各ビットが均等且つ対称とならない場合には、上位ビット側で排他的論理和を求めるようにビット配置する。
【００２４】
また本発明は、入力される文字列Ｋと、辞書に既に登録した文字部分列との最長一致検索により符号化を行うデータ圧縮装置の辞書検索登録方法について、図１（Ｂ）のように、ハッシュ値発生手段２０によって、予め定めた数の複数の入力文字（Ｋ１，Ｋ２，・・・Ｋｎ）と既に検索された辞書に登録済文字列を示すインデックス（ω）から内部ハッシュによってハッシュ値Ｈ（ω，Ｋ１，Ｋ２，・・・Ｋｎ）を生成し、次に辞書検索手段２２によって、複数入力一致検出手段１２でハッシュ値をインデックスとして複数の入力文字（Ｋ１，Ｋ２，・・・Ｋｎ）を対象に辞書１６の検索を行い、更に、一致／不一致検出手段２４によって辞書検索で不一致を検出した際に、リハッシュ手段２６で生成するリハッシュ値として、最初のハッシュ値に対応した乱数列を順に発生して再度辞書を検索する。
【００２５】
このリハッシュ値としては、最初のハッシュ値を拡大ガロア体ＧＦ（２^m）上の１つの元（αⁱ）とし、この元（αⁱ）に同じ拡大ガロア体ＧＦ（２^m）上の全ての元（α¹，α²，・・・・α^2**m-1）とから拡大ガロア体上での排他的論理和により乱数列を順に発生させる。
具体的には、拡大ガロア体ＧＦ（２^m）上の全ての元（α¹，α²，・・・・α^2**m-1）を、１つ前に生成した元（α^i-1）と原始元（α¹）との拡大ガロア体上での累積乗算により順に発生させる。
【００２６】
更に、２^m個の状態に対応して２^m個が連なる相異なる乱数列を、２^m個の状態を拡大ガロア体ＧＦ（２^m）上の元（α）として捉え、この元（α）の値と原始元（α¹）の拡大ガロア体上の累積乗算により順に発生させる。
更に本発明はデータ復元装置の辞書検索登録方法を提供する。データ復元装置は、予め定めた複数文字を入力する毎に、インデックスで表わされた既に検索済みの文字列に続く複数の入力文字から内部ハッシュによってハッシュ値を生成し、このハッシュ値により前記複数の入力文字を対象に前記辞書の検索及び登録を行って得られた圧縮データから文字列を復元する。
【００２７】
このようなデータ復元装置につき本発明の辞書検索登録方法は、入力された符号から辞書の検索により文字列を復元する毎に、入力符号に続くすでに復元された複数の入力文字から内部ハッシュによってハッシュ値を生成し、このハッシュ値により、複数の復元文字を対象に辞書の登録を行う。
またデータ復元時のリハッシュによる辞書の再登録として、本発明の辞書検索登録方法は、入力された符号から辞書の検索により文字列を復元する毎に、入力符号に続くすでに復元された複数の入力文字から内部ハッシュによってハッシュ値を生成し、このハッシュ値により、複数の復元文字を対象に辞書の登録を行った際に、既に同じハッシュ値を使用した辞書の登録済みを検出した場合に、最初のハッシュ値に対応した乱数列を順に発生して再度辞書に登録する。
【００２８】
このように本発明は、複数文字（複数バイト分の文字）を一括した辞書の検索と登録ができることで、データ圧縮及び復元における辞書の検索登録時間が短くて済み処理時間を短縮できる。特に、本発明の内部ハッシュでは、複数文字（Ｋ1 〜Ｋn ）を同時検索することで処理時間を短縮する。具体的には、１バイトからｎバイトの各バイト数単位毎に入力検索手段１０及び一致検出手段１２を設けて並列的に同時動作させることで、高速化を図る。
【００２９】
また本発明によれば、内部ハッシュのデータ構造を用いることで、辞書メモリの高速検索と登録を可能とする。更に、リハッシュ値として拡大ガロア体の元を用いることで、最初のハッシュ値に対応して全く異なるリハッシュ値を発生させることができ、内部ハッシュの衝突を極力避ける効果がある。
【００３０】
【発明の実施の形態】
図２は本発明の辞書検索登録方法の基本概念を示すブロック図であり、データ圧縮装置を例にとっている。本発明の辞書検索登録方法を採用したデータ圧縮装置は、予め定めた複数の入力文字を検索する複数入力検出部１０、複数の入力文字と検索された複数文字の一致検出を行う複数入力一致検出部１２、複数の入力文字を一度に登録する複数入力登録部１４、及び複数文字の検索と登録が可能な辞書メモリ１６で構成される。このように複数文字を一括して検索及び登録することで、処理時間を短縮することができる。
【００３１】
図３は、図２のデータ圧縮装置における複数文字を一括して行う辞書の検索と登録の処理である。まず複数入力検索部１０にあっては、ステップＳ１に示すように、予め定めたｎ個の文字Ｋ１，Ｋ２，・・・Ｋｎと既に登録済みの文字列を示す木構造の節点アドレス（インデックス）ωの組合わせ（ω，Ｋ１，Ｋ２，・・・Ｋｎ）を複数入力文字として、ステップＳ２のように内部ハッシュ構造によって所定のハッシュ関数に従ったハッシュ値Ｈ（ω，Ｋ１，Ｋ２，・・・，Ｋｎ）を作成する。
【００３２】
このように作成したハッシュ値を使用して辞書メモリ１６の辞書検索を行い、辞書メモリ１６の検索で得られた（ω´，Ｋ１´，Ｋ２´，・・・Ｋｎ´）がステップＳ１の複数入力文字における（ω，Ｋ１，Ｋ２，・・・Ｋｎ）のそれぞれと一致するか否か、ステップＳ３で比較する。即ち、複数入力一致検出部１２で一致検出を行う。
【００３３】
ここで辞書検索内容と複数入力文字が全て一致していた場合には、ステップＳ４で一致検出を判別して、新たに得られた辞書の節点アドレス（インデックス）と次のｎ個の入力文字を複数入力文字として同様な辞書検索を行う。
一方、ステップＳ３の辞書検索結果の一致検出で不一致となった場合には、ステップＳ４からステップＳ５に進み、予め定めたハッシュ関数定義に従ってリハッシュ値を作成し、ステップＳ２でリハッシュ値を用いた辞書検索を行う。
【００３４】
最初の複数入力文字によるハッシュ値あるいはリハッシュ値による辞書検索により、ステップＳ３でハッシュ値に対応する登録内容が辞書から検索できなかった場合、即ち辞書メモリ１６にハッシュ値に対応する登録が存在しなかった場合には、検出無しとしてステップＳ６に進み、このときのハッシュ値またはリハッシュ値をインデックスとして複数文字の辞書メモリ１６に対する一括登録を行う。
【００３５】
図４は１バイト構成のｎ個の文字を複数入力する場合の図２の複数入力検索部１０、複数入力一致検出部１２の並列処理構成のブロック図である。即ち、複数入力検索部１０は、１バイト検索部１０−１、２バイト検索部１０−２、・・・，ｎバイト検索部１０−ｎの並列構成となる。また複数入力一致検出部１２は、一致／不一致検出部１２−１，１２−２，・・・，１２−ｎの並列構成となる。そして、総合一致／不一致判断部３０により各バイトごとの一致／不一致検出部１２−１の検出結果の総合的な一致または不一致の判断を行う。
【００３６】
図５は、ＬＺＷ符号の木構造から見た本発明の複数文字を一括して辞書の検索と登録を行う場合の辞書の検索及び登録範囲を示している。この木構造にあっては、３バイト分の複数文字を検索登録する場合であり、先頭の１文字目を木構造の節点アドレス（インデックス）ωとすると、木構造の２文字目は辞書１６−１に登録され、３文字目までは辞書１６−２に登録され、４文字目までは辞書１６−３に登録され、それぞれの辞書１６−１，１６−２，１６−３から先頭１文字目の節点アドレス（インデックス）ωに続くそれぞれの複数文字を検索する。
【００３７】
更に、木構造の５文字目は辞書１６−１に登録され、図示していないが木構造の６文字目までは辞書１６−２に登録され、更に７文字目までは辞書１６−３に登録されるという形で繰り返される。
この図５のような例えば３バイトまでの複数文字を検索登録するための辞書構造について、図４に示した並列処理のハードウェアを使用することで、各辞書１６−１，１６−２，１６−３の辞書検索を同時に効率的に行うことができる。またソフトウェアにあっては逐次処理となるが、この場合には文字数の多い辞書１６−３，１６−２，１６−１の順に辞書検索を進めることで、より効率的な辞書検索が実現できる。
【００３８】
図６は図５の３バイト分の複数文字の検索と登録を内部ハッシュにより行う場合のブロック図である。まず１バイト入力検索部１０−１は、ハッシュ値発生部１１−１と１バイト目までの１文字を登録した辞書メモリ１６−１から構成される。この１バイト入力検索部１０−１における１バイト検索では、検索したいインデックスω１と入力した１文字Ｋ１からハッシュ関数Ｈ（ω１，Ｋ１）に基づいてハッシュ値を作成し、ハッシュ値として作成されたアドレスにより辞書メモリ１６−１を検索する。
【００３９】
辞書メモリ１６−１には登録の有無を示すフラグＦが設けられ、フラグＦがセット状態にあるときにはインデックスωと文字Ｋが登録されている。この場合、ハッシュ関数Ｈ（ω１，Ｋ１）によるハッシュ値をアドレスとして辞書メモリ１６−１を検索することで、フラグＦのセットで登録の存在を検出し、同時にインデックスω１´と１文字Ｋ１´が読み出される。
【００４０】
１バイト入力一致検出部１２−１は比較部１３−１１を有し、辞書メモリ１６−１から読み出したフラグＦがリセット即ちＦ＝０であるか否か検出し、検出無判断部１５−１１から検出無しの出力▲１▼を生ずる。また比較部１３−１２で入力したインデックスω１と辞書から読み出したインデックスω１´を比較し、同時に比較部１３−１３で入力した１文字Ｋ１と辞書から読み出した１文字Ｋ１´を比較する。
【００４１】
この比較結果は一致／不一致判断部１５−１２に与えられ、不一致であればリハッシュ値を生成して再度、辞書メモリ１６−１の検索を行う。一致していれば一致出力▲４▼を下側に示す総合一致／不一致判断部１７−２に出力する。
次の２バイト入力検索部１０−２は、ハッシュ値発生部１１−２で検索したいインデックスω１と２バイト目までの２文字Ｋ１，Ｋ２からハッシュ値Ｈ（ω１，Ｋ１，Ｋ２）を作成し、そのアドレスで辞書メモリ１６−２を検索する。この辞書検索により得られたフラグＦ、インデックスω１´、文字Ｋ１´，Ｋ２´のそれぞれを、２バイト一致検出部１２−２に設けた比較部１３−２１，１３−２２，１３−２３，１３−２４で比較する。
【００４２】
このときフラグＦがＦ＝０であれば、検出無判断部１５−２１より下側の総合検出無判断部１７−１に対し検出無しの出力▲２▼を出す。また比較部１３−２２〜１３−２４で読み出した（ω１´，Ｋ１´，Ｋ２´）を、入力した（ω１，Ｋ１，Ｋ２）とそれぞれ比較し、一致／不一致判断部１５−２２で判断し、不一致であればリハッシュ値を生成して辞書メモリ１６−２の再検索を行い、一致していれば下側の総合一致／不一致判断部１７−２に対し一致出力▲５▼を生ずる。
【００４３】
更に３バイト入力検索部１０−３にあっては、ハッシュ値発生部１１−３で検索したいインデックスω１と３バイト目までの３文字（Ｋ１，Ｋ２，Ｋ３）からハッシュ値Ｈ（ω１，Ｋ１，Ｋ２，Ｋ３）を作り、そのアドレスで辞書メモリ１６−３を検索する。この場合、３バイト目の辞書には図５の木構造から明らかなように、登録の存在の有無を示すと同時に次に続くインデックスω４が登録されている。
【００４４】
これを３バイト一致検出部１２−３の比較部１３−３１に入力して、もしω４＝０であれば検出無判断部１５−３１でアドレスに対応する登録無しとして、下側の総合検出無判断部１７−１に検出無しの出力▲３▼を生ずる。ω４が０以外の有効値、即ち適宜の後続するインデックスの値をとっている場合には、辞書メモリ１６−３よりインデックスω１´、文字Ｋ１´，Ｋ２´，Ｋ３´がそれぞれ得られ、ハッシュ値の作成に使用したインデックスω１及び入力文字Ｋ１，Ｋ２，Ｋ３のそれぞれとの比較が行われ、比較結果に基づき一致／不一致判断部１５−３２が出力▲６▼を下側の総合一致／不一致判断部１７−２に出力する。
【００４５】
総合検出無判断部１７−１及び総合一致／不一致判断部１７−２は、３つの辞書メモリ１６−１〜１６−３からの検出の有無及び一致／不一致の判断結果から、辞書検索を中止するか継続するかを決める。即ち、全ての辞書メモリ１６−１〜１６−３に検出有りで且つ辞書検索結果が一致と判断した場合には、辞書検索を継続するためインデックスω１をω４に置き換えて、検索続行のフィードバックとして各バイトのハッシュ値発生部１１−１，１１−２，１１−３に供給し、同様の動作を行う。
【００４６】
この検索続行条件が満たされなかった場合には、辞書メモリ１６−１〜１６−３の辞書登録を行う。即ち、全ての辞書メモリ１６−１〜１６−３で検出無しの場合は、辞書メモリ１６−１に対する１文字の登録を行う。また辞書メモリ１６−１で検出有り、辞書メモリ１６−２，１６−３で検出無しの場合には、辞書メモリ１６−２，１６−３に対する２文字目の登録を行う。更に辞書メモリ１６−１，１６−２で検出有り、辞書メモリ１６−３で検出無しの場合には、辞書メモリ１６−３に３文字目の登録を行う。
【００４７】
図７，図８，図９は、図６の１〜３バイトごとのハッシュ値発生部１１−１〜１１−３における複数バイト検索でのハッシュ値の発生法の実施形態である。この実施形態にあっては、インデックスωを１６ビット、各文字Ｋ１，Ｋ２，Ｋ３を各々８ビット、ハッシュ値Ｈを１６ビットとした場合を例にとっている。
図７は１バイト検索の場合であり、レジスタ３２の文字Ｋの８ビットとレジスタ３４のインデックスω１の１６ビットの内の上位８ビットとの排他的論理和ＥＸＯＲをとって１６ビットのハッシュ値Ｈの上位８ビットとし、下位８ビットはレジスタ３４のインデックスω１の下位８ビットをそのまま使う形をとっている。
【００４８】
図８の２バイト検索の場合のハッシュ値の発生は、レジスタ３２−１，３２−２の２文字Ｋ１，Ｋ２の各８ビットとレジスタ３４のインデックスω１の１６ビットの各ビットごとの排他的論理和ＥＸＯＲをとって、１６ビットのハッシュ値Ｈ（ω１，Ｋ１，Ｋ２）を作り出している。
更に図９の３バイト検索に使用するハッシュ値の発生は、レジスタ３２，３６による２文字Ｋ１，Ｋ２の各８ビットに対するレジスタ４０のインデックスω１の１６ビットとの排他的論理和ＥＸＯＲをまずとっている。ここでレジスタ４０におけるインデックスω１の１６ビットのビット配置は、中央に第１０〜１５の６ビットを位置させ、残り第０〜第９ビットを両側に分けて配置している。
【００４９】
更にレジスタ３８に３番目の文字Ｋ３の８ビットをセットし、レジスタ４０の中間の８ビットとの排他的論理和ＥＸＯＲを更にとることで、１６ビットのハッシュ値Ｈ（ω１，Ｋ１，Ｋ２，Ｋ３）を作り出している。このような３バイト検索の場合のハッシュ値の作成は、生成されるハッシュ値の各ビットが均等且つ対称となるようにインデックスω１、文字Ｋ１，Ｋ２，Ｋ３のビットを配置している。
【００５０】
しかし、図９の３バイト検索の場合には、３番目の文字Ｋ３が８ビットでインデックスω１が１６ビットとビット値が異なることから、この場合には排他的論理和の対称となるビットが１６ビットのハッシュ値の上位ビットに反映するようにビット配置を行っている。このようなビット配置により、作成されるハッシュ値の衝突を極力回避することができる。
【００５１】
図１０，図１１，図１２は、図７，図８，図９と同様に、１バイト検索，２バイト検索，３バイト検索のそれぞれについて１２ビットのハッシュ値を作成する場合を例にとっている。即ち、この場合にはインデックスωが１２ビットであり、各文字８ビットから１６ビットのハッシュ値を作成している。
この１２ビットのハッシュ値にあっては、図１０，図１１の１バイト検索，２バイト検索については文字とインデックスのビット数に差があるが、図１２の３バイト検索については３文字Ｋ１，Ｋ２，Ｋ３の組合せで１２ビットのインデックスについて全て３つの排他的論理和ＥＸＯＲとすることができ、１２ビットのハッシュ値の各ビットの均等の対称性を完全に保つことができる。
【００５２】
具体的には、レジスタ４０，４６に文字Ｋ１，Ｋ２を格納すると同時に、文字Ｋ３を上下４ビットずつ分けてレジスタ４４，４６に格納し、各レジスタ４４，４６を１２ビット構成とし、レジスタ４８の１２ビットのインデックスω１にビット数を一致させている。この結果、インデックスと文字Ｋ１，Ｋ２，Ｋ３から３つのビットを対象に、均等且つ対称に配置した１２ビットのハッシュ値Ｈ（ω，Ｋ１，Ｋ２，Ｋ３）を作ることができる。
【００５３】
図１３は、本発明の辞書検索登録方法におけるリハッシュ値の発生方法を実現するブロック図である。即ち、図４及び図５に示したように、本発明の内部ハッシュによる辞書の検索登録にあっては、一致／不一致検出によって不一致が検出された場合にリハッシュ値を発生させて再度、検索を行う。このリハッシュ値を発生させる機能は、図１３に示すハッシュ値発生部２０、辞書検索部２２、一致／不一致検出部２４及びリハッシュ部２６で構成される内部ハッシュ機構で実現される。
【００５４】
図１４は、図１３における内部ハッシュ法によるリハッシュ値の発生処理の概念であり、１バイト検索を例にとっている。即ち、ハッシュ値発生部２０においてインデックスω１と文字Ｋ１からハッシュ関数Ｈ（ω１，Ｋ１）によりハッシュ値を求め、辞書検索部２２で辞書メモリ１６−１を検索し、存在の有無を示すフラグＦ及びインデックスω１´と文字Ｋ１´を読み出す。
【００５５】
続いて一致／不一致検出部２４において、比較部１３−１１でフラグＦ＝０でなく、比較部１３−１２，１３−１３によるインデックスと文字の比較で一致／不一致判断部１５−１２が両者の一致を判断すると、リハッシュ部２６を起動してリハッシュ値ＲＨを生成する。このリハッシュ値ＲＨの生成は、一般形としてはリハッシュ関数ＲＨｍ（Ｈ（ω１，Ｋ１）），ｍ＝１，２，・・・ＮＭＡＸで決まる乱数列からリハッシュアドレスを順に発生する。
【００５６】
図１５は図１４のリハッシュ部２６の具体例であり、ハッシュ値入力部５０、演算部５２及び排他的論理和回路５４で構成される。このリハッシュ部２６の構成にあっては、まず最初のハッシュ値をハッシュ値入力部５０において拡大ガロア体ＧＦ^(2**m)の元αと見做す。一方、演算部５２にあっては、予め用意した拡大ガロア体ＧＦ^(2**m)の元α¹，α²，α³，・・・α^max（但しｍａｘ＝２^m−１）を生成する。尚、２＊＊ｍは、２のｍ乗を示す。
【００５７】
そして排他的論理和回路５４において、ハッシュ値入力部５０より出力されるガロア体上の元αと演算部５２より順々に発生される拡大ガロア体上の元α^ｉの排他的論理和、即ち拡大ガロア体上での和をとって、これをリハッシュ値とする。もちろん、演算部５２における拡大ガロア体上の元の順次発生は、排他的論理和回路５２で作成されたリハッシュ値による辞書の検索で、不一致検出となった場合に繰り返される。
【００５８】
図１６は図１５の演算部５２の具体例である。演算部５２は、拡大ガロア体上の元を順々に発生するもので、拡大ガロア体ＧＦ（２^m）の原始元αを設定するレジスタ５６、拡大ガロア体乗算回路６０、拡大ガロア体の単位元α^m-1を初期値として選択し、その後は拡大ガロア体乗算回路６０の出力を選択する選択回路５８で構成される。
【００５９】
ここで拡大ガロア体としてｍ＝４の４次元の２元ベクトルをもつ場合を例にとると、この場合、拡大ガロア体はＧＦ（２⁴）で表わすことができる。これらの元はαのべき乗で表現される。この場合の元αの数は２⁴個＝１６個であり、元は０，α⁰＝１，α¹，α¹，・・・α¹⁴，α¹⁵となる。この元αは原始多項式と呼ばれる式の根から求めることができ、原始元と呼ばれる。
【００６０】
即ち、ガロア体ＧＦ（２⁴）の原始多項式は、
ｆ（ｘ）＝ｘ⁴＋ｘ＋１
で表現することができる。
図１７は、ガロア体ＧＦ（２⁴）におけるｉ＝０〜１５個の元に対する４次元のベクトル表現（Ｘ³，Ｘ²，Ｘ²，Ｘ⁰）とべき乗表現０，α¹〜α⁵の関係を表わしている。ｉ＝０は零元と呼ばれ、ｉ＝１〜１４がそれぞれ原始元となり、これはべき乗表現でα¹〜α¹⁴で表わされる。更にｉ＝１５は単位元と呼ばれ、べき乗表現でα¹⁵となり、これはα⁰に等しい。
【００６１】
図１５のハッシュ値入力部５０にあっては、例えばハッシュ値入力が１バイト構成で８ビットであったとすると、４バイト単位に分けたハッシュ値Ｘ³〜Ｘ⁰について、ｉで表現される元の値を、ハッシュ値を拡大ガロア体上の元として出力する。また図１６のレジスタ５６にあっては、拡大ガロア体ＧＦ（２⁴）の開始元α¹をＰとして拡大ガロア体乗算回路６０にセットする。また選択回路５８は、初期値として拡大ガロア体ＧＦ２^mの単位元α¹⁵を選択してＰＱとして、拡大ガロア体乗算回路６０に出力している。
【００６２】
拡大ガロア体乗算回路６０は図１８に示す論理回路である。即ち、レジスタ５６からの開始元α¹としての入力Ｐと選択回路５８で選択された入力Ｑの各４ビットについて図示の論理演算を行い、最終的にＥＸＯＲにおいて原始多項式ｆ（ｘ）＝ｘ⁴＋ｘ＋１の剰余を求めて、これを拡大ガロア体上の元を表わす値Ｒとして順々に出力する。
【００６３】
ｍ＝４としたときの拡大ガロア体ＧＦ（２⁴）で発生できるリハッシュ値の最大アドレスは、オールビットが１となったときの原始多項式より
ｆ（２）＝２⁴＋２＋１＝１９
となる。
更に図１８の拡大ガロア体乗算回路６０及び図１５に示した排他的論理和で実現される拡大ガロア体上での加算は、図１９の和の結合表に従って行われる。また図１８の拡大ガロア体乗算回路６０におけるＡＮＤ回路による乗算は、図２０の積の結合表に従って行われる。
【００６４】
本発明におけるリハッシュ値を生成するための拡大ガロア体としては、ｍ＝４以外に更に大きな値をとることができる。図２１は、ｍ＝１２とした拡大ガロア体ＧＦ（２¹²）における元とリハッシュ値の関係を一部について表わしている。拡大ガロア体ＧＦ（２¹²）の原始多項式は、
ｆ（ｘ）＝ｘ¹²＋ｘ⁶＋ｘ⁴＋ｘ＋１
で与えられる。この場合のオールビット１となるときのリハッシュ値の最大アドレスは４１７９である。また図２１のリハッシュ値にあっては、０，１，２，・・・，８，９，ａ，ｂ，・・・，ｅ，ｆの１６進表示としている。
【００６５】
図２２は、ｍ＝１６とした場合の拡大ガロア体ＧＦ（２¹⁶）における元とリハッシュ値の関係を一部について示している。拡大ガロア体ＧＦ（２¹⁶）における原始多項式は、
ｆ（ｘ）＝ｘ¹⁶＋ｘ⁶＋ｘ⁴＋ｘ＋１
で与えられる。このときオールビット１となるリハッシュ値の最大アドレスは６５６１９となる。
【００６６】
以上の実施例は、データ圧縮装置での辞書検索登録方法を例にとるものであったが、このデータ圧縮装置で得られた圧縮データから元の文字列を復元するデータ復元装置についても、同じ辞書の登録方法が適用される。
即ち、本発明の辞書検索登録方法を採用するデータ復元装置は、予め定めた複数文字を入力する毎に、インデックスで表わされた既に検索済みの文字列に続く複数の入力文字から内部ハッシュによってハッシュ値を生成し、このハッシュ値により前記複数の入力文字を対象に前記辞書の検索及び登録を行って得られた圧縮データから文字列を復元する。
【００６７】
このようなデータ復元装置につき本発明の辞書検索登録方法は、入力された符号ωから辞書メモリの検索により文字列を復元する毎に、入力符号ωに続くすでに復元された複数の文字（Ｋ１，Ｋ２，Ｋ３，・・・Ｋｎ）から内部ハッシュによってハッシュ値Ｈ（ω，Ｋ１，Ｋ２，Ｋ３，・・・Ｋｎ）を生成し、このハッシュ値により、複数の復元文字（ω，Ｋ１，Ｋ２，Ｋ３，・・・Ｋｎ）を対象に辞書の登録を行う。
【００６８】
この複数文字単位の復元については、図２〜図１２の圧縮時と同じ辞書の登録方法がそのまま適用される。
またデータ復元時のリハッシュによる辞書の再登録として、本発明の辞書検索登録方法は、入力された符号ωから辞書の検索により文字列を復元する毎に、入力符号に続くすでに復元された複数の文字（Ｋ１，Ｋ２，Ｋ３，・・・Ｋｎ）から内部ハッシュによってハッシュ値Ｈ（ω，Ｋ１，Ｋ２，Ｋ３，・・・Ｋｎ）を生成し、このハッシュ値により、複数の復元文字を対象に辞書の登録を行った際に、既に同じハッシュ値を使用した辞書の登録済みを検出した場合に、最初のハッシュ値に対応した乱数列を順に発生して再度辞書に登録する。
【００６９】
このリハッシュ処理についても、図１３〜図２２の圧縮時と同じ辞書の登録方法がそのまま適用される。
尚、上記の実施例は、辞書検索の具体的方法として内部ハッシュによるデータ構造を例にとるものであったが、本発明はこれに限定されず、外部ハッシュによるリスト構造についても適用することができる。
【００７０】
【発明の効果】
以上説明してきたように本発明によれば、ＬＺＷ等のデータ圧縮装置及び復元装置における辞書の検索と登録において、複数文字を一度に検索し登録できるので、圧縮及び復元処理をより一層高速化することができる。
また辞書検索結果が不一致の場合のリハッシュ値の発生に拡大ガロア体の元を用いることで、最初のハッシュ値に対し全く異なるリハッシュ値を発生することができ、これによって内部ハッシュの衝突を可能な限り回避することができる。
【図面の簡単な説明】
【図１】本発明の原理説明図
【図２】本発明の辞書検索登録方法を実現するデータ圧縮装置のブロック図
【図３】図２の内部ハッシュ法による辞書検索登録の処理説明図
【図４】文字数ｎにおける内部ハッシュ法による辞書検索登録の並列処理のブロック図
【図５】本発明によるＬＺＷ符号の検索と登録の辞書範囲を示す木構造の説明図
【図６】内部ハッシュ法により３バイト一括検索を行う本発明の辞書検索登録のブロック図
【図７】１バイト検索の１６ビットハッシュ値を作るハッシュ値生成部のブロック図
【図８】２バイト検索の１６ビットハッシュ値を作るハッシュ値生成部のブロック図
【図９】３バイト検索の１６ビットハッシュ値を作るハッシュ値生成部のブロック図
【図１０】１バイト検索の１２ビットハッシュ値を作るハッシュ値生成部のブロック図
【図１１】２バイト検索の１２ビットハッシュ値を作るハッシュ値生成部のブロック図
【図１２】３バイト検索の１２ビットハッシュ値を作るハッシュ値生成部のブロック図
【図１３】リハッシュ部を備えた本発明の辞書検索登録方法を実現するデータ圧縮装置のブロック図
【図１４】図１３のリハッシュ値の発生処理の説明図
【図１５】図１３のリハッシュ部の具体的なブロック図
【図１６】拡大ガロア体の元を順々に発生する図１５の演算部のブロック図
【図１７】拡大ガロア体ＧＦ（２⁴）の元とリハッシュ値の対応関係の説明図
【図１８】図１６の拡大ガロア体乗算回路の論理回路図
【図１９】拡大ガロア体ＧＦ（２⁴）の加算で使用する和の結合表の説明図
【図２０】拡大ガロア体ＧＦ（２⁴）の乗算で使用する積の結合表の説明図
【図２１】拡大ガロア体ＧＦ（２¹²）の元とリハッシュ値の対応関係の説明図
【図２２】拡大ガロア体ＧＦ（２¹⁶）の元とリハッシュ値の対応関係の説明図
【図２３】従来のＬＺＷ符号における辞書の木構造の説明図
【図２４】ＬＺＷ符号における文字列の符号化の説明図
【図２５】ＬＺＷ符号化の具体例の説明図
【図２６】図２５の符号化で使用する辞書の説明図
【図２７】ＬＺＷ符号化アルゴリズムのフローチャート
【図２８】ＬＺＷ復号化の具体例の説明図
【図２９】ＬＺＷ復号化アルゴリズムのフローチャート
【図３０】従来の内部ハッシュ法による辞書検索と登録のブロック図
【図３１】従来の内部ハッシュ法による辞書検索と登録の処理説明図
【符号の説明】
１０：複数入力検索部
１０−１〜１０−ｎ：１〜ｎバイト検索部
１１−１〜１１−３：ハッシュ値発生部
１３−１１〜１３−３４：比較部
１２−１〜１２−ｎ：一致／不一致検出部
１２：複数入力一致検出部
１４：複数入力登録部
１５−１１，１５−２１，１５−３１：検出無判断部
１６，１６−１〜１６−３：辞書メモリ（辞書）
１７−１：総合検出無判断部
１７−２，３０：総合一致／不一致判断部
２０：ハッシュ値発生部
２２：辞書検索部
２４：一致／不一致検出部
２６：リハッシュ部
３０：総合一致／不一致判断部
32,32-1,32-2,34,36,38,40,42,44,46,48：レジスタ
５０：ハッシュ値入力部
５２：乱数列発生部
５４：排他的論理和回路
５６：レジスタ
５８：選択回路
６０：拡大ガロア体乗算回路[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a data compression apparatus that performs encoding by a longest match search between an input character string and a character substring already registered in a dictionary, and a dictionary search registration method for a decompression apparatus, and more particularly, from a plurality of input characters and registered character strings. The present invention relates to a data compression device that generates an index using an internal hash and searches and registers a dictionary, and a dictionary search and registration method for a decompression device.
[0002]
[Prior art]
In recent years, various types of data such as character codes, vector information, and images have been handled by computers, and the amount of data handled has been rapidly increasing. When handling a large amount of data, it is possible to reduce the storage capacity or to transmit the data faster by omitting redundant portions of the data and compressing the data amount.
[0003]
Universal coding has been proposed as a method for compressing various data in one procedure. Here, the field of the present invention is not limited to compression of character codes, but can be applied to various data. In the following, the term used in information theory is followed, and one word unit of data is called a character, A string in which data is connected to a plurality of arbitrary words is called a character string.
[0004]
As a typical method of universal codes, there is a Ziv-Lempel code (for example, Munakata “Ziv-Lempel data compression method”, Information processing, Vol. 26, No. 1, 1985). checking).
For the Jib-Lempel code, the slide dictionary method and the dynamic dictionary method (Incremental
Two algorithms have been proposed. Furthermore, as an improvement of the slide dictionary type algorithm, there is an LZSS code (see TC Bell, “Better OPM / L Text Compression”, IEEE Trans. On Commun., Vol. COM-34, No. 12, Dec. 1986).
[0005]
As an improvement of the dynamic dictionary type algorithm, there is an LZW (Lempel-Ziv-Welch) code (see TA Welch, “A Technique for High-Performance Data Compression”, Computer, June 1984). Among these codes, LZW codes are used for file compression of storage devices because of high-speed processing and simplicity of algorithms.
[0006]
FIG. 23 shows a tree structure of a dictionary in the LZW code, and FIG. 24 shows character string encoding in the LZW code. LZW encoding has a rewritable dictionary, divides the input character string (source data) into different character strings, registers the numbers in the order in which these character strings appear, registers them in the dictionary, and inputs them now Is represented by the number of the longest matching character string registered in the dictionary and encoded.
[0007]
FIG. 25 shows a specific example of the LZW encoding process. In order to simplify the description, the case of three characters a, b, and c is taken as an example. For this reason, each of the characters a, b, and c is initially registered in the dictionary of FIG. 26 used for encoding.
In FIG. 25, input data is read from left to right. When the first character a is input, since there is no matching character string in the dictionary other than the character a, a reference number (index) is output as a code word. Then, the expanded character string ab is assigned a reference number 4 and registered in the dictionary. Actual registration takes the form of a character string (1b).
[0008]
Subsequently, the second character b becomes the head of the character string. Since there is no matching character string other than the character b in the dictionary, the reference number 2 is output as a code word, and the expanded character string ba is actually registered in the dictionary with the reference number 5 in the form of 2a. The third character a is the head of the next character string. Thereafter, this process is similarly continued.
The flowchart in FIG. 27 is an LZW encoding algorithm. First, in step S1, a character string consisting of one character for all characters is registered in the dictionary as an initial value, and then encoding is started. In step S2, the input first character K is set as a dictionary search reference number (index) ω, which is used as a prefix string. Next, in step S3, the next character K of the input data is read. In step S4, the current dictionary has a character string (ωK) obtained by adding the character K read in step S3 to the initial character string ω obtained in step S2. Search whether or not.
[0009]
If the character string (ωK) is found in the dictionary in step S4, the character string (ωK) is replaced with the reference number ω in step S5, and it is determined whether or not the input data is completed in step S5. The search for the maximum matching length is continued until the column (ωK) cannot be searched from the dictionary.
Next, if the character string (ωK) is not in the dictionary in step S4, the process proceeds to step S7, where the reference number ω of the character K obtained in step S2 is output as the code word code (ω), and the character string (ωK) A new reference number is added to and registered in the dictionary, and the input character K in step S2 is replaced with the reference number ω, the dictionary address N is incremented, and after checking in step S5, the process returns to step S2. The next character K is read.
[0010]
FIG. 28 is a specific example of the LZW decoding process, and a combination of three characters a, b, and c is taken as an example to simplify the description. The first input code is 1, and the characters a, b, and c are already registered in the dictionary as reference numbers 1, 2, and 3 as shown in FIG. Is replaced with the character string a of the reference number to be output.
[0011]
Similarly, the next code 2 is replaced with the character b and output. At this time, a new reference number 4 is added to (1b), which is a combination of the previously processed code 1 and the first character b decoded this time, and is registered in the dictionary. The third code 4 replaces 1b with ab by searching the dictionary and outputs a character string ab. At the same time, the combination character string 2a (= ba) of the previously processed code 2 and the first character a of the character string decoded this time is added to the dictionary with a new reference number 5 added thereto. Similarly, this process is repeated.
[0012]
Here, the decoding of FIG. 28 includes the following exception processing. This exception processing occurs when the sixth input code 8 is decoded. Code 8 is not defined in the dictionary at the time of decoding and cannot be decoded. In this case, a character string 5b obtained by adding the first character b of the previously decoded character string ba to the previously processed code 5 is obtained, and further replaced with 2ab and bab and output. Then, the reference number 8 is added to the character string 5b obtained by adding the character b of the character string decoded this time to the previous code word 5 to the output word of the character string, and the result is registered in the dictionary.
[0013]
This exception process is performed through steps S4 and S9 in the decoding process flow of FIG. 29. Finally, in step S7, a character string is output and registered in a dictionary in which a reference number is added to the new character string. Is called.
The flowchart of FIG. 29 is an LZW decoding algorithm, and performs the reverse operation of the encoding of FIG. First, in step S1, similarly to encoding, a character string consisting of one character for all characters is registered as an initial value in the dictionary in advance, and then decoding is started. In step S2, the first code (reference number) is read, the current CODE is set to OLDcode, and the first code corresponds to one of the reference numbers of one character already registered in the dictionary. Find (K) and output the letter K. The output character (K) is set to char for later exception processing.
[0014]
In step S3, the next code is read and set as NEWcode in CODE. In step S4, it is checked whether the code CODE input in step S3 is defined (registered) in the dictionary. Usually, since the input code word is registered in the dictionary in the process up to the previous time, the process proceeds to step S5 to read the character string code (ωK) corresponding to the code CODE from the dictionary, and temporarily stores the character string K in step S6. The reference number code (ω) is set as a new CODE, and the process returns to step S5 again. The procedure of steps S5 and S6 is recursively repeated until the reference number ω reaches one character.
[0015]
Finally, proceeding to step S7, the characters stacked in step S6 are popped up and output in the LILO (Last In Fast Out) format. At the same time, in step S7, a new reference number is added to the character string representing the pair (ω, K) of the previously used code ω and the first character K of the character string restored this time, and registered in the dictionary.
If the code is not registered in step S4 (this occurs when the previous reference number is referred to in encoding), in step S9, OLDcode is returned to CODE and code (OLDcode, char) is returned to NEWcode. After that, the process proceeds to step S5. Also, the exceptional processing that causes the decoding of the sixth input code 8 in FIG. 28 is performed through the processing of step S4 and step S9, and finally the output of the character string and the reference number are added to the new character string in step S7. Registration in the dictionary is performed.
[0016]
[Problems to be solved by the invention]
However, in such data compression and decompression processing using the conventional LZW code or the like, character registration with respect to the dictionary is performed in units of 1 byte, so that dictionary search in encoding or decoding is 1 There is a problem that it can only be performed byte by byte, and it takes time to search and register the dictionary.
[0017]
This problem will be specifically described as follows. FIG. 30 is a dictionary search / registration block in a conventional data compression process, and includes a plurality of input units 100, a single input match detection unit 102, a single input registration unit 104, and a dictionary memory 106.
FIG. 31 shows the dictionary search process in FIG. Now, when the first character K1 is input, the index ω indicating the contact address of the tree structure realized by registration in the dictionary memory 106 and the input character K1 are combined in step S1, and in step S2, the hash value H ( ω, K1) is obtained, and the dictionary memory 106 is searched using the hash value H (ω, K1) as an index in step S3.
[0018]
In step S5, it is compared in step S5 whether or not the index ω ′ and the character K1 ′ read from the dictionary memory 106 by the dictionary search match the index ω and the character K1 used for generating the hash value. If a match between the two is detected, the process returns to step S1 to create a new hash value with a new index and the next character, and the dictionary search is repeated.
[0019]
If they do not match, a rehash value is obtained in accordance with the definition of the rehash function in step S6, and the dictionary search is repeated until the index ω and the character K1 respectively match. If the dictionary is not registered in the search using the first hash value, or the dictionary is not registered in the search using the rehash value, encoding and dictionary registration are performed in step S4.
[0020]
Such a conventional dictionary search has a problem that it takes time to search and register a dictionary because a dictionary search process and a registration process involving a hash process and a rehash process must be performed for each byte of input characters. .
The present invention has been made in view of the above-described conventional problems, and provides a data compression apparatus and a restoration / restoration dictionary search / registration method capable of reducing the processing time by speeding up dictionary search and registration. For the purpose.
[0021]
[Means for Solving the Problems]
FIG. 1 is a diagram illustrating the principle of the present invention. Taking a data compression apparatus as an example, as shown in FIG. 1A, the present invention is a data compression / decompression that performs encoding by a longest match search between an input character string and a character substring already registered in the dictionary 16. As a dictionary search / registration method used for the above, an index (ω) indicating a predetermined number of input characters (K1, K2,... Kn) and a registered character string in a dictionary already searched by the multiple input search means 10. ) To generate a hash value H (ω, K1, K2,... Kn) using an internal hash, and then a plurality of input characters (K1, K2,... The dictionary 16 is searched for Kn). If there is no registration of the character string by this search, the multi-input registration means 14 registers the character string (ω, K1, K2,... Kn) in the dictionary 16.
[0022]
Here, when the number of input characters is n, n dictionaries provided for each number of characters following the already searched character string ω represented by the index, that is, (ω, K1), (ω, k1, K2),... (ω, K1, K2,... Kn) n dictionaries provided by hashing H (ω, K1), H (ω, k1, K2),... H (ω, K1, K2,.
[0023]
Also, as the hash function of the internal hash, an exclusive OR of a plurality of input characters and the index of the character string registered in the already searched dictionary is taken. In this case, an exclusive OR of a plurality of input characters and a dictionary index is obtained so that each bit of the hash value is equal and symmetric. Here, when the number of bits of a plurality of input characters and the index of the dictionary are different and each bit for obtaining the exclusive OR is not equal and symmetric, the bit is used to obtain the exclusive OR on the upper bit side. Deploy.
[0024]
Further, the present invention relates to a dictionary search registration method of a data compression apparatus that performs encoding by longest match search between an input character string K and a character substring already registered in the dictionary, as shown in FIG. A hash value H is generated by an internal hash from a predetermined number of input characters (K1, K2,... Kn) and an index (ω) indicating a registered character string in a dictionary already searched by the hash value generation means 20. (Ω, K1, K2,... Kn), and then the dictionary search means 22 uses the multiple input match detection means 12 as a hash value as an index to input a plurality of input characters (K1, K2,... Kn). The first hash is used as a rehash value to be generated by the rehashing unit 26 when the dictionary 16 is searched for the target, and when the mismatch is detected by the dictionary search by the match / mismatch detection unit 24. A random number sequence corresponding to the value is sequentially generated and the dictionary is searched again.
[0025]
As this rehash value, the first hash value is expanded Galois field GF (2 ^m ) One element (α ⁱ ) And this element (α ⁱ ) The same expanded Galois field GF (2 ^m ) All the elements above (α ¹ , Α ² , ... α ^{2 ** m-1} ) And a random number sequence are generated in order by exclusive OR on the expanded Galois field.
Specifically, the expanded Galois field GF (2 ^m ) All the elements above (α ¹ , Α ² , ... α ^{2 ** m-1} ) To the element (α ^i-1 ) And primitive elements (α ¹ ) And the cumulative multiplication on the expanded Galois field.
[0026]
In addition, 2 ^m 2 for each state ^m 2 different random numbers in a series ^m Expanded state of Galois field GF (2 ^m ) As the element (α) above, and the value of this element (α) and the primitive element (α ¹ ) In order by accumulative multiplication on the expanded Galois field.
Furthermore, the present invention provides a dictionary search and registration method for a data restoration apparatus. Each time a plurality of predetermined characters are input, the data restoration device generates a hash value by an internal hash from a plurality of input characters following the already searched character string represented by the index, The character string is restored from the compressed data obtained by searching and registering the dictionary for the input character.
[0027]
The dictionary search / registration method according to the present invention for such a data restoration apparatus has a hash by an internal hash from a plurality of already restored input characters following the input code every time a character string is restored by searching the dictionary from the inputted code. A value is generated, and a dictionary is registered for a plurality of restored characters based on the hash value.
In addition, as a re-registration of the dictionary by rehashing at the time of data restoration, the dictionary search registration method of the present invention performs a plurality of input restored after the input code every time the character string is restored by searching the dictionary from the inputted code. When a hash value is generated from a character using an internal hash and a dictionary is registered for multiple restored characters using this hash value, if a registered dictionary that already uses the same hash value is detected, Random number sequences corresponding to the hash values are sequentially generated and registered again in the dictionary.
[0028]
As described above, the present invention can search and register a dictionary in which a plurality of characters (characters of a plurality of bytes) are batched, thereby shortening the dictionary registration / registration time in data compression and decompression and shortening the processing time. In particular, in the internal hash of the present invention, the processing time is shortened by simultaneously searching a plurality of characters (K1 to Kn). Specifically, the input search means 10 and the coincidence detection means 12 are provided for each byte number unit from 1 byte to n bytes, and are operated simultaneously in parallel, thereby increasing the speed.
[0029]
Further, according to the present invention, it is possible to search and register the dictionary memory at high speed by using the data structure of the internal hash. Furthermore, by using an element of an extended Galois field as a rehash value, a completely different rehash value can be generated corresponding to the first hash value, and there is an effect of avoiding internal hash collisions as much as possible.
[0030]
DETAILED DESCRIPTION OF THE INVENTION
FIG. 2 is a block diagram showing the basic concept of the dictionary search and registration method of the present invention, taking a data compression apparatus as an example. A data compression apparatus employing the dictionary search / registration method of the present invention includes a multiple input detection unit 10 for searching for a plurality of predetermined input characters, and a multiple input match detection for detecting a match between a plurality of input characters and a plurality of searched characters. Unit 12, a plurality of input registration unit 14 for registering a plurality of input characters at once, and a dictionary memory 16 capable of searching and registering a plurality of characters. Thus, the processing time can be shortened by collectively searching and registering a plurality of characters.
[0031]
FIG. 3 is a dictionary search and registration process for collectively performing a plurality of characters in the data compression apparatus of FIG. First, in the multi-input search unit 10, as shown in step S1, a tree-structured node address (index) indicating predetermined n characters K1, K2,... Kn and already registered character strings. Using a combination of ω (ω, K1, K2,... Kn) as a plurality of input characters, a hash value H (ω, K1, K2,... according to a predetermined hash function by an internal hash structure as in step S2. ., Kn).
[0032]
A dictionary search of the dictionary memory 16 is performed using the hash value created as described above, and (ω ′, K1 ′, K2 ′,... Kn ′) obtained by the search of the dictionary memory 16 are plural in step S1. In step S3, it is compared whether or not each of the input characters matches (ω, K1, K2,... Kn). That is, the multi-input coincidence detection unit 12 performs coincidence detection.
[0033]
If the dictionary search contents and the plurality of input characters all match, it is determined in step S4 that the match is detected, and the node address (index) of the newly obtained dictionary and the next n input characters are determined. Perform similar dictionary search for multiple input characters.
On the other hand, if no match is found in the match detection of the dictionary search result in step S3, the process proceeds from step S4 to step S5, a rehash value is created according to a predetermined hash function definition, and the dictionary using the rehash value in step S2 Perform a search.
[0034]
If the registered content corresponding to the hash value cannot be searched from the dictionary in step S3 by the dictionary search using the hash value or the re-hash value by the first plural input characters, that is, there is no registration corresponding to the hash value in the dictionary memory 16. If not, the process proceeds to step S6 with no detection, and batch registration is performed in the dictionary memory 16 of a plurality of characters using the hash value or rehash value at this time as an index.
[0035]
FIG. 4 is a block diagram of a parallel processing configuration of the multiple input search unit 10 and the multiple input match detection unit 12 of FIG. 2 when a plurality of 1-byte n characters are input. That is, the multi-input search unit 10 has a parallel configuration of a 1-byte search unit 10-1, a 2-byte search unit 10-2,..., An n-byte search unit 10-n. The multiple input coincidence detector 12 has a parallel configuration of coincidence / mismatch detectors 12-1, 12-2,..., 12-n. Then, the total match / mismatch determination unit 30 determines the total match or mismatch of the detection results of the match / mismatch detection unit 12-1 for each byte.
[0036]
FIG. 5 shows a dictionary search and registration range when searching and registering a dictionary of a plurality of characters according to the present invention as seen from the tree structure of the LZW code. In this tree structure, a plurality of characters of 3 bytes are searched and registered. If the first character is the node address (index) ω of the tree structure, the second character of the tree structure is the dictionary 16-. 1 and the third character is registered in the dictionary 16-2, and the fourth character is registered in the dictionary 16-3. The first character from each dictionary 16-1, 16-2, 16-3 is registered. Each of the plurality of characters following the node address (index) ω is searched.
[0037]
Further, the fifth character of the tree structure is registered in the dictionary 16-1, and although not shown, the sixth character of the tree structure is registered in the dictionary 16-2, and the seventh character is registered in the dictionary 16-3. Repeated in the form of being.
For the dictionary structure for searching and registering a plurality of characters up to, for example, 3 bytes as shown in FIG. 5, by using the parallel processing hardware shown in FIG. 4, each dictionary 16-1, 16-2, 16 -3 dictionary search can be performed efficiently at the same time. In software, sequential processing is performed. In this case, more efficient dictionary search can be realized by performing dictionary search in the order of the dictionary 16-3, 16-2, 16-1 having a larger number of characters.
[0038]
FIG. 6 is a block diagram when searching and registering a plurality of characters of 3 bytes in FIG. 5 using an internal hash. First, the 1-byte input search unit 10-1 includes a hash value generation unit 11-1 and a dictionary memory 16-1 in which one character up to the first byte is registered. In the 1-byte search in the 1-byte input search unit 10-1, a hash value is generated from the index ω1 to be searched and the input one character K1 based on the hash function H (ω1, K1), and the address generated as the hash value Thus, the dictionary memory 16-1 is searched.
[0039]
The dictionary memory 16-1 is provided with a flag F indicating the presence or absence of registration. When the flag F is in the set state, the index ω and the character K are registered. In this case, by searching the dictionary memory 16-1 using the hash value by the hash function H (ω1, K1) as an address, the presence of registration is detected by the set of the flag F, and at the same time, the index ω1 ′ and one character K1 ′ are detected. Read out.
[0040]
The 1-byte input match detection unit 12-1 has a comparison unit 13-11, detects whether or not the flag F read from the dictionary memory 16-1 is reset, that is, F = 0, and detects no determination unit 15-11. Produces an output (1) without detection. Further, the index ω1 input by the comparison unit 13-12 is compared with the index ω1 ′ read from the dictionary, and at the same time, one character K1 input by the comparison unit 13-13 and one character K1 ′ read from the dictionary are compared.
[0041]
This comparison result is given to the coincidence / non-coincidence determining unit 15-12, and if it does not coincide, a rehash value is generated and the dictionary memory 16-1 is searched again. If they match, a match output (4) is output to the overall match / mismatch determination unit 17-2 shown on the lower side.
The next 2-byte input search unit 10-2 creates a hash value H (ω1, K1, K2) from the index ω1 to be searched by the hash value generation unit 11-2 and the two characters K1, K2 up to the second byte. The dictionary memory 16-2 is searched with the address. The comparison units 13-21, 13-22, 13-23, and 13 provided in the 2-byte match detection unit 12-2 for the flag F, the index ω1 ′, and the characters K1 ′ and K2 ′ obtained by the dictionary search, respectively. Compare at -24.
[0042]
At this time, if the flag F is F = 0, an undetected output {circle around (2)} is output to the comprehensive non-detection unit 17-1 below the non-detection unit 15-21. Further, (ω1 ′, K1 ′, K2 ′) read by the comparison units 13-22 to 13-24 are respectively compared with the input (ω1, K1, K2), and determined by the match / mismatch determination unit 15-22. If there is a mismatch, a rehash value is generated and the dictionary memory 16-2 is searched again. If there is a match, a match output (5) is generated for the lower total match / mismatch determination unit 17-2.
[0043]
Further, in the 3-byte input search unit 10-3, the hash value H (ω1, K1, K1) is calculated from the index ω1 to be searched by the hash value generation unit 11-3 and the three characters (K1, K2, K3) up to the third byte. K2, K3) are created, and the dictionary memory 16-3 is searched with the address. In this case, as is apparent from the tree structure of FIG. 5, the next index ω4 is registered in the third byte dictionary as well as the presence / absence of registration.
[0044]
This is input to the comparison unit 13-31 of the 3-byte coincidence detection unit 12-3. If ω4 = 0, the detection non-determination unit 15-31 determines that there is no registration corresponding to the address, and the lower comprehensive detection is not performed. An undetected output {circle over (3)} is generated in the determination unit 17-1. When ω4 takes a valid value other than 0, that is, an appropriate value of the subsequent index, the index ω1 ′, characters K1 ′, K2 ′, and K3 ′ are obtained from the dictionary memory 16-3, respectively, and the hash value Is compared with each of the index ω1 and the input characters K1, K2, and K3 used to create the input, and based on the comparison result, the match / mismatch judgment unit 15-32 determines the output (6) as the lower overall match / mismatch judgment. To the unit 17-2.
[0045]
The total detection non-determination unit 17-1 and the total match / mismatch determination unit 17-2 cancel the dictionary search based on the presence / absence of detection from the three dictionary memories 16-1 to 16-3 and the match / mismatch determination result. Or decide to continue. That is, if all the dictionary memories 16-1 to 16-3 are detected and the dictionary search results are determined to match, the index ω1 is replaced with ω4 in order to continue the dictionary search, and each search is continued as feedback. The byte hash value generators 11-1, 11-2, and 11-3 are supplied to perform the same operation.
[0046]
If this search continuation condition is not satisfied, dictionary registration in the dictionary memories 16-1 to 16-3 is performed. That is, if no detection is made in all dictionary memories 16-1 to 16-3, one character is registered in the dictionary memory 16-1. If the dictionary memory 16-1 is detected and the dictionary memories 16-2 and 16-3 are not detected, the second character is registered in the dictionary memories 16-2 and 16-3. Further, if there is detection in the dictionary memories 16-1 and 16-2 and no detection in the dictionary memory 16-3, the third character is registered in the dictionary memory 16-3.
[0047]
7, 8, and 9 are embodiments of the hash value generation method in the multibyte search in the 1 to 3 byte hash value generation units 11-1 to 11-3 in FIG. 6. In this embodiment, the index ω is 16 bits, each character K1, K2, K3 is 8 bits each, and the hash value H is 16 bits.
FIG. 7 shows a case of 1-byte search. The exclusive OR EXOR of the 8 bits of the character K in the register 32 and the upper 8 bits of the 16 bits of the index ω1 of the register 34 is taken to obtain a 16-bit hash value H. The lower 8 bits are in the form of using the lower 8 bits of the index ω1 of the register 34 as they are.
[0048]
The generation of the hash value in the case of the 2-byte search of FIG. 8 is the exclusive logic for each of the 8 bits of the 2 characters K1 and K2 of the registers 32-1 and 32-2 and the 16 bits of the index ω1 of the register 34. The sum EXOR is taken to produce a 16-bit hash value H (ω1, K1, K2).
Furthermore, the generation of the hash value used for the 3-byte search of FIG. 9 is performed by first taking an exclusive OR EXOR with the 16 bits of the index ω1 of the register 40 for each 8 bits of the two characters K1, K2 by the registers 32, 36. Yes. Here, in the bit arrangement of 16 bits of the index ω1 in the register 40, the 10th to 15th 6 bits are located in the center, and the remaining 0th to 9th bits are arranged separately on both sides.
[0049]
Further, 8 bits of the third character K3 are set in the register 38, and an exclusive OR EXOR with the intermediate 8 bits of the register 40 is further taken to obtain a 16-bit hash value H (ω1, K1, K2, K3 ). In the creation of the hash value in the case of such a 3-byte search, the bits of the index ω1, characters K1, K2, and K3 are arranged so that each bit of the generated hash value is equal and symmetric.
[0050]
However, in the case of the 3-byte search of FIG. 9, since the third character K3 is 8 bits and the index ω1 is 16 bits, the bit value is different. Bit arrangement is performed so that the higher-order bits of the bit hash value are reflected. By such bit arrangement, collision of hash values to be created can be avoided as much as possible.
[0051]
10, 11, and 12 exemplify the case of creating a 12-bit hash value for each of 1-byte search, 2-byte search, and 3-byte search, as in FIGS. 7, 8, and 9. That is, in this case, the index ω is 12 bits, and a 16-bit hash value is created from each character 8 bits.
In this 12-bit hash value, there is a difference in the number of bits of the character and the index for the 1-byte search and 2-byte search of FIGS. 10 and 11, but for the 3-byte search of FIG. With the combination of K2 and K3, it is possible to make all three exclusive ORs EXOR for the 12-bit index, and the equal symmetry of each bit of the 12-bit hash value can be kept perfectly.
[0052]
Specifically, the characters K1 and K2 are stored in the registers 40 and 46, and at the same time, the character K3 is divided into upper and lower 4 bits and stored in the registers 44 and 46. Each register 44 and 46 has a 12-bit configuration. The number of bits is matched with the 12-bit index ω1. As a result, a 12-bit hash value H (ω, K1, K2, K3) arranged uniformly and symmetrically for three bits from the index and the characters K1, K2, and K3 can be created.
[0053]
FIG. 13 is a block diagram for realizing a method for generating a rehash value in the dictionary search / registration method of the present invention. That is, as shown in FIGS. 4 and 5, in the dictionary search / registration using the internal hash of the present invention, when a mismatch is detected by the match / mismatch detection, a rehash value is generated and the search is performed again. Do. The function of generating the rehash value is realized by an internal hash mechanism including the hash value generation unit 20, the dictionary search unit 22, the match / mismatch detection unit 24, and the rehash unit 26 illustrated in FIG.
[0054]
FIG. 14 shows a concept of re-hash value generation processing by the internal hash method in FIG. 13, taking 1-byte search as an example. That is, the hash value generation unit 20 obtains a hash value from the index ω1 and the character K1 by the hash function H (ω1, K1), the dictionary search unit 22 searches the dictionary memory 16-1, and the flag F indicating presence / absence exists. The index ω1 ′ and the character K1 ′ are read out.
[0055]
Subsequently, in the match / mismatch detection unit 24, the comparison unit 13-11 does not set the flag F = 0, but the match / mismatch determination unit 15-12 compares the index and the character by the comparison units 13-12 and 13-13. When the match is determined, the rehash unit 26 is activated to generate a rehash value RH. In general, the rehash value RH is generated by sequentially generating rehash addresses from a random number sequence determined by a rehash function RHm (H (ω1, K1)), m = 1, 2,... NMAX.
[0056]
FIG. 15 is a specific example of the rehash unit 26 of FIG. 14 and includes a hash value input unit 50, a calculation unit 52, and an exclusive OR circuit 54. In the configuration of the rehash unit 26, first, the first hash value is input to the expanded Galois field GF in the hash value input unit 50. ^{(2 ** m)} Is considered to be the original alpha of On the other hand, in the calculation unit 52, an enlarged Galois field GF prepared in advance is provided. ^{(2 ** m)} Yuan α ¹ , Α ² , Α ^Three , ... α ^max (However, max = 2 ^m -1) is generated. 2 ** m indicates 2 to the power of m.
[0057]
Then, in the exclusive OR circuit 54, the element α on the Galois field output from the hash value input unit 50 and the element α on the expanded Galois field generated one after another by the arithmetic unit 52 ⁱ Is taken as the rehash value. Of course, the original sequential generation on the expanded Galois field in the arithmetic unit 52 is repeated when a mismatch is detected in the dictionary search by the rehash value created by the exclusive OR circuit 52.
[0058]
FIG. 16 is a specific example of the calculation unit 52 of FIG. The calculation unit 52 sequentially generates elements on the enlarged Galois field, and the enlarged Galois field GF (2 ^m ) Primitive element α, an enlarged Galois field multiplication circuit 60, an expanded Galois field unit element α ^m-1 Is selected as an initial value, and thereafter, the selection circuit 58 selects the output of the expanded Galois field multiplication circuit 60.
[0059]
Here, taking as an example the case of having a four-dimensional binary vector of m = 4 as an expanded Galois field, in this case, the expanded Galois field is GF (2 ^Four ). These elements are expressed as powers of α. In this case, the number of elements α is 2 ^Four Pieces = 16 pieces, the original is 0, α ⁰ = 1, α ¹ , Α ¹ , ... α ¹⁴ , Α ¹⁵ It becomes. This element α can be obtained from the root of an expression called a primitive polynomial, and is called a primitive element.
[0060]
That is, Galois field GF (2 ^Four ) Primitive polynomial is
f (x) = x ^Four + X + 1
Can be expressed as
FIG. 17 shows Galois field GF (2 ^Four ) In four-dimensional vector representation (X ^Three , X ² , X ² , X ⁰ ) And power expression 0, α ¹ ~ Α ^Five Represents the relationship. i = 0 is called a zero element, and i = 1 to 14 are primitive elements. ¹ ~ Α ¹⁴ It is represented by Furthermore, i = 15 is called a unit element, and α ¹⁵ This is α ⁰ be equivalent to.
[0061]
In the hash value input unit 50 of FIG. 15, for example, if the hash value input is 8 bits in a 1-byte configuration, the hash value X divided into 4-byte units ^Three ~ X ⁰ For, the original value represented by i is output as the hash value as an element on the expanded Galois field. In the register 56 of FIG. 16, the expanded Galois field GF (2 ^Four ) Origin α ¹ Is set in the enlarged Galois field multiplication circuit 60 as P. The selection circuit 58 uses the enlarged Galois field GF2 as an initial value. ^m Unit element α ¹⁵ Is selected and output to the enlarged Galois field multiplication circuit 60 as PQ.
[0062]
The expanded Galois field multiplication circuit 60 is a logic circuit shown in FIG. That is, the start source α from the register 56 ¹ The logical operation shown in the figure is performed on each of the 4 bits of the input P and the input Q selected by the selection circuit 58, and finally the primitive polynomial f (x) = x in EXOR ^Four The remainder of + x + 1 is obtained and sequentially output as a value R representing an element on the expanded Galois field.
[0063]
Expanded Galois field GF (2 when m = 4 ^Four ) Is the maximum rehash value address that can be generated from the primitive polynomial when all bits are 1.
f (2) = 2 ^Four + 2 + 1 = 19
It becomes.
Further, addition on the enlarged Galois field multiplication circuit 60 shown in FIG. 18 and the expanded Galois field realized by the exclusive OR shown in FIG. 15 is performed according to the sum combination table shown in FIG. Further, the multiplication by the AND circuit in the enlarged Galois field multiplication circuit 60 of FIG. 18 is performed according to the product combination table of FIG.
[0064]
As an expanded Galois field for generating a rehash value in the present invention, a larger value than m = 4 can be taken. FIG. 21 shows an enlarged Galois field GF (2 ¹² The relationship between the element and the rehash value in FIG. Expanded Galois field GF (2 ¹² ) Primitive polynomial is
f (x) = x ¹² + X ⁶ + X ^Four + X + 1
Given in. In this case, the maximum address of the rehash value when all bits are 1 is 4179. In the rehash value of FIG. 21, the hexadecimal display of 0, 1, 2,..., 8, 9, a, b,.
[0065]
FIG. 22 shows an enlarged Galois field GF (2 when m = 16. ¹⁶ The relationship between the element and the rehash value in FIG. Expanded Galois field GF (2 ¹⁶ ) Primitive polynomial is
f (x) = x ¹⁶ + X ⁶ + X ^Four + X + 1
Given in. At this time, the maximum address of the rehash value which is all bits 1 is 65619.
[0066]
The above embodiment is an example of the dictionary search / registration method in the data compression device, but the same applies to the data restoration device that restores the original character string from the compressed data obtained by this data compression device. A dictionary registration method is applied.
That is, each time a plurality of predetermined characters are input, the data restoration apparatus adopting the dictionary search / registration method of the present invention uses an internal hash from a plurality of input characters following the already searched character string represented by the index. A hash value is generated, and a character string is restored from the compressed data obtained by searching and registering the dictionary for the plurality of input characters using the hash value.
[0067]
The dictionary search / registration method of the present invention with respect to such a data restoration device, each time a character string is restored by searching the dictionary memory from the inputted code ω, a plurality of characters (K1, K1) already restored following the input code ω. A hash value H (ω, K1, K2, K3,... Kn) is generated from K2, K3,... Kn by an internal hash, and a plurality of restored characters (ω, K1, K2, K2) are generated by the hash value. The dictionary is registered for K3,... Kn).
[0068]
For the restoration of multiple character units, the same dictionary registration method as that used for compression in FIGS. 2 to 12 is applied as it is.
In addition, as re-registration of the dictionary by rehashing at the time of data restoration, the dictionary search and registration method according to the present invention, each time a character string is restored by searching the dictionary from the input code ω, A hash value H (ω, K1, K2, K3,... Kn) is generated from characters (K1, K2, K3,... Kn) by an internal hash, and a plurality of restored characters are targeted by this hash value. When registration of a dictionary using the same hash value is already detected when the dictionary is registered, a random number sequence corresponding to the first hash value is sequentially generated and registered again in the dictionary.
[0069]
The same dictionary registration method as that used for compression in FIGS. 13 to 22 is also applied to this rehash processing.
In the above embodiment, the data structure based on the internal hash is taken as an example as a specific method of dictionary search. However, the present invention is not limited to this, and can be applied to the list structure based on the external hash. it can.
[0070]
【The invention's effect】
As described above, according to the present invention, it is possible to search and register a plurality of characters at a time in dictionary search and registration in a data compression apparatus and decompression apparatus such as LZW, thereby further speeding up the compression and decompression processing. be able to.
Also, by using an extended Galois field element to generate a rehash value when the dictionary search results do not match, it is possible to generate a completely different rehash value for the first hash value, which allows internal hash collisions As far as can be avoided.
[Brief description of the drawings]
FIG. 1 illustrates the principle of the present invention
FIG. 2 is a block diagram of a data compression apparatus for realizing the dictionary search / registration method of the present invention.
FIG. 3 is an explanatory diagram of dictionary search registration processing by the internal hash method of FIG.
FIG. 4 is a block diagram of parallel processing of dictionary search registration by the internal hash method for the number of characters n.
FIG. 5 is an explanatory diagram of a tree structure showing a dictionary range for searching and registering LZW codes according to the present invention.
FIG. 6 is a block diagram of dictionary search registration according to the present invention for performing a 3-byte batch search by an internal hash method.
FIG. 7 is a block diagram of a hash value generation unit that creates a 16-bit hash value for 1-byte search.
FIG. 8 is a block diagram of a hash value generation unit that creates a 16-bit hash value for 2-byte search.
FIG. 9 is a block diagram of a hash value generation unit that creates a 16-bit hash value for 3-byte search.
FIG. 10 is a block diagram of a hash value generation unit that creates a 12-bit hash value for 1-byte search.
FIG. 11 is a block diagram of a hash value generation unit that creates a 12-bit hash value for 2-byte search.
FIG. 12 is a block diagram of a hash value generation unit that creates a 12-bit hash value for 3-byte search.
FIG. 13 is a block diagram of a data compression apparatus that implements the dictionary search / registration method of the present invention including a rehash unit.
FIG. 14 is an explanatory diagram of generation processing of the rehash value of FIG.
15 is a specific block diagram of the rehash part of FIG.
16 is a block diagram of the arithmetic unit in FIG. 15 that sequentially generates elements of an enlarged Galois field.
FIG. 17: Expanded Galois field GF (2 ^Four ) Explanation of correspondence between elements and rehash values
18 is a logic circuit diagram of the expanded Galois field multiplier circuit of FIG.
FIG. 19: Expanded Galois field GF (2 ^Four ) Explanatory diagram of the sum join table used for addition
FIG. 20: Expanded Galois field GF (2 ^Four ) Explanatory drawing of product combination table used for multiplication
FIG. 21: Expanded Galois field GF (2 ¹² ) Explanation of correspondence between elements and rehash values
FIG. 22: Expanded Galois field GF (2 ¹⁶ ) Explanation of correspondence between elements and rehash values
FIG. 23 is an explanatory diagram of a tree structure of a dictionary in a conventional LZW code.
FIG. 24 is an explanatory diagram of character string encoding in the LZW code.
FIG. 25 is an explanatory diagram of a specific example of LZW encoding.
26 is an explanatory diagram of a dictionary used in the encoding of FIG.
FIG. 27 is a flowchart of an LZW encoding algorithm.
FIG. 28 is an explanatory diagram of a specific example of LZW decoding.
FIG. 29 is a flowchart of an LZW decoding algorithm.
FIG. 30 is a block diagram of dictionary search and registration by a conventional internal hash method.
FIG. 31 is an explanatory diagram of dictionary search and registration processing by a conventional internal hash method.
[Explanation of symbols]
10: Multiple input search section
10-1 to 10-n: 1 to n bytes search part
11-1 to 11-3: Hash value generator
13-11 to 13-34: comparison unit
12-1 to 12-n: match / mismatch detection unit
12: Multiple input coincidence detector
14: Multiple input registration section
15-11, 15-21, 15-31: No detection judgment unit
16, 16-1 to 16-3: Dictionary memory (dictionary)
17-1: Comprehensive detection non-judgement
17-2, 30: Total match / mismatch determination unit
20: Hash value generator
22: Dictionary search part
24: Match / mismatch detector
26: Rehash part
30: Total match / mismatch determination section
32,32-1,32-2,34,36,38,40,42,44,46,48: Register
50: Hash value input part
52: Random number generator
54: Exclusive OR circuit
56: Register
58: Selection circuit
60: Expanded Galois field multiplication circuit

Claims

In a dictionary search registration method of a data compression device that performs encoding by longest match search between an input character string and a character substring already registered in the dictionary,
Enter a predetermined number of multiple characters,
Each time the plurality of characters are input, a hash value is generated by an internal hash from the plurality of input characters following the already searched character string represented by the index,
By the hash value, it has row retrieval and registration of the dictionary target the plurality of input character,
A dictionary search / registration method of a data compression apparatus, wherein an exclusive OR of a plurality of input characters and an index of a character string already registered in a dictionary already searched is obtained as a hash function of the internal hash .

In the dictionary search registration method of the data compression apparatus of Claim 1,
When the number of the plurality of input characters is n, n dictionaries provided for each number of characters following the already searched character string represented by the index are represented by a hash value for each number of characters generated by the internal hash. A dictionary search / registration method for a data compression apparatus, wherein a dictionary search is performed simultaneously.

2. The dictionary search / registration method of the data compression apparatus according to claim 1 , wherein an exclusive OR of the plurality of input characters and the index of the dictionary is obtained so that each bit of the hash value is equal and symmetric. A dictionary search / registration method for a data compression apparatus.

4. The dictionary search / registration method of the data compression apparatus according to claim 3 , wherein the number of bits of the plurality of input characters and the index of the dictionary are different and each bit for obtaining an exclusive OR is not equal and symmetric. Includes a bit arrangement so as to obtain an exclusive OR on the high-order bit side, and a dictionary search and registration method for a data compression apparatus.

In a dictionary search / registration method of a data compression apparatus that performs encoding by a longest match search between an input character string K and a character substring already registered in the dictionary,
Enter a predetermined number of multiple characters,
Each time the plurality of characters are input, a hash value is generated by an internal hash from the plurality of input characters following the already searched character string represented by the index,
The hash value is used to search the dictionary for the plurality of input characters,
As a rehash value generated when a mismatch is detected in the dictionary search, a random number sequence corresponding to the first hash value is generated in order, and the dictionary is searched again.
A dictionary search / registration method of a data compression apparatus, wherein an exclusive OR of a plurality of input characters and an index of a character string already registered in a dictionary already searched is obtained as a hash function of the internal hash .

6. The dictionary search / registration method of the data compression apparatus according to claim 5 , wherein the first hash value is set as one element (α ⁱ ) on the expanded Galois field GF (2 ^m ) as the rehash value, and the element (α ⁱ ) and all elements (α ¹ , α ² ,... α ^{2 ** m-1} ) on the same expanded Galois field GF (2 ^m ) and random numbers by exclusive OR on the expanded Galois field A dictionary search / registration method for a data compression apparatus, wherein a sequence is generated in order.

In the dictionary search registration method of the data compression apparatus of Claim 6 ,
All the elements (α ¹ , α ² ,... Α ^{2 ** m-1} ) on the expanded Galois field are generated the last time (α ^i-1 ) and primitive element (α ¹ ). And a dictionary search / registration method for a data compression apparatus, which are generated in order by cumulative multiplication on an expanded Galois field.

In dictionary retrieval method of registering data compression apparatus according to claim 5, different said random number sequence 2 ^m pieces corresponds to 2 ^m pieces of state continuous, 2 ^m pieces of state to expand Galois field GF (2 ^m ) A dictionary search / registration method of a data compression apparatus which is regarded as an element (α) on the above and is generated in order by cumulative multiplication of the value of the element (α) and the primitive element (α ¹ ) on the expanded Galois field.

Each time a plurality of predetermined characters are input, a hash value is generated by an internal hash from the plurality of input characters following the already searched character string represented by the index, and the plurality of input characters are converted by the hash value. In a dictionary search / registration method of a data restoration device for restoring a character string from compressed data obtained by performing search and registration of the dictionary to an object,
Every time a character string is restored by searching the dictionary from the input code, a hash value is generated by an internal hash from a plurality of already restored input characters following the input code,
The hash value is used to register the dictionary for the plurality of restored characters,
A dictionary search / registration method of a data compression apparatus, wherein an exclusive OR of a plurality of input characters and an index of a character string already registered in a dictionary already searched is obtained as a hash function of the internal hash .

Each time a plurality of predetermined characters are input, a hash value is generated by an internal hash from the plurality of input characters following the already searched character string represented by the index, and the plurality of input characters are converted by the hash value. In a dictionary search / registration method of a data restoration device for restoring a character string from compressed data obtained by performing search and registration of the dictionary to an object,
Every time a character string is restored by searching the dictionary from the input code, a hash value is generated by an internal hash from a plurality of already restored input characters following the input code,
The hash value is used to register the dictionary for the plurality of restored characters, and when the registration using the same hash value is detected during the dictionary registration, a random number sequence corresponding to the first hash value In order and register it again in the dictionary,
A dictionary search / registration method of a data compression apparatus, wherein an exclusive OR of a plurality of input characters and an index of a character string already registered in a dictionary already searched is obtained as a hash function of the internal hash .