JP4000266B2

JP4000266B2 - Data encoding apparatus, data encoding method, and program thereof

Info

Publication number: JP4000266B2
Application number: JP2002064271A
Authority: JP
Inventors: 由希子山崎; 高弘柳下
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2002-03-08
Filing date: 2002-03-08
Publication date: 2007-10-31
Anticipated expiration: 2022-03-08
Also published as: JP2003264703A

Description

【０００１】
【発明の属する技術分野】
本発明は、データ符号化装置、データ符号化方法、及びそのプログラムに関し、具体的にはＬＺ７７、およびＬＺ８８のような辞書ベース方式を元にした圧縮技術を基に、特に画像データを効率良く圧縮するデータ符号化装置、データ符号化方法、及びそのプログラムに関する。
【０００２】
【従来の技術】
近年、スキャナ、プリンタ、デジタルカメラ等の画像を扱う分野では、高解像度化が進み画像１枚あたりにおけるデータ量、いわゆる画素数は膨大なサイズになってきている。その膨大なデータを効率よくネットワーク上で転送したり、ハードディスク、メモリ等の記憶媒体に保存するために、処理速度、圧縮率において効率のよい圧縮方法が求められている。
【０００３】
データを効率よく圧縮する方法として、ユニバーサル符号により圧縮する方法が実用化されている。このユニバーサル符号は、情報保存型のデータ圧縮方法であり、データ圧縮時に情報源の統計的な性質を予め仮定しないため、種々のタイプ（文字コード、オブジェクトコードなど）のデータに適用することが出来る。文書画像では、文字の輪郭等や文字間隔に類似性があり、又、網点画像は網点周期性、網点形状の同一性等が類似している。この類似性の持つ冗長性をユニバーサル符号により削減し、有効な圧縮を行うことが出来る。
【０００４】
ユニバーサル符号の代表的な方法として、ジブ−レンペル(Ziv-Lempel)符号がある。Ziv-Lempel符号では、ユニバーサル型と、増分分解型(Incremental parsing) の２つのアルゴリズムが提案されており、ユニバーサル型アルゴリズムを用いた実用的な方法として、ＬＺＳＳ符号があり、又、増分分解型アルゴリズムを用いた実用的な方法として、ＬＺＷ（Lempel- Ziv- Welch）符号がある。
【０００５】
ＬＺＳＳのベースとなるＬＺ７７のアルゴリズムは、符号化データを過去のデータ系列の任意の位置から一致する最大長の系列に区切り、過去の系列の複製として符号化する方法である。具体的には、符号化済みの入力データを格納する移動窓と、これから符号化するデータを格納する先読みバッファとを備え、先読みバッファのデータ系列と移動窓のデータ系列のすべての部分系列とを照合して、移動窓中で一致する最大長の部分系列を求める。
【０００６】
そして、移動窓中でこの最大長の部分系列を指定するために、「その最大長の部分系列の開始位置」と「一致する長さ」と「不一致をもたらした次のシンボル」との組を符号化する。次に、先読みバッファ内の符号化したデータ系列を移動窓に移して、先読みバッファ内に符号化したデータ系列分の新たなデータ系列を入力する。以下、同様の処理を繰り返していくことで、データを部分系列に分解して符号化を実行していくのである。
【０００７】
一般的に、ＬＺＳＳ符号化は、符号化済の入力データを保存している移動窓数分、入力データ列との最長一致を検出する必要があるため、演算量は多いが高圧縮率が得られるといわれている。
【０００８】
また、ＬＺＷ符号化は、書き換え可能な辞書を設け、入力文字列を相異なる文字列に分け、この文字列を出現した順に番号を付けて辞書に登録すると共に、現在入力している文字列を辞書に登録してある最長一致文字列の辞書番号だけで表して符号化するというものである。このＬＺＷ符号化によれば、圧縮率はＬＺＳＳ符号化より劣るが、シンプルで、計算も容易で、高速処理が出来ることから記憶装置のファイル圧縮、データ伝送などで使われるようになっている。
【０００９】
また、特開平１１−１６８３８９号公報「データ圧縮装置」によれば、移動窓数分の比較器を並列に持つのはハード負荷が大きいため、相関のありそうな、場所（真上、左など）に対する比較器のみに限定することで、処理速度、及びハード規模を考慮した方法が提案されている。
【００１０】
また、特開平９−１８１６１０号公報「パターン圧縮方式及び装置」によれば、ＬＺ法などで有効に圧縮できない長大な繰り返しパターンを少ない演算量で圧縮するというもので、ＬＺ圧縮されたデータから低周波成分を抽出し、間引きし、自己相関を算出し、それに基づきパターンマッチングを行い、繰り返しを検出するという方法が提案されている。
【００１１】
【発明が解決しようとする課題】
移動窓を使った方式であるＬＺＳＳのような過去の符号化済みのデータ中から一致するデータ列を検索する場合、窓サイズが大きいほど一致精度が上がるが、その分比較に要する処理が多くなり、処理速度、ハード規模に負担がかかってしまう。
【００１２】
また、上記特開平１１−１６８３８９号公報の方式では、比較処理は少なくて済むが、一致度の高い場所を保存しておくことにより結局、窓サイズとしてのバッファサイズが大きくなってしまう。また、上記特開平９−１８１６１０号公報の方式では、バッファを超えてしまうような長大な繰り返しパターンを発見するために、この方式は多数の工程を必要とするため、処理速度の点で問題があるといえる。
【００１３】
本発明の方式では、窓サイズを可能な限り少なく設定することにより、比較処理も少なくて済み、処理速度の向上、及びハード規模を抑えるという利点がある。また、比較対象単位（符号化単位）のサイズを大きくすることで、長大なパターンを短いパターンに変換することが可能となり、繰り返しパターンを短い距離で発見することが可能となる。
【００１４】
本発明は、符号化対象データの持つ周期がシンボル列の繰り返しとして発生することを利用し、それをランレングス符号化することで効率よく圧縮する。また、これから符号化するシンボル列の直前の符号化済データを比較対象とすることで、比較範囲を小さくすることが可能となり、処理速度の向上、及びハード規模を小さくすることが可能となるデータ符号化装置、データ符号化方法、及びそのプログラムを提供することを第１の目的とする。
【００１５】
また、本発明は、例えばディザ等の周期を発生させる処理が施されたデータが符号化対象データの場合、この周期に基づいた符号化単位にしておくことにより、元画像の周期を得るに当たり効率良く周期を得ることが出来、すなわち比較範囲をより小さくすることが可能となり、処理速度の向上、及びハード規模を小さくことが可能となるデータ符号化装置、データ符号化方法、及びそのプログラムを提供することを第２の目的とする。
【００１６】
また、本発明は、符号化対象データの途中で周期が変わるような場合（例えば画像データの場合、ディザの混在など）にも対応することが可能となるデータ符号化装置、データ符号化方法、及びそのプログラムを提供することを第３の目的とする。
【００１７】
また、本発明は、連続数に下限のしきい値を設けることで、連続数を符号化する際、圧縮率の低下を防ぎ符号化効率を上げることが可能となるデータ符号化装置、データ符号化方法、及びそのプログラムを提供することを第４の目的とする。
【００１８】
また、本発明は、連続数に上限のしきい値を設けることで、連続数を符号化する際、符号のビット長を制限することが出来、符号化効率を上げることが可能となるデータ符号化装置、データ符号化方法、及びそのプログラムを提供することを第５の目的とする。
【００１９】
また、本発明は、連続する事象が再帰的に発生した場合、これについて異なる符号化を行うことで圧縮効率を上げることが可能となるデータ符号化装置、データ符号化方法、及びそのプログラムを提供することを第６の目的とする。
【００２０】
また、本発明は、符号化を進めた結果、目標とする所定の圧縮率に対して差が大きい場合、最適な処理がなされているとはいえない。このような不具合の対策として比較範囲を適応的に変化させることで圧縮率、処理速度の面で最適な圧縮効果を上げることが可能となるデータ符号化装置、データ符号化方法、及びそのプログラムを提供することを第７の目的とする。
【００２１】
また、本発明は、符号化を進めた結果、ある所定の範囲における圧縮率が目標とする所定の圧縮率に対して所定値を満たしていない場合、比較範囲を広げることにより、連続するパターンの発生確立を上げ、圧縮効率を上げることが可能となるデータ符号化装置、データ符号化方法、及びそのプログラムを提供することを第８の目的とする。
【００２２】
また、本発明は、符号化を進めた結果、ある所定の範囲における圧縮率が目標とする所定の圧縮率に対して所定値を満たしていた場合、その比較範囲は十分であるといえる。このとき更に比較範囲を狭くすることにより、比較処理を高速化することが可能となるデータ符号化装置、データ符号化方法、及びそのプログラムを提供することを第９の目的とする。
【００２３】
【課題を解決するための手段】
上記目的を達成するために、請求項１記載の発明は、シンボル列中、現符号化時点に隣接する、符号化済みのＭ個のシンボル列と、符号化前のＭ個のシンボル列と、が一致するか否かを判断し、一致すると判断した場合に、前記符号化前のＭ個のシンボル列に、さらに、連続するＭ個のシンボル列が前記符号化済みのシンボル列と一致するか否かの判断を繰り返す比較手段と、前記比較手段により一致すると判断された一致回数をカウントするラン判定手段と、前記比較手段により一致すると判断された複数のシンボル列に代えて、前記Ｍの値と前記一致回数を符号化する符号化手段と、を有したデータ符号化装置であって、さらに、所定の処理範囲の圧縮率を算出する圧縮率算出手段と、前記圧縮率が目標値を達成しているか否かを判断する判断手段と、を有し、前記判断手段により、圧縮率が目標値を達成していると判断されたか否かに応じて、前記比較手段により比較するシンボル数の上限値を変える、ことを特徴とするデータ符号化装置である。
また、請求項２記載の発明は、請求項１記載のデータ符号化装置において、前記判断手段によって、圧縮率が目標値を達成していないと判断された場合には、前記比較手段が比較するシンボル数の上限値を上げ、圧縮率が目標値を達成していると判断された場合には、前記比較手段が比較するシンボル数の上限値を下げることを特徴とする。
【００２４】
請求項３記載の発明は、請求項１又は２記載のデータ符号化装置において、前記シンボル列を符号化対象データから作成するシンボル作成手段を有し、前記符号化対象データの持つ周期に基づいて、１シンボルあたりのビット数を決定することを特徴とする。
【００２５】
請求項４記載の発明は、請求項３記載のデータ符号化装置において、前記シンボル作成手段は、前記符号化対象データの持つ周期に基づいて、符号化対象データから複数ビットのデータを切り出す切り出し手段と、切り出し手段により切り出されたデータをM oveToFront法により変換して前記シンボルを得る変換手段と、からなることを特徴とする。
【００２６】
請求項５記載の発明は、請求項４記載のデータ符号化装置において、前記シンボル作成手段は、前記符号化データから周期を検出する周期検出手段をさらに有し、前記切り出し手段は、当該検知された周期に基づいて、切り出しビット数を変えることを特徴とする。
【００２７】
請求項６記載の発明は、請求項１から５のいずれか１項記載のデータ符号化装置において、前記符号化手段は、前記ラン判定手段によりカウントされた前記一致回数が所定の回数以上である場合には、前記Ｍの値と前記一致回数とを符号化し、前記ラン判定手段によりカウントされた前記一致回数が所定の回数未満である場合には、前記符号化前のシンボル列を符号化することを特徴とする。
【００２８】
請求項７記載の発明は、請求項１から６のいずれか１項記載のデータ符号化装置において、前記符号化手段は、符号化しようとしている前記Ｍの値と前記一致回数と前記一致しないと判断された最初のシンボルと、直前に符号化した前記Ｍの値と前記一致回数と前記一致しないと判断された最初のシンボルと、において前記一致回数が等しい場合には、前記一致回数を省略した符号化を行うことを特徴とする。
【００３１】
請求項８記載の発明は、シンボル列中、現符号化時点に隣接する、符号化済みのＭ個のシンボル列と、符号化前のＭ個のシンボル列と、が一致するか否かを判断し、一致すると判断した場合に、前記符号化前のＭ個のシンボル列にさらに、連続するＭ個のシンボル列が前記符号化済みのＭ個のシンボル列と一致するか否かの判断を繰り返す比較ステップと、前記比較手段により一致すると判断された一致回数をカウントするラン判定ステップと、前記比較手段により一致すると判断された複数のシンボル列に代えて、前記Ｍの値と前記一致回数とを符号化する符号化ステップと、を有したデータ符号化方法であって、さらに、所定の処理範囲の圧縮率を算出する圧縮率算出ステップと、前記圧縮率が目標値を達成しているか否かを判断する判断ステップと、前記判断ステップにより、圧縮率が目標値を達成していると判断されたか否かに応じて、前記比較ステップにより比較するシンボル数の上限値を変えるステップと、を有することを特徴とするデータ符号化方法である。
【００３２】
請求項９記載の発明は、シンボル列中、現符号化時点に隣接する、符号化済みのＭ個のシンボル列と、符号化前のＭ個のシンボル列と、が一致するか否かを判断し、一致すると判断した場合に、前記符号化前のＭ個のシンボル列に、さらに、連続するＭ個のシンボル列が前記符号化済みのシンボル列と一致するか否かの判断を繰り返す比較処理と、前記比較手段により一致すると判断された一致回数をカウントするラン判定処理と、前記比較手段により一致すると判断された複数のシンボル列に代えて、前記Ｍの値と前記一致回数を符号化する符号化処理と、をコンピュータに実行させるデータ符号化プログラムであって、さらに、所定の処理範囲の圧縮率を算出する圧縮率算出処理と、前記圧縮率が目標値を達成しているか否かを判断する判断処理と、前記判断処理により、圧縮率が目標値を達成していると判断されたか否かに応じて、前記比較処理により比較するシンボル数の上限値を変える処理と、をコンピュータに実行させることを特徴とするデータ符号化プログラムである。
【００３４】
【発明の実施の形態】
以下、本発明の実施の形態を添付図面を参照しながら詳細に説明する。
【００３５】
本発明の第１の実施の形態について説明する。まず、基本構成を図１のブロック図に示す。図１の入力データはテキストデータでもバイナリデータでもよいが、ここでは１bit ／画素のディザ処理等の中間調処理された画像データを入力データとした例を示す。まず、入力データを複数ビット切り出し部１０１において、数ビットをかためて１データとして切り出す。次に中間データ作成部１０２においてＭＴＦ（MoveToFront ）等の手法により各データをＮビットのデータに変換して中間データとして出力する。
【００３６】
ここで、中間データに変換する意味は入力データの冗長性を取り除くという理由である。例えば、複数ビット切り出し部１０１において切り出すかたまりが比較的小さい場合、例えば４ビット程度だったとき、切り出されたデータの発生パターンは、たかだか１６通りである。このような場合はよいが、例えば切り出すビット数が２０ビットだった場合は２の２０乗もの種類になってしまう。
【００３７】
こうなると発生パターンが膨大になり、ソフト上の処理速度的にもハード規模においても支障が生じる。しかしながら、ディザ処理が施された画像データにはもともと周期性がある。ディザのもつ周期が２０ビットだったとして、それに基づいて２０ビットを１データ化すると、その分布はかなり偏りのあるものとなるはずである。それを利用して、ＭＴＦ（MoveToFront ）で２０ビットのデータを比較的小さいビット数のＮビットに変換して中間データとして出力する。
【００３８】
ＭＴＦについて、図２を参照して説明する。ここではＭＴＦに入力されるデータは８ビットの文字データであるとする。まず、図２の▲１▼のように入力シンボル”Ａ”があったとき、”Ａ”が格納されている辞書のインデックス番号”２”を出力値とする変換を行う。次に、▲２▼のように入力シンボルの”Ａ”を辞書の先頭に移動して、辞書を更新する。
【００３９】
これによって処理が進むにつれ、よく出現するシンボルが小さいインデックス番号の位置に集まるので、任意の分布形状を持った入力を、０をピークとする分布に変更することが出来る。この辞書の登録数は、任意の数に制限出来る。登録されている以外のシンボルが入力された場合、”ＥＳＣ符号＋生データ”として出力し、そのシンボルを新規に辞書の先頭に登録することで対応出来る。
【００４０】
図１に戻り、ＭＴＦなどの処理が施された中間データは比較器１０３に入力される（以下、この中間データを文字と呼ぶ）。比較器１０３について、図３、図４を参照して説明する。バッファのサイズを１６文字分（ｎ＝16）とし、すでに８文字分が符号化されているものとする（図３ａ）。バッファ中の符号化開始位置（図の点線の位置）から始まる文字（点線の右側）と、バッファ中の符号化済みの文字列の符号化開始位置の直前点線の左側）の８種類の長さの文字列とを比較して最長に一致するものを検索する（ステップＳ１）。
【００４１】
この場合”ａｂｃ”が最長一致と判定されるが（ステップＳ１／ＹＥＳ）、ここでは符号を出力せずに一時、この先頭アドレス、すなわち文字列長を表す”３”を一時記憶しておく。次に、この一致した文字列３文字分が符号化済みのバッファ部分の最後尾にくるように、バッファ中の文字を左にシフトし、符号化開始位置をずらす（図３ｂ）。次の符号化開始位置をみると先ほどと同様に”ａｂｃ”の文字列が最長一致し、再度アドレス”３”が判定される。
【００４２】
ここで、図１に示すラン判定部１０４において、先ほど一時保存されているアドレスと比較し、同値であることが分かる（ステップＳ２／ＹＥＳ）。ここでランを＋１する（ステップＳ３）。ここでもまだ符号を出力しない。次に先の説明と同様に３文字分左にシフトする（ステップＳ５）。今度の符号化開始文字列と符号化済み文字列とは一致する文字列がない（ステップＳ１／ＮＯ）。このように、一致しなかったときにはじめて今までのランと現在の一致しなかった文字を符号として出力する（ステップＳ８）。この時の符号は”ａｂｃ”の開始位置、すなわち文字の長さである”３”と、ランの”２”、そして”ｄ”の符号が出力される（図３ｃ）。以上の比較器１０３による処理の詳細は、図４のフローチャートに示した。
【００４３】
このように圧縮効率を上げるためにランを発生させたい。すなわち文字列が繰り返されるように、元画像の持つ周期を利用し、それに基づいて中間データを作成する。例えば符号化対象データが８bit ／画素のデータで、これに２０×２０のディザマトリクスサイズのディザが施されて１bit ／画素に中間調処理されていたとする。このとき中間調処理後のデータはＸ方向に２０の周期で類似したデータが繰り返されることが予想出来る。この”２０”という周期が既知である場合、図１における複数ビット切り出し部１０１は、入力データを２０ビット単位に切り出して中間データ作成部１０２に渡す。これにより、切り出されたデータに類似性が増すので、比較器１０３において一致精度を向上させることが出来る。
【００４４】
また、上記符号化対象データが持つ周期を自動的に検出する。図５に処理ブロック図を示す。図１に示した処理の先頭に周期検出部５０１が追加される。図６に周期の検出方法の一例を示す。このような自動周期検出部５０１を持てば、その判定範囲によっては１ライン中でも異なる周期を発見することが出来る。例えば、画像処理によってはイメージ部分とグラフィック部分とが異なるディザを施される場合がある。すなわち異なる周期が１ライン中に発生しても対応することが可能となる。
【００４５】
以上のような符号化対象データの周期を利用した切りだしは圧縮効果を上げるものであるが、もちろんこのような処理をしなくても所定の効果は得られるものである。
【００４６】
次に第２の実施の形態について説明する。第１の実施の形態に示した例において、図１のラン判定部１０４で、ランについてのしきい値を設けるというものである。図６に示したように、ランを作成しない場合、”ａｂ”という文字列があったとして、”ａ”が３ビット、”ｂ”が４ビットの符号が割り当てられていたとすると、”ａｂ”で７ビットとなる。
【００４７】
ランの符号が例えば、ＥＳＣ２符号＝５ビット、アドレス＝５ビット、ラン長＝８ビットだとすると１８ビット使用することになる。ランを作成する場合、通常の１文字単位の符号化よりも圧縮効率をあげることが前提となるため、この例では"ab"が２回で１４ビット、３回で２１ビットとなるため、３回以上の場合でないとランを作成する効果が得られないことになる。よってランの最低値となるしきい値(A) が必要となる。
【００４８】
次、ラン長の上限のしきい値（B ）を設ける。すなわち、ラン長を表すビット数を制限するというものである。一般にバイナリデータが圧縮対象のランレングス符号化は、ラン長が長ければ長いほど圧縮効率が上がるが、今回のようなランの符号としてのビット長が固定である場合、ラン長を表す符号のビット長はある程度制限する必要がある。なぜなら実際に短いラン長しか発生しなかった場合に無駄なビットを出力することになるからである。
【００４９】
以上のようなラン長の下限のしきい値は、上記の例ではランの値のみから判断したが、実際発生するビット数は、文字列を構成する各文字のビット長とランの個数によって決定するのでそれらをすべて考慮して圧縮効率が低下しないしきい値を設定することも出来る。
【００５０】
次に、第３の実施の形態について図７を参照して説明する。図７のような符号化対象データがバッファ中にあるとする。このデータを第１の実施の形態に示した処理で符号化すると、”Ａの符号＋ＥＳＣ２符号＋アドレス(1) ＋ラン（3 ）”に続いて、”Ｂの符号＋ＥＳＣ２符号＋アドレス(1) ＋ラン(3) ”、”Ｃの符号＋ＥＳＣ２符号＋アドレス(1) ＋ラン(3) ”と出力されることになる。
【００５１】
しかしながら、この３種類の符号は見て分かるように構成は全く同じであって、文字の種類だけが異なるものである。これを効率良く符号化するために、本実施の形態では”BBB ”の出力は”Ｂの符号＋ＥＳＣ３符号”に置き換える。同様に次の”CCC ”は”Ｃの符号＋ＥＳＣ３符号”とする。このようにランを２段階作成することで、更に圧縮効率を向上させることが出来る。ここでの例は１文字の連続をあげたが、”ABCABCABCDEFDEFDEF”のような３文字がＮ回繰り返される様な場合にも適応できる。
【００５２】
次に、第４の実施の形態について説明する。本実施の形態は、図１における比較器１０３において、比較処理におけるバッファサイズをある基準を元に適応的に変化させるというものである。図８にフローチャートを示す。ある所定の処理範囲を設定し、その処理範囲が終了したら（ステップＳ２３／ＹＥＳ）そこまでの符号化効率として圧縮率（M ）を算出し（ステップＳ２４）、ある所定の目標圧縮率を達成しているか否かを判定し（ステップＳ２５）、達成していなかった場合に（ステップＳ２５／ＮＯ）、比較範囲（＝窓サイズ）を広げる（ステップＳ２６）。
【００５３】
通常のＬＺ方式等で持つ窓サイズ、特に符号化済みのデータを格納するバッファはかなり大きなサイズを必要とする。これは圧縮効率を上げるためであるが、このサイズによっては処理速度や回路規模に多大な影響を与える。
【００５４】
本発明において第１の実施の形態から説明に使用したバッファサイズはもともと処理速度、及び回路規模を満たすのに十分な程度の極少ないサイズを想定し、局所的な繰り返しを抽出することを目的としている。したがって、実際に窓サイズよりも長い周期の繰り返しが存在していた場合、これを検出できない場合もある。これを回避するために、窓サイズを広げて処理を進める。広げるサイズにも上限を設定することで処理速度の低下を抑えることが出来る。
【００５５】
また、上記目的とは逆に、目標圧縮率が達成されている場合、窓サイズを小さくする。小さくすることで、比較器１０３の処理速度を向上させる効果がある。こちらも同様に処理速度とのバランスで必要以上に小さくする必要はなく、圧縮効率を低下させない程度の制限を設定しておくことも出来る。
【００５６】
なお、上述した実施の形態は、本発明の好適な実施の形態の一例を示すものであり、本発明はそれに限定されることなく、その要旨を逸脱しない範囲内において、種々変形実施が可能である。
【００５７】
なお、本発明はコンピュータにプログラムを実行させることにより実現可能である。当該プログラムは、光記録媒体、磁気記録媒体、光磁気記録媒体もしくは半導体ＩＣ記録媒体に記録されて提供されるか、またはプログラムサーバからＦＴＰ、ＨＴＴＰ等のプロトコルによりネットワークを介してダウンロードされて提供される。
【００５８】
【発明の効果】
以上の説明から明らかなように、本発明によれば次のような効果が得られる。符号化対象データの持つ周期がデータ列の繰り返しとして発生することを利用し、それをランレングス符号化することで効率よく圧縮することが出来る。また、これから符号化するシンボル列の直前の符号化済データを比較対象とすることで、比較範囲を小さくすることにより、処理速度の向上、及びハード規模を小さくすることが出来る。
【００５９】
また、本発明によれば、例えば、ディザ等の周期を発生させる処理が施されたデータが符号化対象データの場合、この周期に基づいた符号化単位にしておくことにより、元画像の周期を得るに当たり効率良く周期を得ることが出来、すなわち比較範囲をより小さくすることが可能となり、処理速度の向上、及びハード規模を小さくすることが出来る。
【００６０】
また、本発明によれば、符号化対象データの途中で周期が変わるような場合（たとえば画像データの場合、ディザの混在など）にも対応することが出来る。
【００６１】
また、本発明によれば、連続数に下限のしきい値を設けることで、連続数を符号化する際、圧縮率の低下を防ぎ符号化効率を上げることが出来る。
【００６２】
また、本発明によれば、連続数に上限のしきい値を設けることで、連続数を符号化する際、符号のbit 長を制限することが出来、符号化効率を上げることが出来る。
【００６３】
また、本発明によれば、連続する事象が再帰的に発生した場合、これについて、異なる符号化を行うことで圧縮効率を上げることが出来る。
【００６４】
また、本発明によれば、符号化を進めた結果、目標とする所定の圧縮率に対して差が大きい場合、最適な処理がなされているとはいえない。このような不具合の対策として比較範囲を適応的に変えることで、圧縮率、処理速度の面で最適な条件を得ることが出来る。
【００６５】
また、本発明によれば、符号化を進めた結果、ある所定の範囲における圧縮率が目標とする所定の圧縮率に対して所定値を満たしていない場合、比較範囲を広げることにより、連続するパターンの発生確立を上げることが出来、圧縮効率を上げることが出来る。
【００６６】
また、本発明によれば、符号化を進めた結果、ある所定の範囲における圧縮率が目標とする所定の圧縮率に対して所定値を満たしていた場合、その比較範囲は十分であるといえる。このとき更に比較範囲を狭くすることにより比較処理を削減出来るので、比較処理を高速化することが出来る。
【図面の簡単な説明】
【図１】本発明の基本構成を示すブロック図である。
【図２】ＭＴＦを説明するための図である。
【図３】比較器１０３を説明するための図である。
【図４】比較器１０３の処理を説明するためのフローチャートである。
【図５】本発明のバリエーションの構成を示すブロック図である。
【図６】周期を自動検出するための一構成例を示す図である。
【図７】第３の実施の形態を説明するための図である。
【図８】第４の実施の形態を説明するためのフローチャートである。
【符号の説明】
１０１複数ビット切り出し部
１０２中間データ作成部
１０３比較器
１０４ラン判定部
５０１周期検出部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a data encoding device, a data encoding method, and a program thereof, and more specifically, compresses image data particularly efficiently based on a compression technique based on a dictionary-based method such as LZ77 and LZ88. The present invention relates to a data encoding device, a data encoding method, and a program thereof.
[0002]
[Prior art]
In recent years, in the field of handling images, such as scanners, printers, and digital cameras, the resolution has increased and the amount of data per image, the so-called number of pixels, has become enormous. In order to efficiently transfer such enormous data over a network or to store it in a storage medium such as a hard disk or memory, an efficient compression method is required in terms of processing speed and compression rate.
[0003]
As a method for efficiently compressing data, a method using a universal code has been put into practical use. This universal code is an information storage type data compression method, and since it does not presume the statistical nature of the information source at the time of data compression, it can be applied to data of various types (character code, object code, etc.). . A document image has similarities in character outlines and character spacings, and a halftone image has similar halftone dot periodicity, dot shape identity, and the like. Redundancy with this similarity can be reduced by universal codes, and effective compression can be performed.
[0004]
As a typical method of the universal code, there is a Ziv-Lempel code. In the Ziv-Lempel code, two algorithms of the universal type and the incremental decomposition type (Incremental parsing) have been proposed. As a practical method using the universal type algorithm, there is the LZSS code, and the incremental decomposition type algorithm. There is an LZW (Lempel-Ziv-Welch) code as a practical method using the code.
[0005]
The LZSS algorithm, which is the base of LZSS, is a method of encoding encoded data as a duplicate of a past sequence by dividing encoded data into a maximum length sequence that matches from any position in the past data sequence. Specifically, a moving window for storing encoded input data and a prefetch buffer for storing data to be encoded are provided, and a data sequence of the prefetch buffer and all partial sequences of the data sequence of the moving window are included. Collation is performed to find the maximum length subsequence that matches in the moving window.
[0006]
Then, in order to specify this maximum length subsequence in the moving window, a set of “start position of the maximum length subsequence”, “matching length”, and “next symbol that caused mismatch” Encode. Next, the encoded data sequence in the prefetch buffer is moved to the moving window, and a new data sequence for the data sequence encoded in the prefetch buffer is input. Thereafter, by repeating the same processing, the data is decomposed into partial series and encoded.
[0007]
In general, LZSS encoding requires detection of the longest match with the input data string by the number of moving windows that store the encoded input data. It is said that
[0008]
In addition, LZW encoding provides a rewritable dictionary, divides an input character string into different character strings, registers the numbers in the order in which the character strings appear, and registers the currently input character string. The encoding is performed by representing only the dictionary number of the longest matching character string registered in the dictionary. According to this LZW encoding, the compression rate is inferior to that of LZSS encoding, but it is simple, easy to calculate, and capable of high-speed processing, so that it is used for file compression, data transmission, etc. of a storage device.
[0009]
Further, according to “Data compression device” of Japanese Patent Application Laid-Open No. 11-168389, having a number of comparators in parallel corresponding to the number of moving windows has a large hardware load. The method considering the processing speed and the hardware scale has been proposed.
[0010]
According to “Pattern compression method and apparatus” of Japanese Patent Laid-Open No. 9-181610, a long repeating pattern that cannot be effectively compressed by the LZ method or the like is compressed with a small amount of calculation. A method has been proposed in which frequency components are extracted, thinned out, autocorrelation is calculated, pattern matching is performed based on the autocorrelation, and repetition is detected.
[0011]
[Problems to be solved by the invention]
When searching for matching data strings from past encoded data such as LZSS, which uses a moving window, the matching accuracy increases as the window size increases, but the amount of processing required for comparison increases accordingly. , Processing speed and hardware scale will be burdened.
[0012]
Further, in the method disclosed in Japanese Patent Application Laid-Open No. 11-168389, the comparison processing is small, but storing a place with a high degree of coincidence eventually increases the buffer size as the window size. In the method of the above-mentioned Japanese Patent Laid-Open No. 9-181610, this method requires a large number of steps in order to find a long repetitive pattern that exceeds the buffer. It can be said that there is.
[0013]
In the method of the present invention, setting the window size as small as possible reduces the number of comparison processes, and has the advantages of improving the processing speed and reducing the hardware scale. In addition, by increasing the size of the comparison target unit (encoding unit), it is possible to convert a long pattern into a short pattern, and it is possible to find a repeated pattern at a short distance.
[0014]
The present invention utilizes the fact that the period of the encoding target data is generated as a repetition of a symbol sequence, and efficiently compresses the data by run-length encoding it. Further, by making the encoded data immediately before the symbol string to be encoded a comparison target, it becomes possible to reduce the comparison range, improve the processing speed, and reduce the hardware scale. A first object is to provide an encoding device, a data encoding method, and a program thereof.
[0015]
In addition, when the data subjected to the processing for generating the cycle such as dither is the data to be encoded, the present invention is efficient in obtaining the cycle of the original image by setting the encoding unit based on this cycle. A data encoding device, a data encoding method, and a program thereof that can obtain a good period, that is, can reduce the comparison range, improve processing speed, and reduce the hardware scale are provided. This is the second purpose.
[0016]
The present invention also provides a data encoding apparatus, a data encoding method, and the like that can cope with a case in which the cycle changes in the middle of the encoding target data (for example, in the case of image data, a mixture of dithers). And a third object thereof is to provide the program.
[0017]
In addition, the present invention provides a data encoding device and a data code that can prevent a reduction in compression rate and increase encoding efficiency when a continuous number is encoded by providing a lower threshold for the continuous number. A fourth object is to provide a computerization method and a program therefor.
[0018]
In addition, the present invention provides a data code that can limit the bit length of a code when encoding the continuous number by providing an upper limit threshold value for the continuous number, and can increase the encoding efficiency. It is a fifth object to provide an encoding device, a data encoding method, and a program therefor.
[0019]
In addition, the present invention provides a data encoding apparatus, a data encoding method, and a program thereof that can improve compression efficiency by performing different encoding on consecutive events that occur recursively. This is the sixth purpose.
[0020]
Further, according to the present invention, when the difference is large with respect to the target predetermined compression rate as a result of the progress of encoding, it cannot be said that the optimum processing is performed. As a countermeasure for such a problem, a data encoding device, a data encoding method, and a program thereof that can increase the compression effect optimal in terms of compression rate and processing speed by adaptively changing the comparison range It is the seventh object to provide.
[0021]
Further, according to the present invention, if the compression rate in a certain predetermined range does not satisfy the predetermined value with respect to the target predetermined compression rate as a result of proceeding with the encoding, the comparison range is widened to increase the continuous pattern. It is an eighth object of the present invention to provide a data encoding device, a data encoding method, and a program thereof that can increase the probability of occurrence and increase the compression efficiency.
[0022]
Further, according to the present invention, if the compression rate in a certain predetermined range satisfies a predetermined value with respect to the target predetermined compression rate as a result of the encoding, it can be said that the comparison range is sufficient. At this time, it is a ninth object to provide a data encoding device, a data encoding method, and a program thereof that can speed up the comparison process by further narrowing the comparison range.
[0023]
[Means for Solving the Problems]
  In order to achieve the above object, the invention according to claim 1 is characterized in that, in a symbol sequence, M encoded symbol sequences adjacent to the current encoding time point, M symbol sequences before encoding, Are determined to match, and if it is determined that they match, whether the M symbol sequences before encoding and the M symbol sequences that are consecutive match the encoded symbol sequences Instead of a comparing means for repeatedly determining whether or not, a run determining means for counting the number of matches determined to be matched by the comparing means, and a plurality of symbol sequences determined to be matched by the comparing means, the value of M And an encoding means for encoding the number of matches.The data encoding apparatus further includes: a compression rate calculation unit that calculates a compression rate of a predetermined processing range; and a determination unit that determines whether or not the compression rate has achieved a target value. The upper limit value of the number of symbols to be compared by the comparing means is changed according to whether or not the determining means determines that the compression ratio has achieved the target value.A data encoding device characterized by that.
According to a second aspect of the present invention, in the data encoding device according to the first aspect, when the determination unit determines that the compression ratio has not achieved the target value, the comparison unit compares the compression rate. The upper limit value of the number of symbols is increased, and when it is determined that the compression ratio has achieved the target value, the upper limit value of the number of symbols to be compared by the comparing means is decreased.
[0024]
  Claim3The described invention is claimed.1 or 2The data encoding device according to claim 1, further comprising: a symbol generating unit that generates the symbol string from the encoding target data, and determining the number of bits per symbol based on a period of the encoding target data. And
[0025]
  Claim4The described invention is claimed.3In the data encoding device described above, the symbol creation unit includes a cutout unit that cuts out data of a plurality of bits from the encoding target data based on a cycle of the encoding target data, and M that extracts the data cut out by the cutout unit. conversion means for obtaining the symbol by conversion by the oveToFront method.
[0026]
  Claim5The described invention is claimed.4In the data encoding device described above, the symbol creation unit further includes a cycle detection unit that detects a cycle from the encoded data, and the cutout unit changes the number of cutout bits based on the detected cycle. It is characterized by that.
[0027]
  Claim6The described invention is claimed.Any one of 1 to 5In the data encoding device described above, the encoding unit encodes the value of M and the number of matches when the number of matches counted by the run determination unit is equal to or greater than a predetermined number, If the number of matches counted by the run determination means is less than a predetermined number, the symbol string before encoding is encoded.
[0028]
  Claim7The described invention is claimed.Any one of 1 to 6In the data encoding device described above, the encoding means includes the first symbol determined to be inconsistent with the value of M to be encoded and the number of matches, and the value of M encoded immediately before If the number of matches is equal between the number of matches and the first symbol determined not to match, encoding is performed without the number of matches.
[0031]
  Claim8The described invention determines whether or not the encoded M symbol sequences adjacent to the current encoding point in the symbol sequence coincide with the M symbol sequences before encoding. A comparison step that repeats the determination as to whether or not the M symbol sequences that are consecutive with the M symbol sequences that have been encoded coincide with the M symbol sequences that have been encoded, in addition to the M symbol sequences before the encoding. A run determination step for counting the number of matches determined to be matched by the comparison means, and encoding the value of M and the number of matches in place of a plurality of symbol strings determined to be matched by the comparison means An encoding step;A compression rate calculation step of calculating a compression rate of a predetermined processing range, a determination step of determining whether or not the compression rate has achieved a target value, and And changing the upper limit value of the number of symbols to be compared in the comparison step according to whether or not the compression step is determined to achieve the target value.This is a data encoding method.
[0032]
  Claim9The described invention determines whether or not the encoded M symbol sequences adjacent to the current encoding point in the symbol sequence coincide with the M symbol sequences before encoding. If it is determined, a comparison process that repeats determination of whether or not M consecutive symbol sequences match the encoded symbol sequence in addition to the M symbol sequences before encoding, A run determination process for counting the number of matches determined to be matched by the comparison means, and an encoding process for encoding the value of M and the number of matches in place of the plurality of symbol sequences determined to be matched by the comparison means And let the computer runA data encoding program, further comprising: a compression ratio calculation process for calculating a compression ratio of a predetermined processing range; a determination process for determining whether or not the compression ratio has achieved a target value; and the determination process And causing the computer to execute a process of changing the upper limit value of the number of symbols to be compared by the comparison process according to whether or not the compression rate is determined to have achieved the target value.This is a data encoding program.
[0034]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
[0035]
A first embodiment of the present invention will be described. First, the basic configuration is shown in the block diagram of FIG. The input data in FIG. 1 may be text data or binary data. Here, an example is shown in which image data subjected to halftone processing such as 1 bit / pixel dither processing is used as input data. First, the multi-bit cutout unit 101 cuts out input data as one data by collecting several bits. Next, the intermediate data creation unit 102 converts each data into N-bit data by a method such as MTF (MoveToFront) and outputs the data as intermediate data.
[0036]
Here, the meaning of converting to intermediate data is the reason for removing redundancy of input data. For example, when the chunk cut out by the multi-bit cutout unit 101 is relatively small, for example, when it is about 4 bits, there are at most 16 patterns of data to be cut out. In such a case, it is good. However, for example, when the number of bits to be cut out is 20 bits, the number is 2 to the 20th power.
[0037]
In this case, the generation pattern becomes enormous, which causes problems in terms of processing speed on software and hardware scale. However, the image data that has been subjected to the dither process originally has periodicity. If the period of the dither is 20 bits, if 20 bits are converted into one data based on the period, the distribution should be considerably biased. Using this, 20-bit data is converted into N bits having a relatively small number of bits by MTF (MoveToFront), and output as intermediate data.
[0038]
The MTF will be described with reference to FIG. Here, it is assumed that data input to the MTF is 8-bit character data. First, when there is an input symbol “A” as shown in (1) in FIG. 2, conversion is performed with the index number “2” of the dictionary storing “A” as an output value. Next, as shown in (2), the input symbol “A” is moved to the top of the dictionary to update the dictionary.
[0039]
As the processing proceeds, frequently appearing symbols gather at the position of a small index number, so that an input having an arbitrary distribution shape can be changed to a distribution having 0 as a peak. The number of registrations in this dictionary can be limited to an arbitrary number. When a symbol other than those registered is input, it can be handled by outputting it as “ESC code + raw data” and newly registering the symbol at the top of the dictionary.
[0040]
Returning to FIG. 1, intermediate data that has been subjected to processing such as MTF is input to the comparator 103 (hereinafter, this intermediate data is referred to as a character). The comparator 103 will be described with reference to FIGS. Assume that the buffer size is 16 characters (n = 16), and 8 characters have already been encoded (FIG. 3a). Eight lengths of characters starting from the encoding start position (dotted line position in the figure) in the buffer (right side of dotted line) and the left side of the dotted line immediately before the encoding start position of the encoded character string in the buffer Is compared with the character string of and the longest match is searched (step S1).
[0041]
In this case, “abc” is determined to be the longest match (step S1 / YES), but here, the head address, that is, “3” representing the character string length is temporarily stored without outputting the code. Next, the characters in the buffer are shifted to the left so that the three matched character strings are at the end of the encoded buffer portion, and the encoding start position is shifted (FIG. 3b). Looking at the next encoding start position, the character string “abc” is the longest match as before, and the address “3” is determined again.
[0042]
Here, in the run determination unit 104 shown in FIG. 1, it can be seen that it is the same value as compared with the address temporarily stored (step S2 / YES). Here, the run is incremented by 1 (step S3). Again, no sign is output. Next, it is shifted to the left by three characters as in the previous description (step S5). There is no matching character string between the current encoding start character string and the encoded character string (step S1 / NO). In this way, the characters that do not match the current run are output as codes only when they do not match (step S8). At this time, the start position of “abc”, that is, the character length “3”, the run “2”, and the code “d” are output (FIG. 3c). The details of the processing by the comparator 103 are shown in the flowchart of FIG.
[0043]
In this way, we want to generate a run to increase the compression efficiency. That is, the intermediate data is created based on the period of the original image so that the character string is repeated. For example, it is assumed that the data to be encoded is 8 bit / pixel data, and is dithered with a dither matrix size of 20 × 20 and subjected to halftone processing to 1 bit / pixel. At this time, the data after halftone processing can be expected to be repeated in 20 cycles in the X direction. When the period “20” is known, the multi-bit cutout unit 101 in FIG. 1 cuts the input data in units of 20 bits and passes it to the intermediate data creation unit 102. As a result, the similarity increases with the cut out data, so that the matching accuracy can be improved in the comparator 103.
[0044]
Further, the period of the encoding target data is automatically detected. FIG. 5 shows a processing block diagram. A cycle detection unit 501 is added to the head of the process shown in FIG. FIG. 6 shows an example of the period detection method. If such an automatic cycle detector 501 is provided, different cycles can be found even in one line depending on the determination range. For example, depending on the image processing, the image portion and the graphic portion may be dithered differently. That is, even if different periods occur in one line, it is possible to cope with it.
[0045]
Cutting out using the period of the data to be encoded as described above increases the compression effect, but of course, a predetermined effect can be obtained without such processing.
[0046]
Next, a second embodiment will be described. In the example shown in the first embodiment, the run determination unit 104 in FIG. 1 sets a threshold value for a run. As shown in FIG. 6, when a run is not created, assuming that there is a character string “ab”, and “a” has a 3-bit code and “b” has a 4-bit code, “ab” Becomes 7 bits.
[0047]
For example, if the run code is ESC2 code = 5 bits, address = 5 bits, and run length = 8 bits, 18 bits are used. When creating a run, since it is premised that the compression efficiency is higher than the normal one-character encoding, in this example, “ab” is 14 bits for 2 times and 21 bits for 3 times. The effect of creating a run cannot be obtained unless it is more than once. Therefore, a threshold value (A) that is the minimum value of the run is required.
[0048]
Next, an upper threshold (B) for the run length is set. That is, the number of bits representing the run length is limited. In general, run-length encoding of binary data that is subject to compression increases the compression efficiency the longer the run length, but if the bit length as a run code is fixed as in this case, the bit of the code representing the run length The length needs to be limited to some extent. This is because if only a short run length actually occurs, useless bits are output.
[0049]
In the above example, the lower threshold for the run length is determined from the run value only, but the number of bits actually generated is determined by the bit length of each character constituting the character string and the number of runs. Therefore, it is possible to set a threshold value at which the compression efficiency does not decrease considering all of them.
[0050]
Next, a third embodiment will be described with reference to FIG. Assume that the data to be encoded as shown in FIG. 7 is in the buffer. When this data is encoded by the processing shown in the first embodiment, “A code + ESC2 code + address (1) + run (3)” followed by “B code + ESC2 code + address (1)”. + Run (3) ”,“ C code + ESC2 code + address (1) + run (3) ”.
[0051]
However, as can be seen, these three types of codes have the same configuration, and only the character types are different. In order to encode this efficiently, the output of “BBB” is replaced with “B code + ESC3 code” in the present embodiment. Similarly, the next “CCC” is “C code + ESC3 code”. Thus, the compression efficiency can be further improved by creating two stages of runs. In this example, a sequence of one character is given, but it can also be applied to a case where three characters such as “ABCABCABCDEFDEFDEF” are repeated N times.
[0052]
Next, a fourth embodiment will be described. In the present embodiment, the comparator 103 in FIG. 1 adaptively changes the buffer size in the comparison process based on a certain reference. FIG. 8 shows a flowchart. When a certain predetermined processing range is set and the processing range ends (step S23 / YES), the compression rate (M) is calculated as the coding efficiency up to that point (step S24), and a predetermined target compression rate is achieved. (Step S25), if not achieved (step S25 / NO), the comparison range (= window size) is expanded (step S26).
[0053]
The window size possessed by the normal LZ method, etc., especially the buffer for storing the encoded data, requires a considerably large size. This is to increase the compression efficiency, but depending on the size, the processing speed and circuit scale are greatly affected.
[0054]
In the present invention, the buffer size used in the description from the first embodiment is originally assumed to be a very small size sufficient to satisfy the processing speed and the circuit scale, and the purpose is to extract local repetitions. Yes. Therefore, if there is actually a repetition of a period longer than the window size, this may not be detected. In order to avoid this, the window size is increased and the process proceeds. By setting an upper limit for the size to be expanded, it is possible to suppress a decrease in processing speed.
[0055]
On the other hand, when the target compression rate is achieved, the window size is reduced. By reducing the size, the processing speed of the comparator 103 can be improved. Similarly, it is not necessary to make it unnecessarily small in balance with the processing speed, and it is possible to set a limit that does not lower the compression efficiency.
[0056]
The above-described embodiment shows an example of a preferred embodiment of the present invention, and the present invention is not limited thereto, and various modifications can be made without departing from the scope of the invention. is there.
[0057]
The present invention can be realized by causing a computer to execute a program. The program is provided by being recorded on an optical recording medium, a magnetic recording medium, a magneto-optical recording medium, or a semiconductor IC recording medium, or is provided by being downloaded from a program server via a network such as FTP or HTTP. The
[0058]
【The invention's effect】
As is clear from the above description, the following effects can be obtained according to the present invention. Using the fact that the period of the data to be encoded is generated as a repetition of the data sequence, it can be efficiently compressed by run-length encoding it. In addition, since the encoded data immediately before the symbol string to be encoded is used as a comparison target, the processing speed can be improved and the hardware scale can be reduced by reducing the comparison range.
[0059]
  Also,BookAccording to the invention, for example, when the data subjected to the process of generating a cycle such as dither is the data to be encoded, by setting the encoding unit based on this cycle, the efficiency in obtaining the cycle of the original image is obtained. The period can be obtained well, that is, the comparison range can be made smaller, the processing speed can be improved, and the hardware scale can be reduced.
[0060]
  Also,BookAccording to the invention, it is possible to cope with a case where the period changes in the middle of the encoding target data (for example, in the case of image data, a mixture of dithers).
[0061]
  Also,BookAccording to the invention, by providing a lower limit threshold value for the continuous number, when encoding the continuous number, it is possible to prevent the compression rate from being lowered and to increase the encoding efficiency.
[0062]
  Also,BookAccording to the invention, by providing an upper limit threshold value for the continuous number, the bit length of the code can be limited when encoding the continuous number, and the encoding efficiency can be increased.
[0063]
  Also,BookAccording to the invention, when a continuous event occurs recursively, the compression efficiency can be increased by performing different coding on this event.
[0064]
  Also,BookAccording to the invention, if the difference is large with respect to the target predetermined compression rate as a result of the progress of encoding, it cannot be said that the optimum processing is performed. By adaptively changing the comparison range as a countermeasure against such problems, it is possible to obtain optimum conditions in terms of compression rate and processing speed.
[0065]
  Also,BookAccording to the invention, as a result of advancing the encoding, when the compression rate in a certain predetermined range does not satisfy the predetermined value with respect to the target predetermined compression rate, the comparison range is expanded to generate a continuous pattern. The establishment can be improved and the compression efficiency can be increased.
[0066]
  Also,BookAccording to the invention, as a result of advancing the encoding, if the compression rate in a certain predetermined range satisfies a predetermined value with respect to the target predetermined compression rate, it can be said that the comparison range is sufficient. At this time, since the comparison process can be reduced by further narrowing the comparison range, the comparison process can be speeded up.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a basic configuration of the present invention.
FIG. 2 is a diagram for explaining an MTF.
FIG. 3 is a diagram for explaining a comparator 103;
FIG. 4 is a flowchart for explaining processing of the comparator 103;
FIG. 5 is a block diagram showing a configuration of a variation of the present invention.
FIG. 6 is a diagram illustrating a configuration example for automatically detecting a cycle.
FIG. 7 is a diagram for explaining a third embodiment;
FIG. 8 is a flowchart for explaining a fourth embodiment;
[Explanation of symbols]
101 Multi-bit cutout unit
102 Intermediate data creation unit
103 comparator
104 Run determination unit
501 Period detector

Claims

In the symbol string, it is determined whether or not the encoded M symbol strings adjacent to the current encoding point coincide with the M symbol strings before encoding.
A comparison unit that repeats the determination of whether or not the M symbol sequences before encoding and the M symbol sequences that are consecutive match the encoded symbol sequence when it is determined that they match ; instead before symbol comparison means run judgment means for counting the number of matches that have been determined to match, prior symbol plurality of symbol columns which are determined to match the comparison means, for coding the match count value of the M a chromatic data encoding device and the encoding means, and
Furthermore, a compression rate calculation means for calculating a compression rate of a predetermined processing range;
Determining means for determining whether or not the compression ratio has achieved a target value,
A data encoding apparatus , wherein the upper limit value of the number of symbols to be compared by the comparing means is changed according to whether or not the determining means determines that the compression rate has achieved a target value .

By the determination means,
If it is determined that the compression ratio has not reached the target value, the comparison means increases the upper limit value of the number of symbols to be compared,
2. The data encoding apparatus according to claim 1, wherein when it is determined that the compression ratio has achieved the target value, the upper limit value of the number of symbols to be compared by the comparing means is lowered.

A symbol creating means for creating the symbol string from the encoding target data;
3. The data encoding apparatus according to claim 1, wherein the number of bits per symbol is determined based on a period of the encoding target data.

The symbol creating means includes:
Based on the cycle of the encoding target data, a cutting means for cutting out a plurality of bits of data from the encoding target data;
4. The data encoding apparatus according to claim 3 , further comprising conversion means for converting the data cut out by the cut-out means by the MoveToFront method to obtain the symbol.

The symbol creating means includes:
Further comprising a period detecting means for detecting a period from the encoded data;
5. The data encoding apparatus according to claim 4 , wherein the cutout unit changes the number of cutout bits based on the detected cycle.

The encoding means includes
If the number of matches counted by the run determination means is greater than or equal to a predetermined number, encode the value of M and the number of matches,
If the match count counted by the run-determining means is less than a predetermined number of times, according to any one of claims 1-5, characterized in that for encoding the symbol string before encoding Data encoding device.

The encoding means includes
The first symbol determined to be inconsistent with the value of M to be encoded and the number of matches, and the first symbol determined to be inconsistent with the value of M encoded immediately before and the number of matches 7. The data encoding device according to claim 1, wherein the encoding is performed when the number of matches is equal to each other .

In the symbol string, it is determined whether or not the encoded M symbol strings adjacent to the current encoding point coincide with the M symbol strings before encoding.
A comparison step of repeatedly determining whether or not consecutive M symbol sequences match the encoded M symbol sequences in addition to the M symbol sequences before encoding when it is determined that they match If, before symbol run determination step of counting the number of matches that have been determined to match the comparison means, before symbol instead plurality of symbol columns which are determined to match the comparison means and the match count value of the M An encoding step for encoding, and a data encoding method comprising:
Furthermore, a compression ratio calculation step for calculating a compression ratio of a predetermined processing range;
A determination step of determining whether or not the compression ratio has achieved a target value;
Changing the upper limit value of the number of symbols to be compared in the comparison step according to whether or not the compression rate is determined to have achieved the target value in the determination step. Method.

In the symbol string, it is determined whether or not the encoded M symbol strings adjacent to the current encoding point coincide with the M symbol strings before encoding.
A comparison process for repeatedly determining whether or not consecutive M symbol sequences match the encoded symbol sequence in addition to the M symbol sequences before encoding when it is determined that they match . instead before symbol comparison means by the run determination processing for counting the number of matches that have been determined to match, before symbol plurality of symbol columns which are determined to match the comparison means, for coding the match count value of the M A data encoding program for causing a computer to execute encoding processing ,
Furthermore, a compression ratio calculation process for calculating a compression ratio of a predetermined processing range;
A determination process for determining whether or not the compression ratio has achieved a target value;
And causing the computer to execute a process of changing an upper limit value of the number of symbols to be compared by the comparison process according to whether or not the compression rate is determined to have achieved the target value by the determination process. A data encoding program.