JP3603099B2

JP3603099B2 - Method and apparatus for lossless encoding of data

Info

Publication number: JP3603099B2
Application number: JP08395794A
Authority: JP
Inventors: 稔明岡山
Original assignee: Seiko Epson Corp
Current assignee: Seiko Epson Corp
Priority date: 1994-03-29
Filing date: 1994-03-29
Publication date: 2004-12-15
Anticipated expiration: 2019-12-15
Also published as: JPH07271552A

Description

【０００１】
【産業上の利用分野】
この発明は、データを可逆符号化する方法および装置に関する。
【０００２】
【従来の技術】
近年では、コンピュータにおいて処理されるデータ量の急激な増大に対応するためにデータ圧縮技術が広く用いられるようになってきている。データ圧縮技術には、圧縮前のデータが元通りに復元される可逆符号化と、元通りに復元されない非可逆符号化とがある。非可逆符号化は、主として画像データの圧縮に用いられている。一方、可逆符号化は、プログラムデータ、テキストデータ、辞書データなどの種々のデータの圧縮に用いられている。
【０００３】
可逆符号化の対象となるデータの一種類としてフォントデータがある。フォントデータは、１セットのフォントに含まれる文字の形状を表わすデータである。近年では、パーソナルコンピュータに数種類のフォントデータがロードされ、ユーザが所望のフォントを選択して使用できるようになっている場合が多い。
【０００４】
【発明が解決しようとする課題】
ところが、フォントデータに対してテキストデータやプログラムデータと同じ可逆符号化方法で圧縮を行なうと、圧縮率があまり高くならない場合があった。これは、フォントデータがテキストデータやプログラムデータとは異なる特有のデータ配列を有しているためであると考えられる。
【０００５】
この発明は、従来技術における上述の課題を解決するためになされたものであり、フォントデータを効率的に圧縮することのできる可逆符号化方法および装置を提供することを目的とする。
【０００６】
【課題を解決するための手段および作用】
上述の課題を解決するため、この発明の請求項１に記載されたデータの可逆符号化方法は、（Ａ）入力データの連続した２バイトを調べ、前記２バイトの後端バイトと前記２バイトの直前の１バイトとが互いに等しく、かつ、前記２バイトの先頭バイトが所定の値以下であるとの符号化条件を満足するか否かを判定する工程と、（Ｂ）前記符号化条件を満足する場合には前記２バイトを所定の符号化モードで符号化する工程と、を備えることを特徴とする。
【０００７】
フォントデータでは、２バイトの後端バイトとその２バイトの直前の１バイトとが互いに等しく、かつ、その２バイトの先頭バイトが所定の値以下である場合が多い。そこで、このような符号化条件を満足する場合に、フォントデータに適した所定の符号化モードで符号化することによって、フォントデータを効率的に圧縮することができる。
【０００８】
請求項２に記載されたデータの可逆符号化方法では、前記工程（Ｂ）は、（Ｃ）前記２バイトを１６ビット未満の符号語で符号化する工程、を含む。
【０００９】
こうすれば、圧縮率を１以上にすることができる。
【００１０】
請求項３に記載されたデータの可逆符号化方法では、前記工程（Ｃ）は、前記所定の符号化モードで符号化されたことを示す８ビット以下の接頭語と、前記先頭バイトの下位７ビットとを組み合わせることによって符号化データを作成する工程、を含む。
【００１１】
こうすれば、１５ビット以下で２バイトを符号化することができる。
【００１２】
請求項４に記載されたデータの可逆符号化方法は、（Ａ）入力データの連続した２バイトが、所定の第１の符号化モードで符号化するための第１の符号化条件を満足するか否かを判定する工程と、（Ｂ）前記第１の符号化条件が成立しない場合には、前記２バイトの後端バイトと前記２バイトの直前の１バイトとが互いに等しく、かつ、前記２バイトの先頭バイトが所定の値以下であるとの第２の符号化条件を満足するか否かを判定する工程と、（Ｃ）前記第１の判定条件が成立する場合には少なくとも前記２バイトを圧縮率が最も高い前記第１の符号化モードで符号化し、前記第２の符号化条件が成立する場合には前記２バイトを圧縮率が中程度の第２の符号化モードで符号化し、前記第１と第２の符号化条件がいずれも成立しない場合には少なくとも前記先頭バイトを圧縮率が最も低い第３の符号化モードで符号化する工程とを備えることを特徴とする。
【００１３】
符号化条件で符号化モードを選択するようにすれば、２バイトのデータの内容に応じた適切な符号化モードを選択することができる。特にその２バイトがフォントデータに特有の第２の符号化条件を満足する場合には、フォントデータに適した第２の符号化モードで符号化することによって、フォントデータを効率的に圧縮することができる。
【００１４】
請求項７に記載したデータの可逆符号化方法では、前記第１の符号化モードによる符号化工程は、少なくとも前記２バイトを含む第１のバイト列と一致する最長の第２のバイト列を前記入力データ内で探索する工程と、前記第１の符号化モードで符号化されたことを示す第１の符号語と、前記第１と第２のバイト列相互の距離を示す第２の符号語と、前記第１のバイト列の長さを示す第３の符号語とを組み合わせることによって符号化データを作成する工程と、を含む。
【００１５】
第１の符号化モードでは、一致する最長バイト列を符号化するので、圧縮率を高くすることができる。
【００１６】
請求項８に記載のデータの可逆符号化方法では、前記第３の符号化モードは、前記第３の符号化モードで符号化されたことを示す接頭語と、前記先頭バイトの下位７ビットとを組み合わせることによって符号化データを得る工程、を含み、前記第３の符号化モードにおける前記接頭語は、前記先頭バイトの１６進数表記が７Ｆ以下である場合に使用される１ビットの第１の接頭語と、前記先頭バイトの１６進数表記が８０以上である場合に使用される複数ビットの第２の接頭語とのいずれか一方から選択される。
【００１７】
前記第３の符号化モードでは、２バイト中の先頭バイトの１６進数表記が７Ｆ以下の場合に８ビットで符号化するので、圧縮率が過度に低くならない。
【００１８】
請求項９に記載されたデータの可逆符号化方法は、入力データ中において、１６進数表記が７Ｆ以下である第１のタイプのバイトデータの出現率が所定の条件を満足するか否かを判定し、前記第１のタイプのバイトデータの出現率が所定の条件を満足する場合には、（Ａ）前記入力データ中の連続した２バイトが、圧縮率が１を超える符号化モードで符号化するための符号化条件を満足するか否かを判定する工程と、（Ｂ）前記符号化条件が成立しない場合には、前記２バイトの少なくとも先頭バイトを、圧縮率が１以下の符号化モードで符号化する工程と、を実行することを特徴とする。
【００１９】
１６進数表記が７Ｆ以下のバイトデータの出現率が所定の条件を満たす場合に工程（Ａ），（Ｂ）を含む符号化を実行するようにすれば、７Ｆ以下のバイトデータを符号化する場合の圧縮率が過度に低くならないように符号語を設定することが可能である。
【００２０】
請求項１０に記載されたデータの可逆符号化方法では、前記工程（Ｂ）は、圧縮率が１以下の前記符号化モードで符号化されたことを示す接頭語と、前記先頭バイトの下位７ビットとを組み合わせることによって符号化データを得る工程と、を含み、前記接頭語は、前記先頭バイトの１６進数表記が７Ｆ以下である場合に使用される１ビットの第１の接頭語と、前記先頭バイトの１６進数表記が８０以上である場合に使用される複数ビットの第２の接頭語とのいずれか一方から選択される。
【００２１】
７Ｆ以下のバイトデータは８ビットで符号化できるので、符号化時の圧縮率が過度に低くなることがない。
【００２２】
請求項１１に記載されたデータの可逆符号化装置は、入力データの連続した２バイトを調べ、前記２バイトの後端バイトと前記２バイトの直前の１バイトとが互いに等しく、かつ、前記２バイトの先頭バイトが所定の値以下であるとの符号化条件を満足するか否かを判定する判定手段と、前記符号化条件を満足する場合には前記２バイトを所定の符号化モードで符号化する符号化手段と、を備えることを特徴とする。
【００２３】
また、請求項１４に記載されたデータの可逆符号化装置は、入力データの連続した２バイトが、所定の第１の符号化モードで符号化するための第１の符号化条件を満足するか否かを判定する第１の判定手段と、前記第１の符号化条件が成立しない場合には、前記２バイトの後端バイトと前記２バイトの直前の１バイトとが互いに等しく、かつ、前記２バイトの先頭バイトが所定の値以下であるとの第２の符号化条件を満足するか否かを判定する第２の手段と、前記第１の判定条件が成立する場合には少なくとも前記２バイトを前記第１の符号化モードで符号化し、前記第２の符号化条件が成立する場合には前記２バイトを第２の符号化モードで符号化し、前記第１と第２の符号化条件がいずれも成立しない場合には少なくとも前記先頭バイトを第３の符号化モードで符号化する符号化手段と、を備えることを特徴とする。
【００２４】
請求項１９に記載されたデータの可逆符号化装置は、入力データ中において、１６進数表記が７Ｆ以下である第１のタイプのバイトデータの出現率が所定の条件を満足するか否かを判定する判定手段と、前記第１のタイプのバイトデータの出現率が所定の条件を満足する場合において符号化を行なう符号化手段であって、前記入力データ中の連続した２バイトが、圧縮率が１を超える符号化モードで符号化するための符号化条件を満足するか否かを判定する第１の手段と、前記符号化条件が成立しない場合には、前記２バイトの少なくとも先頭バイトを、圧縮率が１以下の符号化モードで符号化する第２の手段と、を含む符号化手段と、を備えることを特徴とする。
【００２５】
請求項２１に記載したデータの可逆符号化方法は、入力データ中において、連続した３バイトの両端のバイトデータが互いに一致する割合が所定の条件を満足するか否かを調べ、前記一致する割合が所定の条件を満足する場合には、（Ａ）前記入力データ中の連続した２バイトが、圧縮率が１を超える符号化モードで符号化するための符号化条件を満足するか否かを判定する工程と、（Ｂ）前記符号化条件が成立しない場合には、前記２バイトの少なくとも先頭バイトを、圧縮率が１以下の符号化モードで符号化する工程と、を実行することを特徴とする。
【００２６】
フォントデータでは、３バイトの両端のバイトデータが一致する割合が高いので、この一致割合が所定の条件を満たす場合に工程（Ａ），（Ｂ）を含む符号化を実行するようにすれば、フォントデータに適した符号化を行なうことが可能である。
【００２７】
請求項２２に記載した方法では、前記工程（Ｂ）は、圧縮率が１以下の前記符号化モードで符号化されたことを示す接頭語と、前記先頭バイトの下位７ビットとを組み合わせることによって符号化データを得る工程と、を含み、前記入力データにおいて１６進数表記が７Ｆ以下である第１のバイトデータ群の出現率と１６進数表記が８０以上である第２のバイトデータ群の出現率とを比較し、比較的出現率の高いバイトデータ群の各バイトデータの符号化においては１ビットの第１の接頭語を使用するとともに、比較的出現率の低いバイトデータ群の各バイトデータの符号化においては複数ビットの第２の接頭語を使用する。
【００２８】
７Ｆ以下のバイトデータ群と８０以上のバイトデータ群のうちで、比較的出現率の高いバイトデータ群の各バイトデータが８ビットで符号化できるので、符号化時の圧縮率が過度に低くなることがない。
【００２９】
【実施例】
Ａ．装置の構成：
図１は、この発明の一実施例を適用する情報処理装置のハードウェア構成を示すブロック図である。この情報処理装置は、パーソナルコンピュータシステムとして構成されたものであり、図示するように、ＣＰＵ１０１を中心にバスにより相互に接続された次の各部を備えている。
【００３０】
ＲＯＭ１０４：モニタプログラム等を記憶する読み出し専用メモリ
ＲＡＭ１０５：主記憶を構成する読み出し・書き込み可能なメモリ
ＰＩＣ１１２：各種の割込に優先順位を付けて制御する割込コントローラ
マウスインタフェース１１５：２ボタンマウス１１４とのデータ等のやり取りを司るインタフェース
キーボードインタフェース１１８：キーボード１１７からのキー入力を司るインタフェース
ＦＤＣ１２１：フレキシブルディスクドライブ（ＦＤＤ）１２０を制御するフレキシブルディスクコントローラ
ＨＤＣ１２５：ハードディスクドライブ（ＨＤＤ）１２４を制御するハードディスクコントローラ
ＣＲＴＣ１２９：必要なデータ等を表示するＣＲＴ１２８への信号出力を制御するＣＲＴコントローラ
プリンタインタフェース１３１：プリンタ１３０へのデータの出力を制御するインタフェース。
【００３１】
図２は、可逆符号化によるデータの圧縮と伸長とを行なう圧縮伸長用デバイスドライバ２００の構成を示す機能ブロック図である。この圧縮伸長用デバイスドライバ２００は、ＲＡＭ１０５に格納されたプログラムをＣＰＵ１０１が実行することによって実現されている。
【００３２】
圧縮伸長用デバイスドライバ２００は、圧縮対象となる１クラスタ分のデータを記憶するクラスタバッファ２０２と、可逆符号化によるデータの圧縮を行なう圧縮部２０４と、圧縮データを記憶する圧縮データバッファ２０６と、圧縮データをハードディスク１２３に書き込む書込制御部２０８とを備えている。なお、１クラスタは所定のデータサイズであり、例えば８Ｋバイトである。
【００３３】
圧縮部２０４は、後述する符号化タイプを判定する符号化タイプ判定部２１０と、ＬＺ（Ｌｅｍｐｅｌ−Ｚｉｖ）方式の符号化を行なうＬＺ符号化部２１２と、フォントタイプの符号化を行なうフォント符号化部２１４と、無圧縮タイプの符号化を行なう無圧縮符号化部２１６とを備えている。以下ではまず、ＬＺ符号化部２１２とフォント符号化部２１４と無圧縮符号化部２１６によって行なわれる３つのタイプの符号化について説明する。なお、この実施例においては、「符号化タイプ」と「符号化モード」は同じ意味である。
【００３４】
図３は、ＬＺタイプの符号化方法を示す説明図である。この実施例で使用するＬＺタイプの符号化はレンペル−ジフ（Ｌｅｍｐｅｌ−Ｚｉｖ）方式の符号化であり、特にスライド辞書法と呼ばれる符号化方法を使用している。図３（Ａ）に示す例では、被圧縮データがバイト列「７１ｈ３Ａｈ３Ｂｈ …」を含んでいる。なお、この明細書において、付加記号「ｈ」は１６進数表記であることを示している。
【００３５】
ＬＺタイプの符号化は、２バイト以上の連続したバイト列と同じバイト列が以前に出現しているか否かを調べ、以前に同じバイト列があれば、一致する最長のバイト列を調べる。そして、現在のバイト列から以前のバイト列までの距離（オフセット）と、一致した最長のバイト列のバイト数（一致長）とによって、現在のバイト列を符号化する。例えば図３（Ａ）の例では、実線の下線を付した２バイトのバイト列「３Ａｈ３Ｂｈ」が、破線の下線を付した２バイトのバイト列と同じである。そこで、実線の下線を付した２バイトのバイト列「３Ａｈ３Ｂｈ」の圧縮データは、ＬＺタイプの符号化データであることを示す接頭語と、一致したバイト列同士の距離を表わす符号語ＯＦＦＳＥＴ（３）と、一致長を表わす符号語ＬＥＮＧＴＨ（２）と、の３つの符号語の組み合わせによって表わされる。なお、ＯＦＦＳＥＴ（３）の括弧内の数値は距離を示すバイト数であり、ＬＥＮＧＴＨ（２）の括弧内の数値は一致長を示すバイト数である。各符号化タイプの圧縮データのフォーマットについては後述する。
【００３６】
図３（Ｂ）の例では、オフセットが３バイトで一致長が３バイトなので、接頭語と、符号語ＯＦＦＳＥＴ（３）と、符号語ＬＥＮＧＴＨ（３）の組み合わせによって実線の下線を付した３バイトのバイト列が符号化される。
【００３７】
図３（Ｃ）の例では、第２バイト以降の５バイトが同一である。この場合には、実線の下線を付した第３バイト以降の４バイトが、第２バイト以降の４バイトと一致することになる。従って、オフセットは１バイト、一致長は４バイトとなる。
【００３８】
以上のように、ＬＺタイプの符号化方法では、一致するバイト列が以前に存在している場合に、一致する最長のバイト列を符号化することによってデータを圧縮するので、同一のバイト列が繰り返し出現するようなデータを圧縮する場合に適している。
【００３９】
図４は、フォントタイプの符号化方法を示す説明図である。フォントデータでは、連続した３バイトの両端が互いに等しく、かつ、中央バイトが８０ｈ未満（すなわち７Ｆ以下）である場合が多いという特徴がある。この特徴は、特にトゥルータイプ（ＴｒｕｅＴｙｐｅ）フォント（ＴｒｕｅＴｙｐｅはアップルコンピュータ社の商標）において顕著である。
【００４０】
図４において、先頭の１バイト「７１ｈ」と下線を付した２バイト「３Ａｈ７１ｈ」は、上述の特徴を有しているので、下線が付されている２バイトがフォントタイプの符号化によって圧縮される。フォントタイプの符号化の対象となる２バイト「３Ａｈ７１ｈ」の直前の１バイト「７１ｈ」は、無圧縮タイプの符号化で符号化されていても良く、また、ＬＺタイプの符号化によって符号化されていても良い。
【００４１】
フォントタイプの圧縮データは、フォントタイプであることを示す接頭語と、符号化の対象となる２バイト中の先頭バイト「３Ａｈ」の下位７ビットとで構成される。なお、「３Ａｈ」の下位７ビットのみを使用すれば良い理由は、フォントタイプの符号化が行なわれる場合に先頭バイトの１６進数表記が８０ｈ未満であることが条件とされるので、先頭バイトの最上位ビット（ＭＳＢ）は常に０だからである。なお、フォントタイプであることを示す接頭語が付されている場合には、その直前に存在する１バイトと、フォントタイプで符号化された２バイト中の後端バイトが同一であることが解るので、後端バイトを示す情報は不要である。
【００４２】
この実施例におけるＬＺタイプの符号化では、２バイト以上の連続したバイト列が以前のバイト列と同一である場合に符号化される。また、フォントタイプの符号化でも、連続した２バイトが上記の所定の特徴を満足する場合に符号化される。従って、どの符号化タイプで符号化するかを判定する場合には、連続した２バイトを調べて３つの符号化タイプの１つを選択すればよい。
【００４３】
図５は、この実施例で用いられる各符号化タイプの圧縮データのフォーマットを示す説明図である。無圧縮タイプの符号化による圧縮データには、タイプ１（図５（Ａ））とタイプ２（図５（Ｂ））が存在する。無圧縮タイプ１は、圧縮前の最上位ビット（ＭＳＢ）が０であるバイトデータ（００ｈ〜７Ｆｈ）の圧縮データである。また、無圧縮タイプ２は、圧縮前の最上位ビット（ＭＳＢ）が１であるバイトデータ（８０ｈ〜ＦＦｈ）の圧縮データである。図５（Ａ），（Ｂ）に示すように、圧縮前のバイトデータの最上位ビットの値によって無圧縮タイプの圧縮データの接頭語を変えているので、接頭語の後ろに付加するデータは、圧縮前のデータの下位７ビットのみでよい。なお、無圧縮タイプ１の圧縮データでは、接頭語を含めた８ビットで圧縮前の１バイトが符号化されている。従って、無圧縮タイプ１の圧縮率は１．０である。無圧縮タイプ２の接頭語は４ビットなので、その圧縮率は約０．７である。
【００４４】
通常の符号化方法において、１バイトのデータを無圧縮で符号化した符号化データは、無圧縮を示す１ビット以上の接頭語と、元の８ビットとによって構成されている。従って、その符号化データは９ビット以上のビット数を有しており、圧縮率は１未満である。これに対して上述の無圧縮タイプ１の符号化では、圧縮率が１．０なので、圧縮率は通常の無圧縮の符号化データよりも高い。従って、８０ｈ未満のバイトがある程度以上に含まれているデータに対してこの実施例の符号化方法を適用すれば、通常の符号化方法に比べて圧縮率を高めることが可能である。
【００４５】
ＬＺタイプの符号化による圧縮データにもタイプ１（図５（Ｃ））とタイプ２（図５（Ｄ））が存在する。ＬＺタイプ１はオフセットの値が２５５（１６進表記でＦＦｈ）以下の場合に使用される圧縮データであり、ＬＺタイプ２はオフセットの値が２５６（１６進数で１００ｈ）以上の場合に使用される圧縮データである。なお、ＬＺタイプ１とＬＺタイプ２では、接頭語が異なるだけでなく、オフセットを表わす符号語ＯＦＦＳＥＴも異なっている。例えば、ＬＺタイプ２のオフセットの符号語は８ビットで表わされているのに対して、ＬＺタイプ１のオフセットの符号語は９ビット以上の所定数のビットで表わされている。なお、ＬＺタイプ１，２の一致長の符号語ＬＥＮＧＴＨは、ハフマン符号化やワイル符号化などの符号化テーブルに従って符号化されている。
【００４６】
図５（Ｅ）に示すように、フォントタイプの圧縮データは、フォントタイプであることを示す接頭語「１１１０」と、先頭バイトの下位７ビットで構成されている。
【００４７】
図５に示す各符号化タイプは互いに接頭語が異なっており、また、接頭語は一意にかつ瞬時に復号可能な符号語である。従って、圧縮データを復号する際に、接頭語を調べるだけで瞬時に各符号化タイプを判定することができ、各符号化タイプに従って復号することが可能である。
【００４８】
図６は、符号化処理の手順を示すフローチャートである。ステップＳ１では、クラスタバッファ２０２に格納された被圧縮データの中から連続した２バイトが符号化タイプの判定対象として抽出される。２バイトを符号化タイプの判定対象とするのは、上述したように、ＬＺタイプとフォントタイプの符号化において、２バイト以上の連続したバイト列が符号化されるからである。
【００４９】
ステップＳ２では、ＬＺタイプの符号化条件を満足するか否かが判断される。ＬＺタイプの符号化条件Ｃ１は、次の通りである。
符号化条件Ｃ１：判定対象の２バイトと同一のバイト列が以前に存在する。
【００５０】
上記の条件Ｃ１が成立する場合には、ステップＳ４においてＬＺタイプの符号化が実行される。フォントタイプの符号化では、図３において説明したように、２バイト以上の連続したバイト列が符号化される。符号化条件Ｃ１が成立しない場合には、ステップＳ３においてフォントタイプの符号化条件を満足するか否かが判断される。フォントタイプの符号化条件Ｃ２は、次の通りである。
符号化条件Ｃ２：判定対象の２バイトの後端バイトが、対象２バイトの直前の１バイトと等しく、かつ、対象２バイトの先頭バイトが８０ｈ未満である。
【００５１】
上記の符号化条件Ｃ２が成立する場合には、ステップＳ５においてフォントタイプの符号化が実行される。フォントタイプの符号化では２バイトが符号化される（図４）。符号化条件Ｃ２が成立しない場合には、ステップＳ６において、判定対象の２バイトの内の先頭バイトが無圧縮タイプで符号化される。無圧縮タイプの符号化では、１バイトが符号化されるだけである。
【００５２】
ステップＳ７では、クラスタバッファ２０２に格納された被圧縮データの全てが符号化されたか否かが判断され、終了していなければステップＳ１に戻る。なお、ステップＳ１〜Ｓ３，Ｓ７は符号化タイプ判定部２１０（図２）によって実行され、ステップＳ４，Ｓ５，およびＳ６はＬＺ符号化部２１２、フォント符号化部２１４、および無圧縮符号化部２１６によってそれぞれ実行される。
【００５３】
図６の手順は次のように言い換えることも可能である。すなわち、ＬＺタイプの符号化は、３つの符号化タイプの中で圧縮率が最も高いので、ＬＺタイプの符号化条件Ｃ１が成立する場合にはＬＺタイプの符号化が行なわれる。また、フォントタイプの符号化は、３つの符号化タイプの中で圧縮率が中程度なので、ＬＺタイプの符号化条件Ｃ１が成立しないが、フォントタイプの符号化条件Ｃ２が成立する場合にはフォントタイプの符号化が行なわれる。そして、無圧縮タイプの符号化は、圧縮率が最も低いので、ＬＺタイプとフォントタイプで符号化できない場合にのみ無圧縮タイプで符号化される。なお、ＬＺタイプとフォントタイプの符号化は圧縮率が１を超えるのに対して、無圧縮タイプの符号化は圧縮率が１以下である。
【００５４】
図７と図８は、実施例を適用して入力データを符号化する一例を示す説明図である。図７（Ａ）は被圧縮データのバイト列を示している。バイト列の最初の１バイトは、ＬＺタイプでもフォントタイプでも圧縮できないので、図７（Ｂ）に示すように無圧縮タイプで符号化される。第１バイト「８５ｈ」は、８０ｈ以上なので、無圧縮タイプ２で圧縮される。
【００５５】
第２，第３バイト「３Ａｈ８５ｈ」は上述のフォントタイプの符号化条件Ｃ２を満足するので、図７（Ｃ）に示すようにフォントタイプで符号化される。
【００５６】
第４，第５バイト「３ＢｈＥ０ｈ」はＬＺタイプの符号化条件Ｃ１を満足せず、また、フォントタイプの符号化条件Ｃ２も満足しない。従って、図７（Ｄ）に示すように、第４バイト「３Ｂｈ」のみが無圧縮タイプ１で符号化される。
【００５７】
第５，第６バイト「Ｅ０ｈ３Ｂｈ」も２つの符号化条件Ｃ１，Ｃ２をいずれも満足しないので、図８（Ａ）に示すように、第５バイト「Ｅ０ｈ」のみが無圧縮タイプ２で符号化される。また、第６，第７バイト「３Ｂｈ３Ａｈ」も２つの符号化条件Ｃ１，Ｃ２をいずれも満足しないので、図８（Ｂ）に示すように、第６バイト「３Ｂｈ」のみが無圧縮タイプ１で符号化される。
【００５８】
第７，第８バイト「３Ａｈ８５ｈ」は第２，第３バイトと同じなので、ＬＺタイプで符号化される。ＬＺタイプの符号化では、一致する最長のバイト列さが調べられる。図８（Ｃ）の例では、一致する最長のバイト列は３バイトなので、第７〜第９バイトの３バイトがＬＺタイプで符号化される。なお、図８（Ｃ）の場合のオフセットは５バイトであり、２５５バイト以下なのでＬＺタイプ１の符号化が行なわれている。
【００５９】
第１０バイト「ＦＦｈ」は、２つの符号化条件Ｃ１，Ｃ２をいずれも満足しないので、図８（Ｄ）に示すように、無圧縮タイプ２で符号化される。なお、被圧縮データの符号化が終了した場合には、図示しない所定の終了コードが圧縮データの終端に付加される。
【００６０】
上記実施例では、被圧縮データの連続した２バイトが判定条件Ｃ１またＣ２を満足するか否かに応じて符号化タイプを判別していたので、被圧縮データの局所的なデータ構造に応じて適切な符号化タイプを設定することができ、この結果、圧縮率を向上させることができるという利点がある。
【００６１】
なお、上記実施例では被圧縮データをフォントデータ用の可逆符号化方法で符号化する場合について説明したが、与えられた被圧縮データをフォントデータ用の可逆符号化方法で圧縮するか、他の可逆符号化方法で圧縮するかを事前に判定するようにすることも可能である。
【００６２】
例えば、フォントデータのファイル名の拡張子は、「ＴＴＦ」などの数種類に限定されていることが多い。そこで、データ圧縮を行なう際に被圧縮データファイルの拡張子を調べ、拡張子が「ＴＴＦ」などの予め登録された拡張子名と一致した場合には、上述のフォントデータの可逆符号化方法を適用するようしてもよい。
【００６３】
なお、上述の無圧縮タイプ１の符号化では符号化データが８ビットですむので、通常の符号化方法による場合よりも無圧縮タイプの符号化データの圧縮率が高いという利点がある。従って、８０ｈ未満のバイトデータの出現率がある程度以上の入力データに対しては、上述の実施例によるフォントデータの符号化方法を適用すれば、通常の符号化方法に比べて符号化データの圧縮率を高めることが可能である。このような考え方に従ってフォントデータの符号化方法を適用すべきか否かを判断する方法としては種々のものが考えられる。例えば、被圧縮データの先頭から一定バイト数（すなわち、入力データの少なくとも一部）を調べ、８０ｈ未満のバイト数が所定の値以上の場合にフォントデータの符号化方法を適用するようにしてもよい。また、被圧縮データを間引きしつつ一定バイト数を調べ、８０ｈ未満のバイト数が所定の値以上の場合にフォントデータの符号化方法を適用するようにしてもよい。
【００６４】
なお、８０ｈ以上のバイトデータが比較的多い場合には、８０ｈ以上のデータのための接頭語を１ビット（例えば「０」）とし、８０ｈ未満のデータには２ビット以上の接頭語（例えば「１１１１」）を用いるようにしてもよい。換言すれば、入力データの少なくとも一部を調べて、１６進数表記が８０ｈ未満のバイトデータと８０ｈ以上のバイトデータの数（または出現率）を調べ、比較的出現率の高い方に１ビットの接頭語を使用し、比較的出現率の低い方２ビット以上の接頭語を使用するようにしてもよい。
【００６５】
なお、フォントデータでは連続した３バイトの両端のバイトが同一であるという特徴があるので、この特徴を利用して、１つおきの２バイトが一致する割合が一定以上であればフォントデータの符号化方法を適用するようにすることも可能である。この場合には、入力データの少なくとも一部について１つおきの２バイトが一致する割合を調べ、その結果に応じて、適用すべき符号化方法を選択すればよい。
【００６６】
なお、この発明は上記実施例に限られるものではなく、その要旨を逸脱しない範囲において種々の態様において実施することが可能であり、例えば次のような変形も可能である。
【００６７】
（１）図９は、図６の処理手順にステップＳ１０とＳ１１を付加したものである。ステップＳ１０において、ＬＺタイプで符号化する際のオフセットが２５６以上であり、かつ、一致長が２に等しい場合には、ステップＳ１１に移行し、符号化の対象となる２バイトがフォントタイプの判定条件Ｃ２を満足するか否かが判断される。フォントタイプの判定条件Ｃ２が成立する場合には、符号化対象の２バイトはフォントタイプで符号化され、そうでない場合にはＬＺタイプで符号化される。このようにするのは、ステップＳ１０，Ｓ１１の条件をいずれも満足する場合には、ＬＺタイプで符号化するよりもフォントタイプで符号化した方が符号化データの圧縮率が高くなるからである。
【００６８】
このように、ＬＺタイプの符号化判定条件Ｃ１を満足する２バイトのバイト列に対しても、フォントタイプで符号化した方が符号化データの圧縮率が高くなる場合には、フォントタイプで符号化するようにしてもよい。
【００６９】
（２）上記実施例におけるＬＺ方式の符号化方法の代わりに、ハフマン符号化や算術符号化などの種々の可逆符号化方法を使用することも可能である。
【００７０】
【発明の効果】
以上説明したように、請求項１および１１に記載された発明によれば、入力データが所定の符号化条件を満足する場合に、フォントデータに適した所定の符号化モードで符号化するので、フォントデータを効率的に圧縮することができるという効果がある。
【００７１】
請求項２、５、１２および１５に記載された発明によれば、圧縮率を１以上にすることができるという効果がある。
【００７２】
請求項３、６、１３および１６に記載された発明によれば、１５ビット以下で２バイトを符号化することができるという効果がある。
【００７３】
請求項４および１４に記載された発明によれば、２バイトのデータの内容に応じた適切な符号化モードを選択することができ、特に、その２バイトがフォントデータに特有の第２の符号化条件を満足する場合において効率的に圧縮することができるという効果がある。
【００７４】
請求項７および１７に記載された発明によれば、第１の符号化モードにおいて、一致する最長バイト列を符号化するので、圧縮率を高くすることができるという効果がある。
【００７５】
請求項８および１８に記載された発明によれば、第３の符号化モードにおいて、２バイト中の先頭バイトの１６進数表記が７Ｆ以下の場合に８ビットで符号化するので、圧縮率が過度に低くならないという効果がある。
【００７６】
請求項９および１９に記載された発明によれば、１６進数表記が７Ｆ以下のバイトデータの出現率が所定の条件を満たす場合に工程（Ａ），（Ｂ）を含む符号化を実行するので、７Ｆ以下のバイトデータを符号化する場合の圧縮率が過度に低くならないように符号語を設定することが可能であるという効果がある。
【００７７】
請求項１０およびに２０に記載された発明によれば、７Ｆ以下のバイトデータを８ビットで符号化できるので、符号化時の圧縮率が過度に低くなることがないという効果がある。
【００７８】
請求項２１に記載された発明によれば、３バイトの両端のバイトデータが一致する割合が所定の条件を満たす場合に工程（Ａ），（Ｂ）を含む符号化を実行するので、フォントデータに適した符号化を行なうことが可能であるという効果がある。
【００７９】
請求項２２に記載された発明によれば、７Ｆ以下のバイトデータ群と８０以上のバイトデータ群のうちで、比較的出現率の高いバイトデータ群の各バイトデータを８ビットで符号化できるので、符号化時の圧縮率が過度に低くなることがないという効果がある。
【図面の簡単な説明】
【図１】この発明の一実施例を適用する情報処理装置のハードウェア構成を示すブロック図。
【図２】可逆符号化によるデータの圧縮と伸長とを行なう圧縮伸長用デバイスドライバ２００の構成を示す機能ブロック図。
【図３】ＬＺタイプの符号化方法を示す説明図。
【図４】フォントタイプの符号化方法を示す説明図。
【図５】各符号化タイプの圧縮データのフォーマットを示す説明図。
【図６】符号化処理の手順を示すフローチャート。
【図７】実施例を適用して入力データを符号化する一例を示す説明図。
【図８】実施例を適用して入力データを符号化する一例を示す説明図。
【図９】符号化処理の手順の変形例を示すフローチャート。
【符号の説明】
１０１…ＣＰＵ
１０４…ＲＯＭ
１０５…ＲＡＭ
１１２…ＰＩＣ
１１５…マウスインタフェース
１１７…キーボード
１１８…キーボードインタフェース
１２１…ＦＤＣ
１２３…ハードディスク
１２４…ハードディスクドライブ
１２５…ＨＤＣ
１２８…ＣＲＴ
１２９…ＣＲＴＣ
１３０…プリンタ
１３１…プリンタインタフェース
２００…圧縮伸長用デバイスドライバ
２０２…クラスタバッファ
２０４…圧縮部
２０６…圧縮データバッファ
２０８…書込制御部
２１０…符号化タイプ判定部
２１２…ＬＺ符号化部
２１４…フォント符号化部
２１６…無圧縮符号化部[0001]
[Industrial applications]
The present invention relates to a method and an apparatus for losslessly encoding data.
[0002]
[Prior art]
In recent years, data compression techniques have been widely used to cope with a rapid increase in the amount of data processed by a computer. Data compression techniques include lossless encoding, in which data before compression is restored as before, and irreversible encoding, in which data is not restored as before. Lossy coding is mainly used for compression of image data. On the other hand, lossless encoding is used for compressing various data such as program data, text data, dictionary data, and the like.
[0003]
Font data is one type of data to be subjected to lossless encoding. Font data is data representing the shape of characters included in one set of fonts. In recent years, in many cases, several types of font data are loaded on a personal computer so that a user can select and use a desired font.
[0004]
[Problems to be solved by the invention]
However, when the font data is compressed by the same reversible encoding method as the text data and the program data, the compression ratio may not be so high. This is considered to be because the font data has a specific data array different from the text data and the program data.
[0005]
SUMMARY OF THE INVENTION The present invention has been made to solve the above-mentioned problems in the related art, and has as its object to provide a lossless encoding method and apparatus capable of efficiently compressing font data.
[0006]
Means and action for solving the problem
In order to solve the above-mentioned problem, a lossless encoding method for data according to claim 1 of the present invention comprises the steps of: (A) examining two consecutive bytes of input data; Determining whether the encoding condition that one byte immediately before is equal to each other and the first byte of the two bytes is equal to or less than a predetermined value is satisfied; and (B) determining whether the encoding condition is satisfied. Encoding the two bytes in a predetermined encoding mode if satisfied.
[0007]
In font data, the trailing byte of two bytes and the one byte immediately before the two bytes are equal to each other, and the leading byte of the two bytes is often equal to or smaller than a predetermined value. Therefore, when such an encoding condition is satisfied, font data can be efficiently compressed by encoding in a predetermined encoding mode suitable for the font data.
[0008]
In the data lossless encoding method according to claim 2, the step (B) includes a step (C) of encoding the two bytes with a code word of less than 16 bits.
[0009]
In this case, the compression ratio can be set to 1 or more.
[0010]
4. The data lossless encoding method according to claim 3, wherein the step (C) includes: a prefix of 8 bits or less indicating that the data is encoded in the predetermined encoding mode; Creating encoded data by combining the bits.
[0011]
In this way, 2 bytes can be encoded with 15 bits or less.
[0012]
According to a fourth aspect of the present invention, (A) two consecutive bytes of input data satisfy a first encoding condition for encoding in a predetermined first encoding mode. And (B) if the first encoding condition is not satisfied, the last byte of the two bytes is equal to the one byte immediately before the two bytes, and Determining whether or not a second encoding condition that the first byte of the two bytes is equal to or smaller than a predetermined value is satisfied; and (C) determining whether the first determination condition is satisfied. Encoding bytes in the first encoding mode having the highest compression ratio, and encoding the two bytes in the second encoding mode having a medium compression ratio when the second encoding condition is satisfied. , When neither the first nor the second encoding condition is satisfied Characterized in that it comprises a step of at least the compression ratio of the first byte is encoded in the lowest third encoding mode.
[0013]
If the encoding mode is selected based on the encoding condition, an appropriate encoding mode can be selected according to the content of the 2-byte data. In particular, when the two bytes satisfy a second encoding condition specific to font data, the font data is efficiently compressed by encoding in a second encoding mode suitable for the font data. Can be.
[0014]
8. The lossless encoding method of data according to claim 7, wherein the encoding step according to the first encoding mode includes the step of encoding a longest second byte sequence that matches a first byte sequence including at least the two bytes. Searching in the input data, a first codeword indicating that the encoding is performed in the first encoding mode, and a second codeword indicating a distance between the first and second byte strings. And generating a coded data by combining the third byte with the third codeword indicating the length of the first byte sequence.
[0015]
In the first encoding mode, since the longest matching byte string is encoded, the compression ratio can be increased.
[0016]
9. The data lossless encoding method according to claim 8, wherein the third encoding mode includes a prefix indicating that encoding is performed in the third encoding mode, and lower 7 bits of the first byte. Obtaining the encoded data by combining the first byte with the first byte. The prefix in the third encoding mode is a 1-bit first byte used when the hexadecimal notation of the first byte is 7F or less. It is selected from one of a prefix and a multi-bit second prefix used when the hexadecimal notation of the first byte is 80 or more.
[0017]
In the third encoding mode, when the hexadecimal notation of the first byte in two bytes is 7F or less, encoding is performed with 8 bits, so that the compression ratio does not become excessively low.
[0018]
The lossless encoding method for data according to claim 9, wherein it is determined whether or not the appearance rate of the first type of byte data whose hexadecimal notation is 7F or less in input data satisfies a predetermined condition. If the appearance rate of the first type of byte data satisfies a predetermined condition, (A) two consecutive bytes in the input data are encoded in an encoding mode in which the compression rate exceeds 1. (B) determining whether the encoding condition is satisfied, and if the encoding condition is not satisfied, sets at least the first byte of the two bytes to an encoding mode having a compression ratio of 1 or less. And encoding.
[0019]
When the encoding including the steps (A) and (B) is executed when the appearance rate of byte data whose hexadecimal notation is 7F or less satisfies a predetermined condition, when byte data of 7F or less is encoded Can be set so that the compression ratio of the code word does not become excessively low.
[0020]
11. The data lossless encoding method according to claim 10, wherein in the step (B), a prefix indicating that encoding has been performed in the encoding mode having a compression ratio of 1 or less, and a lower 7 bytes of the first byte. Obtaining encoded data by combining the first and second bits with each other, wherein the prefix is a 1-bit first prefix used when the hexadecimal notation of the first byte is 7F or less, and It is selected from either one of the multi-bit second prefix used when the hexadecimal notation of the first byte is 80 or more.
[0021]
Since byte data of 7F or less can be encoded with 8 bits, the compression ratio at the time of encoding does not become excessively low.
[0022]
12. The data lossless encoding apparatus according to claim 11, wherein two consecutive bytes of the input data are checked, and a trailing byte of the two bytes and a byte immediately before the two bytes are equal to each other, and Determining means for determining whether or not an encoding condition that the first byte of the byte is equal to or less than a predetermined value is satisfied; and, if the encoding condition is satisfied, encoding the two bytes in a predetermined encoding mode. Encoding means for encoding.
[0023]
According to a fourteenth aspect of the present invention, in the data lossless encoding apparatus, two consecutive bytes of the input data satisfy a first encoding condition for encoding in a predetermined first encoding mode. First determining means for determining whether or not the first encoding condition is not satisfied; and a trailing byte of the two bytes and a byte immediately before the two bytes are equal to each other, and Second means for determining whether or not a second encoding condition that the first byte of the two bytes is equal to or less than a predetermined value is satisfied; and if the first determination condition is satisfied, at least the second condition is satisfied. Encoding a byte in the first encoding mode, and encoding the two bytes in a second encoding mode if the second encoding condition is satisfied; If none of the above holds, at least the first byte Characterized in that it comprises encoding means for encoding the third encoding mode, a.
[0024]
The lossless data encoding apparatus according to claim 19, determines whether or not the appearance rate of the first type of byte data whose hexadecimal notation is 7F or less in input data satisfies a predetermined condition. And encoding means for performing encoding when the appearance rate of the first type of byte data satisfies a predetermined condition, wherein two consecutive bytes in the input data have a compression rate of First means for determining whether or not an encoding condition for encoding in an encoding mode exceeding 1 is satisfied; and, when the encoding condition is not satisfied, at least the first byte of the two bytes is Encoding means including: a second means for encoding in an encoding mode with a compression ratio of 1 or less.
[0025]
22. The lossless encoding method of data according to claim 21, wherein in the input data, it is determined whether or not a rate at which byte data at both ends of three consecutive bytes match each other satisfies a predetermined condition. Satisfies a predetermined condition, it is determined whether (A) two consecutive bytes in the input data satisfy an encoding condition for encoding in an encoding mode in which the compression ratio exceeds 1. And (B) encoding at least the first byte of the two bytes in an encoding mode having a compression ratio of 1 or less when the encoding condition is not satisfied. And
[0026]
In the font data, since the ratio at which the byte data at both ends of the three bytes match is high, if this matching ratio satisfies a predetermined condition, if the encoding including the steps (A) and (B) is executed, It is possible to perform encoding suitable for font data.
[0027]
23. The method according to claim 22, wherein the step (B) is performed by combining a prefix indicating that encoding is performed in the encoding mode with a compression ratio of 1 or less and the lower 7 bits of the first byte. Obtaining encoded data, wherein the input data has a first byte data group whose hexadecimal notation is 7F or less and a second byte data group whose hexadecimal notation is 80 or more. In the encoding of each byte data of the byte data group having a relatively high appearance rate, a 1-bit first prefix is used, and the encoding of each byte data of the byte data group having a relatively low appearance rate is performed. The encoding uses a multi-bit second prefix.
[0028]
Among the byte data group of 7F or less and the byte data group of 80 or more, each byte data of the byte data group having a relatively high appearance rate can be encoded with 8 bits, so that the compression rate at the time of encoding is excessively low. Nothing.
[0029]
【Example】
A. Equipment configuration:
FIG. 1 is a block diagram showing a hardware configuration of an information processing apparatus to which an embodiment of the present invention is applied. This information processing apparatus is configured as a personal computer system, and includes the following units mutually connected by a bus around a CPU 101 as shown in the figure.
[0030]
ROM 104: read-only memory for storing monitor programs and the like
RAM 105: read / write memory constituting main memory
PIC112: Interrupt controller that controls priorities of various interrupts
Mouse interface 115: an interface that controls data exchange with the two-button mouse 114
Keyboard interface 118: an interface for controlling key input from the keyboard 117
FDC 121: Flexible disk controller for controlling flexible disk drive (FDD) 120
HDC 125: a hard disk controller that controls a hard disk drive (HDD) 124
CRTC 129: CRT controller for controlling signal output to CRT 128 for displaying necessary data and the like
Printer interface 131: an interface that controls output of data to the printer 130.
[0031]
FIG. 2 is a functional block diagram showing the configuration of a compression / decompression device driver 200 that performs compression and decompression of data by lossless encoding. The compression / decompression device driver 200 is realized by the CPU 101 executing a program stored in the RAM 105.
[0032]
The device driver 200 for compression and decompression includes a cluster buffer 202 for storing data for one cluster to be compressed, a compression unit 204 for compressing data by lossless encoding, a compressed data buffer 206 for storing compressed data, A write control unit 208 that writes the compressed data to the hard disk 123. One cluster has a predetermined data size, for example, 8 Kbytes.
[0033]
The compression unit 204 includes an encoding type determination unit 210 that determines an encoding type described later, an LZ encoding unit 212 that performs LZ (Lempel-Ziv) encoding, and a font encoding that performs font type encoding. And a non-compression encoding unit 216 for performing non-compression type encoding. First, three types of encoding performed by the LZ encoding unit 212, the font encoding unit 214, and the non-compression encoding unit 216 will be described. In this embodiment, "encoding type" and "encoding mode" have the same meaning.
[0034]
FIG. 3 is an explanatory diagram showing an LZ type encoding method. The LZ type coding used in this embodiment is a Lempel-Ziv coding, and particularly uses a coding method called a slide dictionary method. In the example shown in FIG. 3A, the data to be compressed includes a byte string “71h 3Ah 3Bh...”. In addition, in this specification, the additional symbol "h" indicates that it is a hexadecimal notation.
[0035]
In the LZ type encoding, it is checked whether or not the same byte sequence as a continuous byte sequence of 2 bytes or more has appeared before. If there is the same byte sequence before, the longest matching byte sequence is checked. Then, the current byte sequence is encoded by the distance (offset) from the current byte sequence to the previous byte sequence and the number of bytes of the longest matching byte sequence (match length). For example, in the example of FIG. 3A, the two-byte byte sequence “3Ah 3Bh” underlined with a solid line is the same as the two-byte byte sequence underlined with a broken line. Therefore, the compressed data of the 2-byte byte string “3Ah 3Bh” underlined with a solid line is a prefix indicating that it is LZ type encoded data, and a code word OFFSET (which indicates the distance between the matched byte strings. 3) and a codeword LENGTH (2) representing the match length. The numerical value in parentheses of OFFSET (3) is the number of bytes indicating the distance, and the numerical value in parentheses of LENGTH (2) is the number of bytes indicating the matching length. The format of the compressed data of each encoding type will be described later.
[0036]
In the example of FIG. 3B, since the offset is 3 bytes and the matching length is 3 bytes, the solid, underlined 3 bytes are formed by a combination of the prefix, the code word OFFSET (3), and the code word LENGTH (3). Is encoded.
[0037]
In the example of FIG. 3C, the 5 bytes after the second byte are the same. In this case, the four bytes after the third byte, which are underlined with a solid line, match the four bytes after the second byte. Therefore, the offset is 1 byte and the match length is 4 bytes.
[0038]
As described above, in the LZ type encoding method, when a matching byte sequence has previously existed, data is compressed by encoding the longest matching byte sequence. It is suitable for compressing data that appears repeatedly.
[0039]
FIG. 4 is an explanatory diagram showing a font type encoding method. The font data is characterized in that both ends of three consecutive bytes are equal to each other, and the central byte is often less than 80h (that is, 7F or less). This feature is particularly remarkable in TrueType fonts (TrueType is a trademark of Apple Computer).
[0040]
In FIG. 4, the leading one byte “71h” and the underlined two bytes “3Ah 71h” have the above-described characteristics, so the two underlined bytes are compressed by font type encoding. Is done. One byte “71h” immediately before the two bytes “3Ah 71h” to be encoded by the font type may be encoded by non-compression type encoding, or may be encoded by LZ type encoding. It may be.
[0041]
The font-type compressed data includes a prefix indicating the font type and the lower 7 bits of the first byte "3Ah" in the two bytes to be encoded. The reason why only the lower 7 bits of “3Ah” should be used is that the font type must be encoded in the hexadecimal notation of less than 80h when the font type is encoded. This is because the most significant bit (MSB) is always 0. When a prefix indicating the font type is added, it can be understood that the immediately preceding byte and the last byte of the two bytes encoded by the font type are the same. Therefore, information indicating the trailing byte is unnecessary.
[0042]
In the LZ type encoding in this embodiment, encoding is performed when a continuous byte sequence of two or more bytes is the same as a previous byte sequence. Also, in the case of font-type encoding, encoding is performed when two consecutive bytes satisfy the above-mentioned predetermined characteristics. Therefore, when determining which encoding type is used for encoding, it is only necessary to examine two consecutive bytes and select one of the three encoding types.
[0043]
FIG. 5 is an explanatory diagram showing the format of compressed data of each encoding type used in this embodiment. Type 1 (FIG. 5 (A)) and type 2 (FIG. 5 (B)) exist in the compressed data by the non-compression type encoding. Uncompressed type 1 is compressed data of byte data (00h to 7Fh) whose most significant bit (MSB) before compression is 0. The uncompressed type 2 is compressed data of byte data (80h to FFh) whose most significant bit (MSB) before compression is 1. As shown in FIGS. 5A and 5B, the prefix of the non-compression type compressed data is changed according to the value of the most significant bit of the byte data before compression, so that the data added after the prefix is , Only the lower 7 bits of the data before compression need be used. Note that, in the compressed data of the non-compression type 1, one byte before compression is encoded by 8 bits including the prefix. Therefore, the compression ratio of the non-compression type 1 is 1.0. Since the prefix of uncompressed type 2 is 4 bits, its compression ratio is about 0.7.
[0044]
In a normal encoding method, encoded data obtained by encoding 1-byte data without compression is composed of a prefix of 1 bit or more indicating non-compression and the original 8 bits. Therefore, the encoded data has 9 or more bits, and the compression ratio is less than 1. On the other hand, in the above-mentioned non-compression type 1 encoding, since the compression ratio is 1.0, the compression ratio is higher than that of ordinary uncompressed encoded data. Therefore, if the encoding method of this embodiment is applied to data including bytes less than 80h to some extent or more, it is possible to increase the compression ratio as compared with a normal encoding method.
[0045]
Type 1 (FIG. 5 (C)) and type 2 (FIG. 5 (D)) also exist in compressed data obtained by LZ type encoding. LZ type 1 is compressed data used when the offset value is 255 (FFh in hexadecimal notation), and LZ type 2 is used when the offset value is 256 (100 h 2 in hexadecimal). It is compressed data. Note that the LZ type 1 and the LZ type 2 not only have different prefixes, but also have different codewords OFFSET representing offsets. For example, an LZ type 2 offset codeword is represented by 8 bits, while an LZ type 1 offset codeword is represented by a predetermined number of 9 bits or more. Note that the codeword LENGTH having a matching length of LZ types 1 and 2 is encoded according to an encoding table such as Huffman encoding or Weyl encoding.
[0046]
As shown in FIG. 5E, the font-type compressed data includes a prefix “1110” indicating the font type and the lower 7 bits of the first byte.
[0047]
Each encoding type shown in FIG. 5 has a different prefix, and the prefix is a codeword that can be uniquely and instantaneously decoded. Therefore, when decoding the compressed data, each coding type can be instantaneously determined only by examining the prefix, and decoding can be performed according to each coding type.
[0048]
FIG. 6 is a flowchart showing the procedure of the encoding process. In step S1, two consecutive bytes are extracted from the data to be compressed stored in the cluster buffer 202 as a coding type determination target. The reason why two bytes are to be determined as the encoding type is that, as described above, in the LZ type and font type encoding, a continuous byte string of two or more bytes is encoded.
[0049]
In step S2, it is determined whether or not the LZ type encoding condition is satisfied. The LZ type encoding condition C1 is as follows.
Encoding condition C1: A byte string identical to the two bytes to be determined previously exists.
[0050]
If the above condition C1 is satisfied, LZ type encoding is performed in step S4. In the font type encoding, a continuous byte string of 2 bytes or more is encoded as described with reference to FIG. If the encoding condition C1 is not satisfied, it is determined in step S3 whether the encoding condition of the font type is satisfied. The encoding condition C2 of the font type is as follows.
Coding condition C2: The last byte of the two bytes to be determined is equal to the one byte immediately before the two bytes, and the first byte of the two bytes is less than 80h.
[0051]
If the above-described encoding condition C2 is satisfied, font-type encoding is performed in step S5. In the font type encoding, two bytes are encoded (FIG. 4). If the encoding condition C2 is not satisfied, in step S6, the first byte of the two bytes to be determined is encoded in the non-compression type. In non-compression type coding, only one byte is coded.
[0052]
In step S7, it is determined whether or not all of the data to be compressed stored in the cluster buffer 202 has been encoded. If not, the process returns to step S1. Steps S1 to S3 and S7 are executed by the encoding type determination unit 210 (FIG. 2), and steps S4, S5 and S6 are performed by the LZ encoding unit 212, the font encoding unit 214, and the non-compression encoding unit 216. Respectively.
[0053]
The procedure of FIG. 6 can be reworded as follows. That is, since the LZ type encoding has the highest compression rate among the three encoding types, the LZ type encoding is performed when the LZ type encoding condition C1 is satisfied. In the font type encoding, since the compression ratio is medium among the three encoding types, the LZ type encoding condition C1 is not satisfied. However, when the font type encoding condition C2 is satisfied, the font is not encoded. Type encoding is performed. Since the non-compression type encoding has the lowest compression rate, it is encoded by the non-compression type only when encoding cannot be performed by the LZ type and the font type. Note that the compression ratio of the LZ type and the font type is greater than 1, whereas the compression ratio of the non-compression type is less than 1.
[0054]
7 and 8 are explanatory diagrams showing an example of encoding input data by applying the embodiment. FIG. 7A shows a byte sequence of the data to be compressed. Since the first byte of the byte string cannot be compressed by either the LZ type or the font type, it is encoded by the non-compression type as shown in FIG. Since the first byte "85h" is 80h or more, it is compressed by the non-compression type 2.
[0055]
Since the second and third bytes “3Ah 85h” satisfy the above-described font type encoding condition C2, they are encoded in the font type as shown in FIG. 7C.
[0056]
The fourth and fifth bytes “3Bh E0h” do not satisfy the LZ type encoding condition C1 and do not satisfy the font type encoding condition C2. Therefore, as shown in FIG. 7 (D), only the fourth byte “3Bh” is encoded by the non-compression type 1.
[0057]
Since the fifth and sixth bytes “E0h 3Bh” do not satisfy either of the two encoding conditions C1 and C2, only the fifth byte “E0h” is encoded by the uncompressed type 2 as shown in FIG. Be transformed into Also, since the sixth and seventh bytes “3Bh 3Ah” do not satisfy both of the two encoding conditions C1 and C2, only the sixth byte “3Bh” is the uncompressed type 1 as shown in FIG. .
[0058]
Since the seventh and eighth bytes “3Ah 85h” are the same as the second and third bytes, they are encoded in the LZ type. In LZ type encoding, the longest matching byte sequence is examined. In the example of FIG. 8C, since the longest matching byte string is 3 bytes, the 3rd to 7th to 9th bytes are encoded in the LZ type. Note that the offset in the case of FIG. 8C is 5 bytes, which is 255 bytes or less, so that LZ type 1 encoding is performed.
[0059]
Since the tenth byte “FFh” does not satisfy any of the two encoding conditions C1 and C2, it is encoded by the non-compression type 2 as shown in FIG. When the encoding of the data to be compressed is completed, a predetermined end code (not shown) is added to the end of the compressed data.
[0060]
In the above embodiment, the encoding type is determined according to whether two consecutive bytes of the compressed data satisfy the determination condition C1 or C2. Therefore, the encoding type is determined according to the local data structure of the compressed data. An appropriate coding type can be set, and as a result, there is an advantage that the compression ratio can be improved.
[0061]
In the above embodiment, the case where the data to be compressed is encoded by the lossless encoding method for font data has been described. However, the given data to be compressed is compressed by the lossless encoding method for font data, It is also possible to determine in advance whether to perform compression by the lossless encoding method.
[0062]
For example, the extension of the file name of font data is often limited to several types such as "TTF". Therefore, when data compression is performed, the extension of the compressed data file is checked, and if the extension matches a previously registered extension name such as “TTF”, the above-described lossless encoding method for font data is used. You may make it apply.
[0063]
Note that the above-described uncompressed type 1 encoding requires only 8 bits of encoded data, and thus has an advantage that the compression rate of the uncompressed type encoded data is higher than that obtained by a normal encoding method. Therefore, when the encoding method of the font data according to the above-described embodiment is applied to the input data having the appearance rate of byte data of less than 80h less than a certain level, the compression of the encoded data is smaller than that of the normal encoding method. It is possible to increase the rate. There are various methods for determining whether to apply the font data encoding method according to such a concept. For example, a fixed number of bytes (that is, at least a part of the input data) is checked from the head of the data to be compressed, and the encoding method of the font data is applied when the number of bytes less than 80h is equal to or more than a predetermined value. Good. Further, a fixed number of bytes may be checked while thinning out the data to be compressed, and the font data encoding method may be applied when the number of bytes less than 80h is equal to or more than a predetermined value.
[0064]
When the byte data of 80h or more is relatively large, the prefix for the data of 80h or more is 1 bit (for example, "0"), and the data of less than 80h is 2 bits or more prefix (for example, "0"). 1111 "). In other words, by examining at least a part of the input data, examining the number (or appearance rate) of the byte data in hexadecimal notation less than 80h and the byte data of 80h or more (or appearance rate), A prefix may be used, and a prefix of two or more bits having a relatively low appearance rate may be used.
[0065]
Note that the font data has a feature that the bytes at both ends of three consecutive bytes are the same, and this feature is used to determine the sign of the font data if the ratio of coincidence of every other two bytes is equal to or greater than a certain value. It is also possible to apply the conversion method. In this case, the rate at which every other two bytes match at least a part of the input data may be checked, and an encoding method to be applied may be selected according to the result.
[0066]
The present invention is not limited to the above-described embodiment, and can be implemented in various modes without departing from the gist of the present invention. For example, the following modifications are possible.
[0067]
(1) FIG. 9 is obtained by adding steps S10 and S11 to the processing procedure of FIG. In step S10, if the offset at the time of encoding with the LZ type is 256 or more and the matching length is equal to 2, the process proceeds to step S11, where 2 bytes to be encoded are determined as the font type. It is determined whether condition C2 is satisfied. If the font type determination condition C2 is satisfied, the two bytes to be encoded are encoded with the font type, otherwise, the two bytes are encoded with the LZ type. The reason for this is that when both the conditions of steps S10 and S11 are satisfied, the compression rate of the encoded data is higher when encoding is performed using the font type than when encoding is performed using the LZ type. .
[0068]
As described above, even when a 2-byte byte string that satisfies the LZ type encoding determination condition C1 is encoded with the font type and the compression rate of the encoded data becomes higher, the encoding with the font type is performed. You may make it.
[0069]
(2) Various lossless encoding methods such as Huffman encoding and arithmetic encoding can be used in place of the LZ encoding method in the above embodiment.
[0070]
【The invention's effect】
As described above, according to the first and eleventh aspects of the present invention, when input data satisfies a predetermined encoding condition, encoding is performed in a predetermined encoding mode suitable for font data. There is an effect that font data can be efficiently compressed.
[0071]
According to the second, fifth, twelfth, and fifteenth aspects, the compression ratio can be made 1 or more.
[0072]
According to the third, sixth, thirteenth, and sixteenth aspects, there is an effect that two bytes can be encoded with 15 bits or less.
[0073]
According to the fourth and fourteenth aspects of the present invention, it is possible to select an appropriate encoding mode in accordance with the content of two-byte data. In particular, the two bytes are the second code unique to font data. There is an effect that the compression can be efficiently performed when the conversion condition is satisfied.
[0074]
According to the seventh and 17th aspects of the present invention, in the first encoding mode, the longest matching byte string is encoded, so that the compression rate can be increased.
[0075]
According to the eighth and eighteenth aspects of the present invention, in the third encoding mode, when the hexadecimal notation of the first byte in the two bytes is 7F or less, encoding is performed by 8 bits, so that the compression rate is excessively high. The effect is that it does not decrease.
[0076]
According to the ninth and nineteenth aspects of the present invention, the encoding including the steps (A) and (B) is executed when the appearance rate of byte data whose hexadecimal notation is 7F or less satisfies a predetermined condition. , 7F or less, it is possible to set a code word so that the compression ratio does not become excessively low.
[0077]
According to the tenth and twentieth aspects of the present invention, since byte data of 7F or less can be encoded with 8 bits, the compression rate at the time of encoding does not become excessively low.
[0078]
According to the twenty-first aspect of the present invention, the encoding including the steps (A) and (B) is executed when the ratio of coincidence of the byte data at both ends of the three bytes satisfies a predetermined condition. This makes it possible to perform encoding suitable for.
[0079]
According to the invention described in claim 22, each byte data of a byte data group having a relatively high appearance rate among the byte data group of 7F or less and the byte data group of 80 or more can be encoded with 8 bits. This has the effect that the compression ratio at the time of encoding does not become excessively low.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a hardware configuration of an information processing apparatus to which an embodiment of the present invention is applied.
FIG. 2 is a functional block diagram showing the configuration of a compression / decompression device driver 200 that performs compression and decompression of data by lossless encoding.
FIG. 3 is an explanatory diagram showing an LZ type encoding method.
FIG. 4 is an explanatory diagram showing a font type encoding method.
FIG. 5 is an explanatory diagram showing a format of compressed data of each encoding type.
FIG. 6 is a flowchart showing a procedure of an encoding process.
FIG. 7 is an explanatory diagram showing an example of encoding input data by applying the embodiment.
FIG. 8 is an explanatory diagram showing an example of encoding input data by applying the embodiment.
FIG. 9 is a flowchart illustrating a modification of the procedure of the encoding process.
[Explanation of symbols]
101 ... CPU
104 ... ROM
105 ... RAM
112 ... PIC
115 ... Mouse interface
117 ... Keyboard
118 ... Keyboard interface
121 ... FDC
123 ... Hard disk
124 ... Hard disk drive
125 ... HDC
128 ... CRT
129… CRTC
130 ... Printer
131 ... Printer interface
200: Device driver for compression and decompression
202: Cluster buffer
204: compression unit
206: compressed data buffer
208: Write control unit
210: coding type determination unit
212 ... LZ encoding unit
214: Font encoding unit
216: Non-compression encoding unit

Claims

A method of reversibly encoding data,
(A) Checking two consecutive bytes of input data, if the last byte of the two bytes is equal to the one byte immediately before the two bytes and the first byte of the two bytes is equal to or less than a predetermined value Determining whether or not the encoding condition of
(B) encoding the two bytes in a predetermined encoding mode when the encoding condition is satisfied;
A lossless encoding method for data.

The lossless encoding method of data according to claim 1, wherein
In the step (B),
(C) encoding the two bytes with a code word of less than 16 bits.

3. The data lossless encoding method according to claim 2, wherein
The step (C) includes:
A method of creating encoded data by combining a prefix of 8 bits or less indicating that encoding has been performed in the predetermined encoding mode and the lower 7 bits of the first byte. .

A method of reversibly encoding data,
(A) determining whether two consecutive bytes of input data satisfy a first encoding condition for encoding in a first encoding mode;
(B) when the first encoding condition is not satisfied, the last byte of the two bytes is equal to the one byte immediately before the two bytes, and the first byte of the two bytes is a predetermined value. Determining whether or not a second encoding condition that:
(C) at least the two bytes are encoded in the first encoding mode having the highest compression ratio when the first determination condition is satisfied, and when the second encoding condition is satisfied, Two bytes are encoded in a second encoding mode having a medium compression ratio, and if neither the first nor the second encoding condition is satisfied, at least the first byte is converted to a third encoding having the lowest compression ratio. Encoding in an encoding mode;
A lossless encoding method for data.

The data lossless encoding method according to claim 4, wherein
The encoding step according to the second encoding mode includes:
Encoding the two bytes with a codeword of less than 16 bits.

The data lossless encoding method according to claim 5,
In the encoding step in the second encoding mode, the encoding is performed by combining a prefix of 8 bits or less indicating that encoding is performed in the second encoding mode, and the lower 7 bits of the first byte. Creating a data, a method of reversible encoding of data.

A lossless encoding method for data according to any one of claims 4 to 6, wherein:
The encoding step according to the first encoding mode includes:
Searching the input data for a longest second byte sequence that matches a first byte sequence containing at least the two bytes;
A first codeword indicating that encoding has been performed in the first encoding mode, a second codeword indicating a distance between the first and second byte strings, and Creating encoded data by combining with a third codeword indicating the length.

A lossless encoding method for data according to any one of claims 4 to 7,
The third encoding mode includes a step of obtaining encoded data by combining a prefix indicating that encoding is performed in the third encoding mode and the lower 7 bits of the first byte,
The prefix in the third encoding mode is a 1-bit first prefix used when the hexadecimal notation of the first byte is 7F or less, and the hexadecimal notation of the first byte is 80 or more. A lossless encoding method for data selected from any one of a plurality of bits and a second prefix used when.

A lossless encoding method for data,
In the input data, it is determined whether the appearance rate of the first type of byte data whose hexadecimal notation is 7F or less satisfies a predetermined condition,
When the appearance rate of the byte data of the first type satisfies a predetermined condition, (A) two consecutive bytes in the input data are encoded in an encoding mode in which the compression rate exceeds 1. Determining whether or not the encoding condition of
(B) encoding the at least the first byte of the two bytes in an encoding mode having a compression ratio of 1 or less when the encoding condition is not satisfied. Encoding method.

The data lossless encoding method according to claim 9, wherein
The step (B) includes a step of obtaining encoded data by combining a prefix indicating that encoding has been performed in the encoding mode with a compression ratio of 1 or less and the lower 7 bits of the first byte. Including
The prefix is a 1-bit first prefix used when the hexadecimal notation of the first byte is 7F or less, and a plurality of prefixes used when the hexadecimal notation of the first byte is 80 or more. A lossless encoding method for data selected from one of a bit and a second prefix.

A lossless encoding device for data,
Checking two consecutive bytes of the input data, encoding that the last byte of the two bytes and the one byte immediately before the two bytes are equal to each other and the first byte of the two bytes is equal to or less than a predetermined value Determining means for determining whether a condition is satisfied;
Encoding means for encoding the two bytes in a predetermined encoding mode when the encoding condition is satisfied;
A lossless encoding device for data, comprising:

The data lossless encoding device according to claim 11, wherein
The encoding means,
A first means for encoding the two bytes with a code word of less than 16 bits, the data lossless encoding apparatus.

The data lossless encoding device according to claim 12,
The first means includes:
A lossless encoding apparatus for encoding the data, comprising: means for creating encoded data by combining a prefix of 8 bits or less indicating that encoding is performed in the predetermined encoding mode and the lower 7 bits of the first byte. .

A lossless encoding device for data,
First determining means for determining whether two consecutive bytes of input data satisfy a first encoding condition for encoding in a predetermined first encoding mode;
If the first encoding condition is not satisfied, the last byte of the two bytes is equal to the one byte immediately before the two bytes, and the first byte of the two bytes is equal to or less than a predetermined value. Second means for determining whether or not a second encoding condition of
When the first determination condition is satisfied, at least the two bytes are encoded in the first encoding mode having the highest compression ratio. When the second encoding condition is satisfied, the two bytes are encoded. Encoding is performed in a second encoding mode having a medium compression rate, and if neither the first nor second encoding condition is satisfied, at least the first byte is encoded in a third encoding mode having the lowest compression rate. Encoding means for encoding with
A lossless encoding device for data, comprising:

The data lossless encoding device according to claim 14,
The encoding means,
A reversible coding apparatus for data, comprising: first means for coding the two bytes with a code word of less than 16 bits in the coding in the second coding mode.

The data lossless encoding device according to claim 15, wherein
The first means includes:
Lossless encoding of data including means for creating encoded data by combining a prefix of 8 bits or less indicating that encoding is performed in the second encoding mode and the lower 7 bits of the first byte. apparatus.

A data lossless encoding device according to any one of claims 14 to 16,
The encoding means,
Searching means for searching in the input data for a longest second byte string that matches the first byte string including at least the two bytes in the encoding in the first encoding mode;
A first codeword indicating that encoding has been performed in the first encoding mode, a second codeword indicating a distance between the first and second byte strings, and A coded data generating means for generating coded data by combining the third codeword indicating the length with the third codeword.

The data lossless encoding device according to any one of claims 14 to 17, wherein
The encoding means,
Means for creating encoded data by combining a prefix indicating that encoding has been performed in the third encoding mode and the lower 7 bits of the first byte in encoding in the third encoding mode. When,
As the prefix in the third encoding mode, the 1-bit first prefix used when the hexadecimal notation of the first byte is 7F or less, and the hexadecimal notation of the first byte is 80 or more. Means for selecting from either one of a plurality of bits and a second prefix used in the case of.

A lossless encoding device for data,
Determining means for determining whether or not the appearance rate of the first type of byte data whose hexadecimal notation is 7F or less in input data satisfies a predetermined condition;
Encoding means for encoding when the appearance rate of the first type of byte data satisfies a predetermined condition,
First means for determining whether or not two consecutive bytes in the input data satisfy an encoding condition for encoding in an encoding mode in which a compression ratio exceeds 1;
An encoding unit including, if the encoding condition is not satisfied, a second unit that encodes at least a first byte of the two bytes in an encoding mode with a compression ratio of 1 or less;
A lossless encoding device for data, comprising:

20. The data lossless encoding device according to claim 19,
The second means includes:
Means for generating encoded data by combining a prefix indicating that encoding has been performed in the encoding mode with a compression ratio of 1 or less and the lower 7 bits of the first byte;
As the prefix, a 1-bit first prefix used when the hexadecimal notation of the first byte is 7F or less, and a plurality of prefixes used when the hexadecimal notation of the first byte is 80 or more. Means for selecting from one of a bit and a second prefix.

A lossless encoding method for data,
In the input data, it is checked whether or not the rate at which the byte data at both ends of three consecutive bytes match each other satisfies a predetermined condition.
When the matching ratio satisfies a predetermined condition,
(A) determining whether two consecutive bytes in the input data satisfy an encoding condition for encoding in an encoding mode in which a compression ratio exceeds 1;
(B) encoding the at least the first byte of the two bytes in an encoding mode having a compression ratio of 1 or less when the encoding condition is not satisfied. Encoding method.

The data lossless encoding method according to claim 9 or 21,
In the step (B),
Obtaining encoded data by combining a prefix indicating that encoding is performed in the encoding mode with a compression ratio of 1 or less and lower 7 bits of the first byte,
The appearance rate of the first byte data group whose hexadecimal notation is 7F or less in the input data is compared with the appearance rate of the second byte data group whose hexadecimal notation is 80 or more, and the appearance rate is relatively high. In encoding each byte data of the byte data group, a 1-bit first prefix is used, and in encoding each byte data of the byte data group having a relatively low appearance rate, a second prefix of a plurality of bits is used. A lossless encoding method for data that uses words.