JP2823917B2

JP2823917B2 - Data compression method

Info

Publication number: JP2823917B2
Application number: JP507890A
Authority: JP
Inventors: 泰彦中野; 茂吉田
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1990-01-12
Filing date: 1990-01-12
Publication date: 1998-11-11
Anticipated expiration: 2013-11-11
Also published as: JPH03209922A

Description

【発明の詳細な説明】［概要］文字等の入力データ系列を、参照テキストに登録され
た既に符号化済みの系列の複製として圧縮符号化するデ
ータ圧縮方式に関し、参照テキストの更新と検索を高速化することを目的と
し、参照テキストを複数領域に分割して順次符号化済みデ
ータ系列を登録し、参照テキストの検索は登録の新しい
分割領域から登録の古い分割領域に向かって行い、更に
参照テキストが一杯になった場合には、最も登録の古い
分割領域を更新するように構成する。DETAILED DESCRIPTION OF THE INVENTION [Summary] The present invention relates to a data compression method for compressing and encoding an input data sequence such as a character as a copy of an already encoded sequence registered in a reference text. The reference text is divided into a plurality of regions, and the encoded data series is registered sequentially.The reference text is searched from the newly registered divided region to the old registered divided region, and further, the reference text is searched. When is full, the oldest registered divided area is updated.

［産業上の利用分野］本発明は、文字等の入力データ系列を、参照テキスト
に登録された既に符号化済みの系列の複製として圧縮符
号化するデータ圧縮方式に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a data compression method for compressing and encoding an input data sequence such as a character as a copy of an already encoded sequence registered in a reference text.

文字等のコード列情報を伝達・蓄積する際には、デー
タ量を低減して伝送時間の短縮と記憶容量の低減を図る
ためコード情報を圧縮符号化しており、この圧縮符号化
としては、過去のデータ系列を登録した参照テキストの
任意の位置から入力コード情報に一致する最大長さの部
分列を取出し、過去の系列の複製として符号化するユニ
バーサル符号化が行われ、演算の高速化と同時に高圧縮
率が得を得る必要がある。When transmitting and storing code string information such as characters, code information is compressed and coded to reduce the amount of data to reduce transmission time and storage capacity. Universal coding is performed to extract a substring of the maximum length that matches the input code information from an arbitrary position in the reference text in which the data series has been registered, and encode it as a copy of the past series. It is necessary to obtain a high compression ratio.

［従来の技術］一般に、蓄積、伝送すべきデータの容量が大きいと
き、通信回線や記憶装置の容量を有効に利用するため、
データ列を圧縮して伝送や蓄積を行い、再度、そのデー
タを使用するときに元のデータ列に復元する方法が良く
用いられる。[Prior art] Generally, when the capacity of data to be stored and transmitted is large, in order to effectively use the capacity of a communication line or a storage device,
A method of compressing a data string for transmission or storage and restoring the original data string when the data is used again is often used.

従来、文字コードを能率良く圧縮する方式としてZiv
−Lempel符号（以下「ZL符号」という）が知られている
（例えば宗像清治著，「Ziv−Lempelデータ圧縮法」，
情報処理,pp.2〜6,vol.26,NO.1,1985を参照のこと）。Conventionally, Ziv was used as a method to efficiently compress character codes.
-Lempel code (hereinafter referred to as "ZL code") is known (for example, "Ziv-Lempel data compression method" by Seiji Munakata,
Information processing, pp. 2-6, vol. 26, NO. 1, 1985).

ZL符号には、ユニバーサル型と、増分分解型（Incremental persing）の２つのアルゴリズムの提案されている。この２つのア
ルゴリズムの間では、圧縮率はユニバーサル型の方が優
れており、処理速度は増分分解型の方が早いという特徴
がある。Two algorithms have been proposed for ZL code: universal type and incremental persing. Between these two algorithms, the compression ratio is superior in the universal type, and the processing speed is faster in the incremental decomposition type.

これら２つのアルゴリズムの内、圧縮率が高いユニバ
ーサル型ZL符号は、入力した文字列を以前に入力した符
号化済みの文字列から最大長一致する文字列を検索し、
その最大長一致する文字列を複製として符号化する。Among these two algorithms, the universal ZL code with a high compression ratio searches the input character string for the maximum length matching character string from the previously input encoded character string,
The character string that matches the maximum length is encoded as a duplicate.

ここでデータ圧縮は文字コードに限らず、一般のデー
タにも適用できるが、以下の説明では情報理論等で使わ
れている呼称を踏襲し、データの1wordを文字、その集
合を文字列と呼ぶことにする。Here, data compression can be applied not only to character codes but also to general data. In the following description, one word of data is called a character, and the set is called a character string, following the names used in information theory and the like. I will.

第５図にユニバーサル型ZL符号器の原理を示す。 FIG. 5 shows the principle of the universal ZL encoder.

第５図において、Ｐバッファ12には符号化済みの文字
列が格納されており、Ｑバッファ10にはこれら符号化す
る文字列が格納されている。Ｑバッファ10の入力文字列
とＰバッファ12の登録されている全ての文字列（部分
列）とを検索照合し、Ｐバッファ12中で一致する最大長
の部分列を求める。そして、Ｐバッファ12中の最大一致
長をもつ部分列を指定するため、次の情報の組を符号化
する。In FIG. 5, a coded character string is stored in a P buffer 12, and a character string to be coded is stored in a Q buffer 10. The input character string of the Q buffer 10 and all the registered character strings (substrings) of the P buffer 12 are searched and collated, and a matching maximum length substring in the P buffer 12 is obtained. Then, in order to specify the subsequence having the maximum matching length in the P buffer 12, the following information set is encoded.

次にＱバッファ10内の符号化した文字列をＰバッファ
12に移して、符号化した文字数分の新たな文字を登録す
る。以下、同様の操作を繰り返し、入力データを部分列
に分解して順次符号化する。 Next, the encoded character string in the Q buffer 10 is
Move to step 12 to register new characters for the number of encoded characters. Hereinafter, the same operation is repeated to decompose the input data into sub-sequences and sequentially encode them.

第６図は従来方式の一例を示し、部号語が２バイトで
表わされる場合、例えばＰバッファ12を12bit、Ｑバッ
ファ10を4bitで表わしている。Ｐバッファ12上での検索
は、Ｐバッファ12の左側から行われ、もし一致する文字
列がなければ、INPUTポインタのところに入力データ系
列が新たに登録される。FIG. 6 shows an example of the conventional system. In the case where a symbol is represented by 2 bytes, for example, the P buffer 12 is represented by 12 bits and the Q buffer 10 is represented by 4 bits. The search on the P buffer 12 is performed from the left side of the P buffer 12, and if there is no matching character string, an input data sequence is newly registered at the INPUT pointer.

［発明が解決しようとする課題］このようなユニバーサル型ZL符号方式で圧縮率を向上
させるには、できるだけ多くの参照テキストとしてのＰ
バッファの登録が必要であり、またＱバッファのビット
幅で決まるマッチングによる一致長が大きさを制限なし
に表せることが理想である。[Problems to be Solved by the Invention] In order to improve the compression ratio in such a universal ZL coding method, as many reference texts as possible
Ideally, it is necessary to register a buffer, and the matching length determined by the matching determined by the bit width of the Q buffer can express the size without limitation.

しかし、実際には、ソフトウェアで符号化・復合化す
るときは、単にＰバッファ12及びＱバッファ10を大きく
とると、バッファアドレスで決まる符号語データが大き
くなり、結果的に圧縮率が悪化する。また、参照テキス
トが増えるので、一致検索に時間がかかり処理スピード
も落ちる。However, in practice, when encoding / decoding by software, if the P buffer 12 and the Q buffer 10 are simply made large, the code word data determined by the buffer address becomes large, and as a result, the compression ratio deteriorates. Also, since the number of reference texts increases, it takes time to perform a match search, and the processing speed is reduced.

本発明は、このような従来の問題点に鑑みてなされた
もので、参照テキストを大きくした際の一致検索と更新
を高速化するデータ圧縮方式を提供することを目的とす
る。The present invention has been made in view of such a conventional problem, and an object of the present invention is to provide a data compression method that speeds up a match search and an update when a reference text is enlarged.

［課題を解決するための手段］第１図は本発明の原理説明図である。[Means for Solving the Problems] FIG. 1 is an explanatory view of the principle of the present invention.

まず本発明は、データ系列を第１バッファ（Ｑバッ
フ）10に入力し、既に符号化済みのデータ系列が登録さ
れている参照テキストとしての第２バッファ（Ｐバッフ
ァ）12を検索して入力データ系列に一致する符号化済み
データ系列の最大長一致の部分列を求め、該最大長一致
部分の開始位置と一致長との組を符号語として出力して
圧縮符号化するデータ圧縮方式を対象とする。First, according to the present invention, a data sequence is input to a first buffer (Q buffer) 10 and a second buffer (P buffer) 12 as a reference text in which an already encoded data sequence is registered is searched for input data. A data compression method for obtaining a maximum length matching subsequence of an encoded data sequence matching a sequence, outputting a set of a start position of the maximum length matching portion and a matching length as a codeword, and compressing and encoding the data sequence. I do.

このようなデータ圧縮方式につき本発明にあっては、
第２バッファ12を複数領域12−１〜12nに分割して符号
化済みデータを順次登録し、分割領域12−１〜12−ｎの
内の登録の新しい分割領域から登録の古い領域に向けて
入力データ系列との一致検索を行い、更に全ての分割領
域12−１〜12−ｎが登録データ系列で満たされた場合に
は、最も登録の古い分割領域を更新するように構成す
る。In the present invention regarding such a data compression method,
The second buffer 12 is divided into a plurality of regions 12-1 to 12n, and the encoded data is sequentially registered. From the newly registered divided regions in the divided regions 12-1 to 12-n, the regions are registered from the newly registered region to the old registered region. A search for a match with the input data series is performed, and if all the divided areas 12-1 to 12-n are filled with the registered data series, the divided area with the oldest registration is updated.

更に本発明にあっては、参照テキストとしての第２バ
ッファ12を大きくしながら、符号語を小さくして圧縮率
を向上させるため、検索された最大長一致部分の開始位
置を第２バッファ12の分割領域12−１〜12−ｎの領域番
号と領域内位置とに分け、領域内位置は符号語の最大長
さ一致部分の開始位置として部号化し、一方、領域番号
は符号語とは別個に第２バッファ12に識別データとして
持つことにより、一致長検索に適したビット幅に第２バ
ッファ12の検索インデックをビット幅を縮小させ、この
縮小させたビット分を第１バッファ10に割り当てて一致
長検索の許容長さを拡大させるように構成する。Further, in the present invention, in order to increase the second buffer 12 as a reference text while reducing the code word and improving the compression ratio, the start position of the searched maximum length matching portion is set in the second buffer 12. The region numbers of the divided regions 12-1 to 12-n are divided into the region number and the position in the region, and the position in the region is numbered as the start position of the maximum length matching portion of the code word, while the region number is different from the code word. By having the second buffer 12 as identification data, the search index of the second buffer 12 is reduced in bit width to a bit width suitable for a match length search, and the reduced bits are allocated to the first buffer 10. It is configured to extend the allowable length of the match length search.

［作用］このような構成を備えた本発明のデータ圧縮方式によ
れば、符号化の途中で、第２バッファ12の全ての分割領
域12−１〜12−ｎが一杯になったときは、登録の一番古
い分割領域を消去して新たに登録していくことにより、
従来では全体として行っていた面倒な第２バッファ12の
更新作業が大幅に削減され、処理の高速化が可能とな
る。[Operation] According to the data compression system of the present invention having such a configuration, when all the divided areas 12-1 to 12-n of the second buffer 12 become full during encoding, By deleting the oldest divided area and newly registering it,
Conventionally, the troublesome work of updating the second buffer 12 as a whole is greatly reduced, and the processing can be speeded up.

また登録の新しい第２バッファ12の分割領域から検索
することにより、登録されたばかりの新しい情報から検
索できるので、検索が短い時間で済む。In addition, by searching from the newly-registered divided area of the second buffer 12, it is possible to search from new information that has just been registered, so that the search can be completed in a short time.

［実施例］第２図は符号器を例にとって本発明の一実施例を示し
た実施例構成図であり、ユニバーサル型ZL符号化で実際
的な方法として知られたLZSS符号化方式（T.C.Bell:Bet
ter OPM/L Text Compression,IEEE Trans.on Commom.,V
ol.34,No.12,1986参照）により符号語データを２バイト
で表わす場合を例にとっている。Embodiment FIG. 2 is a block diagram showing an embodiment of the present invention, taking an encoder as an example, and shows an LZSS encoding method (TCBell: Bet
ter OPM / L Text Compression, IEEE Trans.on Commom., V
ol. 34, No. 12, 1986) in which codeword data is represented by 2 bytes.

第２図において、14はデータ圧縮装置であり、第１バ
ッファとしてのＱバッファ10と、第２バッファとしての
Ｐバッファ12を備え、この実施例にあっては、Ｐバッフ
ァ12は３つの分割領域12−1,12−3,12−３に分割されて
いる。In FIG. 2, reference numeral 14 denotes a data compression device which includes a Q buffer 10 as a first buffer and a P buffer 12 as a second buffer. In this embodiment, the P buffer 12 has three divided areas. It is divided into 12-1, 12-3, 12-3.

尚、Ｐバッファ12の分割領域は、12−1,12−２が登録
済みで、現在残りの領域12−３への登録が行われている
状態を示している。It should be noted that the divided areas of the P buffer 12 indicate that 12-1 and 12-2 have been registered, and the registration to the remaining area 12-3 is currently being performed.

16はファイル／伝送装置であり、データ圧縮装置14か
ら出力される最大長一致部分の開始位置と一致長との組
のデータ構造をもつ符号語又は生データ（入力文字列そ
のもの）を記憶又は伝送する。A file / transmission device 16 stores or transmits a code word or raw data (input character string itself) having a data structure of a set of a start position of a maximum length matching portion and a matching length output from the data compression device 14. I do.

ここで本発明の処理を第６図の従来方式と対比して説
明すると次のようになる。Here, the processing of the present invention will be described in comparison with the conventional method shown in FIG.

まず第６図の従来方式では、符号語を２バイトで表す
とき、例えばＰバッファ12のサイズを12ビット、Ｑバッ
ファ10のサイズを４ビットで表し、Ｐバッファ12上での
検索はＰバッファ12の左側から行われ、もし一致する文
字列がなければ、図中のINPUTポインタのところに新た
に登録される。この方法では、文字の比較が、現在のＱ
バッファ10の内容とは関係が薄いと考えられる古い登録
文字から始められ、検索時間がかかっている。また、Ｐ
バッファ12が一杯になったときの更新法として、Ｑバッファ10上の符号化が終わった分だけ、Ｐバッフ
ァ12を左にシフトする方法や、一続きのＰバッファ12を全てクリアし、最初から登録
をやり直す方法があった。First, in the conventional method shown in FIG. 6, when the code word is represented by 2 bytes, for example, the size of the P buffer 12 is represented by 12 bits, and the size of the Q buffer 10 is represented by 4 bits. From the left side, and if there is no matching character string, it is newly registered at the INPUT pointer in the figure. In this method, the character comparison is
The search starts with an old registered character that is considered to have little relation to the contents of the buffer 10 and takes a long time to search. Also, P
As an update method when the buffer 12 becomes full, a method of shifting the P buffer 12 to the left as much as the encoding on the Q buffer 10 is completed, or a method of clearing a continuous P buffer 12 and starting from the beginning There was a way to redo the registration.

しかし、の一部シフト法はＰバッファ12が一度一杯
になると、毎回、更新可能が必要で処理が煩雑になる。
またのオールクリア法は、現在までの学習されたＰバ
ッファ12が一度に失われ、符号化されない生のデータが
出力される可能性が大きくなり効率が悪い。However, in the partial shift method, once the P-buffer 12 becomes full, it is necessary to be able to update each time, and the processing becomes complicated.
In the all clear method, the possibility that the learned P buffer 12 up to now is lost at a time and uncoded raw data is output increases, resulting in poor efficiency.

これに対し本発明では、従来は一続きであったＰバッ
ファ12を複数領域、例えば３つの領域に12−１〜12−３
に分け、Ｐバッファ12の中のどのバッファ分割領域が使
われているかは、バッファ内の識別コードに組み込むこ
ととする。On the other hand, in the present invention, the P buffer 12 which is conventionally continuous is divided into a plurality of areas, for example, three areas 12-1 to 12-3.
Which of the buffer divided areas in the P buffer 12 is used is incorporated into the identification code in the buffer.

Ｐバッファ12の検索は、検索を短い時間で終了させる
ために、一番登録の新しい分割領域12−３から検索を始
める。また、Ｐバッファ12の全ての分割領域12−１〜12
−３が一杯になったら、関係の薄いと思われる登録が一
番古いＰバッファ12の分割領域12−１のみをクリアし、
クリアした分割領域12−１に新しく登録していくことで
検索の効率が図れ、且つ他の２つの分割領域12−2,12−
３はそのまま利用できる。The search of the P buffer 12 starts from the newest registered sub-region 12-3 in order to complete the search in a short time. Further, all the divided areas 12-1 to 12-12 of the P buffer 12 are
When -3 is full, the registration that seems to be less relevant clears only the oldest divided area 12-1 of the P buffer 12,
By newly registering in the cleared divided area 12-1, search efficiency can be improved, and the other two divided areas 12-2, 12-
3 can be used as it is.

また、第２図の実施例にあたっては、第６図の従来方
式でＰバッファ12が12ビット、Ｑバッファ、10が４ビッ
トであったものを、それぞれ11ビット、及び５ビットと
している。その結果、Ｐバッファ12が１ビット減った
分、参照テキストの量が半分になるが、３つの分割領域
12−１〜12−３に分けて持つことにより、参照テキスト
の量は等化的に3/2になる。またＱバッファが１ビット
増えているので、その分マッチング一致長が長くとれ圧
縮率も向上する。In the embodiment shown in FIG. 2, the P buffer 12 has 12 bits, the Q buffer and 10 have 4 bits in the conventional system shown in FIG. 6, but 11 bits and 5 bits, respectively. As a result, the amount of the reference text is reduced by half as much as the P buffer 12 is reduced by one bit.
By having the reference text divided into 12-1 to 12-3, the amount of the reference text is equalized to 3/2. Further, since the Q buffer is increased by one bit, the matching matching length can be lengthened and the compression ratio can be improved.

更に詳細に説明すると、第６図の従来方式では、Ｐバ
ッファ12を12ビット、Ｑバッファ10を４ビットで表わ
し、更に８個のデータ毎に符号語データか生データかを
識別するための識別データが格納されている。即ち、識
別データの１ビットずつが続いて並ぶ８個のデータが符
号語データか生データかを識別する。More specifically, in the conventional system shown in FIG. 6, the P buffer 12 is represented by 12 bits and the Q buffer 10 is represented by 4 bits, and for every 8 data, an identification for identifying codeword data or raw data is performed. Data is stored. That is, it is determined whether the eight pieces of data in which the identification data are successively arranged one bit at a time are codeword data or raw data.

ここで圧縮率を上げようとしてＰバッファ121及びＱ
バッファ10のビット幅を仮に１ビットずつ増やしたとす
ると、開始位置と一致長の組でなる符号語データが８の
倍数でなくなり、データを転送する際にビット詰めとい
う面倒な処理が必要となる。また８の倍数となるように
Ｐバッファ12及びＱバッファ10のビット幅を、例えば18
ビット、６ビットに増やしたとすると、最大一致長の開
始位置と一致長との組でなる符号語が３バイトにもな
り、最大一致長が２バイト又は３バイトであった場合
は、共に複製で表わすことでは圧縮されず、符号化の効
率が悪くなる。Here, the P buffer 121 and Q
Assuming that the bit width of the buffer 10 is increased by one bit, the code word data which is a set of the start position and the matching length is not a multiple of 8, and a complicated process of packing bits is required when transferring data. Also, the bit widths of the P buffer 12 and the Q buffer 10 are set to, for example, 18 so as to be a multiple of 8.
If the number of bits is increased to 6 bits, the code word consisting of the start position of the maximum match length and the match length becomes 3 bytes, and if the maximum match length is 2 bytes or 3 bytes, both are copied. If it is represented, it will not be compressed, and the encoding efficiency will be reduced.

これに対し本発明では、一続きのＰバッファ12を第２
図のように例えば３つの分割領域12−１〜12−３に分け
て持ち、どの分割領域が使われているかは、Ｐバッファ
12の識別データを１データ当り２ビットとして持たせる
ことにより、１組の符号語データの長さを２バイトのま
ま変化させずに圧縮率を上げることができる。On the other hand, in the present invention, the continuous P buffer 12 is
As shown in the figure, for example, it is divided into three divided areas 12-1 to 12-3, and which divided area is used is determined by a P buffer.
By providing 12 pieces of identification data as 2 bits per data, it is possible to increase the compression ratio without changing the length of one set of code word data at 2 bytes.

次に第３図の動作フロー図を参照して本発明の処理動
作を説明する。Next, the processing operation of the present invention will be described with reference to the operation flowchart of FIG.

まずステップS1（以下「ステップ」は省略）入力文字
列をＱバッファに読み込む。次にS2で入力文字列の終端
でなければS3に進み、終端であれば処理を終了する。。First, an input character string is read into a Q buffer in step S1 (hereinafter "step" is omitted). Next, if it is not the end of the input character string in S2, the process proceeds to S3, and if it is, the process ends. .

S3にあってはｉ＝Newとする。ここでNewは、一番新し
いＰバッファの分割領域を示すインデックスである。In S3, i = New. Here, New is an index indicating the newest P buffer divided area.

次にS4でＰバッファ12の分割領域Ｐ［ｉ］をスキャン
し、一致する文字を探す。一致すれば、S5に進んで分割
領域Ｐ［ｉ］中の一致開始位置と一致長をレジスタ等に
登録、更新してS6に進み、もし一致する文字が分割領域
Ｐ［ｉ］になければ、そのままS6へ進む。Next, in S4, the divided area P [i] of the P buffer 12 is scanned to find a matching character. If they match, the process proceeds to S5, where the match start position and the match length in the divided area P [i] are registered and updated in a register or the like, and the process proceeds to S6. If the matching character is not in the divided area P [i], Proceed directly to S6.

S6ではｉを関数ｆ（ｉ）に従って更新する。関数ｆ
（ｉ）によるｉの更新は、最初はｉ＝1,2,3と更新さ
れ、一杯になって登録の古いｉ＝１のクリア更新が行な
われると、次にｉ＝2,3,1と更新され、３回目の更新で
はｉ＝3,1,2と更新され、以下これを繰り返す。In S6, i is updated according to the function f (i). Function f
In the update of i according to (i), i = 1,2,3 is updated at first, and when it is full and clear update of old registered i = 1 is performed, then i = 2,3,1 is updated. It is updated, and in the third update, it is updated as i = 3, 1, 2, and so on.

次にS7に進んでNew＝ｉであるか、即ち全ての分割領
域の検索が終了したか否かを判断し、終了していればS8
へ、終了していなければS3に戻る。Next, the process proceeds to S7 to determine whether New = i, that is, whether or not the search for all the divided areas has been completed.
If not, return to S3.

全てのＰバッファ分割領域の検索が終り、S8で一致す
る文字がＰバッファ中になければ、S9で生データそのも
のを符号語データ列として出力してS11のＰバッファ登
録処理に進む。S8で一致する文字がＰバッファ中にある
ことが判別されると、S5で登録した一致開始位置（特定
のＰバッファ分割領域内の領域位置）と一致長の組でな
る符号語データを出力し、S11に進み、以下、S2で文字
列の終了が判別されるまで同じ処理を繰り返す。When the search of all the P buffer divided areas is completed and there is no matching character in the P buffer in S8, the raw data itself is output as a codeword data string in S9, and the process proceeds to the P buffer registration processing in S11. If it is determined in S8 that a matching character is present in the P buffer, codeword data consisting of a set of a matching start position (region position in a specific P buffer divided region) registered in S5 and a matching length is output. , S11, and thereafter, the same processing is repeated until the end of the character string is determined in S2.

第４図は、第３図のS11におけるＰバッファ登録処理
をサブルーチンとして示した動作フロー図である。FIG. 4 is an operation flowchart showing a P buffer registration process in S11 of FIG. 3 as a subroutine.

第４図において、まずS1が登録で最も新しい分割領域
Ｐ［New］が一杯かどうかを調べ、もし一杯であればS2
に進む。一方、まだ登録できる余裕があればS5へ進む。In FIG. 4, it is first checked whether or not S1 is full and the newest divided area P [New] is registered.
Proceed to. On the other hand, if there is still room for registration, the process proceeds to S5.

S2では、一番古いＰバッファ分割領域のインデックス
Oldを計算し、S3で最も登録の古い分割領域Ｐ［old］を
クリアし、S4でクリア後にNew＝Oldに置き換えてS5にへ
進む。S5ではクリア状態にある分割領域Ｐ［New］に生
データを登録する。In S2, the index of the oldest P buffer divided area
Old is calculated, the oldest registered divided area P [old] is cleared in S3, and after clearing in S4, it is replaced with New = Old and the process proceeds to S5. In S5, the raw data is registered in the divided area P [New] in the clear state.

尚、上記の実施例は、Ｐバッファ12を３つの領域に分
割した場合を例にとるものであったが、必要に応じて任
意の数に分割できる。In the above embodiment, the case where the P buffer 12 is divided into three areas is taken as an example. However, the P buffer 12 can be divided into an arbitrary number as needed.

［発明の効果］以上説明したように本発明によれば、参照テキストと
マッチング一致長さの許容長さを大きくしても、参照テ
キストの検索及び更新を高速に行うことができる。[Effects of the Invention] As described above, according to the present invention, even when the allowable length of the reference text and the matching length is increased, the reference text can be searched and updated at high speed.

[Brief description of the drawings]

第１図は本発明の原理説明図；第２図は本発明の実施例構成図；第３図は本発明の動作フロー図；第４図は本発明のＰバッファ登録処理の動作フロー図；第５図はユニバーサル型ZL符号化の原理説明図；第６図は従来方式の説明図である。図中、 10:第１バッファ（Ｑバッファ） 12:第２バッファ（Ｐバッファ） 12−１〜12−n:分割領域 14:検索装置 16:符号器 FIG. 1 is a diagram for explaining the principle of the present invention; FIG. 2 is a block diagram of an embodiment of the present invention; FIG. 3 is an operational flow diagram of the present invention; FIG. 5 is a diagram for explaining the principle of universal ZL encoding; FIG. 6 is a diagram for explaining a conventional system. In the figure, 10: first buffer (Q buffer) 12: second buffer (P buffer) 12-1 to 12-n: divided area 14: search device 16: encoder

フロントページの続き (58)調査した分野(Int.Cl.⁶，ＤＢ名) H03M 7/30Continuation of front page (58) Field surveyed (Int.Cl. ⁶ , DB name) H03M 7/30

Claims

(57) [Claims]

A first buffer for storing a data series such as a character;
And searches the second buffer (12) in which the already encoded past data sequence is registered, and searches for the maximum length of the encoded data sequence that matches the data sequence of the first buffer (10). In a data compression method in which a matching part is obtained and output as a code word of a set of a start position of the maximum length matching part and a matching length and compression encoding is performed, the second buffer (12) includes a plurality of areas (12- 1-12-n)
, The encoded data sequence is sequentially registered, and the matching search is performed from the new divided region of the encoded data sequence in the divided regions (12-1 to 12-n) to the old divided region. A data compression method characterized by updating the oldest divided area when all the divided areas are filled with the registered data sequence.

2. A start position of the maximum length matching portion is divided into an area number and an area position of a divided area (12-1 to 12-n) of the second buffer (12). It is encoded as the start position of the maximum match length portion of the code word, and the area number is stored in the second buffer (12) as identification data separately from the code word, so that the bit width suitable for match length search is obtained. The bit width of the search index of the second buffer (12) is reduced, and the reduced bit is allocated to the first buffer (10) to increase the allowable length of the match length search. Item 2. The data compression method according to Item 1.