JP3100206B2

JP3100206B2 - Data compression method

Info

Publication number: JP3100206B2
Application number: JP31204191A
Authority: JP
Inventors: 泰彦中野; 茂吉田; 佳之岡田; 広隆千葉
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1991-11-27
Filing date: 1991-11-27
Publication date: 2000-10-16
Anticipated expiration: 2015-10-16
Also published as: JPH05150939A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明はデータ圧縮方法に係わ
り、特に既に入力した符号化済みの文字列の中から入力
文字列と一致する文字列を求め、該一致文字列の番号に
より入力文字列を符号化するデータ圧縮方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a data compression method, and more particularly, to a method of obtaining a character string that matches an input character string from encoded character strings that have already been input, and using the number of the matched character string to determine an input character string. And a data compression method for encoding.

【０００２】近年、ＯＡ（オフィスオートメイション）
が発展し、画像を白黒階調画像情報として計算機で扱う
ことが増えている。これらの画像情報のデータ量は１枚
当り数Ｍバイトにも及び非常に大きい。従って、蓄積や
伝送等において画像情報を効率よく扱うには、効率的な
データ圧縮を加えることでデータ量を減らすことが必須
となっている。In recent years, office automation (OA)
Has been developed, and images are increasingly handled by computers as black-and-white gradation image information. The data amount of these pieces of image information is as large as several Mbytes per sheet. Therefore, in order to efficiently handle image information in storage, transmission, and the like, it is essential to reduce the data amount by adding efficient data compression.

【０００３】[0003]

【従来の技術】データを効率よく圧縮する方法として、
ユニバーサル符号により圧縮する方法が実用化されてい
る。このユニバーサル符号は、情報保存型のデータ圧縮
方法であり、データ圧縮時に情報源の統計的な性質を予
め仮定しないため、種々のタイプ（文字コード、オブジ
ェクトコードなど）のデータに適用することができる。
文書画像では、文字の輪郭等や文字間隔に類似性があ
り、又、網点画像は網点周期性、網点形状の同一性等が
類似している。この類似性の持つ冗長性をユニバーサル
符号により削減し、有効な圧縮を行うことができる。
尚、以下では、情報理論で用いられている呼称を踏襲
し、データの１ワード単位を文字と呼び、データが任意
ワードつながったものを文字列と呼ぶことにする。2. Description of the Related Art As a method of efficiently compressing data,
A compression method using a universal code has been put to practical use. This universal code is an information preserving type data compression method, and does not presuppose a statistical property of an information source at the time of data compression, so that it can be applied to data of various types (character codes, object codes, etc.). .
Document images have similarities in character outlines and character intervals, and halftone images have similar dot periodicity and halftone dot shape. Redundancy having this similarity can be reduced by the universal code, and effective compression can be performed.
In the following, following the name used in information theory, one word unit of data will be called a character, and data connected with an arbitrary word will be called a character string.

【０００４】ユニバーサル符号の代表的な方法として、
ジブ−レンペル(Ziv-Lempel)符号がある。例えば、宗像
「Ziv-Lempelのデータ圧縮法」、情報処理、Vol.26,No.
1,1985年参照。このZiv-Lempel符号では、ユニバーサ
ル型と、増分分解型(Incremental parsing) の2つの
アルゴリズムが提案されており、ユニバーサル型アルゴ
リズムを用いた実用的な方法として、ＬＺＳＳ符号(T.
C. Bell,"Better OMP/LText Compression", IEEE Tran
s. on Commun., Vol. COM-34, No.12, Dec.1986)があ
り、又、増分分解型アルゴリズムを用いた実用的な方法
として、ＬＺＷ（Lempel- Ziv- Welch)符号がある(T.A.
Welch, " A Technique for High-Performance Data Co
mpression" , Computer, June 1984)。[0004] As a typical method of the universal code,
There is a Ziv-Lempel code. For example, Munakata "Ziv-Lempel Data Compression Method", Information Processing, Vol. 26, No.
See 1,1985. In this Ziv-Lempel code, two algorithms of a universal type and an incremental decomposition type (Incremental parsing) have been proposed.As a practical method using the universal type algorithm, an LZSS code (T.
C. Bell, "Better OMP / LText Compression", IEEE Tran
s. on Commun., Vol. COM-34, No. 12, Dec. 1986), and a practical method using an incremental decomposition type algorithm is an LZW (Lempel-Ziv-Welch) code ( TA
Welch, "A Technique for High-Performance Data Co
mpression ", Computer, June 1984).

【０００５】ＬＺＳＳ符号化ユニバーサル型アルゴリズムを用いた実用的な方法とし
てのＬＺＳＳ符号化においては、既に出現して符号化済
の入力データを記憶部（Ｐバッファ）に記憶すると共
に、符号化済みデータの任意の位置から始まる部分デー
タ列より入力データ列と最大長に一致する部分データ列
を捜し、該一致部分データ列の先頭文字の記憶部（Ｐバ
ッファ）におけるアドレスと一致長とを示す情報によ
り、入力データ列を符号化する。このＬＺＳＳ符号化に
おいては、演算量は多いが、高圧縮率が得られる。 LZSS encoding In LZSS encoding as a practical method using a universal algorithm, input data that has already appeared and has been encoded is stored in a storage unit (P buffer), and encoded data is encoded. Is searched for a partial data string that matches the maximum length of the input data string from the partial data string starting from an arbitrary position of the above. The information indicating the address and the match length of the first character of the matching partial data string in the storage unit (P buffer) , Encode the input data sequence. In this LZSS encoding, a large amount of calculation is performed, but a high compression rate can be obtained.

【０００６】図７はかかるＬＺＳＳ符号化の説明図であ
り、１はＱバッファ、２はＰバッファである。Ｑバッフ
ァ１は例えば４ビットのインデックス情報（アドレス）
を持ち、これから符号化する１６（＝２⁴）個の文字列
を格納するもの、Ｐバッファ２は例えば１２ビットのイ
ンデックス情報（アドレス）を持ち、最新に符号化され
た４０９６（＝２¹²）個の文字列を格納するものであ
る。FIG. 7 is an explanatory diagram of such LZSS encoding, wherein 1 is a Q buffer and 2 is a P buffer. The Q buffer 1 has, for example, 4-bit index information (address).
The Have, which stores now 16 (= 2 ⁴⁾ for coding pieces of string, P buffer 2 has, for example, 12 bits of index information (address), 4096 encoded the latest (= 2 ¹²⁾ Is stored.

【０００７】図示しないユニバーサル符号化部は、Ｑバ
ッファ１の先頭からの文字列とＰバッファ２の任意の位
置から始まる文字列とを照合して最大長一致部分文字列
３を求め、「該部分文字列のＰバッファにおける一致開
始位置ｐ₁」と「部分文字列の一致長ｑ₁」とを用いてＱ
バッファの部分文字列３′を符号化して記憶する。しか
る後、ユニバーサル符号化部はＱバッファ１内の符号化
した文字列３′をＰバッファ２に移すと共に該文字列数
分の最も古い符号化済み文字列をＰバッファ２から捨
て、かつ符号化した文字列３′の文字数分の新たな文字
列をＱバッファ１内に入力し、以後、前述の符号化処理
を継続する。尚、最大一致長が１の場合には、符号化せ
ず、Ｑバッファ１の先頭文字データ（生データという）
をそのまま記憶する。これは、符号化データとして２バ
イト必要であるが、生データは１バイトで済むからであ
る。A universal encoding unit (not shown) collates a character string from the head of the Q buffer 1 with a character string starting from an arbitrary position in the P buffer 2 to obtain a maximum length matching partial character string 3. Using the matching start position p ₁ of the character string in the P buffer and the “matching length q ₁ of the partial character string”,
The partial character string 3 'of the buffer is encoded and stored. Thereafter, the universal encoding unit moves the encoded character string 3 'in the Q buffer 1 to the P buffer 2 and discards the oldest encoded character string corresponding to the number of the character strings from the P buffer 2 and performs encoding. A new character string corresponding to the number of characters of the character string 3 'is input into the Q buffer 1, and thereafter, the above-described encoding processing is continued. If the maximum match length is 1, no encoding is performed, and the first character data of the Q buffer 1 (called raw data)
Is stored as it is. This is because encoded data requires 2 bytes, but raw data only requires 1 byte.

【０００８】そして、８個の符号化データ又は生データ
が記憶されれば、図７(b)に示すように、符号化データ
と生データの識別を表示するための８個のフラグビット
より成る識別データを先頭に付加し（”０”は符号デー
タ、”１”は生データ）、この一組のデータを順次出力
する。When eight encoded data or raw data are stored, as shown in FIG. 7 (b), it is composed of eight flag bits for indicating the discrimination between the encoded data and the raw data. The identification data is added to the head ("0" is code data, "1" is raw data), and this set of data is sequentially output.

【０００９】ＬＺＷ符号化増分分解型アルゴリズムを用いた実用的な方法としての
ＬＺＷ符号化においては、書き換え可能な辞書を設け、
入力文字列を相異なる文字列に分け、この文字列を出現
した順に番号を付けて辞書に登録すると共に、現在入力
している文字列を辞書に登録してある最長一致文字列の
辞書番号だけで表して符号化する。このＬＺＷ符号化に
よれば、圧縮率はＬＺＳＳ符号化より劣るが、シンプル
で、計算も容易で、高速処理ができることから記憶装置
のファイル圧縮、データ伝送などで使われるようになっ
ている。In LZW encoding as a practical method using an LZW encoding incremental decomposition type algorithm, a rewritable dictionary is provided,
Divides the input character string into different character strings, assigns numbers to the character strings in the order in which they appear, and registers them in the dictionary, and stores only the dictionary number of the longest matching character string registered in the dictionary with the currently input character string. And encode it. According to the LZW coding, the compression ratio is inferior to that of the LZSS coding, but since it is simple, easy to calculate, and can perform high-speed processing, it is used for file compression of storage devices and data transmission.

【００１０】図８はＬＺＷ符号化説明図、図９は辞書構
成の説明図、図１０はＬＺＷ符号化処理の流れ図であ
る。尚、説明を簡単にするために、ａ，ｂ，ｃ３文字か
らなる文字列をＬＺＷ符号化してデータ圧縮するものと
する。予め、全文字につき一文字からなる文字列（ａ，
ｂ，ｃ）に登録番号を付して辞書に初期登録すると共
に、辞書の登録数Ｎを文字種数Ｍとする（Ｍ→Ｎ）。・
・ステップ１０１FIG. 8 is an explanatory diagram of LZW encoding, FIG. 9 is an explanatory diagram of a dictionary configuration, and FIG. 10 is a flowchart of LZW encoding processing. For the sake of simplicity, it is assumed that a character string composed of three characters a, b, and c is subjected to LZW encoding and data compression. In advance, a character string (a,
b, c) is given a registration number and is initially registered in the dictionary, and the number N of registrations in the dictionary is set to the number M of character types (M → N).・
・ Step 101

【００１１】かかる状態で、最初の文字Ｋを入力し、該
文字の登録番号を参照番号ωとし、これを語頭文字列(p
refix string)とする（ステップ１０２）。ついで、入
力データの次の文字Ｋを読み込み（ステップ１０３）、
ステップ１０２で求めた語等文字列ωにステップ１０３
で読み込んだ文字Ｋを加えた文字列（ωＫ）が現在の辞
書にあるか否かを検索する（ステップ１０４）。In this state, the first character K is input, the registration number of the character is set as a reference number ω, and this is referred to as an initial character string (p
refix string) (step 102). Next, the next character K of the input data is read (step 103),
Step 103 is applied to the word string ω obtained in step 102.
A search is performed to determine whether or not the character string (ωK) to which the character K read in (1) is added in the current dictionary (step 104).

【００１２】文字列（ωＫ）が辞書に存在すれば、文字
列（ωＫ）をωに置き換え（ステップ１０５）、しかる
後、入力データが終了したか判断し（ステップ１０
６）、データが終了してなければステップ１０３に戻り
以降の処理を繰返し、文字列（ωＫ）が辞書から捜せな
くなるまで最大一致長文字列の検索を続ける。一方、ス
テップ１０６において、入力データが終了していれば、
参照番号ωを符号語 code（ω）として出力して（ステ
ップ１０７）、符号化処理を終了する。この場合、符号
語code（ω）（＝ω）は「log₂Ｎ]（log₂Ｎより大きい
最小の整数）ビットで表現される。If the character string (ωK) exists in the dictionary, the character string (ωK) is replaced with ω (step 105), and thereafter, it is determined whether the input data has been completed (step 10).
6) If the data is not completed, the process returns to step 103 and the subsequent processing is repeated, and the search for the maximum matching length character string is continued until the character string (ωK) cannot be searched from the dictionary. On the other hand, if the input data is completed in step 106,
The reference number ω is output as the code word code (ω) (step 107), and the encoding process ends. In this case, the codeword code (ω) (= ω) is represented by “log ₂ N” (the smallest integer greater than log ₂ N) bits.

【００１３】最長一致文字列の検索が続行して、ステッ
プ１０４において、文字列（ωＫ）が辞書に存在しなく
なれば、参照番号ωを符号語 code（ω）として「log₂
Ｎ]ビットで表現して出力し、又、文字列（ωＫ）に新
たな登録番号Ｎを付加して辞書に登録し、更にステップ
１０３で読み込んだ文字Ｋの登録番号を参照番号ωに置
き換えると共に、辞書アドレスＮを１インクリメントす
る（ステップ１０８）。次いで、ステップ１０６により
入力データが終了したか判断し、判断結果に応じて以降
の処理を繰り返す。If the search for the longest matching character string continues and the character string (ωK) does not exist in the dictionary in step 104, the reference number ω is set to the code word code (ω) and “log ₂
[N] bits and outputs the result, and adds a new registration number N to the character string (ωK), registers it in the dictionary, and replaces the registration number of the character K read in step 103 with the reference number ω. , The dictionary address N is incremented by one (step 108). Next, it is determined in step 106 whether the input data has been completed, and the subsequent processing is repeated according to the determination result.

【００１４】図８及び図９を参照してＬＺＷ符号化を具
体的に説明すると、以下のようになる。すなわち、図８
の入力データを左から右に向けて１文字づつ読み込む。
最初の文字ａを読み込んだ時、辞書にはａの他に一致す
る文字列はないから、ａの登録番号「１」（参照番号ω
＝１）を符号語（code（ω））として出力する。そし
て、拡張した文字列ａｂに登録番号４を付けて辞書に登
録する。実際の登録は文字列「１ｂ」の形となる。続い
て、２番目の文字ｂが入力文字列の先頭になる。辞書に
はｂの他に一致する文字列がないので、ｂの登録番号
（参照番号）２を符号語として出力し、拡張した文字列
ｂａを実際には２ａの形で登録番号５を付けて辞書に登
録する。The LZW encoding will be specifically described with reference to FIGS. 8 and 9. That is, FIG.
Is read one character at a time from left to right.
When the first character a is read, there is no matching character string other than a in the dictionary, so the registration number “1” of a (reference number ω
= 1) as a codeword (code (ω)). Then, the extended character string ab is assigned a registration number 4 and registered in the dictionary. The actual registration is in the form of a character string "1b". Subsequently, the second character b becomes the head of the input character string. Since there is no matching character string other than b in the dictionary, the registration number (reference number) 2 of b is output as a code word, and the expanded character string ba is actually assigned a registration number 5 in the form of 2a. Register in the dictionary.

【００１５】以上により、３番目の文字ａが入力文字列
の先頭になる。辞書には先頭文字ａが存在するから、該
文字の登録番号１に次の文字ｂを付した文字列「１ｂ」
が存在するか調べる。文字列「１ｂ」が存在するから、
該文字列の登録番号４に次の文字ｃを付した文字列「４
ｃ」が存在するか調べる。文字列「４ｃ」は存在しない
から、最長一致文字列「１ｂ」の登録番号「４」を符号
語として出力し、拡張した文字列「４ｃ」に登録番号６
を付して辞書部登録し、以後同様に符号化と辞書登録を
繰り返して全入力文字のＬＺＷ符号化処理を実行する。As described above, the third character a becomes the head of the input character string. Since the first character a exists in the dictionary, a character string “1b” obtained by adding the next character b to the registration number 1 of the character
Check if exists. Since the character string "1b" exists,
A character string “4” in which the following character c is added to the registration number 4 of the character string
c "is present. Since the character string "4c" does not exist, the registration number "4" of the longest matching character string "1b" is output as a code word, and the registration number 6 is added to the expanded character string "4c".
Is added to the dictionary, and the encoding and dictionary registration are repeated in the same manner to execute the LZW encoding process for all input characters.

【００１６】図１１はＬＺＷ復号化処理の流れ図であ
り、復号化処理では、符号化の逆の操作が行われる。す
なわち、復号化に際しては、符号化と同様に、全文字に
つき一文字からなる文字列（ａ，ｂ，ｃ）に登録番号を
付して辞書に初期登録すると共に、辞書の登録数Ｎを文
字種数Ｍとする（Ｍ→Ｎ）。・・ステップ２０１つい
で、最初の符号CODEを読み込み、該符号CODEをOLDcode
とする。又、最初の符号は既に辞書に登録された一文字
の登録番号のいずれかに該当することから、入力符号CO
DE(＝登録番号)が示す文字Ｋを出力する。又、出力した
文字Ｋは後の例外処理のためにcharとして設定する。・
・以上ステップ２０２FIG. 11 is a flowchart of the LZW decoding process. In the decoding process, the reverse operation of the encoding is performed. That is, at the time of decoding, as in the case of encoding, a character string (a, b, c) consisting of one character for every character is assigned a registration number and initially registered in the dictionary, and the number N of registered dictionary is changed to the number of character types. Let M be (M → N).・・ Step 201 Then, read the first code CODE and replace the code OLDcode
And Also, since the first code corresponds to one of the registration numbers of one character already registered in the dictionary, the input code CO
The character K indicated by DE (= registration number) is output. The output character K is set as char for later exception processing.・
・ Step 202

【００１７】しかる後、次の符号CODEを読み込んでNEWc
odeとしてセットすると共に(ステップ２０３）、符号CO
DE(＝登録番号)が辞書に定義(登録)されているか否かを
チェックする(ステップ２０４）。通常、入力した符号C
ODE(＝登録番号)は前回までの処理で辞書に登録されて
いるから、ステップ２０４において「ＮＯ」となるか
ら、次に、符号CODE(＝登録番号)が指示する辞書の登録
文字列が（ωＫ）か判断する。すなわち、符号CODEが指
示する辞書の登録文字列が（ωＫ）のように、参照番号
ωと文字Ｋの結合文字列であるか判断する（ステップ２
０５）。Thereafter, the next code CODE is read and NEWc is read.
ode (step 203) and code CO
It is checked whether DE (= registration number) is defined (registered) in the dictionary (step 204). Normally, input code C
Since the ODE (= registration number) has been registered in the dictionary in the processing up to the previous time, "NO" is obtained in step 204. Next, the registered character string of the dictionary indicated by the code CODE (= registration number) is ( ωK). That is, it is determined whether the registered character string of the dictionary indicated by the code CODE is a combined character string of the reference number ω and the character K, such as (ωK) (step 2).
05).

【００１８】参照番号ωと文字Ｋの結合文字列であれ
ば、文字Ｋを一時的にスタックし、参照番号ωの符号語
code（ω）（実際にはcode（ω）＝ω）を新たなCODEと
し、かつ文字数Ｃを１カウントアップし（ステップ２０
６）、ステップ２０５に戻る。以後、ステップ２０５、
２０６の処理をCODEが示す登録文字列が一文字に至るま
で再帰的に繰り返す。If the character string is a combination of the reference number ω and the character K, the character K is temporarily stacked and the code word of the reference number ω is
code (ω) (actually code (ω) = ω) is set as a new CODE, and the number of characters C is incremented by 1 (step 20).
6) Return to step 205. Thereafter, step 205,
Step 206 is recursively repeated until the registered character string indicated by CODE reaches one character.

【００１９】ステップ２０５において、CODEが示す文字
列が一文字の場合には、すなわち、符号CODEが指示する
辞書の登録文字列が（Ｋ）の場合には、Ｋを出力し、し
かる後、スタックしたＣ個の文字列をＬＩＦＯ（Last i
n Fast Out)形式でポップアップして出力する。又、前
回の復号化において使用した符号OLDcodeに、今回復号
した文字列の先頭文字Ｋを付加した文字列（OLDcode，
Ｋ）を登録番号Ｎを付して辞書に登録し、Ｎを１インク
リメントする（Ｎ＋１→Ｎ）。更に、復号文字列の先頭
文字Ｋをcharとし、かつNEWcodeをOLDcodeとする。・・
以上ステップ２０７In step 205, if the character string indicated by the CODE is one character, that is, if the registered character string in the dictionary indicated by the code CODE is (K), K is output, and then the stack is executed. LIFO (Last i
n Pop up and output in Fast Out) format. A character string (OLDcode, OLDcode, OLDcode) obtained by adding the first character K of the character string decoded this time to the code OLDcode used in the previous decoding.
K) is registered in the dictionary with a registration number N, and N is incremented by 1 (N + 1 → N). Further, the first character K of the decoded character string is set to char, and NEWcode is set to OLDcode.・・
Step 207

【００２０】以後、符号入力が終了したか判断し（ステ
ップ２０８）、終了してなければステップ２０３に戻り
次の符号を読み込んで復号処理を繰り返す。ところで、
符号化処理においては、ある文字列の符号化と、該文字
列に次の先頭文字を付加した文字列の辞書登録とを同時
に行うため、次の符号化処理において直前に符号化した
文字列の符号語を使用できる。しかし、復号化処理にお
いては、直前に復号した文字列に、今回復号した文字列
の先頭文字列を付加した文字列を辞書登録するため、辞
書登録が符号化処理に比べて１回遅れる。このため、符
号化処理において、直前に符号化した文字列の符号語を
使用すると、復号化処理において、該符号語が登録（定
義）されていない場合を生じる。この場合がステップ２
０４においてCODEが定義されていない状態になり、「Ｙ
ＥＳ」となる。Thereafter, it is determined whether code input has been completed (step 208). If not completed, the process returns to step 203 to read the next code and repeat the decoding process. by the way,
In the encoding process, since the encoding of a certain character string and the dictionary registration of the character string in which the next leading character is added to the character string are performed at the same time, in the next encoding process, Codewords can be used. However, in the decoding process, since the character string obtained by adding the head character string of the currently decoded character string to the character string decoded immediately before is registered in the dictionary, dictionary registration is delayed by one time as compared with the encoding process. For this reason, if the code word of the character string coded immediately before is used in the encoding process, the code word may not be registered (defined) in the decoding process. This is step 2
In 04, no code is defined, and "Y
ES ”.

【００２１】例えば、図１２に示すように符号化に際し
て、文字列「ａ・・・ｚ」に対してOLDcodeを出力する
と共に、文字列「ａ・・・ｚａ」をNEWcodeとして辞書
登録し、次の文字列「ａ・・・ｚａ」をNEWcodeで出力
し、文字列「ａ・・・ｚａｂ」を辞書登録する。さて、
復号側で符号語NEWcodeを読み込んだ時、該符号語は復
号側で辞書登録されていないので、復号ができない。し
かし、NEWcodeとOLDcodeを比較すると、以下の関係NEWc
odeの文字列＝OLDcodeの文字列＋OLDcodeの文字列の先
頭文字（char)がある。このため、ステップ２０４で
「ＮＯ」となれば、セットされているcharをスタックす
ると共に、OLDcodeをCODEとみなし、かつ、OLDcodeにch
arを付加した文字列をNEWcodeとし(ステップ２０９）、
以後CODEを用いてステップ２０５以降の処理を行う。For example, at the time of encoding, as shown in FIG. 12, an OLD code is output for a character string "a... Z", and the character string "a. Is output by NEWcode, and the character string "a ... zab" is registered in the dictionary. Now,
When the decoding side reads the code word NEWcode, the decoding cannot be performed because the code word is not registered in the dictionary on the decoding side. However, comparing NEWcode and OLDcode, the following relationship NEWc
The character string of ode = the character string of OLDcode + the first character (char) of the character string of OLDcode. Therefore, if "NO" in step 204, the set char is stacked, the OLDcode is regarded as a CODE, and the chord is set to the OLDcode.
The character string to which ar is added is set as NEWcode (step 209),
Thereafter, the processing from step 205 is performed using the CODE.

【００２２】図１３を参照して復号化処理を具体的に説
明すると以下のようになる。最初の入力符号は「１」で
あり、一文字ａ，ｂ，ｃについては既に登録番号１、
２、３として辞書登録されているから（図９と同様）、
辞書の参照により符号「１」に一致する登録番号の文字
列ａに置き換えて出力する。次に、符号「２」について
も同様にして文字ｂに置き換えて出力する。この時、前
回処理した符号「１」と今回復号した最初の一文字ｂと
を組み合わせた「１ｂ」に新たな登録番号４を付加して
辞書に登録する。The decoding process will be described below in detail with reference to FIG. The first input code is “1”, and one character a, b, c is already registered with the registration number 1,
Since the dictionary is registered as 2, 3 (similar to FIG. 9),
By referring to the dictionary, it is replaced with the character string a of the registration number that matches the code “1” and output. Next, the code "2" is similarly replaced with the character b and output. At this time, a new registration number 4 is added to “1b”, which is a combination of the code “1” processed last time and the first character b decoded this time, and registered in the dictionary.

【００２３】３番目の符号「４」は辞書の検索により、
「１ｂ」から「ａｂ」と置き換えて文字列「ａｂ」を出
力する。同時に、前回処理した符号「２」と今回復号し
た１番目の文字ａとを組み合わせた文字列「２ａ（＝ａ
ｂ）」に新たな登録番号５を付加して辞書に登録する。
以下、同様に、復号処理を繰り返す。尚、図１１のステ
ップ２０９の例外処理は、第６番目の入力符号「８」の
復号で生じる。符号「８」は復号時に辞書に定義されて
おらず、復号できない。この場合には、前回処理した符
号「５」に前回復号した文字列「ｂａ」の最初の一文字
ｂを加えた文字列「５ｂ」を求め、更に「２ａｂ」、
「ｂａｂ」と置き換えられて出力される。そして、前回
の符号語「５」に今回復号した文字列の文字ｂを加えた
文字列「５ｂ」に登録番号「８」を付加して辞書登録す
る。The third code "4" is obtained by searching the dictionary.
The character string “ab” is output by replacing “1b” with “ab”. At the same time, the character string “2a (= a) is obtained by combining the code“ 2 ”processed last time and the first character a decoded this time.
b) is added to the new registration number 5 and registered in the dictionary.
Hereinafter, similarly, the decoding process is repeated. Note that the exception processing in step 209 in FIG. 11 occurs when the sixth input code “8” is decoded. The code “8” is not defined in the dictionary at the time of decoding and cannot be decoded. In this case, a character string “5b” is obtained by adding the first character b of the character string “ba” decoded last time to the code “5” processed last time, and further obtains “2ab”,
The output is replaced with “bab”. Then, a registration number "8" is added to the character string "5b" obtained by adding the character b of the character string decoded this time to the previous code word "5", and the dictionary is registered in the dictionary.

【００２４】[0024]

【発明が解決しようとする課題】以上のように、ユニバ
ーサル符号は、符号化対象の性質が未知でも、それを学
習しながら符号化してゆく圧縮法であり、既出のデータ
列を辞書に登録して行き、同じデータ列が表れた時に
は、その辞書の登録位置もしくは、登録番号を符号化デ
ータ（符号語）として送出するというシンプルなもので
ある。すなわち、従来のユニバーサル符号化において
は、入力文字列の先頭より一番長く一致する文字列（最
長一致文字列）を辞書より検索して、その登録位置又は
登録番号で入力文字列を符号化するものである。As described above, the universal code is a compression method in which, even if the nature of the object to be encoded is unknown, the encoding is performed while learning it. When the same data sequence appears, the registration position or registration number of the dictionary is transmitted as coded data (codeword). That is, in the conventional universal encoding, a character string that matches the longest from the beginning of the input character string (the longest matching character string) is searched from the dictionary, and the input character string is encoded by its registration position or registration number. Things.

【００２５】このため、辞書に図１４の右欄に示すよう
に各種文字列が登録されている場合において、左欄に示
す入力文字列 ”ＡＢＣＤＥＦＧＨＩＪＫ” を符号化する場合、従来のユニバーサル符号化方法で
は、 ”ＡＢ”，”ＣＤＥ”，”Ｆ”，”ＧＨＩ”，”ＪＫ” と最長一致文字列を検索し、これらを順次符号化する。
しかし、最初の文字”Ａ”で区切っておけば、次に最長
一致文字列 ”ＢＣＤＥＦＧＨＩＪＫ” を検索でき２つの符号語により符号化が完了し、前者の
場合に比べて圧縮効率を向上できる。換言すれば、従来
のユニバーサル符号化による圧縮方法では十分に圧縮が
行われない場合があり、改善の余地があった。Therefore, in the case where various character strings are registered in the dictionary as shown in the right column of FIG. 14, when encoding the input character string "ABCDEFGHIJK" shown in the left column, a conventional universal encoding method is used. Then, "AB", "CDE", "F", "GHI", "JK" and the longest matching character string are searched, and these are sequentially encoded.
However, if the first character "A" is separated, the longest matching character string "BCDEFGHIJK" can be searched next, and encoding is completed with two codewords, so that the compression efficiency can be improved as compared with the former case. In other words, the conventional compression method based on universal coding may not perform compression sufficiently, and there is room for improvement.

【００２６】そこで、一致検索の途中において現在の一
致文字列で検索を打ち切った場合と、そのまま一致検索
を処理を続けた場合のそれぞれにおいて、次の文字列の
最長一致文字列を求め、各場合における最初と次の２つ
の文字列のトータルでの符号化効率を考慮して、どこで
一致検索を打ち切ってよいかを判断することが考えられ
る。しかし、かかる方法では、一致検索のあらゆる途中
において、上記処理を行わなくてはならず、演算量が増
加し高速符号化ができない問題がある。Therefore, the longest matching character string of the next character string is obtained when the search is terminated with the current matching character string in the middle of the matching search and when the matching search is continued as it is. In consideration of the total encoding efficiency of the first and next two character strings, it is possible to determine where the match search can be terminated. However, in such a method, the above-described processing must be performed during every match search, and there is a problem that the amount of calculation increases and high-speed encoding cannot be performed.

【００２７】以上から本発明の目的は、少ない演算量の
増加により、換言すれば処理速度の低下を最小限に押さ
えてデータ圧縮率を改善できるデータ圧縮方法を提供す
ることである。本発明の別の目的は、先頭文字で区切っ
た場合と、区切ることなく最長一致文字列の検索処理を
続けた場合のそれぞれにおいて、次の文字列の最長一致
文字列を求め、各場合における最初と次の２つの文字列
のトータルでの符号化効率を考慮して、先頭文字で区切
って符号化するか、区切らずに符号化するか判断してデ
ータ圧縮効率を向上するデータ圧縮方法を提供すること
である。Accordingly, an object of the present invention is to provide a data compression method capable of improving a data compression ratio by minimizing a decrease in processing speed by increasing a small amount of calculation. Another object of the present invention is to determine the longest matching character string of the next character string in each of the case where the first character is separated and the case where the search processing of the longest matching character string is continued without separating, In consideration of the total encoding efficiency of the following two character strings, a data compression method is provided to improve the data compression efficiency by judging whether to encode with the first character or to encode without dividing. It is to be.

【００２８】本発明の更に別の目的は正確に圧縮率を演
算して、正しく圧縮率の高いやり方で入力文字列を区切
って符号化できるデータ圧縮方法を提供することであ
る。本発明の他の目的は、先頭文字で区切って符号化す
る場合、該先頭文字を最長一致文字列の符号語より短い
符号語で表現することにより、更に圧縮率を改善できる
データ圧縮方法を提供することである。本発明の更に他
の目的は、データを所定量づつブロック化し、ブロック
毎に最適な符号化方法に基づいて符号化して圧縮率を向
上できるデータ圧縮方法を提供することである。It is still another object of the present invention to provide a data compression method capable of accurately calculating a compression ratio and dividing and encoding an input character string in a manner having a high compression ratio. Another object of the present invention is to provide a data compression method capable of further improving the compression ratio by expressing the first character with a code word shorter than the code word of the longest matching character string when encoding by separating with the first character. It is to be. It is still another object of the present invention to provide a data compression method capable of improving the compression ratio by dividing data into blocks by a predetermined amount and coding the blocks based on an optimum coding method.

【００２９】[0029]

【課題を解決するための手段】図１は本発明の原理説明
図である。ＣＴＲは入力文字列、ＣＴＲ₁₁は入力文字列
の先頭文字のみからなる第１文字列、ＣＴＲ₁₂は辞書登
録されている符号化済み文字列と最長に一致する先頭文
字以降の第２文字列、ＣＤＷ₁₁は第１文字列の符号語、
ＣＤＷ₁₂は第２文字列の符号語、ＣＴＲ₂₁は入力文字列
の先頭文字より辞書登録されている符号化済み文字列と
最長に一致する第１文字列、ＣＴＲ₂₂は辞書登録されて
いる符号化済み文字列と最長に一致する第１文字列以降
の第２文字列、ＣＤＷ₂₁は第１文字列の符号語、ＣＤＷ
₂₂は第２文字列の符号語、ＣＰＲ₁，ＣＰＲ₂は圧縮率演
算部、ＣＯＭＰは圧縮率の大小を比較する比較部、ＣＤ
ＯＴは符号語出力部である。FIG. 1 is a diagram illustrating the principle of the present invention. CTR is input string, CTR ₁₁ is the first character string consisting of only the first character of the input string, CTR ₁₂ is a dictionary that are registered encoded character string and second character string after the first character matching the longest, CDW ₁₁ is the code word of the first character string,
CDW ₁₂ is the code word of the second character string, CTR ₂₁ is the first character string that matches the encoded character string registered in the dictionary from the first character of the input character string and the longest, and CTR ₂₂ is the code registered in the dictionary. CDW ₂₁ is the code word of the first character string, CDW ₂₁ is the second character string after the first character string that matches the longest
₂₂ is a code word of the second character string, CPR ₁ and CPR ₂ are compression ratio calculation units, COMP is a comparison unit that compares the compression ratio, CD
OT is a codeword output unit.

【００３０】[0030]

【作用】ユニバーサル符号化において、入力文字列ＣＴ
Ｒを先頭文字Ｃ₁で区切った場合、該先頭文字を第１文
字列ＣＴＲ₁₁とすると共に、第１文字列以降の入力文字
列と最長に一致する符号化済み文字列Ｃ₂Ｃ₃・・・Ｃ_n
を求め、該最長一致文字列を第２文字列ＣＴＲ₁₂として
求める。又、入力文字列ＣＴＲの先頭文字Ｃ₁より最長
に一致する符号化済み文字列Ｃ₁Ｃ₂Ｃ₃・・・Ｃ_iを求
め、該最長一致文字列を第１文字列ＣＴＲ₂₁とし、入力
文字列における第１文字列ＣＴＲ₂₁以降の文字列と最長
に一致する符号化済み文字列Ｃ_i+1・・・Ｃ_mを求め、該
最長一致文字列を第２文字列ＣＴＲ₂₂とする。ついで、
各最長一致文字列ＣＴＲ₁₂，ＣＴＲ₂₁，ＣＴＲ₂₂をその
辞書番号（登録番号）を用いて符号語ＣＤＷ₁₂，ＣＤＷ
₂₁，ＣＤＷ₂₂で符号化すると共に、１文字からなる第１
文字列ＣＴＲ₁₁を最長一致文字列の符号語より短い符号
語ＣＤＷ₁₁で表現し、第１、第２文字列ＣＴＲ₁₁，ＣＴ
Ｒ₁₂を符号化した時の圧縮率と、第１、第２文字列ＣＴ
Ｒ₂₁，ＣＴＲ₂₂を符号化した時の圧縮率を圧縮率演算部
ＣＰＲ₁，ＣＰＲ₂でそれぞれ求め、各圧縮率の大小を比
較部ＣＯＭＰで比較し、符号出力部ＣＤＯＴは圧縮率の
高い方の第１文字列ＣＴＲ₂₁又はＣＴＲ₁₁の符号語を出
力し、以後、入力文字列ＣＴＲにおける第１文字列の次
の文字を先頭文字として符号化を継続する。In the universal encoding, the input character string CT
If separated R in the first character C _1, with the tip initials first string CTR _11, the encoded character string C ₂ C ₃ · · matching the input character string and the longest of the first character string and later・ C _n
, And obtains the outermost length matching character string as the second string CTR _12. Further, an encoded character string C ₁ C ₂ C ₃ ... C _i that is the longest match from the _first character C ₁ of the input character string CTR is obtained, and the longest match character string is set as the first character string CTR _21. obtains an encoded character string C _{i + 1} ··· C _m matching the first string CTR ₂₁ after the string and the longest in the string, the outermost length matching character string and the second character string CTR _22. Then
Each longest matching character string CTR ₁₂ , CTR ₂₁ , CTR ₂₂ is converted into a code word CDW ₁₂ , CDW ₁₂ using its dictionary number (registration number).
₂₁ and CDW ₂₂ and the first character
The character string CTR ₁₁ is represented by a code word CDW ₁₁ shorter than the code word of the longest matching character string, and the first and second character strings CTR ₁₁ , CT
Compression rate when the R ₁₂ is coded, first, second string CT
The compression ratios at the time of encoding R ₂₁ and CTR ₂₂ are obtained by the compression ratio calculation units CPR ₁ and CPR ₂ , respectively, and the respective compression ratios are compared by the comparison unit COMP. The code output unit CDOT has the higher compression ratio first outputs a string code word CTR ₂₁ or CTR ₁₁ of, thereafter, continue to encode the next character in the first character string in the input string CTR as the first character.

【００３１】このように、先頭文字で区切った場合と、
区切ることなく最長一致文字列の検索処理を続けた場合
のそれぞれにおいて、次の文字列の最長一致文字列を求
め、各場合における最初と次の２つの文字列のトータル
での符号化効率を考慮して、先頭文字で区切って符号化
するか、区切らずに符号化するか判断してデータ圧縮す
るから、少ない演算量の増加で効率のよいデータ圧縮が
可能となる。As described above, the case where the character is separated by the first character,
In each case where the search processing of the longest matching character string is continued without separating, the longest matching character string of the next character string is obtained, and the total encoding efficiency of the first and next two character strings in each case is considered. Then, since data is compressed by judging whether encoding is performed by dividing by the first character or encoding without dividing, efficient data compression can be performed with a small increase in the amount of calculation.

【００３２】又、先頭文字で区切って符号化する場合、
該先頭文字を最長一致文字列の符号語より短い符号語で
表現するようにしているから、更に圧縮率を改善でき
る。更に、圧縮率を、第１、第２文字列の文字の総和を
第１、第２文字列の符号語のビット数の総和で除算する
ことにより求めるようにする。このようにすれば、正確
に圧縮率を演算でき、正しく圧縮率の高いやり方で入力
文字列を区切って符号化でき、圧縮率を向上できる。In the case of encoding by separating with the first character,
Since the first character is represented by a code word shorter than the code word of the longest matching character string, the compression ratio can be further improved. Further, the compression ratio is obtained by dividing the sum of the characters of the first and second character strings by the sum of the bit numbers of the code words of the first and second character strings. In this way, the compression ratio can be calculated accurately, the input character string can be segmented and coded in a manner with a high compression ratio, and the compression ratio can be improved.

【００３３】又、一定数の入力文字列に対して符号化を
仮に実行して第１文字列ＣＴＲ₁₁の符号語が出力される
回数を計数し、一定値以上の場合には、該符号化処理を
正式に行って符号語を出力し、一定値以下の場合には従
前通りの符号化を行うことで、文字列の性質に応じた符
号化方式を採用でき、データ圧縮率を向上することがで
きる。The number of times the code word of the first character string CTR ₁₁ is output is counted by temporarily executing the encoding for a certain number of input character strings. Formally perform processing and output codewords, and if it is less than a certain value, perform the same encoding as before, so that it is possible to adopt an encoding method according to the character string characteristics and improve the data compression rate Can be.

【００３４】[0034]

【Example】

(a) 第１の実施例構成図２は本発明に係わるデータ圧縮方法を実現する符号器
の構成図及びデータ圧縮の概略説明図であり、１１は入
力文字列ＣＴＲを符号化して出力するユニバーサル符号
化部、１２は既に入力されて符号化済みの文字列に順次
番号（登録番号あるいは参照番号という）を付して記憶
する辞書部である。(a) First Embodiment Configuration FIG. 2 is a block diagram of an encoder for realizing a data compression method according to the present invention and a schematic explanatory diagram of data compression. Reference numeral 11 denotes a universal encoding and output of an input character string CTR. The encoding unit 12 is a dictionary unit that sequentially adds numbers (referred to as registration numbers or reference numbers) to character strings that have already been input and encoded, and stores them.

【００３５】データ圧縮の概略ユニバーサル符号化部１１は、まず、図２(b)に示すよ
うに、入力文字列ＣＴＲを先頭文字Ｃ₁で区切った場
合、該先頭文字を第１文字列ＣＴＲ₁₁とすると共に、第
１文字列以降の入力文字列と最長に一致する符号化済み
文字列Ｃ₂Ｃ₃・・・Ｃ_nを辞書部１２より検索し、該最
長一致文字列を第２文字列ＣＴＲ₁₂とし、第１、第２文
字列ＣＴＲ₁₁，ＣＴＲ₁₂を符号化し、それぞれを符号語
ＣＤＷ₁₁，ＣＤＷ₁₂とする。The data compression schematic universal coding unit 11, first, as shown in FIG. 2 (b), the case where the separated input string CTR in the first character C _1, the distal acronym first string CTR ₁₁ with a, a coded string C ₂ C ₃ ··· C _n that matches the input character string and the longest of the first character string and later searches the dictionary part 12, a second string outermost length matching string and CTR _12, the first, second string CTR _11, CTR ₁₂ encodes, the respective code word CDW _11, CDW _12.

【００３６】１文字からなる第１文字列ＣＴＲ₁₁の符号
語ＣＤＷ₁₁は、１文字がｍビット（例えば８ビット）で
構成されているものとすると、該文字そのものを表現す
るｍビットの生データと、生データであることを示すフ
ラグビットとのトータル９ビットで表現する。又、最長
一致文字列である第２文字列ＣＴＲ₁₂の符号語ＣＤＷ ₁₂
は、辞書部１２に登録されている文字列数をＮとする
と、該第２文字列ＣＴＲ ₁₂の登録番号を表現する「log₂
Ｎ]（log₂Ｎより大きい最小の整数）ビットと、最長一
致文字列であることを示すフラグビットとのトータル
（「log₂Ｎ]＋１）ビットで表現する。・・・以
上、第１圧縮方式First character string CTR consisting of one character₁₁Sign of
Word CDW₁₁Is one character with m bits (for example, 8 bits)
If it is composed, the character itself is expressed.
M-bit raw data and a file
Expressed as a total of 9 bits with lag bits. Also the longest
Second character string CTR that is a matching character string₁₂Code word CDW ₁₂
Is N, the number of character strings registered in the dictionary unit 12
And the second character string CTR ₁₂"Log representing the registration number of_Two
N] (log_TwoThe smallest integer greater than N) bits and the longest one
Total with flag bit indicating that it is a match character string
("Log_TwoN] +1) bits. ...
Above, 1st compression method

【００３７】次いで、第１、第２文字列ＣＴＲ₁₁，ＣＴ
Ｒ₁₂の文字の総和を符号語ＣＤＷ₁₁，ＣＤＷ₁₂のビット
数の総和で除算して圧縮率ｒ₁を計算する。第１圧縮方
式の圧縮率ｒ₁の計算が終了すれば、ユニバーサル符号
化部１１は図２(c)に示すように、入力文字列ＣＴＲの
先頭文字Ｃ₁より最長に一致する符号化済み文字列Ｃ₁Ｃ
₂Ｃ₃・・・Ｃ_iを辞書部１２より求め、該最長一致文字
列を第１文字列ＣＴＲ₂₁とする。又、入力文字列におけ
る第１文字列以降の文字列と最長に一致する符号化済み
文字列Ｃ_i+1・・・Ｃ_mを辞書部１２より求め、該最長一
致文字列を第２文字列ＣＴＲ₂₂とする。しかる後、第
１、第２文字列ＣＴＲ₂₁，ＣＴＲ₂₂を符号化し、それぞ
れを符号語ＣＤＷ₂₁，ＣＤＷ₂₂とする。Next, the first and second character strings CTR ₁₁ , CT
By dividing the sum of the character of R ₁₂ in the total number of bits of the code word CDW _11, CDW ₁₂ calculates the compression ratio r _1. When the calculation of the compression ratio r ₁ of the first compression method is completed, the universal encoding unit 11, as shown in FIG. 2C, encodes the encoded character that matches the longest character than the _first character C ₁ of the input character string CTR. Row C ₁ C
₂ C ₃ ... C _i are obtained from the dictionary unit 12, and the longest matching character string is set as a first character string CTR ₂₁ . Also, calculated from the first character dictionary unit 12 an encoded character string C _{i + 1} ··· C _m matching the column after the string and the longest in the input string, outermost length matching character string and the second character string CTR ₂₂ . Thereafter, the first and second character strings CTR ₂₁ and CTR ₂₂ are encoded, and are respectively referred to as code words CDW ₂₁ and CDW ₂₂ .

【００３８】最長一致文字列である第１、第２文字列Ｃ
ＴＲ₂₁，ＣＴＲ₂₂の符号語ＣＤＷ₂₁，ＣＤＷ₂₂は、辞書
部１２に登録されている文字列数をＮとすると、各第
１、第２文字列の登録番号を表現する「log₂Ｎ]（log₂
Ｎより大きい最小の整数）ビットと、最長一致文字列で
あることを示すフラグビットとのトータル（「log₂Ｎ]
＋１）ビットで表現する。・・・
以上、第２圧縮方式The first and second character strings C which are the longest matching character strings
Assuming that the number of character strings registered in the dictionary unit 12 is N, the codewords CDW ₂₁ and CDW ₂₂ of TR ₂₁ and CTR ₂₂ are represented by “log ₂ N” representing the registration numbers of the first and second character strings. (Log ₂
The total of bits (the smallest integer greater than N) and a flag bit indicating the longest matching character string (“log ₂ N”)
+1) bits. ...
The second compression method

【００３９】ついで、第１、第２文字列ＣＴＲ₂₁，ＣＴ
Ｒ₂₂を符号語ＣＤＷ₂₁，ＣＤＷ₂₂で符号化した時の圧縮
率ｒ₂を計算する。両圧縮方式における圧縮率が求まれ
ば、ユニバーサル符号化部１１は圧縮率ｒ ₁，ｒ₂の大小
を比較し、圧縮率の高い方の第１文字列ＣＴＲ₁₁又はＣ
ＴＲ₂₁の符号語を出力し、以後、第１文字列ＣＴＲ₁₁又
はＣＴＲ₂₁の次の文字Ｃ₂又はＣ_i+1を先頭文字として符
号化を継続する。Next, the first and second character strings CTR_{twenty one}, CT
R_{twenty two}To the code word CDW_{twenty one}, CDW_{twenty two}Compression when encoding with
Rate r_TwoIs calculated. The compression ratio is required for both compression methods
For example, the universal encoding unit 11 calculates the compression rate r ₁, R_TwoBig and small
And the first character string CTR with the higher compression ratio₁₁Or C
TR_{twenty one}Of the first character string CTR₁₁or
Is CTR_{twenty one}The next letter C_TwoOr C_{i + 1}As the first character
Continue encryption.

【００４０】このように、先頭文字で区切った場合と、
区切ることなく最長一致文字列の検索処理を続けた場合
のそれぞれにおいて、次の文字列の最長一致文字列を求
め、各場合における最初と次の２つの文字列のトータル
での圧縮率を考慮して、先頭文字で区切って符号化する
か、区切らずに符号化するかにより効率のよいデータ圧
縮が可能となる。As described above, the case where the character is delimited by the first character,
In each case where the longest matching character string search processing is continued without separating, the longest matching character string of the next character string is obtained, and the total compression ratio of the first and next two character strings in each case is taken into consideration. Therefore, efficient data compression can be achieved depending on whether encoding is performed by dividing by the first character or by encoding without dividing.

【００４１】データ圧縮の詳細図３及び図４は本発明のデータ圧縮処理の流れ図であ
る。予め、文字列一致長を調べるための変数ＬＥＮを０
に初期化すると共に（ステップ３０１）、８ビット構成
の全文字（＝２５６＝２⁸個）のそれぞれに登録番号を
付して辞書に初期登録し、かつ、辞書の登録数Ｎを文字
種数Ｍ（＝２５６）とする（Ｍ→Ｎ）。又、かかる状態
で、最初の文字Ｋを入力し、該文字の登録番号を参照番
号ωとし、これを語頭文字列(prefix string)とする
（ステップ３０２）。 Details of Data Compression FIGS. 3 and 4 are flowcharts of the data compression processing of the present invention. A variable LEN for checking the character string matching length is set to 0 in advance.
(Step 301) is initialized, the initial registration in the dictionary are given the respective registration numbers of all characters (= 256 = 2 ⁸⁾ of 8 bits, and the character type number registration number N of dictionary M (= 256) (M → N). In this state, the first character K is input, the registration number of the character is set as a reference number ω, and this is set as a prefix string (step 302).

【００４２】ついで、入力文字列の次の文字Ｋを読み込
み、該文字が存在するかチェックする（ステップ３０
３，３０４）。入力データが終了していれば、文字Ｋは
存在しないから、参照番号ωを「log₂Ｎ]ビットで表現
したものに、最長一致文字列であることを示すフラグビ
ットを付加したトータル（「log₂Ｎ]＋１）ビットの符
号語 code（ω）を出力し（ステップ３０５）、符号処
理を終了する。Next, the next character K of the input character string is read, and it is checked whether the character exists (step 30).
3, 304). If the input data is completed, since the character K does not exist, the reference number ω is expressed by “log ₂ N” bits, and a flag bit indicating the longest matching character string is added (“log ₂ N”). _A 2N] +1) -bit code word code (ω) is output (step 305), and the coding process is terminated.

【００４３】一方、次の文字Ｋが存在すれば、該文字Ｋ
を参照番号ωに加えた文字列（ωＫ）が現在の辞書にあ
るか否かを検索する（ステップ３０５）。文字列（ω
Ｋ）が辞書に存在すれば、ＬＥＮ＝０かチェックし、換
言すれば文字Ｋが第２番目の文字かチェックする（ステ
ップ３０６）。ＬＥＮ＝０であり、文字Ｋが２番目の文
字であれば、図４の第２文字列検索ルーチンを実行し、
第１圧縮方式における第２文字列ＣＴＲ₁₂（図２参照）
の検索を行う（ステップ３０７）。すなわち、ωを保持
すると共に、２番目の文字Ｋの登録番号をω′とする
（ステップ４０１）。ついで、入力文字列の次の文字
（３番目の文字）Ｋ′を読み込み、該文字Ｋ′が存在す
るか、換言すれば入力文字列が終了したか判断する（ス
テップ４０２，４０３）。On the other hand, if the next character K exists,
Is searched for in the current dictionary (step 305). String (ω
If K) exists in the dictionary, check if LEN = 0, in other words, check if character K is the second character (step 306). If LEN = 0 and the character K is the second character, the second character string search routine of FIG.
Second character string CTR ₁₂ in the first compression method (see FIG. 2)
(Step 307). That is, ω is held, and the registration number of the second character K is ω ′ (step 401). Next, the next character (third character) K 'of the input character string is read, and it is determined whether the character K' exists, in other words, whether the input character string has been completed (steps 402 and 403).

【００４４】文字Ｋ′が存在すれば、該文字Ｋ′を参照
番号ω′に加えた文字列（ω′Ｋ′）が辞書に存在する
か検索し（ステップ４０４）、文字列（ω′Ｋ′）が辞
書に存在すれば、該文字列（ω′Ｋ′）をω′に置き換
える（ステップ４０５）。以後、ステップ４０２に戻り
以降の処理を繰返し、文字列（ω′Ｋ′）が辞書から捜
せなくなるまで、あるいは次の文字が存在しなくなるま
で、最大一致長文字列（第２文字列ＣＴＲ₁₂）の検索を
続ける。If the character K 'exists, the dictionary is searched for a character string (ω'K') obtained by adding the character K 'to the reference number ω' (step 404), and the character string (ω'K If ') exists in the dictionary, the character string (ω'K') is replaced with ω '(step 405). Thereafter, the flow returns to step 402 and the subsequent processing is repeated until the character string (ω′K ′) cannot be searched from the dictionary or until the next character does not exist, and the maximum matching length character string (second character string CTR ₁₂ ) Continue searching for.

【００４５】そして、ステップ４０３において文字Ｋ′
が存在しなくなれば、あるいは、ステップ４０４におい
て文字列（ω′Ｋ′）が辞書から捜せなくなれば、第２
文字列の検索処理が終了する。尚、以上の第２文字列検
索ルーチンで求められた第２文字列ＣＴＲ₁₂の文字数を
ｎ₁₂、該第２文字列ＣＴＲ₁₂の参照番号ω′を表現する
に必要なビット数をlog₂（ω′）と表記する。Then, in step 403, the character K '
If no longer exists, or if the character string (ω′K ′) cannot be found in the dictionary in step 404, the second
The character string search process ends. Note that the number of characters of the second character string CTR ₁₂ obtained by the second character string search routine is n ₁₂ , and the number of bits required to represent the reference number ω ′ of the second character string CTR ₁₂ is log ₂ ( ω ′).

【００４６】ついで、ユニバーサル符号化部１１は、次
式ｒ₁＝（１＋ｎ₁₂）／｛９＋log₂（ω′）＋１｝ (1) により、「第１圧縮方式」における圧縮率ｒ₁を計算す
る（ステップ３０８）。尚、分母における数値９は、先
頭の１文字からなる第１文字列ＣＴＲ₁₁を表現するため
のビット数であり、（log₂（ω′）＋１）は最長一致文
字列である第２文字列ＣＴＲ₁₂を表現するためのビット
数である。Next, the universal encoding unit 11 calculates the compression ratio r _{1 in the} “first compression scheme” by the following equation: r ₁ = (1 + n ₁₂ ) / {9 + log ₂ (ω ′) + 1} (1) (Step 308). The numerical values 9 in the denominator is the number of bits for representing the first character string CTR ₁₁ consisting of the first _{character, (log 2 (ω ')} + 1) and the second string is the longest matching character string the number of bits for representing the CTR _12.

【００４７】以上により、「第１圧縮方式」における圧
縮率ｒ₁の計算処理が終了すれば、先頭文字、すなわち
第１文字列ＣＴＲ₁₁の参照番号（登録番号）ωをバッフ
ァに保持すると共に（ω→ＢＵＦω）、一致長ＬＥＮを
１にする（ステップ３０９，３１０）。尚、第１文字列
ＣＴＲ₁₁の参照番号（登録番号）ωをバッファに保持す
る理由は、「第１圧縮方式」の方が「第２圧縮方式」に
比べて圧縮率が高い場合に、該第１文字列ＣＴＲ₁₁の参
照番号（登録番号）ωの符号語code(ω)を、後の処理で
出力する必要があるからである。As described above, when the calculation processing of the compression ratio r _{1 in} the “first compression method” is completed, the first character, that is, the reference number (registration number) ω of the first character string CTR ₁₁ is held in the buffer, and ω → BUFω), and the matching length LEN is set to 1 (steps 309 and 310). The reason why the reference number (registration number) ω of the first character string CTR ₁₁ is stored in the buffer is that the “first compression method” has a higher compression ratio than the “second compression method”. This is because the code word code (ω) of the reference number (registration number) ω of the first character string CTR ₁₁ needs to be output in a later process.

【００４８】以上の処理が終了すれば、以後、「第２圧
縮方式」における第１、第２文字列ＣＴＲ₂₁，ＣＴＲ₂₂
の検索処理を行う。すなわち、まず、入力文字列におけ
る先頭から第２文字までの文字列（ωＫ）をωに置き換
える（ステップ３１１）。When the above processing is completed, the first and second character strings CTR ₂₁ and CTR ₂₂ in the “second compression method” will be described hereinafter.
Perform search processing. That is, first, the character string (ωK) from the head to the second character in the input character string is replaced with ω (step 311).

【００４９】ついで、ステップ３０３に戻り、次の文字
Ｋを読み込み、該文字が存在するかチェックする（ステ
ップ３０３，３０４）。入力データが終了していれば、
文字Ｋはもはや存在しないから、参照番号ωを「log
₂Ｎ]ビットで表現したものに、最長一致文字列であるこ
とを示すフラグビットを付加したトータル（「log₂Ｎ]
＋１）ビットの符号語 code（ω）を出力し（ステップ
３０５）、符号処理を終了する。Next, returning to step 303, the next character K is read, and it is checked whether the character exists (steps 303 and 304). If the input data is finished,
Since the letter K no longer exists, the reference number ω
₂ N] bits, and a flag bit indicating the longest matching character string is added to the total (“log ₂ N]
A code word code (ω) of +1) bits is output (step 305), and the encoding process ends.

【００５０】一方、次の文字Ｋが存在すれば、該文字Ｋ
を参照番号ωに加えた文字列（ωＫ）が現在の辞書にあ
るか否かを検索し（ステップ３０５）、文字列（ωＫ）
が辞書に存在すれば、ＬＥＮ＝０かチェックする（ステ
ップ３０６）。既にステップ３１０においてＬＥＮは１
とされているから「ＮＯ」となり、文字列（ωＫ）をω
に置き換えると共に、ＬＥＮを１インクリメントし（ス
テップ３１２，３１３）、しかる後、ステップ３０３に
戻り、「第２圧縮方式」における第１文字列ＣＴＲ₂₁の
検索を継続する。On the other hand, if the next character K exists,
A search is performed to determine whether or not a character string (ωK) obtained by adding to the reference number ω is present in the dictionary (step 305), and the character string (ωK)
If exists in the dictionary, it is checked whether LEN = 0 (step 306). LEN is already 1 in step 310
Is "NO", and the character string (ωK) is changed to ω
In addition, LEN is incremented by 1 (steps 312 and 313), and thereafter, the process returns to step 303, and the search for the first character string CTR ₂₁ in the "second compression method" is continued.

【００５１】以上の検索処理が繰り返されて文字列（ω
Ｋ）が辞書から捜せなくなると、ステップ３０５におい
て「ＮＯ」となり、「第２圧縮方式」における第１文字
列ＣＴＲ₂₁の検索処理が完了する。尚、以上の第１文字
列検索処理で求められた第１文字列ＣＴＲ₂₁の文字数を
ｎ₂₁、該第１文字列ＣＴＲ₂₁の参照番号ωを表現するに
必要なビット数をlog₂（ω）と表記する。The above search processing is repeated to obtain a character string (ω
If K) cannot be found from the dictionary, "NO" is determined in the step 305, and the search processing of the first character string CTR ₂₁ in the "second compression method" is completed. Note that the number of characters of the first character string CTR ₂₁ obtained in the first character string search processing is n ₂₁ , and the number of bits required to represent the reference number ω of the first character string CTR ₂₁ is log ₂ (ω ).

【００５２】「第２圧縮方式」における第１文字列ＣＴ
Ｒ₂₁の検索処理が完了すれば、次に、図４に示す第２文
字列検索ルーチンを実行して「第２圧縮方式」における
第２文字列ＣＴＲ₂₂の検索処理をおこなう（ステップ３
１４）。The first character string CT in the "second compression method"
If search processing for R ₂₁ is complete, then executes the second character string search routine shown in FIG. 4 performs a search process of the second string CTR ₂₂ in the "second format" (Step 3
14).

【００５３】すなわち、現在のω（第１文字列ＣＴＲ₂₁
の参照番号）を保持すると共に、第１文字列ＣＴＲ₂₁の
次の文字Ｋ（ステップ３０４で読み込んだ文字）の登録
番号をω′とする（ステップ４０１）。ついで、入力文
字列における次の文字Ｋ′を読み込み、該文字Ｋ′が存
在するか、換言すれば入力文字列が終了したか判断する
（ステップ４０２，４０３）。That is, the current ω (first character string CTR ₂₁
Is stored, and the registration number of the character K (the character read in step 304) next to the first character string CTR ₂₁ is set to ω ′ (step 401). Next, the next character K 'in the input character string is read, and it is determined whether or not the character K' exists, in other words, whether the input character string has been completed (steps 402 and 403).

【００５４】文字Ｋ′が存在すれば、該文字Ｋ′を参照
番号ω′に加えた文字列（ω′Ｋ′）が辞書に存在する
か検索し（ステップ４０４）、文字列（ω′Ｋ′）が辞
書に存在すれば、該文字列（ω′Ｋ′）をω′に置き換
える（ステップ４０５）。以後、ステップ４０２に戻り
以降の処理を繰返し、文字列（ω′Ｋ′）が辞書から捜
せなくなるまで、あるいは次の文字が存在しなくなるま
で、最大一致長文字列（第２文字列ＣＴＲ₂₂）の検索を
続ける。If the character K 'exists, the dictionary is searched for a character string (ω'K') obtained by adding the character K 'to the reference number ω' (step 404), and the character string (ω'K If ') exists in the dictionary, the character string (ω'K') is replaced with ω '(step 405). Thereafter, the process returns to step 402 and the subsequent processing is repeated until the character string (ω′K ′) cannot be searched from the dictionary or until the next character does not exist, and the maximum matching length character string (second character string CTR ₂₂ ) Continue searching for.

【００５５】そして、ステップ４０３において文字Ｋ′
が存在しなくなれば、あるいは、ステップ４０４におい
て文字列（ω′Ｋ′）が辞書から捜せなくなれば、第２
文字列の検索処理が終了する。尚、以上の第２文字列検
索ルーチンで求められた第２文字列ＣＴＲ₂₂の文字数を
ｎ₂₂、該第２文字列ＣＴＲ₂₂の参照番号ω′を表現する
に必要なビット数をlog₂（ω′）と表記する。Then, at step 403, the character K '
If no longer exists, or if the character string (ω′K ′) cannot be found in the dictionary in step 404, the second
The character string search process ends. The number of characters of the second character string CTR ₂₂ obtained by the above-described second character string search routine is n ₂₂ , and the number of bits required to express the reference number ω ′ of the second character string CTR ₂₂ is log ₂ ( ω ′).

【００５６】ついで、ユニバーサル符号化部１１は、次
式ｒ₂＝（ｎ₂₁＋ｎ₂₂）／［｛log₂（ω）＋１｝＋｛log₂（ω′）＋１｝］ (2) により、「第２圧縮方式」における圧縮率ｒ₂を計算す
る（ステップ３１５）。尚、分母における｛log₂（ω）
＋１｝は最長一致文字列である第１文字列ＣＴＲ ₂₁を表
現するためのビット数、｛log₂（ω′）＋１｝は最長一
致文字列である第２文字列ＣＴＲ₂₂を表現するためのビ
ット数である。Next, the universal encoding unit 11
Equation r_Two= (N_{twenty one}+ N_{twenty two}) / [｛Log_Two(Ω) +1｝ + ｛log_Two(Ω ′) + 1}], the compression ratio r in the “second compression method”_TwoCalculate
(Step 315). Note that ｛log in the denominator_Two(Ω)
+1} is the first character string CTR that is the longest matching character string _{twenty one}The table
Number of bits to represent, ｛log_Two(Ω ') + 1｝ is the longest one
Second character string CTR that is a match character string_{twenty two}To express
Number of units.

【００５７】以上により、「第２圧縮方式」における圧
縮率ｒ₂の計算処理が終了すれば、一致長ＬＥＮを０に
戻し（ステップ３１６）、両圧縮方式における圧縮率ｒ
₁，ｒ₂の大小を比較する（ステップ３１７）。ｒ₁≧ｒ₂
であれば、バッファに記憶してある入力文字列における
先頭文字の参照番号ω、換言すれば、「第１圧縮方式」
における第１文字列ＣＴＲ₁₁の参照番号をωとして戻し
（BUFω→ω，ステップ３１８)、該参照番号ωを８ビッ
トの生データと１ビットのフラグビットのトータル９ビ
ットで表現した符号語code(ω)を出力する（ステップ３
１９）。As described above, when the calculation of the compression ratio r _{2 in} the “second compression method” is completed, the matching length LEN is returned to 0 (step 316), and the compression ratios r in both compression methods are determined.
The magnitudes of ₁ and r ₂ are compared (step 317). r ₁ ≧ r ₂
, The reference number ω of the first character in the input character string stored in the buffer, in other words, the “first compression method”
, The reference number of the first character string CTR ₁₁ is returned as ω (BUFω → ω, step 318), and the codeword code () expressing the reference number ω with a total of 9 bits of 8-bit raw data and 1-bit flag bit ω) (Step 3)
19).

【００５８】以後、第１文字列の次の文字を先頭文字と
し、その登録番号を新たな参照番号ωとして、ステップ
３０３以降の処理を繰返して入力文字列の符号化を行
う。一方、ステップ３１７においてｒ₁＜ｒ₂であれば、
ステップ４０１で保持してある「第２圧縮方式」におけ
る第１文字列ＣＴＲ₂₁の参照番号ωを、「log₂Ｎ]（log
₂Ｎより大きい最小の整数）ビットと、最長一致文字列
であることを示すフラグビットとのトータル（「log
₂Ｎ]＋１）ビットで表現した符号語code(ω)を出力する
と共に、第１文字列に次の文字Ｋを付加した文字列（ω
Ｋ）に登録番号Ｎを付加して辞書に登録し、更に文字Ｋ
の登録番号を参照番号ωに置き換えると共に、辞書アド
レスＮを１インクリメントする（ステップ３２０）。以
後、ステップ３０３以降の処理を繰返して入力文字列の
符号化を行う。After that, the next character of the first character string is set as the first character, and the registration number is set as a new reference number ω, and the processing after step 303 is repeated to encode the input character string. On the other hand, if r ₁ <r ₂ in step 317,
The reference number ω of the first character string CTR ₂₁ in the “second compression method” held in step 401 is changed to “log ₂ N] (log
₂ (the smallest integer greater than N) and a flag bit indicating the longest match string ("log
₂ N] +1) and outputs the code word code (omega) representing a bit string (omega obtained by adding the next character K to the first character string
K) is added to the registration number N and registered in the dictionary.
Is replaced with the reference number ω, and the dictionary address N is incremented by one (step 320). Thereafter, the processing of step 303 and subsequent steps is repeated to encode the input character string.

【００５９】(b) 第２の実施例以上の図３に従った符号化処理によれば、１文字で区切
った場合に９ビットで符号化でき、最長一致文字列を
(「log₂Ｎ]＋１）ビットで符号化する場合に比べて短く
できるが、第２圧縮方式における第１文字列（最大一致
長文字列）の出力頻度が多くなってくると、フラグ分だ
け累積され、フラグを用いない従来の符号化方式（図１
０）に比べて符号化効率が悪化する場合がある。そこ
で、ある一定区間（一定量の入力文字列）のファイルに
ついて、図３の符号化処理を仮に行い、「第１圧縮方
式」による第１文字列ＣＲＴ₁₁の符号語が出力される回
数を監視し、その値が設定値より少ない場合には従来の
符号化方式により圧縮を行い、多い場合には図３に従っ
た本発明の符号化処理を行えば、入力文字列の性質に応
じて最適な圧縮を行うことができる。(B) Second Embodiment According to the encoding processing according to FIG. 3 described above, it is possible to encode with 9 bits when divided by one character.
(“Log ₂ N] +1) bits can be shortened compared to the case of encoding. However, when the output frequency of the first character string (maximum matching length character string) in the second compression method increases, only the flag amount is used. A conventional coding method that is accumulated and does not use a flag (FIG. 1)
In some cases, the coding efficiency may be lower than that in the case of (0). Therefore, the encoding process of FIG. 3 is temporarily performed for a file in a certain section (a fixed amount of input character strings), and the number of times the code word of the first character string CRT ₁₁ is output by the “first compression method” is monitored. If the value is smaller than the set value, compression is performed by the conventional encoding method. If the value is larger than the set value, the encoding process of the present invention according to FIG. Compression can be performed.

【００６０】図５はかかる圧縮方法を実現する符号器の
実施例構成図であり、１１は入力文字列ＣＴＲを図３あ
るいは図１０に従って符号化して出力するユニバーサル
符号化部、１２は既に入力されて符号化済みの文字列に
順次番号（登録番号あるいは参照番号）を付して記憶す
る辞書部、２１は仮圧縮部であり、一定数（例えば１Ｋ
バイト）の入力文字列ＣＴＲに対して図３に示した符号
化処理を仮に実行し、「第１圧縮方式」による第１文字
列ＣＲＴ₁₁の符号語が出力される回数を監視し、その回
数により、図３に従った本発明の符号化処理を正式に行
うか、あるいは従来通りのフラグを用いない符号化処理
を行うかを決定する仮圧縮部、２２は一定数の入力文字
列ＣＴＲを記憶する記憶部である。尚、記憶部２２は設
けず、一定量の入力文字列の仮圧縮後に、該入力文字列
をユニバーサル符号化部１１に入力するように構成する
こともできる。又、仮圧縮部２１とユニバーサル符号化
部１１を共通にして、仮圧縮と本圧縮の両方に兼用させ
るように構成することもできる。FIG. 5 is a block diagram showing an embodiment of an encoder for realizing such a compression method. Reference numeral 11 denotes a universal encoding unit for encoding an input character string CTR according to FIG. 3 or FIG. A dictionary unit 21 stores character strings that have been encoded in advance by sequentially assigning numbers (registration numbers or reference numbers), and 21 is a temporary compression unit.
3) is temporarily executed on the input character string CTR of (byte), and the number of times the code word of the first character string CRT ₁₁ according to the “first compression method” is output is monitored. The provisional compression unit 22 determines whether to perform the encoding process of the present invention in accordance with FIG. 3 formally or to perform the encoding process without using the flag as in the related art. It is a storage unit for storing. The storage unit 22 may not be provided, and the input character string may be input to the universal encoding unit 11 after a predetermined amount of the input character string is temporarily compressed. Further, the temporary compression unit 21 and the universal encoding unit 11 may be configured in common so that both the temporary compression and the main compression are used.

【００６１】図６は本発明のデータ圧縮処理の流れ図で
ある。一定量の入力文字列ＣＴＲを順次仮圧縮部２１と
記憶部２２に入力する。仮圧縮部２１は図３に従った符
号化処理を仮に実行し、記憶部２２は一定量の入力文字
列ＣＴＲを記憶する（ステップ５０１、５０２）。仮圧
縮部２１は、入力文字列ＣＴＲに対して符号化処理を実
行すると共に「第１圧縮方式」による第１文字列ＣＴＲ
₁₁の符号語が出力される回数（生データ出力回数）Ｐ
(i)を監視する（ステップ５０３）。FIG. 6 is a flowchart of the data compression processing of the present invention. A certain amount of the input character string CTR is sequentially input to the temporary compression unit 21 and the storage unit 22. The temporary compression unit 21 temporarily executes the encoding process according to FIG. 3, and the storage unit 22 stores a fixed amount of the input character string CTR (steps 501 and 502). The temporary compression unit 21 performs an encoding process on the input character string CTR, and performs the first character string CTR using the “first compression method”.
Number of times ₁₁ code words are output (raw data output number) P
(i) is monitored (step 503).

【００６２】そして、一定量の仮圧縮が終了すれば、生
データ出力回数Ｐ(i)が予め設定されている一定数Ｃと
大小比較する（ステップ５０４）。生データ出力回数Ｐ
(i)が一定数Ｃ以下であれば、MODE＝０（従来の符号化
処理）とし、一定数以上であればMODE＝１（図３に従っ
た本発明の符号化処理）とする。・・ステップ５０５、
５０６When the fixed amount of temporary compression is completed, the raw data output frequency P (i) is compared with a predetermined fixed number C (step 504). Raw data output count P
If (i) is equal to or less than a certain number C, MODE = 0 (conventional encoding processing), and if it is more than a certain number, MODE = 1 (the encoding processing of the present invention according to FIG. 3). ..Step 505,
506

【００６３】ついで、モードの変更があったかチェック
し、あればモード変更データを出力しする（ステップ５
０７、５０８）。ユニバーサル符号化部１１はモード変
更データにより、MODE＝１が指示されたか判断し（ステ
ップ５０９）、MODE＝０であれば、従来のフラグを用い
ない符号化処理を実行し（ステップ５１０）、MODE＝１
であれば図３に従った本発明の符号化処理を実行し（ス
テップ５１１）、圧縮データを出力し（ステップ５１
２）、以後次の一定量の入力文字列に対してステップ５
０１以降の処理を繰り返す。Next, it is checked whether or not the mode has been changed, and if so, mode change data is output (step 5).
07, 508). The universal encoding unit 11 determines whether MODE = 1 has been instructed by the mode change data (step 509). If MODE = 0, the universal encoding unit 11 performs an encoding process that does not use a conventional flag (step 510). = 1
If so, the encoding process according to the present invention according to FIG. 3 is executed (step 511), and compressed data is output (step 51).
2) Thereafter, step 5 is performed for the next fixed amount of input character strings.
The processing after 01 is repeated.

【００６４】尚、モード変更データは、文字として使用
しない８ビットのコードを決めておき、該コードでモー
ドの変更を指示する。あるいは、一定入力文字列（１ブ
ロック）毎にモードを指定する１ビットのモード指定フ
ラグを出力してモードを指定する。以上、本発明を実施
例により説明したが、本発明は請求の範囲に記載した本
発明の主旨に従い種々の変形が可能であり、本発明はこ
れらを排除するものではない。As the mode change data, an 8-bit code not used as a character is determined, and the mode change is instructed by the code. Alternatively, the mode is designated by outputting a 1-bit mode designation flag for designating the mode for each fixed input character string (one block). As described above, the present invention has been described with reference to the embodiments. However, the present invention can be variously modified in accordance with the gist of the present invention described in the claims, and the present invention does not exclude these.

【００６５】[0065]

【発明の効果】以上本発明によれば、先頭文字で区切っ
た場合と、区切ることなく最長一致文字列の検索処理を
続けた場合のそれぞれにおいて、次の文字列の最長一致
文字列を求め、各場合における最初と次の２つの文字列
のトータルでの符号化効率を考慮して、先頭文字で区切
って符号化するか、区切らずに符号化するか判断してデ
ータ圧縮するように構成したから、少ない演算量の増加
で、換言すれば演算速度の低下を最小限に押さえて、効
率のよいデータ圧縮が可能となる。As described above, according to the present invention, the longest matching character string of the next character string is obtained in each of the case where the first character is separated and the case where the search processing of the longest matching character string is continued without separating. In consideration of the total encoding efficiency of the first and next two character strings in each case, data compression is performed by judging whether encoding is performed by delimiting with the first character or encoding without delimiting. Therefore, efficient data compression can be achieved with a small increase in the amount of calculation, in other words, a decrease in the calculation speed is minimized.

【００６６】又、本発明によれば、先頭文字で区切って
符号化する場合、該先頭文字を最長一致文字列の符号語
より短い符号語で表現するようにしているから、更に圧
縮率を改善できる。Further, according to the present invention, when encoding by dividing by the first character, the first character is represented by a code word shorter than the code word of the longest matching character string, so that the compression ratio is further improved. it can.

【００６７】更に、本発明によれば、圧縮率を、第１、
第２文字列の文字の総和を第１、第２文字列の符号語の
ビット数の総和で除算することにより求めるように構成
したから、正確に圧縮率を演算でき、正しく圧縮率の高
いやり方で入力文字列を区切って符号化でき、圧縮率を
向上できる。Further, according to the present invention, the compression ratio is set to the first,
Since the sum of the characters of the second character string is obtained by dividing the sum of the number of bits of the code words of the first and second character strings, the compression ratio can be accurately calculated and the compression ratio can be accurately increased. Can be used to separate and encode the input character string and improve the compression ratio.

【００６８】又、本発明によれば、一定数の入力文字列
に対して符号化を仮に実行して生データが出力される回
数を計数し、一定値以上の場合には、本発明による符号
化処理を正式に行って符号語を出力し、一定値以下の場
合には従来通りの符号化を行うように構成したから文字
列の性質に応じた符号化方式を採用でき、データ圧縮率
を向上することができる。Further, according to the present invention, the number of times that raw data is output by temporarily executing encoding for a fixed number of input character strings is counted. The code word is output by performing the formalization process, and if the value is less than a certain value, the encoding is performed as before.Therefore, the encoding method according to the character string characteristics can be adopted, and the data compression ratio can be reduced. Can be improved.

[Brief description of the drawings]

【図１】本発明の原理説明図である。FIG. 1 is a diagram illustrating the principle of the present invention.

【図２】本発明によるデータ圧縮方法を実現する符号器
の構成図及びデータ圧縮の概略図である。FIG. 2 is a configuration diagram of an encoder for realizing a data compression method according to the present invention and a schematic diagram of data compression.

【図３】本発明のデータ圧縮処理の第１の流れ図であ
る。FIG. 3 is a first flowchart of a data compression process of the present invention.

【図４】本発明のデータ圧縮処理の第２の流れ図であ
る。FIG. 4 is a second flowchart of the data compression processing of the present invention.

【図５】本発明の別の実施例構成図である。FIG. 5 is a configuration diagram of another embodiment of the present invention.

【図６】本発明の別の圧縮処理の流れ図である。FIG. 6 is a flowchart of another compression process of the present invention.

【図７】ＬＺＳＳ符号化の説明図である。FIG. 7 is an explanatory diagram of LZSS encoding.

【図８】ＬＺＷ符号化説明図である。FIG. 8 is an explanatory diagram of LZW encoding.

【図９】辞書構成の説明図である。FIG. 9 is an explanatory diagram of a dictionary configuration.

【図１０】ＬＺＷ符号化のフローチャートである。FIG. 10 is a flowchart of LZW encoding.

【図１１】ＬＺＷ復号化のフローチャートである。FIG. 11 is a flowchart of LZW decoding.

【図１２】ＬＺＷ復号化の例外時における説明図であ
る。FIG. 12 is an explanatory diagram at the time of exception of LZW decoding.

【図１３】ＬＺＷ復号化説明図である。FIG. 13 is an explanatory diagram of LZW decoding.

【図１４】従来のデータ圧縮の問題点説明図である。FIG. 14 is an explanatory diagram of a problem of conventional data compression.

[Explanation of symbols]

ＣＴＲ・・入力文字列ＣＴＲ₁₁・・入力文字列の先頭文字のみからなる第１文
字列ＣＴＲ₁₂・・先頭文字以降の第２文字列ＣＤＷ₁₁・・第１文字列の符号語ＣＤＷ₁₂・・第２文字列の符号語ＣＴＲ₂₁・・符号化済み文字列と最長に一致する第１文
字列ＣＴＲ₂₂・・符号化済み文字列と最長に一致する第２文
字列ＣＤＷ₂₁・・第１文字列の符号語ＣＤＷ₂₂・・第２文字列の符号語ＣＰＲ₁，ＣＰＲ₂・・圧縮率演算部ＣＯＭＰ・・圧縮率の大小を比較する比較部ＣＤＯＴ・・符号語出力部１１・・ユニバーサル符号化部１２・・辞書部２１・・仮圧縮部CTR ·· input string CTR ₁₁ the second string of the first character string CTR ₁₂ ·· first character after which consists of only the first character of ... the input string CDW ₁₁ ·· first character string code word CDW ₁₂ ·· the second string CDW ₂₁ · · first character that matches the first character string CTR ₂₂ · · encoded strings and longest matching codeword CTR ₂₁ · · encoded string and the longest second string string of the code word CDW ₂₂ ... second string of code words CPR _1, comparing section cdot ... codeword output unit 11 ... universal code for comparing the magnitude of CPR ₂ · compression ratio calculation unit COMP · compression ratio Transformation unit 12 Dictionary unit 21 Temporary compression unit

───────────────────────────────────────────────────── フロントページの続き (72)発明者千葉広隆神奈川県川崎市中原区上小田中1015番地富士通株式会社内 (56)参考文献特開平２−34038（ＪＰ，Ａ) 特開平３−78322（ＪＰ，Ａ) 特開平４−86126（ＪＰ，Ａ) 特開平５−11973（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06F 5/00 H03M 7/30 - 7/46 ────────────────────────────────────────────────── ─── Continuation of the front page (72) Inventor Hirotaka Chiba 1015 Kamiodanaka, Nakahara-ku, Kawasaki City, Kanagawa Prefecture Inside Fujitsu Limited (56) References JP-A-2-34038 (JP, A) JP-A-3-78322 (JP, A) JP-A-4-86126 (JP, A) JP-A-5-11973 (JP, A) (58) Fields investigated (Int. Cl. ⁷ , DB name) G06F 5/00 H03M 7 / 30-7/46

Claims

(57) [Claims]

1. A data compression method for obtaining a character string that matches an input character string from an already input encoded character string and encoding the input character string by using the number of the matched character string. The encoded character string that matches the longest character from the first character of (CTR) is defined as the first character string (CTR ₂₁ ), and the encoded character string that matches the longest character string after the first character string in the input character string is The first step to obtain the second character string (CTR ₂₂ ). When the input character string (CTR) is separated by the first character, the first character is used as the first character string (CTR ₁₁ ), and the input after the first character string A second step of obtaining a coded character string that matches the longest character string as a second character string (CTR ₁₂ ); a first character string (CT) consisting of one character obtained in the second step
R ₁₁ ) is represented by a code word shorter than the code word of the longest matching character string. The compression ratio when the first and second character strings (CTR ₂₁ , CTR ₂₂ ) in the first step are encoded , The first and second character strings of the second step
A fourth step of comparing the magnitude of the compression rate when encoding (CTR ₁₁ , CTR ₁₂ ); a fifth step of outputting the code word of the first string having the higher compression rate; a first step in the input string A sixth step of continuing encoding with the next character of the character string as the leading character.

2. The compression ratio in the fourth step is obtained by dividing the sum of the characters of the first and second character strings by the sum of the bit numbers of the code words of the first and second character strings. 2. The data compression method according to claim 1, wherein

3. When one character is composed of m bits,
First character string (CTR ₁₁ ) consisting of one character in the second step
With m-bit raw data representing the character itself;
(M +
2. The data compression method according to claim 1, wherein the encoding is performed by 1) bits.

4. Assuming that the number of encoded character strings is n, “log ₂ N] (l
(“log ₂ N] + the smallest integer greater than og ₂ N) and a flag bit indicating the longest matching string
2. The data compression method according to claim 1, wherein each longest matching character string is encoded by 1) bits.

5. The method according to claim 5, wherein said encoding is temporarily performed on a fixed number of input character strings, and the number of times the code word of the first character string is output in the second step is counted.
2. The data compression method according to claim 1, further comprising the step of formally performing the encoding process and outputting a codeword.