JPH05128101A

JPH05128101A - Data compression and restoration system

Info

Publication number: JPH05128101A
Application number: JP3287451A
Authority: JP
Inventors: Yoshiyuki Okada; 佳之岡田; Shigeru Yoshida; 茂吉田; Yasuhiko Nakano; 泰彦中野; Hirotaka Chiba; 広隆千葉
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1991-11-01
Filing date: 1991-11-01
Publication date: 1993-05-25
Anticipated expiration: 2018-03-24
Also published as: JP3388768B2

Abstract

PURPOSE:To effectively utilize the capacity of a dictionary as a whole by increasing the capacity of a split dictionary in accorcance with the number of registration at the time of encoding or deoding. CONSTITUTION:An initial set software 12 sets a minimum size to the respective sharing dictionaries 10-1 to 10-255 which are split up into 256, e.g. at the time of encoding or decoding and initial-registers all the character kinds of 256 by putting indexes in each character unit. A dictionary retrieving software 14 specifies a specific split dictionary 10-(i) with a history based on the group of last characters among character strings which are encoded just before then, for example, the character code of the last character at the time of encoding an input character string, and retrieves a partial string whose length coinsides with maximum length among the partial strings which are registered in a specified partial string 10-(i) and already encoded. A dictionary capacity increasing software 18 increases its dictionary capacity when the index of the split dictionary 10-(i) exceeds the prescribed size of a dictionary and keeps the size of the dictionary as it is when the index does not exceed it.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、ユニバーサル符号の一
種である増分分解型の改良として知られたＬＺＷ符号に
よるデータ圧縮及び復元方式に関する。近年、文字コー
ド、ベクトル情報、画像など様々なデータがコンピュー
タで扱われるようになっており、扱われるデータ量も急
速に増加してきている。大量のデータを扱うときは、デ
ータの中の冗長な部分を省いてデータ量を圧縮すること
で、記憶容量を減らしたり、速く伝送したりできるよう
になる。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a data compression and decompression method using an LZW code known as an improvement of an incremental decomposition type which is a kind of universal code. In recent years, various data such as character codes, vector information, and images have been handled by computers, and the amount of data handled has been increasing rapidly. When handling a large amount of data, omitting redundant parts of the data and compressing the amount of data reduces the storage capacity and enables faster transmission.

【０００２】様々なデータを１つの方式でデータ圧縮で
きる方法としてユニバーサル符号化が提案されている。
ここで本発明の分野は、文字コードの圧縮に限らず、様
々なデータに適用できるが、以下の説明では、情報理論
で用いられている呼称を踏襲し、データの１ワード単位
を文字と呼び、データが任意数のワードつながったもの
を文字列を呼ぶことにする。Universal encoding has been proposed as a method of compressing various data by one method.
Here, the field of the present invention is not limited to compression of character codes and can be applied to various data. However, in the following description, the word used in information theory is followed, and one word unit of data is called a character. , Let's call a character string a string of data connected by an arbitrary number of words.

【０００３】ユニバーサル符号の代表的な方法として、
Ziv-Lempel（ジブ−レンペル）符号がある（詳しくは、
例えば、宗像『Ziv-Lempelのデータ圧縮法』，情報処
理，Vol.26,No.1,1985年を参照のこと）。 Ziv-Lempel符号ではユニバーサル型と、増分分解型（Incremental parsing）の２つのアルゴリズムが提案されている。As a typical method of the universal code,
There is Ziv-Lempel code (for details,
For example, see Munakata "Ziv-Lempel Data Compression Method", Information Processing, Vol.26, No.1, 1985). For Ziv-Lempel codes, two algorithms have been proposed: a universal type and an incremental decomposition type (Incremental parsing).

【０００４】さらに、ユニバーサル型アルゴリズムの改
良として、ＬＺＳＳ符号、（T.C. Bell,“Better OPM/L
Text Compression ”,IEEE Trans. on Commun., Vol.C
OM-34, No.12, Dec. 1986参照）。また、増分分解型ア
ルゴリズムの改良としては、ＬＺＷ（Lempel-Ziv-Welc
h）符号がある（T.A. Welch, “A Technique for High-
Performance Data Compression ”,Computer, June1984
参照）。Further, as an improvement of the universal type algorithm, LZSS code, (TC Bell, "Better OPM / L
Text Compression ”, IEEE Trans. On Commun., Vol.C
OM-34, No. 12, Dec. 1986). Further, as an improvement of the incremental decomposition type algorithm, LZW (Lempel-Ziv-Welc
h) Signed (TA Welch, “A Technique for High-
Performance Data Compression ”, Computer, June 1984
reference).

【０００５】これらの符号の内、高速処理ができること
と、アルゴリズムの簡単さからＬＺＷ符号が記憶装置の
ファイル圧縮などで使われるようになっている。Among these codes, the LZW code has come to be used for file compression of a storage device because of the high speed processing and the simplicity of the algorithm.

【０００６】[0006]

【従来の技術】従来のＬＺＷ符号の符号化／復号化の詳
細なアルゴリズムを図９に示し、また復号化の詳細なア
ルゴリズムを図１０に示す。ＬＺＷ符号化は、書き替え
可能な辞書をもち、入力文字コードで成るデータ中を相
異なる文字列に分け、この文字列を出現した順に番号を
付けて辞書に登録すると共に、現在入力している文字列
を辞書に登録してある最長一致する文字列の番号で表し
て、符号化するものである。2. Description of the Related Art A detailed algorithm for encoding / decoding a conventional LZW code is shown in FIG. 9, and a detailed algorithm for decoding is shown in FIG. The LZW encoding has a rewritable dictionary, divides the data consisting of input character codes into different character strings, assigns numbers to the character strings in the order in which they appear, registers them in the dictionary, and is currently inputting them. The character string is represented by the number of the longest matching character string registered in the dictionary and is encoded.

【０００７】図９のＬＺＷ符号化処理では、まずステッ
プＳ１で予め全文字につき一文字からなる文字列を初期
値として登録してから符号化を始める。ステップＳ２で
は入力した最初の文字Ｋを辞書検索する参照番号ωと
し、これを語頭文字列（prefixstring ）とする。次に
ステップＳ３で入力データの次の文字Ｋを読み込み、ス
テップＳ４ではステップＳ２で求めた語頭文字列ωにス
テップＳ３で読み込んだ文字Ｋを加えた文字列（ωＫ）
が現在の辞書にあるか否か検索する。In the LZW encoding process of FIG. 9, first, in step S1, a character string consisting of one character for all characters is registered in advance as an initial value, and then encoding is started. In step S2, the input first character K is used as a reference number ω for dictionary search, and this is used as a prefix character string (prefixstring). Next, in step S3, the next character K of the input data is read, and in step S4, the character string (ωK) obtained by adding the character K read in step S3 to the initial character string ω obtained in step S2.
Is searched for in the current dictionary.

【０００８】ステップＳ４で文字列（ωＫ）が辞書にあ
れば、ステップＳ５で文字列（ωＫ）を参照番号ωに置
き換え、ステップＳ５で入力データが終了かどうかを判
断した後、再びステップＳ３に戻って文字列（ωＫ）が
辞書から探せなくなるまで最大一致長の検索を続ける。
次にステップＳ４で文字列（ωＫ）が辞書になければ、
ステップＳ７に進んでステップＳ２で求めた文字Ｋの参
照番号ωを符号語code（ω）として出力し、また文字列
（ωＫ）に新たな参照番号を付加して辞書に登録し、さ
らにステップＳ２の入力文字Ｋを参照番号ωに置き換え
るとともに、辞書アドレスＮをインクリメントして、ス
テップＳ６のチェックを受けた後、ステップＳ２に戻っ
て次の文字Ｋを読み込む。If the character string (ωK) is found in the dictionary in step S4, the character string (ωK) is replaced with the reference number ω in step S5, and it is determined in step S5 whether or not the input data has ended. Returning to this, the search for the maximum matching length is continued until the character string (ωK) cannot be searched from the dictionary.
Next, in step S4, if the character string (ωK) is not in the dictionary,
Proceeding to step S7, the reference number ω of the character K obtained in step S2 is output as a code word code (ω), and a new reference number is added to the character string (ωK) and registered in the dictionary. The input character K of is replaced with the reference number ω, the dictionary address N is incremented, and after the check in step S6 is received, the process returns to step S2 to read the next character K.

【０００９】次に図１１、図１２を参照して符号化を具
体的に説明すると次のようになる。尚、図１１、図１２
では説明を簡単にするためａｂｃの３文字の組合せから
なるデータを圧縮する場合を取上げている。まず図１１
の入力データは左から右へ読み込む。最初の文字ａを入
力したとき、辞書にはａの他に一致する文字列がないの
で、出力符号（参照番号ω）を符号語として出力する。
そして、拡張した文字列ａｂに参照番号４をつけて辞書
に登録する。実際の登録は文字列（１ｂ）の形となる。Next, the encoding will be specifically described with reference to FIGS. 11 and 12. Incidentally, FIG. 11 and FIG.
In order to simplify the explanation, the case of compressing data consisting of a combination of three letters abc is taken up. First, FIG.
Input data of is read from left to right. When the first character a is input, since there is no matching character string other than a in the dictionary, the output code (reference number ω) is output as a code word.
Then, the reference character 4 is attached to the expanded character string ab and registered in the dictionary. The actual registration is in the form of a character string (1b).

【００１０】続いて２番目のｂが文字列の先頭になる。
辞書にはｂの他に一致する文字列がないので、参照番号
２を符号語として出力し、拡張した文字列ｂａを実際に
は２ａの形で参照番号５をつけて辞書に登録する。３番
目のａが次の文字列の先頭になる。以下、同様にこの処
理を続ける。図１０の復号化処理は図９の符号化処理の
逆の操作を行う。Subsequently, the second b becomes the beginning of the character string.
Since there is no matching character string other than b in the dictionary, the reference number 2 is output as a codeword, and the expanded character string ba is actually added to the dictionary with the reference number 5 and registered in the dictionary. The third a is the beginning of the next character string. Hereinafter, this processing is similarly continued. The decoding process of FIG. 10 performs the reverse operation of the encoding process of FIG.

【００１１】図１０の復号化では、ステップＳ１におい
て符号化と同様に予め辞書に全文字につき一文字からな
る文字列を初期値として登録してから復号を始める。ま
ずステップＳ２で最初の符号（参照番号）を読み込み、
現在のＣＯＤＥをＯＬＤcodeとし、最初の符号は既に辞
書に登録された一文字の参照番号いずれかに該当するこ
とから、入力符号ＣＯＤＥに一致する文字code（Ｋ）を
探し出し、文字Ｋを出力する。なお、出力した文字
（Ｋ）は後の例外処理のためcharにセットしておく。In the decoding of FIG. 10, similarly to the encoding in step S1, a character string consisting of one character for every character is registered in the dictionary in advance as an initial value, and then the decoding is started. First, in step S2, the first code (reference number) is read,
The current CODE is set as an OLD code, and the first code corresponds to any one-character reference number already registered in the dictionary. Therefore, the character code (K) matching the input code CODE is searched for, and the character K is output. The output character (K) is set in char for exception processing later.

【００１２】次にステップＳ３に進んで次の符号を読み
込んでＣＯＤＥにＮＥＷcodeとしてセットする。次にス
テップＳ４に進み、ステップＳ３で入力された符号ＣＯ
ＤＥが辞書に定義（登録）されているか否かチェックす
る。通常、入力した符号語は前回までの処理で辞書に登
録されているため、ステップＳ５に進んで符号ＣＯＤＥ
に対応する文字列code（ωＫ）を辞書から読み出し、ス
テップＳ６で文字列Ｋを一時的にスタックし、参照番号
code（ω）を新たなＣＯＤＥとして再度ステップＳ５に
戻し、このステップＳ５，ステップＳ６の手順を再帰的
に参照番号ωが一文字にいたるまで繰り返し、最後にス
テップＳ７に進んでステップＳ６でスタックした文字を
ＬＩＬＯ（LastIn Fast Out）形式でポップアップして
出力する。Next, in step S3, the next code is read and set in CODE as NEW code. Next, in step S4, the code CO input in step S3 is input.
It is checked whether DE is defined (registered) in the dictionary. Normally, the input codeword is registered in the dictionary by the processing up to the previous time, so the processing proceeds to step S5 and the code CODE is entered.
The character string code (ωK) corresponding to is read from the dictionary, the character string K is temporarily stacked in step S6, and the reference number
The code (ω) is set as a new CODE and returned to step S5 again, and the procedure of steps S5 and S6 is recursively repeated until the reference number ω reaches one character, and finally, the process proceeds to step S7 and the characters stacked in step S6. Is popped up and output in the LILO (Last In Fast Out) format.

【００１３】同時にステップＳ７において、前回使った
符号ωと今回復元した文字列の最初の一文字Ｋを組
（ω，Ｋ）と表した文字列に、新たな参照番号を付加し
て辞書に登録する。なお、ステップＳ４において登録さ
れていない符号（符号化において直前の参照番号を参照
する場合に起きる）場合、ステップＳ９にて、ＯＬＤco
deをＣＯＤＥに、code（ＯＬＤcode,char ）をＮＥＷco
deに戻した後にステップＳ５へ進むようにする。At the same time, in step S7, a new reference number is added to the character string in which the code ω used last time and the first character K of the character string restored this time are represented as a set (ω, K) and registered in the dictionary. .. If the code that is not registered in step S4 (which occurs when the immediately preceding reference number is referred to in encoding), the OLDco
de to CODE, code (OLDcode, char) to NEWco
After returning to de, the process proceeds to step S5.

【００１４】図１３を参照して復号化処理を具体的に説
明すると次のようになる。尚、図１３１２では説明を簡
単にするためａｂｃの３文字の組合せからなるデータを
復号する場合を取上げている。まず図１３で最初の入力
符号は１であり、一文字ａ，ｂ，ｃについては既に参照
番号１，２，３として図１２に示すように辞書に登録さ
れているため、辞書の参照により符号１に一致する参照
番号の文字列ａに置き換えて出力する。The decoding process will be described in detail with reference to FIG. Note that, in FIG. 1312, the case of decoding data consisting of a combination of three characters of abc is illustrated for the sake of simplicity. First, in FIG. 13, the first input code is 1, and the characters a, b, and c are already registered in the dictionary as reference numbers 1, 2, and 3 as shown in FIG. Is replaced with a character string a having a reference number that matches

【００１５】次の符号２についても同様にして文字ｂに
置き換えて出力する。このとき前回処理した符号と今回
復号した最初の一文字ｂとを組み合わせた（１ｂ）に新
たな参照番号４を付加して辞書に登録する。３番目の符
号４は辞書の探索により１ｂからａｂと置き換えて文字
列ａｂを出力する。同時に前回処理した符号２と今回復
号した文字列の１番目の文字ａとの組合せた文字列２ａ
（＝ｂａ）を新たな参照番号５を付加して辞書に登録す
る。以下同様に、この処理を繰り返す。Similarly, the following code 2 is replaced with the character b and output. At this time, a new reference number 4 is added to the combination (1b) of the previously processed code and the first character b decoded this time, and registered in the dictionary. The third code 4 replaces 1b with ab by searching the dictionary and outputs the character string ab. At the same time, a character string 2a obtained by combining the previously processed code 2 and the first character a of the character string decoded this time
(= Ba) is added to the reference number 5 and registered in the dictionary. Similarly, this process is repeated thereafter.

【００１６】ただし、図１３の復号化では次の例外処理
がある。この例外処理は、第６番目の入力符号８の復号
で生ずる。符号８は復号時に辞書に定義されておらず、
復号できない。この場合には、前回処理した符号５に前
回復号した文字列ｂａの最初の一文字ｂを加えた文字列
５ｂを求め、さらに２ａｂ，ｂａｂと置き換えられて出
力される。そして、文字列の出力語に前回の符号語５に
今回復号した文字列の文字ｂを加えた文字列５ｂに参照
番号８を付加して辞書に登録する。However, there is the following exception processing in the decoding of FIG. This exception processing occurs in the decoding of the sixth input code 8. Code 8 is not defined in the dictionary at the time of decoding,
I can't decrypt. In this case, the character string 5b obtained by adding the first character b of the previously decoded character string ba to the code 5 processed last time is obtained, and further replaced with 2ab and bab and output. Then, the reference number 8 is added to the character string 5b obtained by adding the character b of the character string decoded this time to the previous code word 5 to the output word of the character string and registered in the dictionary.

【００１７】この例外処理は図１０の復号化処理フロー
のステップＳ４，ステップＳ９の処理を通じて行われ、
最終的にステップＳ７で文字列の出力と新たな文字列に
参照番号を付加した辞書への登録が行われる。なお、図
９、図１０の符号化及び復号化処理は、同じ辞書を作り
出しながら行う。This exception processing is performed through the processing of steps S4 and S9 of the decoding processing flow of FIG.
Finally, in step S7, the character string is output and the new character string is added to the reference number and registered in the dictionary. The encoding and decoding processes of FIGS. 9 and 10 are performed while creating the same dictionary.

【００１８】[0018]

【発明が解決しようとする課題】従来のＬＺＷ符号で
は、入力文字コード、データ中を相異なる文字列に分け
て符号化するとき、現在符号化中の各文字列は以前の文
字列とは独立に出現するものとして符号化する形式をと
っている。この方法では、無記憶情報源の符号化には問
題ない。In the conventional LZW code, when the input character code and the data are divided into different character strings for encoding, each character string currently being encoded is independent of the previous character string. It takes the form of encoding as it appears in. With this method, there is no problem in encoding the memoryless information source.

【００１９】しかし、実際の文章等、多くのデータは記
憶情報源と見なされ、ＬＺＷ符号化では文字列が出現す
る履歴を十分利用できておらず、データ圧縮後も文字列
の出現の従属性については冗長性が残るという欠点があ
った。図１４は図９のＬＺＷ符号化を行った時に作成さ
れた辞書の探索木であり、辞書の探索木の根（root）は
空であり、ＬＺＷ符号では符号化中の文字列に対して以
前に出現した文字列の履歴は考えられていない。また図
１５は図１４の辞書の作成に伴って生成されたＬＺＷ符
号を示す。However, many data such as actual sentences are regarded as a memory information source, and the history of appearance of character strings cannot be fully utilized in LZW encoding, and the dependency of appearance of character strings even after data compression. There was a drawback that the redundancy remained. FIG. 14 is a search tree of the dictionary created when the LZW encoding of FIG. 9 is performed, the root of the search tree of the dictionary is empty, and in the LZW code, it appears before the character string being encoded. The history of the string that was done is not considered. Further, FIG. 15 shows the LZW code generated along with the creation of the dictionary of FIG.

【００２０】そこで本願発明者等にあっては、直前の文
字列の最終文字群（最終文字の一つ前，２つ前・・・の
文字を含めても可）との従属関係を辞書に取り込むこと
によって文字列間の冗長性を削減し、圧縮率を高めよう
にしたＬＺＷ符号化および復元方式を提案している。具
体的には、図１６に示すように辞書を複数個、例えば０
〜２５５に分けて索引をつけ、図１７に示すように直前
の文字列の最終文字Ｐ０，Ｐ１，Ｐ２，Ｐ３，・・・を
索引にして個別の辞書を選択することが考えられる。各
辞書には、索引の最終文字Ｐ０，Ｐ１，Ｐ２，Ｐ３，・
・・につながる文字列のみを格納しておく。Therefore, the inventors of the present invention use the dictionary to determine the subordinate relationship with the last character group of the immediately preceding character string (including the character before the last character, the character two before the last character ...). We have proposed an LZW encoding and decompression method that reduces redundancy between character strings by incorporating it and increases the compression rate. Specifically, as shown in FIG. 16, a plurality of dictionaries, for example, 0
It is possible to select an individual dictionary by dividing each of the characters up to 255 and indexing them, and indexing the last characters P0, P1, P2, P3, ... Of the immediately preceding character string as shown in FIG. In each dictionary, the last characters P0, P1, P2, P3, ...
・ Store only the character string connected to.

【００２１】この方式によれば、従来、辞書中の文字列
を全体からみた参照番号で符号語を表していたのに対
し、索引に繋がる系列の参照番号で符号語を表すことが
できるので短く表現でき、符号化効率を向上させること
ができる。しかし、この方法には以下にあげる問題があ
った。複数個に分ける際の辞書容量は、例えば２５６個
に分割した場合、個々の分割辞書に対して初期状態にお
いて全体辞書容量の１／２５６（均等分割）の容量が割
り当てられるため、全くアクセスされない辞書にも同じ
容量が割り当てられると共に、よく使用する辞書がすぐ
に辞書容量をオーバーする欠点があった。According to this method, conventionally, a code word is represented by a reference number of a character string in the dictionary as a whole, but the code word can be represented by a reference number of a series connected to an index, which is short. It can be expressed and the coding efficiency can be improved. However, this method has the following problems. When dividing into a plurality of dictionary capacities, for example, when divided into 256, the capacity of 1/256 (equal division) of the entire dictionary capacity is assigned to each divided dictionary in the initial state, and therefore the dictionary is not accessed at all. Although the same capacity is allocated to, the frequently used dictionaries have a drawback that the dictionary capacity soon exceeds.

【００２２】また、初期値の登録も含めて、より全体辞
書容量を効率よく使用するために、最終文字との従属関
係の履歴状態を縮退させたり（例えば最終文字の上位ビ
ットのみの履歴状態をとる）、よく使用する辞書の容量
を予め大きく採っておくことも考えられるが、予めデー
タの特性を知らなければ辞書容量の配分が出来ず、任意
のデータに対してはあまり効果がない等の欠点を有して
いた。本発明は、このような従来の問題点に鑑みてなさ
れたもので、直前文字列の最終文字群を履歴情報として
分割辞書を指定して符号化及び復号化する際に、分割辞
書の容量を適切に割り当てて辞書として使用するメモリ
容量を低減すると共に、不必要な辞書の参照番号の増加
を抑えて高圧縮率が得られるようにしたデータ圧縮及び
復元方式を提供することを目的とする。In order to use the entire dictionary capacity more efficiently, including the registration of the initial value, the history state of the dependency relation with the last character is degenerated (for example, the history state of only the upper bit of the last character is changed). It may be possible to use a large capacity of the dictionary that is often used, but without knowing the characteristics of the data in advance, the dictionary capacity cannot be distributed, and there is little effect on arbitrary data. It had drawbacks. The present invention has been made in view of such a conventional problem, and when the encoding and decoding are performed by designating a division dictionary as the last character group of the immediately preceding character string as history information, the capacity of the division dictionary is reduced. It is an object of the present invention to provide a data compression and decompression method that can appropriately allocate and reduce the memory capacity used as a dictionary and can suppress an unnecessary increase in the reference number of the dictionary to obtain a high compression rate.

【００２３】[0023]

【課題を解決するための手段】図１は本発明の原理説明
図である。まず本発明は、有限の大きさを持つ辞書１０
に登録された符号化済みの文字部分列により入力文字列
の符号化を行うデータ圧縮方式を対象とする。このよう
なデータ圧縮方式として本発明にあっては、図１（ａ）
に示すように、辞書１０を複数に分割して構成した分割
辞書１０−０〜１０−ｎと、各分割辞書１０−１〜１０
−ｎに対して最小の辞書サイズを設定する初期設定手段
１２と、入力文字列の符号化時に、直前に符号化済みの
文字列の中の最終文字群、例えば最終文字に基づいた履
歴により特定の分割辞書１０−ｉを指定し、この指定分
割辞書１０−ｉに登録された既に符号化済みの部分列の
内、最大長一致する部分列を検索する辞書検索手段１４
と、辞書検索手段１４で分割辞書１０−ｉから検索され
た入力文字に最大長一致する部分列の参照番号を符号語
として出力する符号化手段１６と、符号化に使用した分
割辞書１０−ｉの参照番号が所定の辞書サイズが越えた
場合に、分割辞書１０−ｉの辞書容量を増やし、越えな
い場合は辞書サイズをそのまま維持する辞書容量増加手
段１８と、符号化手段１６で符号語を出力した際に、符
号語に次の入力１文字を加えた文字列を該符号語の最終
文字群に基づく履歴により指定される分割辞書に登録す
る分割辞書登録手段２０とを設けたことを特徴とする。FIG. 1 illustrates the principle of the present invention. First, the present invention is based on the dictionary 10 having a finite size.
The target is a data compression method in which an input character string is encoded by an encoded character substring registered in. According to the present invention as such a data compression method, FIG.
, The dictionary 10 is divided into a plurality of divided dictionaries 10-0 to 10-n, and the divided dictionaries 10-1 to 10-10.
The initial setting means 12 for setting the minimum dictionary size for -n, and the last character group in the character string encoded immediately before the encoding of the input character string, for example, specified by the history based on the last character The dictionary searching means 14 for designating the divided dictionary 10-i of the above and searching for a substring having the maximum length match among the already encoded substrings registered in the designated divided dictionary 10-i.
An encoding unit 16 for outputting, as a code word, a reference number of a subsequence having a maximum length matching the input character retrieved by the dictionary retrieval unit 14 from the division dictionary 10-i; and the division dictionary 10-i used for encoding. If the reference number exceeds the predetermined dictionary size, the dictionary capacity of the divided dictionary 10-i is increased, and if it does not exceed the dictionary size, the dictionary capacity increasing means 18 for maintaining the dictionary size as it is and the encoding means 16 store the code word. A division dictionary registration means 20 is provided for registering a character string obtained by adding the next input character to a code word in a division dictionary designated by a history based on the last character group of the code word when output. And

【００２４】また本発明は、有限の大きさを持つ辞書１
０に登録した復号化済みの文字部分列により入力した符
号語列の復号化を行うデータ復元方式を対象とする。こ
のデータ復元方式につき本発明にあっては、図１（ｂ）
に示すように、各分割辞書に対して最小の辞書サイズを
設定する初期設定手段１２と、入力符号語の復号化時
に、直前に復号化済みの文字列の中の最終文字群、例え
ば最終文字に基づいた履歴により分割辞書１０−０〜１
０−ｎの中から特定の分割辞書１０−ｉを検索する分割
辞書検索手段１４と、検索された分割辞書１０−ｉの入
力符号語に一致する参照番号の登録内容に基づいて文字
又は文字列を復号する分割辞書復号化手段２２と、復号
化に使用した分割辞書１０−ｉの参照番号が所定の辞書
サイズが越えた場合には分割辞書１０−ｉの辞書容量を
増やし、越えない場合は辞書サイズをそのまま維持する
辞書容量増加手段１８と、復号化手段２２で文字列を復
号した際に、前回の入力符号語に今回の復元文字列の先
頭の１文字を加えた文字列を、直前に復号した文字列の
最終文字群に基づく履歴により指定される分割辞書に登
録する分割辞書登録手段２４とを設けたことを特徴とす
る。The present invention also provides a dictionary 1 having a finite size.
The data restoration method for decoding the input codeword string by the decoded character substring registered in 0 is targeted. According to the present invention regarding this data restoration method, FIG.
As shown in, the initial setting means 12 for setting the minimum dictionary size for each divided dictionary, and the last character group in the character string decoded immediately before the decoding of the input codeword, for example, the last character. According to the history based on the division dictionary 10-0 ~ 1
A character or character string based on the division dictionary search means 14 for searching a specific division dictionary 10-i from 0-n and the registered content of the reference number that matches the input codeword of the retrieved division dictionary 10-i. If the reference size of the divided dictionary 10-i used for decoding is larger than the predetermined dictionary size, the dictionary capacity of the divided dictionary 10-i is increased. When the character string is decoded by the dictionary capacity increasing unit 18 that maintains the dictionary size as it is and the decoding unit 22, the character string obtained by adding the first character of the restored character string to the previous input codeword is immediately preceding. And a division dictionary registration means 24 for registering in the division dictionary designated by the history based on the final character group of the decoded character string.

【００２５】ここで初期設定手段１２は、各分割辞書１
０−０〜１０−ｎ毎に少なくとも全文字種、例えば２５
６個の全文字種を１文字単位に参照番号を付けて登録可
能な最小の辞書サイズを設定すると共に、この全文字種
を１文字単位に参照番号を付けて初期登録する。ここで
辞書容量増加手段１８は、符号化又は復号化に使用した
分割辞書の参照番号が所定の辞書サイズを越える毎に、
現状サイズの２倍に分割辞書の容量を増加させる。Here, the initial setting means 12 uses the divided dictionaries 1
Every 0-0 to 10-n at least all character types, for example, 25
All six character types are assigned a reference number on a character-by-character basis to set a minimum dictionary size that can be registered, and all the character types are assigned a reference number on a character-by-character basis for initial registration. Here, the dictionary capacity increasing means 18 sets the reference number of the divided dictionary used for encoding or decoding each time it exceeds a predetermined dictionary size.
Increase the capacity of the split dictionary to twice the current size.

【００２６】更に辞書容量増加手段１８は、符号化又は
復号化に使用した分割辞書の参照番号が所定の辞書サイ
ズを越える毎に、現状サイズに所定の最小単位を加えた
サイズに辞書容量を増加させるようにしてもよい。Further, the dictionary capacity increasing means 18 increases the dictionary capacity to a size obtained by adding a predetermined minimum unit to the current size each time the reference number of the divided dictionary used for encoding or decoding exceeds a predetermined dictionary size. It may be allowed to.

【００２７】[0027]

【作用】このような構成を備えた本発明のデータ圧縮及
び復元方式によれば、次の作用が得られる。まず本発明
のデータ圧縮方式は、図２のフローチャートに示すよう
に、最初に各分割辞書に対して最小の辞書サイズを与え
ておき、次に入力したデータと同じ文字を文字列間の履
歴に従い分割した辞書の中から検索し、次に分割辞書に
おけるインデックス（参照番号）で符号化を行い、符号
化に使用した分割辞書のインデックス（参照番号）が所
定の辞書サイズが越えた場合には分割辞書の辞書容量を
増やし、越えない場合は辞書サイズをそのままにして登
録を行うことにより、有限な辞書を効率良く使用し、よ
く使用する分割辞書がオーバーしないようにしたもので
ある。According to the data compression and decompression method of the present invention having such a configuration, the following effects can be obtained. First, in the data compression method of the present invention, as shown in the flowchart of FIG. 2, first, a minimum dictionary size is given to each divided dictionary, and then the same character as the input data is set according to the history between character strings. Searches from the divided dictionaries, then encodes with the index (reference number) in the divided dictionary, and if the index (reference number) of the divided dictionary used for encoding exceeds the predetermined dictionary size, divides By increasing the dictionary capacity of the dictionary and registering the dictionary size as it is when the dictionary size is not exceeded, a finite dictionary can be used efficiently and the frequently used division dictionary is prevented from exceeding.

【００２８】また本発明のデータ復元方式は、図３のフ
ローチャートに示すように、最初に各分割辞書に対して
最小の辞書サイズを与えておき、次に入力した符号化デ
ータ（符号語）から文字列間の履歴に従い分割した辞書
の中から該当する文字列を検索し、分割辞書内から検出
した文字列を復元する。このとき復号に使用した分割辞
書のインデックス（参照番号）が所定の辞書サイズが越
えた場合には、その分割辞書の辞書容量を増やし、越え
ない場合は辞書サイズをそのままにして登録を行うこと
により、同様に有限な辞書を効率良く使用し、よく使用
する分割辞書がオーバーしないようにしたものである。Further, in the data restoration system of the present invention, as shown in the flowchart of FIG. 3, first, a minimum dictionary size is given to each divided dictionary, and then the input coded data (code word) is used. The corresponding character string is searched from the dictionary divided according to the history between the character strings, and the character string detected from the divided dictionary is restored. At this time, if the index (reference number) of the divided dictionary used for decoding exceeds the predetermined dictionary size, increase the dictionary capacity of the divided dictionary, and if it does not exceed the dictionary size, register the dictionary size as it is. Similarly, a finite dictionary is used efficiently so that the frequently used division dictionary is not overrun.

【００２９】データ符号化及び復号化における分割辞書
サイズの増加は、例えば図４に示すように、第１段階において、最小サイズを各分割辞書に与え、
初期値を登録すると共に、各分割辞書の最終絶対アドレ
スamax(n) を n×256 、最大辞書容量dmax(n)を256 と
して別の領域に格納設定する。また、全体辞書の最終絶
対アドレスall-amaxを66536 、全体辞書の最大辞書容量
all-dmaxを66536 に設定する。The increase in the size of the division dictionary in the data encoding and decoding is performed by giving the minimum size to each division dictionary in the first step, as shown in FIG. 4, for example.
The initial value is registered, and the final absolute address amax (n) of each divided dictionary is set to n × 256 and the maximum dictionary capacity dmax (n) is set to 256 and stored in another area. Also, the final absolute address all-amax of the whole dictionary is 66536, the maximum dictionary capacity of the whole dictionary.
Set all-dmax to 66536.

【００３０】第２段階において、ｎ＝０の分割辞書が
選択されたとすると、全体辞書の最終絶対アドレスall-
amaxから256 の領域をｎ＝０の分割辞書に確保し、各分
割辞書の最終絶対アドレスamax(0) と最大辞書容量 dma
x(0)および全体辞書の最終絶対アドレスall-amaxと最大
辞書容量all-dmaxを下記の通りに更新する。第３段階はｎ＝２５６の分割辞書が現状サイズ（最小
サイズ）２５６を越えた場合で、２倍の５１２に増加さ
れる。In the second stage, if a divided dictionary with n = 0 is selected, the final absolute address all-
The area from amax to 256 is secured in the divided dictionary of n = 0, and the final absolute address amax (0) of each divided dictionary and the maximum dictionary capacity dma
The final absolute address all-amax and the maximum dictionary capacity all-dmax of x (0) and the whole dictionary are updated as follows. The third stage is a case where the division dictionary of n = 256 exceeds the current size (minimum size) 256, and is doubled to 512.

【００３１】第４段階はｎ＝０の分割辞書が現状サイ
ズ５１２を越えた場合で、２倍の１０２４に増加させ
る。第５段階はｎ＝０の分割辞書が現状サイズ１０２４を
越えた場合で、２倍の２０４８に増加させる。これをま
とめると次のようになる。［最終絶対アドレス］［最大辞書容量］［最終絶対アドレス］［最大辞書容量］第１段階 :n×256 →amax(n), 256→dmax(n), 66536 →all-amax, 66536 →all-dmax 第２段階 : 66792 →amax(0), 512→dmax(0), 66792 →all-amax, 66792 →all-dmax 第３段階 : 67048 →amax(255), 512→dmax(255), 67048→all-amax, 67048 →all-dmax 第４段階 : 67560 →amax(0), 1024→dmax(0), 67560→all-amax, 67560 →all-dmax 第５段階 : 68584 →amax(0), 2048→dmax(0), 68584→all-amax, 68584 →all-dmax 以上のようにして、限られた辞書容量をデータに応じ
て、最大限有効に使用しようというものである。但し、
図４の場合は辞書容量を現状サイズの２倍に更新してい
るが、追加する辞書容量は任意である。In the fourth step, when the division dictionary of n = 0 exceeds the current size 512, it is doubled to 1024. In the fifth step, when the division dictionary of n = 0 exceeds the current size 1024, it is doubled to 2048. This is summarized as follows. [Last absolute address] [Maximum dictionary capacity] [Last absolute address] [Maximum dictionary capacity] 1st stage: n × 256 → amax (n), 256 → dmax (n), 66536 → all-amax, 66536 → all- dmax 2nd stage: 66792 → amax (0), 512 → dmax (0), 66792 → all-amax, 66792 → all-dmax 3rd stage: 67048 → amax (255), 512 → dmax (255), 67048 → all-amax, 67048 → all-dmax 4th stage: 67560 → amax (0), 1024 → dmax (0), 67560 → all-amax, 67560 → all-dmax 5th stage: 68584 → amax (0), 2048 → dmax (0), 68584 → all-amax, 68584 → all-dmax As described above, the limited dictionary capacity is used as effectively as possible according to the data. However,
In the case of FIG. 4, the dictionary capacity is updated to twice the current size, but the dictionary capacity to be added is arbitrary.

【００３２】[0032]

【実施例】図５は本発明の一実施例を示した実施例構成
図である。図５において、２６は制御手段としてのＣＰ
Ｕであり、ＣＰＵ２６に対してはプログラムメモリ２８
とデータメモリ３２が接続される。プログラムメモリ２
８には、コントロールソフト３０、初期設定ソフト１
２、辞書検索ソフト１４、符号化ソフト１６、復号化ソ
フト２２、辞書容量増加ソフト１８、及び分割辞書登録
ソフト２０，２４が設けられる。FIG. 5 is a block diagram of an embodiment showing one embodiment of the present invention. In FIG. 5, reference numeral 26 is a CP as a control means.
U, the program memory 28 for the CPU 26
And the data memory 32 are connected. Program memory 2
8, control software 30, initial setting software 1
2, dictionary search software 14, encoding software 16, decoding software 22, dictionary capacity increasing software 18, and divided dictionary registration software 20, 24 are provided.

【００３３】初期設定ソフト１２は、符号化及び復号化
の際に例えば２５６分割された分割辞書１０−１〜１０
−２５５の各々に対し最小サイズの設定を行うと共に、
全文字種２５６を１文字単位にインデックスを付けて初
期登録する。辞書検索ソフト１４は入力文字列の符号化
時に、直前に符号化済みの文字列の中の最終文字群、例
えば最終文字の文字コードに基づいた履歴により特定の
分割辞書１０−ｉを指定し、指定分割辞書１０−ｉに登
録された既に符号化済みの部分列の内、最大長一致する
部分列を検索する。The initialization software 12 is a division dictionary 10-1 to 10 divided into, for example, 256 when encoding and decoding.
-Set the minimum size for each of -255,
All the character types 256 are initially registered by indexing each character. When the input character string is encoded, the dictionary search software 14 designates a specific divided dictionary 10-i based on the last character group in the character string encoded immediately before, for example, the history based on the character code of the last character, Of the already encoded substrings registered in the designated division dictionary 10-i, the substring having the maximum length match is searched.

【００３４】また辞書検索ソフト１４は入力符符号化デ
ータ（符号語）の復号化時に、直前に復号化済みの文字
列の中の最終文字群、例えば最終文字の文字コードに基
づいた履歴により分割辞書１０−０〜１０−ｎの中から
特定の分割辞書１０−ｉを検索する。符号化ソフト１６
は辞書検索ソフト１４で分割辞書から検索された既に符
号化済みの部分列の内の入力文字に最大長一致する部分
列を示す参照番号を符号化データ（符号語）として出力
する。Further, the dictionary search software 14 divides the input coded data (codeword) by the history based on the last character group in the character string decoded just before, for example, the character code of the last character, at the time of decoding. A specific divided dictionary 10-i is searched from the dictionaries 10-0 to 10-n. Encoding software 16
Outputs, as encoded data (code word), a reference number indicating a substring having the maximum length that matches the input character of the already encoded substring searched by the dictionary search software 14 from the divided dictionary.

【００３５】また復号化ソフト２２は、辞書検索ソフト
１４で検索された分割辞書１０−ｉの入力符号語に一致
する参照番号の登録内容に基づいて文字又は文字列を復
号する。辞書容量増加ソフト１８は、符号化または復号
化に使用した分割辞書１０−ｉのインデックスが所定の
辞書サイズが越えた場合に、この分割辞書１０−ｉの辞
書容量を増やし、越えない場合は辞書サイズをそのまま
維持する。、更に、分割辞書登録ソフト２０は、符号化
ソフト１６で符号語を出力した際に、符号語に次の入力
１文字を加えた文字列を該符号語の最終文字群に基づく
履歴により指定される分割辞書に登録する。更に分割辞
書登録ソフト２４は、復号化ソフト２２で文字列を復号
した際に、前回の入力符号語に今回の復元文字列の先頭
の１文字を加えた文字列を、直前に復号した文字列の最
終文字群に基づく履歴により指定される分割辞書に登録
する。The decoding software 22 also decodes a character or a character string based on the registered content of the reference number that matches the input codeword of the divided dictionary 10-i searched by the dictionary search software 14. The dictionary capacity increasing software 18 increases the dictionary capacity of the divided dictionary 10-i when the index of the divided dictionary 10-i used for encoding or decoding exceeds a predetermined dictionary size, and when it does not exceed the dictionary size, the dictionary capacity is increased. Keep the size as it is. Further, when the encoding software 16 outputs the code word, the division dictionary registration software 20 specifies the character string obtained by adding the next input character to the code word by the history based on the last character group of the code word. Register in the split dictionary. Further, when the decoding software 22 decodes the character string, the division dictionary registration software 24 adds a character string obtained by adding the leading one character of the restored character string of this time to the character string of the character string decoded immediately before. It registers in the division dictionary designated by the history based on the last character group.

【００３６】一方、データメモリ３２には、これから符
号化しようとする文字列又はこれから復号化しようとす
る符号列を格納するデータバッファ３４と、処理対象と
なる全て文字種２５６個に対応して２５６分割された分
割辞書１０−０〜１０−２５５が設けられる。図６は本
発明の符号化処理の詳細を示したフローチャートであ
る。On the other hand, in the data memory 32, a data buffer 34 for storing a character string to be encoded or a code string to be decoded, and 256 divisions corresponding to all 256 character types to be processed. The divided dictionaries 10-0 to 10-255 are provided. FIG. 6 is a flowchart showing details of the encoding process of the present invention.

【００３７】まず、ステップＳ１では、初期設定とし
て、出現する文字の総数Ｍ、例えばＭ＝２５６、分割辞
書の個数Ａ、例えばＡ＝２５６個に対して、Ａ＝２５６
個の分割辞書Ｄi にＭ＝０〜２５５個の文字を初期登録
する。次に、直前の文字列の最終文字で選択するＡ個の
分割辞書の各木ｉについて節点（インデックス）の個数
をindc(i) で管理する。まず、初期化としてＡ個のindc
(i) をＭ＋１（＝２５６）にセットする。また、各分割
辞書の最大辞書容量dmax(i) を最小辞書サイズＭ＋１
（＝２５６）にセットする。First, in step S1, as a default, the total number M of appearing characters, for example M = 256, the number A of divided dictionaries, for example A = 256, is set to A = 256.
Initially register M = 0 to 255 characters in this divided dictionary Di. Next, the number of nodes (indexes) is managed by indc (i) for each tree i of the A division dictionaries selected by the last character of the immediately preceding character string. First, as initialization, A indcs
Set (i) to M + 1 (= 256). Further, the maximum dictionary capacity dmax (i) of each divided dictionary is set to the minimum dictionary size M + 1.
(= 256).

【００３８】まず、最初の文字Ｋを入力し、それをイン
デックス（語頭文字列）ωとすると共に直前文字列の最
終文字Ｋ１にも代入する。直前文字列の最終文字からの
履歴ＰＫを定義し、直前文字列の最終文字Ｋ１から使用
すべき辞書番号に対応づけるルックアップテーブルＬＵ
Ｔを設置する。尚、最初の１文字については直前文字列
の最終文字Ｋ１がないことから所定の固定値を履歴ＰＫ
として使用する。First, the first character K is input, and it is used as an index (initial character string) ω and also substituted for the last character K1 of the immediately preceding character string. A lookup table LU that defines the history PK from the last character of the immediately preceding character string and associates it with the dictionary number to be used from the last character K1 of the immediately preceding character string.
Install T. Since there is no last character K1 of the immediately preceding character string for the first character, a predetermined fixed value is set in the history PK.
To use as.

【００３９】次にステップＳ２で次の文字Ｋを入力す
る。続いてステップＳ３において、文字列ωＫが分割辞
書Ｄ_PKに存在するかどうかをチェックする。存在する場
合、ステップＳ４に進み、文字列ωＫを新たな語頭文字
列ωに置き換え、また入力文字Ｋを最終文字Ｋ１に置き
換え、ステップＳ５を経てステップＳ２に戻り、一致す
る最長文字列を検索する。Next, in step S2, the next character K is input. Then, in step S3, it is checked whether or not the character string ωK exists in the division dictionary D _PK . If it exists, the process proceeds to step S4, the character string ωK is replaced with a new initial character string ω, the input character K is replaced with the final character K1, and the process returns to step S2 via step S5 to search for the longest matching character string. To do.

【００４０】ステップＳ３において、文字列ωＫが分割
辞書Ｄ_PKに存在せず、最長文字列の検索を終了した場
合、ステップＳ６に進む。ステップＳ６では、分割辞書
Ｄ_PKのインデックスindc（ＰＫ）が、分割辞書Ｄ_PKの辞
書サイズdmax（ＰＫ）より大きいかどうかをチェックす
る。インデックスindc（ＰＫ）が辞書サイズdmax（Ｐ
Ｋ）を越えた場合、ステップＳ７に進み、辞書サイズdm
ax（ＰＫ）及び最終絶対アドレスamax（ＰＫ）を更新
し、ステップＳ８に進む。If the character string ωK does not exist in the divided dictionary D _PK in step S3 and the search for the longest character string is completed, the process proceeds to step S6. In step S6, the index indc split dictionary D _PK (PK) is, dictionary size dmax of split dictionary D _PK (PK) to check whether larger. The index indc (PK) is the dictionary size dmax (P
If it exceeds K), the process proceeds to step S7 and the dictionary size dm
The ax (PK) and the final absolute address amax (PK) are updated, and the process proceeds to step S8.

【００４１】また、ステップＳ６でインデックスindc
（ＰＫ）が辞書サイズdmax（ＰＫ）を越えない場合はス
テップＳ８に進む。ステップＳ８では、分割辞書の符号
化データcode（ω）を出力すると共に、アドレスindc
（ＰＫ）の分割辞書Ｄ_PKに文字列ωＫを登録した後、文
字をＫを語頭文字列ωに代入し、インデックスindc（Ｐ
Ｋ）をインクリメント、履歴PKをＬＵＴ（Ｋ１）として
ステップＳ５に進む。In step S6, the index indc
If (PK) does not exceed the dictionary size dmax (PK), the process proceeds to step S8. In step S8, the encoded data code (ω) of the divided dictionary is output and the address indc
After the character string ωK is registered in the divided dictionary D _PK of (PK), the character K is substituted for the initial character string ω, and the index indc (P
K) is incremented, the history PK is set to LUT (K1), and the process proceeds to step S5.

【００４２】図７は図６の符号化処理において行われる
分割辞書の容量増加を具体的に示した説明図であり、図
４に対応している。図７にあっては、初期設定及び初期登録を行う第１段階；ｎ＝０の分割辞書が選択された時に辞書領域を確保す
る第２段階；ｎ＝２５６の分割辞書が選択された時に辞書領域を確
保する第３段階；ｎ＝０の分割辞書のサイズを越えた時に更に辞書領域
を確保する第４段階；ｎ＝０の分割辞書のサイズを越えた時に更に辞書領域
を確保する第５段階；について示している。FIG. 7 is an explanatory diagram concretely showing the increase of the capacity of the division dictionary performed in the encoding process of FIG. 6, and corresponds to FIG. In FIG. 7, a first step of initial setting and initial registration; a second step of securing a dictionary area when a divided dictionary of n = 0 is selected; a dictionary when a divided dictionary of n = 256 is selected Third stage of securing a region; Fourth stage of securing a further dictionary region when the size of the divided dictionary of n = 0 is exceeded; Fourth stage of securing a further dictionary region when the size of the divided dictionary of n = 0 is exceeded Stage;

【００４３】即ち、第１段階では、最小サイズを各分割
辞書に与え、初期値を登録すると共に、各分割辞書の最
終絶対アドレスamax(n) を n×256 、最大辞書容量dmax
(n)を256 として別の領域に格納設定する。また、全体
辞書の最終絶対アドレスall-amaxを66536 、全体辞書の
最大辞書容量all-dmaxを66536 に設定する。第２段階に
おいては、ｎ＝０の分割辞書が選択されたことから、全
体辞書の最終絶対アドレスall-amaxから256 の領域をｎ
＝０の分割辞書に確保し、各分割辞書の最終絶対アドレ
スamax(0) と最大辞書容量 dmax(0)および全体辞書の最
終絶対アドレスall-amaxと最大辞書容量all-dmaxを更新
する。That is, in the first stage, the minimum size is given to each divided dictionary, the initial value is registered, the final absolute address amax (n) of each divided dictionary is n × 256, and the maximum dictionary capacity dmax.
Set (n) to 256 and store it in another area. Also, the final absolute address all-amax of the whole dictionary is set to 66536 and the maximum dictionary capacity all-dmax of the whole dictionary is set to 66536. At the second stage, since the divided dictionary with n = 0 is selected, the area from the final absolute address all-amax of the entire dictionary to 256 is n.
The final absolute address amax (0) and the maximum dictionary capacity dmax (0) of each divided dictionary and the final absolute address all-amax and the maximum dictionary capacity all-dmax of the whole dictionary are updated.

【００４４】第３段階においてはｎ＝２５６の分割辞書
が現状サイズ（最小サイズ）２５６を越えた場合で、２
倍の５１２に増加させ、各分割辞書の最終絶対アドレス
amax(0) と最大辞書容量 dmax(0)および全体辞書の最終
絶対アドレスall-amaxと最大辞書容量all-dmaxを更新す
る。第４段階においてはｎ＝０の分割辞書が現状サイズ
５１２を越えた場合で、２倍の１０２４に増加させる。In the third step, when the number of divided dictionaries of n = 256 exceeds the current size (minimum size) 256, 2
Doubled to 512 and the final absolute address of each split dictionary
Update amax (0) and maximum dictionary capacity dmax (0), and final absolute address all-amax and maximum dictionary capacity all-dmax of the whole dictionary. In the fourth stage, when the division dictionary of n = 0 exceeds the current size 512, it is doubled to 1024.

【００４５】第４段階においてはｎ＝０の分割辞書が現
状サイズ１０２４を越えた場合で、２倍の２０４８に増
加させる。図８は本発明の復号化処理の詳細を示したフ
ローチャートである。まず、ステップＳ１の初期設定
は、図６のステップＳ１と同じである。続いてステップ
Ｓ２において、最初の符号を読み、ＯＬＤcodeとする。
ＣＯＤＥに対応する分割辞書Ｄ_PKから文字Ｋを復元し、
出力すると共に、文字Ｋをchar，ＰＫをＰＫ１に、ＬＵ
Ｔ（Ｋ）をＰＫに代入する。次にステップＳ３では次の
符号を読み、ＮＥＷcodeとする。In the fourth stage, when the division dictionary of n = 0 exceeds the current size 1024, it is doubled to 2048. FIG. 8 is a flowchart showing details of the decoding process of the present invention. First, the initial setting of step S1 is the same as step S1 of FIG. Then, in step S2, the first code is read and set as an OLD code.
Restore the character K from the split dictionary D _PK corresponding to CODE,
Outputs the character K to char, PK to PK1, LU
Substitute T (K) into PK. Next, in step S3, the next code is read and set as NEW code.

【００４６】ステップＳ４では、分割辞書Ｄ_PKにＣＯＤ
Ｅが定義されていない場合はステップＳ５に進み、定義
されている場合はステップＳ６に進む。ステップＳ５で
は直前文字列の第１文字charを出力すると共に、ＣＯＤ
ＥをＯＬＤcodeに戻し、またＮＥＷcodeを分割辞書Ｄ_PK
内のＯＬＤcodeとcharの組み合わせから得られるcodeに
戻した後ステップＳ６に進む。In step S4, the COD is added to the division dictionary D _PK .
If E is not defined, the process proceeds to step S5, and if it is defined, the process proceeds to step S6. In step S5, the first character char of the previous character string is output and the COD
E is returned to OLD code, and NEW code is divided dictionary D _PK
After returning to the code obtained from the combination of OLD code and char in the above, the process proceeds to step S6.

【００４７】ステップステップＳ６では、分割辞書Ｄ_PK
のインデックスＣＯＤＥに対応する文字列code（ωＫ）
を辞書から読み出し、ステップＳ７で文字列Ｋを一時的
にスタックし、参照番号code（ω）を新たなＣＯＤＥと
して再度ステップＳ６に戻し、このステップＳ６，ステ
ップＳ７の手順を再帰的に参照番号ωが一文字にいたる
まで繰り返し、最後にステップＳ８に進んでステップＳ
７でスタックした文字をＬＩＬＯ（Last In Fast Out）
形式でポップアップして出力する。In step S6, the division dictionary D _PK
Character string code (ωK) corresponding to the index CODE of
From the dictionary, the character string K is temporarily stacked in step S7, the reference number code (ω) is returned to step S6 as a new CODE, and the procedure of steps S6 and S7 is recursively performed. Is repeated until one character is reached, and finally the process proceeds to step S8 and step S8.
Characters stacked in 7 are LILO (Last In Fast Out)
Pop up in a format and output.

【００４８】次にステップＳ９において、分割辞書Ｄ_PK
のインデックスindc（ＰＫ）が、分割辞書Ｄ_PKの辞書サ
イズdmax（ＰＫ）より大きいかどうかをチェックする。
インデックスindc（ＰＫ）が辞書サイズdmax（ＰＫ）を
越えた場合、ステップＳ１０に進み、辞書サイズdmax
（ＰＫ）及び最終絶対アドレスamax（ＰＫ）を更新す
る。Next, in step S9, the division dictionary D _PK
, Index indc (PK) is larger than the dictionary size dmax (PK) of the division dictionary D _PK .
If the index indc (PK) exceeds the dictionary size dmax (PK), the process proceeds to step S10, and the dictionary size dmax
(PK) and the final absolute address amax (PK) are updated.

【００４９】また、インデックスindc（ＰＫ）が辞書サ
イズdmax（ＰＫ）を越えない場合はステップＳ１１に進
む。ステップＳ１１では、直前の符号ＯＬＤcodeと直前
文字列の最終文字Ｋとの組合せアドレスindc（ＰＫ１）
の分割辞書Ｄ_PKに登録する。次にインデックスindc（Ｐ
Ｋ１）の値をインクリメントし、ステップＳ１２に進
む。ステップＳ１３では、復元文字列の第１文字をcha
r、復元文字列の最終文字をＫ１に、履歴ＰＫをＰＫ１
に、ＬＵＴ（Ｋ１）をＰＫに、ＮＥＷcodeをＯＬＤcode
に各々代入し、ステップＳ１４を経てステップＳ３に戻
る。If the index indc (PK) does not exceed the dictionary size dmax (PK), the process proceeds to step S11. In step S11, a combination address indc (PK1) of the immediately preceding code OLDcode and the last character K of the immediately preceding character string
Register in the divided dictionary D _PK . Then the index indc (P
The value of K1) is incremented, and the process proceeds to step S12. In step S13, the first character of the restored character string is cha
r, the last character of the restored character string is K1, the history PK is PK1
, LUT (K1) to PK, NEWcode to OLDcode
, And returns to step S3 via step S14.

【００５０】この復号化処理の際にも、例えば図７に示
したと同様な分割辞書の容量の追加が行われる。尚、上
記の実施例における分割辞書の容量の追加は、現状サイ
ズを２倍することで、２５６、５１２、１０２４、２０
４８，・・・と増加させる場合を例にとるものであった
が、増加させる辞書サイズを例えば初期設定時の２５６
に固定し、２５６、５１２、７６８，１０２４，・・・
と増加させてもよく、辞書サイズの増加の仕方は必要に
応じて適宜に定めることができる。Also in this decoding process, the capacity of the divided dictionary similar to that shown in FIG. 7, for example, is added. It should be noted that the addition of the capacity of the divided dictionary in the above-described embodiment is performed by doubling the current size, which is 256, 512, 1024,
Although the case of increasing the number to 48, ... Is taken as an example, the dictionary size to be increased is set to, for example, 256 at the time of initial setting.
Fixed to 256, 512, 768, 1024, ...
The method of increasing the dictionary size can be appropriately determined as necessary.

【００５１】[0051]

【発明の効果】以上説明したように本発明によれば、分
割辞書の容量を符号化時又は復号化時の登録数に応じて
増加させることにより、全体としての辞書容量を有効に
使用することができ、かつ、どのようなデータに対して
も短い参照番号（インデックス）の符号語にできるので
高圧縮率が期待出来る。As described above, according to the present invention, the capacity of the divided dictionary is increased according to the number of registrations at the time of encoding or decoding, so that the dictionary capacity as a whole is effectively used. Since a code word with a short reference number (index) can be used for any data, a high compression rate can be expected.

[Brief description of drawings]

【図１】本発明の原理説明図FIG. 1 is an explanatory diagram of the principle of the present invention.

【図２】本発明の符号化処理を示した作用説明図FIG. 2 is an operation explanatory view showing an encoding process of the present invention.

【図３】本発明の復号処理を示した作用説明図FIG. 3 is an operation explanatory view showing a decoding process of the present invention.

【図４】本発明による辞書容量の増加を示した作用説明
図FIG. 4 is an operation explanatory view showing an increase in dictionary capacity according to the present invention.

【図５】本発明の実施例構成図FIG. 5 is a configuration diagram of an embodiment of the present invention.

【図６】本発明の符号化の詳細に示したフローチャートFIG. 6 is a flow chart showing details of encoding of the present invention.

【図７】図６の符号化おいて行われる分割辞書の辞書容
量の追加の状況を示した説明図7 is an explanatory diagram showing a situation in which the dictionary capacity of a divided dictionary is added in the encoding of FIG.

【図８】本発明の復号化の詳細を示したフローチャートFIG. 8 is a flowchart showing details of the decoding of the present invention.

【図９】従来のＬＺＷ符号化を示したフローチャートFIG. 9 is a flowchart showing conventional LZW encoding.

【図１０】従来のＬＺＷ復号化を示したフローチャートFIG. 10 is a flowchart showing conventional LZW decoding.

【図１１】従来のＬＺＷ符号化の具体例を示した説明図FIG. 11 is an explanatory diagram showing a specific example of conventional LZW encoding.

【図１２】従来のＬＺＷ符号化で作成される辞書構成を
具体的に示した説明図FIG. 12 is an explanatory diagram specifically showing a dictionary structure created by conventional LZW encoding.

【図１３】従来のＬＺＷ復号化の具体例を示した説明図FIG. 13 is an explanatory diagram showing a specific example of conventional LZW decoding.

【図１４】従来のＬＺＷ符号における辞書の木構成図FIG. 14 is a tree structure diagram of a dictionary in a conventional LZW code.

【図１５】従来のＬＺＷ符号化による文字列間の関係を
示した説明図FIG. 15 is an explanatory diagram showing a relationship between character strings by conventional LZW encoding.

【図１６】本願発明者等が提案している分割辞書方式の
辞書の木構成図FIG. 16 is a tree configuration diagram of a dictionary of a split dictionary system proposed by the inventors of the present application.

【図１７】本願発明者等が提案している分割辞書方式で
符号化する文字列間の関係を示した説明図FIG. 17 is an explanatory diagram showing a relationship between character strings encoded by the division dictionary method proposed by the inventors of the present application.

[Explanation of symbols]

１０：辞書１０−０〜１０−２５５：分割辞書１２：初期設定手段（初期設定ソフト）１４：辞書検索手段（辞書検索ソフト）１６：符号化手段（符号化ソフト）１８：辞書容量増加手段（辞書容量増加ソフト）２０，２４：分割辞書登録手段（分割辞書登録ソフト）２２：復号化手段（復号化ソフト）２６：ＣＰＵ２８：プログラムメモリ３０：コントロールソフト３２：データメモリ３４：データバッファ 10: Dictionary 10-0 to 10-255: Divided dictionary 12: Initial setting means (initial setting software) 14: Dictionary search means (dictionary search software) 16: Encoding means (encoding software) 18: Dictionary capacity increasing means ( Dictionary capacity increasing software) 20, 24: Divided dictionary registration means (divided dictionary registration software) 22: Decoding means (decoding software) 26: CPU 28: Program memory 30: Control software 32: Data memory 34: Data buffer

───────────────────────────────────────────────────── フロントページの続き (72)発明者千葉広隆神奈川県川崎市中原区上小田中1015番地富士通株式会社内 ─────────────────────────────────────────────────── --- Continuation of the front page (72) Inventor Hirotaka Chiba 1015 Kamiodanaka, Nakahara-ku, Kawasaki City, Kanagawa Prefecture Fujitsu Limited

Claims

[Claims]

1. A data compression method for encoding an input character string by an encoded character substring registered in a dictionary (10) having a finite size, wherein a plurality of the dictionary (10) are provided. A divided dictionary (10-0 to 10-n) configured by dividing, initial setting means (12) for setting a minimum dictionary size for each divided dictionary (10-1 to 10-n), and input characters At the time of encoding a string, a specific division dictionary (10-i) is designated by a history based on the last character group in the character string that has been encoded immediately before, and is registered in the designated division dictionary (10-i). The dictionary search means (14) for searching a substring having the maximum length match among the already encoded substrings, and the maximum length match with the input character string searched from the divided dictionary by the dictionary search means (14) An encoding means (16) for outputting the reference number of the subsequence as a codeword; If the reference number of the division dictionary (10-i) used for the encoding exceeds a predetermined dictionary size, the division dictionary (10-i)
The dictionary capacity increasing means (18) for increasing the dictionary capacity of -i) and maintaining the dictionary size as it is when it does not exceed, and when the code word is output by the encoding means (16), A data compression method, comprising: a division dictionary registration means (20) for registering a character string to which one input character is added in a division dictionary specified by a history based on the last character group of the codeword.

2. A data restoration method for decoding a codeword string input by a decoded character substring registered in a dictionary (10) having a finite size, wherein a plurality of said dictionary (10) are provided. Divided dictionary (10-0 to 10-n) configured by dividing the divided dictionary, initial setting means (12) for setting a minimum dictionary size for each divided dictionary, and immediately before decoding the input codeword. A dictionary based on the history based on the final character group in the decoded character string (10-
0 to 10-n) to search for a specific divided dictionary (10-i), and a reference number that matches the input codeword of the searched divided dictionary (10-i). Decoding means (22) for decoding a character or a character string based on the registered contents, and the division dictionary (10-i) used for the decoding when the reference number exceeds a predetermined dictionary size, the division dictionary. (10
The dictionary capacity increasing means (18) for increasing the dictionary capacity of -i) and maintaining the dictionary size as it is when it does not exceed, and the previous input codeword when the character string is decoded by the decoding means (22). A division dictionary registration means (24) for registering a character string obtained by adding the first character of the restored character string to a division dictionary designated by a history based on the last character group of the character string decoded immediately before is provided. Data restoration method characterized by

3. The data compression and decompression method according to claim 1 or 2, wherein said initial setting means (12) is provided for each division dictionary (10-0 to 0-0).
10-n) at least all character types are assigned a reference number in a character unit to set a minimum dictionary size that can be registered, and all character types are initially registered in a character number with a reference number. Data compression and decompression method.

4. The data compression and decompression method according to claim 1 or 2, wherein the dictionary capacity increasing means (18) has a predetermined reference number of a division dictionary used for encoding or decoding. A data compression and decompression method characterized by increasing the capacity of the division dictionary to twice the current size each time the size is exceeded.

5. The data compression and decompression system according to claim 1 or 2, wherein the dictionary capacity increasing means (18) has a predetermined reference number of a divided dictionary used for encoding or decoding. A data compression and decompression method characterized by increasing the dictionary capacity to a size obtained by adding a predetermined minimum unit to the current size each time the size is exceeded.