JPH03270417A

JPH03270417A - Data compressing and decoding system

Info

Publication number: JPH03270417A
Application number: JP2070379A
Authority: JP
Inventors: Shigeru Yoshida; 茂吉田; Yasuhiko Nakano; 泰彦中野; Yoshiyuki Okada; 佳之岡田; Hirotaka Chiba; 広隆千葉
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1990-03-20
Filing date: 1990-03-20
Publication date: 1991-12-02
Anticipated expiration: 2013-07-09
Also published as: JP2774350B2

Abstract

PURPOSE:To improve the rate of compression between character strings by preparing a dictionary to register a partial character string to be followed for each final character string of the preceding partial character string or for each group formed by the final character and applying the registration number of the character string to be registered to the dictionary for each dictionary. CONSTITUTION:In belonging relation with the preceding final character, the codes of current partial character strings ab and abc are applied. A tree 11 is composed of the head character and the developed character for respective final characters (a)-(c) of the preceding character string, and the numbers (indexes) of the respective character strings ab and abc are applied for each tree 11. Therefore, when the respective characters (a)-(c) appear with the equal probability, the length of the index (of the registration numbers of the respective partial character strings ab and abc in the tree 11 of each dictionary 2) is made short. Thus, the length of the code to identify the partial character strings ab and abc is made short and the rate of compression is improved.

Description

【発明の詳細な説明】〔概要〕ＬＺＷ符号によるデータ圧縮および復元方式符号化文字
列に対して直前の文字列の最終文字との従属関係を辞書
に取り込むことによって文字列間の圧縮率を高めること
を目的とし、任意の連続する二つの文字部分列において
、前の文字部分列の最終文字ごとにもしくは最終文字に
よるグループごとに続く文字部分列を登録する辞書を作
成し、辞書に登録する文字列の登録番号は各個別辞書ご
とに付与し、文字列の符号化に際し、直前の文字列の最
終文字もしくは最終文字グループに対応する個別辞書に
おける該当する文字部分列の登録番号に基づいて現文字
部分列の符号語を作成し、出力する構成を持つ。[Detailed Description of the Invention] [Summary] Data compression and decompression method using LZW code The compression ratio between character strings is increased by incorporating the dependency relationship between the encoded character string and the last character of the immediately preceding character string into a dictionary. With the aim of A column registration number is assigned to each individual dictionary, and when encoding a string, the current character is assigned based on the registration number of the corresponding character substring in the individual dictionary that corresponds to the last character or final character group of the previous string. It has a configuration that creates and outputs a code word for a subsequence.

[Industrial application field]

コンピュータ、データ通信等におけるような大量のデー
タを扱う場合、データの中の冗長な部分を省いて、デー
タを圧縮することで、記憶容量を減らしたり、通信速度
を高速化することができるようになる。When handling large amounts of data, such as in computers, data communications, etc., by eliminating redundant parts of the data and compressing the data, it is possible to reduce storage capacity and speed up communication speed. Become.

データを圧縮する方法として、入力文字列を順次具なる
文字部分列ごとに順次符号化して記憶し、符号化済の過
去の最長の文字部分列の複製として現在の文字部分列を
符号化するデータ圧縮方式がある。As a data compression method, an input character string is sequentially encoded and stored for each character substring, and the current character substring is encoded as a copy of the longest previously encoded character substring. There are compression methods.

そのような従来のデータ圧縮方式においては、文字列を
相異なる文字部分列に分けて符号化するとき、現在符号
化しようとする現文字部分列は出現した文字列とは独立
に出現するものとして符号化していた。In such conventional data compression methods, when a character string is divided into different character substrings and encoded, the current character substring to be encoded is assumed to appear independently of the character string that appeared. It was encoded.

実際の文章等のデータにおいては、各文字部分列間に相
関関係があるものであり、上記のように従来の符号化方
式においては、文字部分列の出現する履歴を利用してな
く、現文字部分列の過去に出現した文字部分列との従属
性については考慮されていないため、データ圧縮におけ
る冗長性として残されていた。In actual data such as sentences, there is a correlation between each character substring, and as mentioned above, conventional encoding methods do not use the history of character substrings to appear, but rather Dependency of a substring with character substrings that appeared in the past was not taken into consideration, so it was left as a redundancy in data compression.

本発明は、符号化文字部分列に対して直前の文字列の最
終文字との従属関係を辞書に取りことによって文字部分
列におけるデータ圧縮の冗長性を削減し、圧縮率を高め
た増分分解型のＬＺ　Ｗ　（Ｚｉｖ−Ｌｅａ＋ｐｅｌ−
Ｗｅｌｃｈ）符号によるデータ圧縮および復元方式に関
するものである。The present invention provides an incremental decomposition type that reduces redundancy in data compression in a character substring and increases the compression rate by taking the dependency relationship between the encoded character substring and the last character of the immediately preceding character string in a dictionary. LZ W (Ziv-Lea+pel-
The present invention relates to a data compression and decompression method using Welch) codes.

[Prior art]

ＬＺＷ符号によるデータ圧縮は、文字列を異なる文字部
分列に分は辞書に登録されている過去に出現した文字列
のうちから最長−散文字列を探し、その番号により符号
化する。同時に、一致した最長文字列より一文字延ばし
た文字列をあらたに新たに出現した文字列として辞書に
登録するものである。Data compression using the LZW code involves dividing a character string into different character substrings by searching for the longest-dispersed character string among character strings that have appeared in the past and are registered in a dictionary, and encoding the string using its number. At the same time, a character string that is one character longer than the longest matched character string is registered in the dictionary as a newly appearing character string.

第１２〜１４図により従来のＬＺＷ符号化方式を説明す
る。The conventional LZW encoding method will be explained with reference to FIGS. 12 to 14.

第１２図（ａ）、　（ｂ）、　（Ｃ）は、簡単のため、
ａ、ｂ。Figures 12(a), (b), and (C) are for simplicity,
a, b.

Ｃの３文字よりなる場合についてデータを符号化して圧
縮する場合および復号する場合を示している。The case where data is encoded and compressed and the case where it is decoded are shown for the case of three characters of C.

ＬＺＷ符号では、予め辞書に全文字につき一文字からな
る文字列を初期値として登録してから符号化を始める。In the LZW code, encoding is started after a character string consisting of one character for each character is registered in a dictionary as an initial value in advance.

そして、文字列から、辞書に登録しである最長−数文字
部分列を捜し、その登録符号ωを符号として出力する。Then, from the character string, the longest sub-sequence of characters registered in the dictionary is searched, and the registered code ω is outputted as the code.

一方、辞書には、その最長−散文字列に、不一致となっ
た次の一文字（拡張文字Ｋ）を足した文字列を継ぎ足し
た文字列を（ω、Ｋ）の組で表して辞書に登録する。On the other hand, in the dictionary, a character string obtained by adding the next character (extended character K) that does not match to the longest scattered character string is added, and the character string is represented as a pair (ω, K) and registered in the dictionary. do.

復号は符号化の逆の操作を行う。即ち、入力した符号ω
に対応する文字列の表現の組（ωＫ）を求める。次に、
同様に、ω°に対応する文字列の表現の組を求め、その
つと求めた拡張文字Ｋをスタックしておく。Decoding is the inverse of encoding. That is, the input code ω
Find a set of character string representations (ωK) corresponding to . next,
Similarly, a set of character string representations corresponding to ω° is obtained, and the obtained extended character K is stacked.

この手順を繰り返して、番号ωが一文字にいこるまで繰
り返し最後にスタックした文字を出力し、各文字列を復
号する。This procedure is repeated until the number ω reaches one character, and the last stacked character is output, and each character string is decoded.

そして、辞書には、前回使った符号ωと今回復元した文
字列の第１文字にの組（ω、Ｋ）を登録し、復元辞書を
更新する。Then, the combination (ω, K) of the previously used code ω and the first character of the character string restored this time is registered in the dictionary, and the restoration dictionary is updated.

第１２図（ａ）、（ｂ）により、従来（７）ＬＺＷ符号
化方法をａ、ｂ、ｃの三文字のみよりなる場合について
、具体的に説明する。Referring to FIGS. 12(a) and 12(b), a case in which the conventional (7) LZW encoding method consists of only three characters a, b, and c will be specifically explained.

先ず、−文字ａ、ｂ、ｃについては、図（ｂ）に示すよ
うに最初に辞書に登録しておく。First, - characters a, b, and c are first registered in the dictionary as shown in Figure (b).

（１）　図（ａ）に示す入力文字列において、先ず、先
頭の文字ａを入力する。ａは辞書にあるので、ａの符号
が１なので、ω＝１として、次の文字ｂ　（Ｋ＝ｂ）を
入力する。その文字部分列ａｂ（ω＝１．に＝ｂ）は辞
書に登録されていないので、ａを出力し、ａｂ（Ｉｂ）
を登録コード４で辞書に登録する。(1) In the input character string shown in Figure (a), first input the first character a. Since a is in the dictionary, the code of a is 1, so we set ω=1 and input the next character b (K=b). Since the character substring ab (=b in ω=1.) is not registered in the dictionary, a is output and ab(Ib)
is registered in the dictionary with registration code 4.

（２）　次に、いま入力したｂを文字列の先頭文字とし
て（ω＝２とする）、次の文字ａ（Ｋ＝ａ）を入力する
。(2) Next, input the next character a (K=a) with the b you just input as the first character of the character string (ω=2).

そこで、作成された文字部分列ｂａ（ω＝２゜Ｋ＝ａ）
は、辞書に未登録であるので、ｂを出力し、ｂａ　（２
ａ）を登録コード５として、に登録する。Therefore, the created character substring ba (ω=2°K=a)
is not registered in the dictionary, so output b and ba (2
Register a) as registration code 5.

（３）　さらに、（２）で入力したａを文字列の先頭と
して（ω＝１）次の文字ｂ　（Ｋ＝ｂ）を入力する。(3) Furthermore, input the next character b (K=b) with a, which was input in (2), as the beginning of the character string (ω=1).

文字列ａｂは、既に登録コード４で登録されているので
、ωを現在の文字部分列ａｂの登録コード４として続く
文字Ｃを入力する（ω−４゜Ｋ＝ｃ）。Since the character string ab has already been registered with the registration code 4, the following character C is input with ω as the registration code 4 of the current character substring ab (ω−4°K=c).

文字列ａｂｃは未登録であるので、一致した最長文字部
分列ａｂを登録コード４で出力し、文字列ａｂｃ（４ｃ
）を登録コード６で、辞書に登録する。Since the character string abc is unregistered, the longest matching character substring ab is output with the registration code 4, and the character string abc(4c
) in the dictionary with registration code 6.

（４）　そこで、今入力した文字Ｃを文字列の先頭文字
として（ω＝３）、続く文字すを入力する（Ｋ＝ｂ）。(4) Then, the character C that has just been input is set as the first character of the character string (ω=3), and the following character S is input (K=b).

文字列ｃｂ（３ｂ）は辞書に未登録であるので、文字Ｃ
を登録コード３（初期化において登録済）で出力し、文
字列ａｂ（３ｂ）を登録コード７として辞書に登録する
。The character string cb (3b) is not registered in the dictionary, so the character C
is output as registration code 3 (registered in initialization), and the character string ab (3b) is registered as registration code 7 in the dictionary.

以下同様の手順により、続く文字列について、登録済の
文字部分列から一致する最大長の文字部分列の登録コー
ドにより出力し、未登録の文字部分列は新たに登録コー
ドを定めて辞書に登録し、辞書を更新してゆく。Following the same procedure, the following character strings will be output using the registered code of the longest matching character substring from the registered character substrings, and unregistered character substrings will be registered in the dictionary with a new registration code. and update the dictionary.

図（ｂ）は図（ａ）の入力文字列について、ＬＺＷ符号
により作成した参照辞書を示す。Figure (b) shows a reference dictionary created using the LZW code for the input character string in figure (a).

入力されたコードを復号する場合には、例えば、符号８
が入力されると、図示の変換テーブルにより、８−５ｂ
を読み取り、次に５＝２８、であるので、８＝２ａｂと
し、さらに２＝ｂから文字列ｂａｂを復号する。When decoding the input code, for example, the code 8
When 8-5b is input, the conversion table shown in the figure
Then, since 5=28, 8=2ab is read, and then the character string bab is decoded from 2=b.

図（Ｃ）に、ＬＺＷ符号の復号方式を示す。Figure (C) shows a decoding method for the LZW code.

図（Ｃ）は図（ａ）における文字列の出力コードを復号
する場合を示す。Figure (C) shows the case where the output code of the character string in Figure (a) is decoded.

復号は、符号化の手順の逆の操作を行う。Decoding is the reverse of the encoding procedure.

あらかしめ、初期化において、登録辞書には一文字ａ、
ｂ、ｃをそれぞれコード１，２．３として登録しておく
。During initialization and initialization, the registered dictionary contains one character a,
b and c are registered as codes 1, 2.3, respectively.

図示の入力コードを復号する場合により、従来のＬＺＷ
符号の復号手順を説明する。When decoding the input code shown, the conventional LZW
The code decoding procedure will be explained.

（１）　　先ず、最初の入力コード１が入力されると、
辞書を参照して文字ａを出力する。(1) First, when the first input code 1 is input,
Output the character a by referring to the dictionary.

ここで、入力コード１は直前コードレジスタ（Ｏｌｄｃ
ｏｄｅ　）に残しておく。Here, input code 1 is the previous code register (Oldc
ode).

（２）　次に入力コード２によりｂを出力する。(2) Next, output b using input code 2.

このとき、（１）の処理における入力コードｌと、今復
号した文字列のｂの組の符号１ｂを辞書に登録コード４
として登録し、辞書を復元する。At this time, the code 1b of the input code l in the process of (1) and the character string b just decoded is registered in the dictionary as code 4.
Register as and restore the dictionary.

そして、入力コード２は、０１ｄｃｏｄｅに保存してお
く。Input code 2 is then saved in 01dcode.

（３）　次に入力コード符号４により辞書を参照して１
ｂを読み取り、さらにｌｂから文字列ａｂを復号する。(3) Next, refer to the dictionary using the input code code 4 and
Read b and further decode the character string ab from lb.

そして、（２）で復元した文字列の第１文字ａと０１ｄ
ｃｏｄｅの２とにより変換コード２ａを辞書に登録コー
ド５で登録する。Then, the first characters a and 01d of the character string restored in (2)
With code 2, conversion code 2a is registered in the dictionary as registration code 5.

入力コード４は０１ｄｃｏｄｅに移して保存する。Input code 4 is moved to 01dcode and saved.

（４）　次の入力コード３を入力する。(4) Enter the next input code 3.

３はＣとして辞書に登録済であるから文字Ｃを復元し、
０１ｄｃｏｄｅのコード４といま復元した文字Ｃにより
変換コード４ｃを辞書に登録コード６で登録する。3 is already registered in the dictionary as C, so restore the letter C,
Using the code 4 of 01dcode and the character C just restored, the conversion code 4c is registered in the dictionary as the registration code 6.

そして、入力コード３を０１ｄｃｏｄｅに移して保存す
る。Then, input code 3 is moved to 01dcode and saved.

（５）　次の入力コード５を読み取る。(5) Read the next input code 5.

入力コード５は既に変換コード２ａとして登録されてい
るので、２ａより、文字部分列ｂａを復号する。Since input code 5 has already been registered as conversion code 2a, character substring ba is decoded from 2a.

そして、今復号した文字部分列の第１文字すと０１ｄｃ
ｏｄｅに保存されているコード３により３ｂを辞書に登
録コード７で登録する。入力コード５は０１ｄｃｏｄｅ
に移して保存する。Then, the first character of the character substring just decoded is 01dc
3b is registered in the dictionary with the registration code 7 using the code 3 stored in ode. Input code 5 is 01dcode
and save it.

同様の順を繰り返して、入力コードを順次復号し、辞書
を更新してゆく。The same sequence is repeated to sequentially decode the input codes and update the dictionary.

第１３図に、従来のＬＺＷ符号化方式の符号化のフロニ
を示す。FIG. 13 shows a coding diagram of the conventional LZW coding method.

上記の文字列ａｂａｂｃについて符号化する場合を例と
してフローを説明する。The flow will be explained by taking as an example the case where the above character string ababc is encoded.

■　初期化において、−文字ａ、ｂ＋Ｃを辞書に登録す
る（ａ＝１、ｂ＝２、ｃ　＝　３　）　＊同時に、辞書
の先頭アドレスｎを設定する。(2) During initialization, - characters a, b+C are registered in the dictionary (a=1, b=2, c=3) *At the same time, the starting address n of the dictionary is set.

図のフローは、２５６文字ある場合についてのものであ
るので、先頭アドレスとしてｎ＝２５６を設定しである
が、今の場合は、ａ、　　ｂＣ三文字のみの場合を考え
ているので、ｎ＝４を初期値として考える。The flow in the figure is for a case where there are 256 characters, so we set n=256 as the first address, but in this case, we are considering a case where there are only three characters a and bC, so n=256. Consider 4 as the initial value.

（１）　■　文字列の第１文字Ｋ　（Ｋ＝ａ）を語頭文
字列のωとする。(1) ■ Let the first character K (K=a) of the character string be ω of the initial character string.

（２）　■　次の文字すを読む。(2)　■　Read the next letter.

■　入力文字列の最後の文字の処理を終わっている場合
には、次の文字にはないので、終了処理に進む。■ If the last character of the input string has been processed, the next character does not exist, so proceed to end processing.

今は、入力文字列があるので、■に進む。Now that we have an input string, proceed to ■.

■　今ω＝ａ、に＝ｂであり、ωに＝ａｂの文字部分列
は辞書にないので、■に進む。■ Now ω=a, ni=b, and the character substring ω=ab is not found in the dictionary, so proceed to ■.

■　ω＝ａの登録コード（コード（ω））として１を出
力する。■ Output 1 as the registration code (code (ω)) for ω=a.

■　入力文字の登録コードをωに移し、同時に辞書のア
ドレスｎを１つ進める。■ Move the registration code of the input character to ω, and at the same time advance the dictionary address n by one.

（３）　再び、■に戻り、次の文字ａを入力する。今、
ω＝２であるので■でωに＝１ｂを辞書と照合する。未
登録であるので、■でｂの登録コード２（コード（ω）
）を出力する。モして■でωに＝２　ａを辞書に登録コ
ード５　（ｎ＝５）で登録する。さらにωにいま入力し
た文字ａのコード１を移す。そして再び■に戻る。(3) Return to ■ again and input the next character a. now,
Since ω=2, check ω=1b with the dictionary in ■. Since it is unregistered, use ■ to register b's registration code 2 (code (ω)
) is output. Then, use ■ to register ω = 2 a in the dictionary with registration code 5 (n = 5). Furthermore, the code 1 of the character a that was just input is transferred to ω. Then return to ■ again.

（４）　次の文字すを入力する。(4) Enter the next character.

この場合は、■における判断で、ωに＝１ｂは登録済で
あるので、■に進み、ωに＝１ｂをωとする。そこで、
■に戻る。In this case, as determined in step (2), since ω=1b has already been registered, proceed to step (2) and set ω=1b to ω. Therefore,
Return to ■.

（５）　次の文字ＣをＫとして入力し、以降の処理を行
う。(5) Input the next character C as K and perform the subsequent processing.

■は、最終文字が、入力済の場合で、最終文字は■の処
理でωに入力されている状態であるから、そのコード（
コード（ω））を出力して符号の作成処理を終了する。■ is a case where the final character has already been input, and the final character has been input to ω in the process of ■, so the code (
The code (ω)) is output, and the code creation process ends.

第１４図は、従来のＬＺＷ符号化方式のフローを示す。FIG. 14 shows the flow of the conventional LZW encoding method.

第１２図において例として説明した文字列の出力符号を
用いてフローを説明する。The flow will be explained using the output code of the character string explained as an example in FIG.

復号化には、入力コードメモリ（ｌＮｃｏｄｅ　　）＝
入力コードを格納する入力メモリ（ＩＮｃｏｄｅ）　。For decoding, input code memory (lNcode) =
Input memory (INcode) that stores input codes.

直前コードを格納するメモリ（Ｏｌｄｃｏｄｅ　）　、
復元文字の第１文字を格納するメモリ（ＦＩＮｃｈａｒ
）順次復元される復号文字を１時格納するメモリ（スタ
ック）を用いる。Memory for storing the previous code (Oldcode),
Memory that stores the first character of restored characters (FINchar
) A memory (stack) is used to temporarily store decoded characters that are sequentially restored.

■　初期化により、−文字についての符号は予め、作成
しておく。■ By initialization, the code for the - character is created in advance.

（１）　　！ｌ初のコードを読み込む（今の場合、ｌ）
。０１ｄｃｏｄｅに読み込んだコードを入れる。(1)! lRead the first code (in this case, l)
. Insert the read code into 01dcode.

入力コード（１）と辞書の登録コードを参照して、文字
にとしてａを出力する。Referring to the input code (1) and the dictionary registration code, output a as a character.

ａをＦＩＮｃｈａ　ｒに移して一時保存する。Move a to FINchar and save it temporarily.

（２）　■　次のコード（ＣＯＤＥ）を読み、ＩＮｃｏ
ｄｅに入れる。(2) ■ Read the following code (CODE) and
Put it in de.

■　最後の符号まで読み取ってコードがない場合には処
理を終了する。■ If no code is found after reading up to the last code, the process ends.

■いまの場合は次のコード２があるので、■に進む。■In this case, there is the next code 2, so proceed to ■.

■　Ｋをスタックに移す。第１２図（ｂ）における登録
コード８のように、８が５ｂで５が２ａであるようなコ
ードの場合は、ωにの文字Ｋを順次スタックに入力し、
ωを順次変換していってωＫが一文字になるまで処理を
繰り返す。■ Move K to the stack. In the case of a code where 8 is 5b and 5 is 2a, such as the registered code 8 in FIG. 12(b), input the letter K in ω into the stack sequentially,
The process is repeated by sequentially converting ω until ωK becomes one character.

今の例の場合にとしてコード２に対応するｂをスタック
に格納する。In the present example, b corresponding to code 2 is stored on the stack.

■　スタックに格納された文字すを出力する。■ Output the characters stored in the stack.

■　復号文字の第１文字、いまの場合は文字すをＦｒＮ
ｃｈａｒに格納する。■ The first character of the decoded character, in this case the character FrN
Store in char.

■　（Ｏｌｄｃｏｄｅ　、　　Ｋ）の組を辞書に登録す
る。■ Register the pair (Oldcode, K) in the dictionary.

今の場合、０１ｄｃｏｄｅには初期化において読み込ん
だ文字列の第１文字ａのコードｌが格納されている。ま
たに＝ｂであるのでｌｂを登録コード４で格納する（新
しく登録する文字部分列の登録番地（＝登録コード）ｎ
は４から始まるものとする）。In this case, the code l of the first character a of the character string read during initialization is stored in 01dcode. Also, since =b, store lb with registration code 4 (registration address (=registration code) n of character substring to be newly registered)
shall start from 4).

ｎをｌインクリメントする。Increment n by l.

＠　　０１ｄｃｏｄｅにＩＮｃｏｄｅのデータを移す。Move INcode data to @01dcode.

いまの場合、ＩＮｃｏｄｅのデータは２であるから、０
１ｄｃｏｄｅを２とする。In this case, the INcode data is 2, so it is 0.
1dcode is set to 2.

（３）　そこで、■に戻り、次の符号を読む。(3) Then, return to ■ and read the next code.

次の符号は４である。そこで、辞書を参照すると符号４
はｌｂであるから、■でωに＝１ｂから、スタックに順
次すとａを格納し、■で文字ａｂを出力する。さらに、
■で復号文字列の第１文字ａをＦＩＮｃｈａｒに格納す
る。そして、いま０１ｄｃｏｄｅは２と復号文字の第１
文字をに＝ａにより、（２，ａ）の組合せにより、符号
２ａを辞書に登録コード５　（ｎ＝５）で登録する。The next code is 4. Therefore, when referring to the dictionary, the code 4 is
Since is lb, by sequentially writing a to the stack from ω=1b with ■, output the character ab with ■. moreover,
3 stores the first character a of the decrypted character string in FINchar. And now 01dcode is 2 and the first decoded character
By setting the character to =a and using the combination (2, a), the code 2a is registered in the dictionary with the registration code 5 (n=5).

そしてｎを１インクリメントする。Then, n is incremented by 1.

さらに、０１ｄｃｏｄｅに入力符号を移し、いつの場合
０１ｄｃｏｄｅを４とする。そして、■以降の処理を繰
り返す。Furthermore, the input code is moved to 01dcode, and 01dcode is set to 4 in any case. Then, repeat the process from ① onwards.

上記のように、入力コードを復号し、復号した文字列の
コードを記憶しておいて、次の入力コードにより、次の
文字列を復号した時点においして、記憶しであるコード
に対応する文字部分列より１文字延ばした、未登録の文
字部分列を辞書に登録し、辞書を復元する。As mentioned above, decode the input code, memorize the code of the decoded string, and use the next input code to correspond to the stored code at the time the next string is decoded. An unregistered character substring that is one character longer than the character substring is registered in the dictionary, and the dictionary is restored.

図のフローにおいて、■は、例外的な処理の場合である
。In the flowchart in the figure, ■ is a case of exceptional processing.

上記のように、ＬＺＷ号符号による圧縮処理では、符号
化においては、注目文字部分列の符号化を終了した時点
で、−文字のばした文字部分列を辞書に登録できるが、
復号化において。As mentioned above, in the compression process using the LZW code, when the encoding of the character substring of interest is completed, the character substring with the - character extended can be registered in the dictionary.
In decoding.

注目文字列を１文字延ばすときは、次の文字部分列の先
頭文字と合わせて辞書に登録するため、次の文字列の復
号が終了した時点でないと登録を行うことができない。When extending the character string of interest by one character, it is registered in the dictionary along with the first character of the next character substring, so registration cannot be performed until the decoding of the next character string is completed.

そのため、入力された符号を復号するために必要な登録
コードが、辞書に登録されていないような場合を生しる
ことがある。このような場合には、入力符号を復号でき
ないわけであるが、０はその場合の復号処理を行うため
のものである。Therefore, a registration code necessary for decoding an input code may not be registered in the dictionary. In such a case, the input code cannot be decoded, but 0 is used to perform decoding processing in that case.

例えば、第１２図（Ｃ）の入力コードにおいて、入力コ
ードｌＯが入力された場合を考える。For example, consider a case where an input code IO is input in the input codes shown in FIG. 12(C).

この時、０１ｄｃｏｄｅは１であり、ＦＩＮｃｈａｒは
ａである。At this time, 01dcode is 1 and FINchar is a.

この時点では、辞書への登録は登録コード９までてあり
、１０は登録されていない。At this point, registration codes up to 9 have been registered in the dictionary, and 10 has not been registered.

そのため、入力コード１０を復元することができない。Therefore, the input code 10 cannot be restored.

そこで、■により、ＦＩＮｃｈａｒのａおよび、０１ｄ
ｃｏｄｅの１により、１ａをＩＮｃｏｄｅに入力する。Therefore, by ■, FINchar's a and 01d
With code 1, input 1a to INcode.

その後は、■以降の通常の場合と同様の処理により、ａ
ａを出力し、１ａを辞書に符号１０で登録することがで
きる。After that, by the same process as in the normal case after ■, a
a can be output and 1a can be registered in the dictionary with the code 10.

[Problem to be solved by the invention]

従来のＬＺＷ符号化方式では、入力文字列を相異なる文
字部分列に分けて符号化するとき、現在符号化中の各文
字部分列は過去に出現した文字部分列とは独立に出現し
たものとして符号化をしていた。In the conventional LZW encoding method, when an input character string is divided into different character substrings and encoded, each character substring currently being encoded is assumed to have appeared independently of character substrings that have appeared in the past. It was encoding.

この方法では、無記憶情報源（リアルタイム入力データ
）の符号化には問題ない、しかし、実際の文章等多くの
データは記憶情報源とみなされ、ＬＺＷ符号では文字列
が出現する履歴を十分利用できておらず、データ圧縮後
も、文字列の出現の従属性について考慮していない分は
冗長性として残る欠点があった。This method has no problem encoding memoryless information sources (real-time input data), but many data such as actual sentences are considered memory information sources, and LZW codes make full use of the history of occurrences of character strings. However, even after data compression, the lack of consideration of dependencies in the appearance of character strings remains as redundancy.

本発明は、例えば、直前の文字列の最終文字のような直
前の文字列との関係において、現在の文字列の符号を決
定して辞書に登録する等、符号化対象文字部分列に対し
て直前の文字列の最終文字との従属関係を辞書に取り込
むことによって文字列間の冗長性を削減し、圧縮率を高
めようとするものである。The present invention can be applied to a character substring to be encoded, such as determining the code of the current character string in relation to the immediately preceding character string, such as the last character of the immediately preceding character string, and registering it in a dictionary. This method attempts to reduce redundancy between character strings and increase the compression rate by incorporating the dependency relation between the last character of the immediately preceding character string into the dictionary.

［課題を解決するための手段］第１１図により、課題を解決するための手段を説明する
。[Means for solving the problem] Means for solving the problem will be explained with reference to FIG.

図において、（ａ）は従来のＬＺＷ符号による辞書の木
を示し、Φ）は従来のＬＺＷ符号による文字列の符号化
における各文字列間の関係を示している。In the figure, (a) shows a dictionary tree using the conventional LZW code, and Φ) shows the relationship between each character string in the encoding of character strings using the conventional LZW code.

従来は、図（ロ）に示す、各文字部分列の先頭の文字ご
とに、図（ａ）に示すような文字部分列により辞書の木
を作成していた。Conventionally, a dictionary tree has been created using character substrings as shown in Figure (a) for each first character of each character substring shown in Figure (b).

例えば、図示のように、２５６個の先頭の一文字に対し
て、０〜２５６の番号を付し、各−文字を先頭文字とす
る文字列をそれぞれの先頭文字から展開していた。For example, as shown in the figure, a number from 0 to 256 was attached to each of the 256 first characters, and a character string with each - character as the first character was developed from each first character.

例えば、−文字「ａノを０とし、「ａ」を先頭文字とす
る文字列に対して例えば、図に示すようにｒａｂ」は２
５７、「ａｃ」は２５９、ｒａｃａＪは２５８のｒａｃ
Ｊの下位の階層として全体を１つの辞書の木として各文
字列に番号を付していた。For example, for a character string in which the - character "a" is 0 and "a" is the first character, for example, as shown in the figure, "rab" is 2
57, "ac" is 259, racaJ is 258 rac
Each character string was given a number as a hierarchy below J, with the entire dictionary being treated as one dictionary tree.

この場合、先頭文字列間の結びつきはなく、いぼ、空を
根とする辞書の木の根に各先頭文字が結合しているもの
で、ＬＺＷ符号ＬＺＷ符号では符号化中の文字列に対し
て以前に出現した文字列の履歴は考えられていないこと
を示している。In this case, there is no connection between the first character strings, and each first character is connected to the root of the dictionary tree whose roots are wart and sky. This indicates that the history of the strings that have appeared has not been considered.

このような従来の方法によれば、各文字列を識別する番
号は、異なる文字列について、全て異なる番号を付さな
ければならず、符号語を設定するための各文字部分列の
登録番号も大きくなり、しかも文字列の出現頻度等は考
慮されずに設定される等の冗長性が残されているもので
あった。According to such conventional methods, different numbers must be assigned to identify each character string for different character strings, and registration numbers for each character substring for setting code words must also be assigned. Moreover, redundancy remains, such as setting without taking into consideration the appearance frequency of character strings.

次に本発明の辞書の木の横取と字列の符号化方法を図（
Ｃ）および（ｄ）により説明する。Next, the diagram shows the method of intercepting the dictionary tree and encoding the character string according to the present invention (
This is explained by C) and (d).

本発明では、図（ｄ）に示すように、直前の最終文字と
の従属関係において、現文字部分列の符号を付すように
した。In the present invention, as shown in Figure (d), the code of the current character substring is attached in the subordinate relationship with the immediately preceding final character.

そして、直前の文字列の最終文字ごとに図（Ｃ）に示す
ように、先頭文字およびその展開文字で木を横取するよ
うにし、各木毎に、各文字列の番号を付すようにした。Then, as shown in Figure (C), for each last character of the previous character string, the first character and its expansion characters are used to steal a tree, and each tree is given a number for each character string. .

例えば、直前の文字が「ａ」に対して、−文字ｒａ、が
つくときは、その「ａ」をその木におけるインデックス
１とし、直前の文字「ａ」に対する文字列ｒａｂ」はイ
ンデックス７、直前の文字ｒａ、に対する一文字「ｂ」
は番号２とする。また、直前の文字列が「ｂ」の場合の
一文字ｒａ、は直前文字列「ｂｊの木の番号１、「ａｂ
」はその木におけるインデックス４というよに、直前文
字列を根とする木毎に各文字列のインデックスを付すよ
うにする。For example, if the immediately preceding character is "a" and the - character ra is appended, that "a" is set to index 1 in the tree, and the character string rab" for the immediately preceding character "a" is index 7, immediately preceding One letter "b" for the letter ra,
is number 2. In addition, when the immediately preceding character string is "b", one character ra is the tree number 1 of the immediately preceding character string "bj", "ab
” is index 4 in the tree, so that each character string is indexed for each tree whose root is the previous character string.

このようにすることにより、各文字が等確率で出現する
場合には、インデックス（各辞書の木における各部分文
字列の登録番号）の長さを１／２５６とすることができ
る。By doing this, when each character appears with equal probability, the length of the index (registration number of each partial character string in each dictionary tree) can be set to 1/256.

通常、個別の木の大きさは、個別の木を全部合わせた全
体の木の大きさの十数分の−になるので、文字部分列を
識別する符号の長さを短くすることができ、圧縮率を高
めることが可能になる。Normally, the size of an individual tree is a tenth of the size of the entire tree, which is the sum of all individual trees, so the length of the code that identifies a character substring can be shortened. It becomes possible to increase the compression ratio.

第１図により本発明の符号化のための基本構成を説明す
る。The basic configuration for encoding of the present invention will be explained with reference to FIG.

図は、文字列が３文字ａ、ｂ、ｃのみより戒る文字列に
おいて、直前文字列の最終文字ごとに辞書を作成する場
合について、例示的に示したものである。The figure exemplarily shows a case where a dictionary is created for each last character of the immediately preceding character string in a character string that is limited to only three characters a, b, and c.

図において、１は入力文字列、２は最終文字を根とする
木ごとに登録文字部分列のインデックス（１（ｎ））を
登録した辞書、例えば、ａを根とする木における文字部
分列ａｂ、ａｂｃのインデックスはそれぞれ０．１等で
あることを示すもの、３は一文字ずつ入力文字列を読み
出す文字続出出力段、４は対象とする現文字部分列、５
は現文字部分列を辞書を参照して、登録されている文字
部分列より現文字部分列と一致する最大長の文字部分列
を読み取る辞書参照手段、８は読み出した文字列の最大
−数文字部分を辞書に登録されているインデックスに基
づいてコード化し、最大−成文字列に文字列の次の一文
字を延ばした新しく現れた現文字部分列に、直前文字列
の最終文字ごとにインデックスを定める符号化手段、９
は現文字列部分辞書に登録する辞書登録手段、１０は最
大−数文字部分列の最終文字部分を記憶する最終文字記
憶手段、１１は直前文字列の最終文字を根とする辞書の
木の例である。In the figure, 1 is an input character string, 2 is a dictionary in which indexes (1(n)) of registered character substrings are registered for each tree whose root is the final character, for example, a character substring ab in a tree whose root is a. , abc indicate that each index is 0.1, 3 is a character sequence output stage that reads the input string one character at a time, 4 is the current character substring to be targeted, 5
8 is a dictionary reference means that refers to a dictionary for the current character substring and reads the maximum length character substring that matches the current character substring from the registered character substrings, and 8 is the maximum number of characters in the read character string. Code the part based on the index registered in the dictionary, and set an index for each final character of the previous character string in the newly appeared current character substring that extends the next character of the character string to the maximum number of characters. Encoding means, 9
is a dictionary registration means for registering in the current character string partial dictionary, 10 is a final character storage means for storing the last character part of the maximum-several character partial string, and 11 is an example of a dictionary tree whose root is the last character of the immediately preceding character string. It is.

[Effect]

入力文字列をａｂａｂｃｂ・・・を符号化する場合を例
として、第１図の基本構成の作用を具体的に説明する。The operation of the basic configuration shown in FIG. 1 will be specifically explained by taking as an example a case where an input character string is encoded as ababcb....

本発明においては、例えば、文字部分列としてａを出力
する場合、直前文字部分列の最終文字がａに続くａと、
ｂに続くａではそれぞれａを根とする木のａとｂを根と
する木のａとして区別して出力しなければならない。In the present invention, for example, when outputting a as a character substring, the last character of the immediately preceding character substring is a following a,
For a following b, each must be output separately as a of the tree whose root is a and a of the tree whose root is b.

そのような各組につく１文字を出力するためには、木の
根となる各文字と１文字との組合せ（ａａ、ａｂ、ａｃ
、ｂａ−）等を符号化側、復号化側の両方に、あらかし
め初期化する際に作成しておき、そのコードによりａに
続くａ、ｂに続くａ等を区別して出力するか、そのよう
な木の根につく１文字があらたに現れた場合には１文字
（生データ〉を出力するようにする方法をとらなければ
ならない。In order to output one character in each set, we need to output the combination of each character, which is the root of the tree, and one character (aa, ab, ac
, ba-), etc., on both the encoding side and the decoding side, and use that code to distinguish and output a following a, a following b, etc. If a character attached to the root of a tree like this newly appears, a method must be adopted in which one character (raw data) is output.

ここでは、直前の文字列の最終文字の木の根に直　この
作用説明においては、後者の１文字（生データ）を出力
する場合を例として説明する。Here, we will explain the case where the latter one character (raw data) is output directly to the root of the tree of the last character of the immediately preceding character string.

（１）　　文字列続出手段３は最初の文字ａを読み出し
、文字部分列４とする。辞書参照手段５は辞書を参照し
、ａが未登録であることを確認する。(1) The character string succession means 3 reads the first character a and sets it as a character substring 4. The dictionary reference means 5 refers to the dictionary and confirms that a is unregistered.

符号化手段８は、辞書にインデックス０を設定する。The encoding means 8 sets index 0 in the dictionary.

辞書登録手段９は、直前文字列の最終文字０の木にａを
インデックス０で辞書の登録位置（ｎ＝１）に登録する
。The dictionary registration means 9 registers a in the last character 0 tree of the immediately preceding character string at the dictionary registration position (n=1) with index 0.

同時に、文字ａを出力する。At the same time, the character a is output.

そして、直前文字列の最終文字としてａを記憶する。Then, a is stored as the last character of the immediately preceding character string.

（２）次に、第２番目の文字すを読み取る。(2) Next, read the second character.

そこで、直前文字列の最終文字ａと入力文字すとによる
文字列ａｂを辞書を参照する。　ａｂは未登録であるの
で、文字列「ａｂ」を辞書の登録位置２に、ａを根とす
る木の第１番目の登録文字部分列としてインデックス０
を定め、登録する。Therefore, a dictionary is referred to for a character string ab formed by the last character a of the immediately preceding character string and the input character. Since ab is unregistered, the character string "ab" is placed at registration position 2 in the dictionary and index 0 as the first registered character substring of the tree whose root is a.
Define and register.

そして、いま入力したｂはａを根とする木に現れた１文
字であるので、ｂを生データとして出力し、直前の文字
列の最終文字として、ｂを記憶する。Then, since the b that has just been input is a character that appears in the tree whose root is a, b is output as raw data and b is stored as the final character of the immediately preceding character string.

（３）同様に、第３番目の文字ａを入力する。(3) Similarly, input the third character a.

そこで、直前の文字列の最終文字すと読み取ったａとに
よる文字列ｂａを辞書を参照する。Therefore, the dictionary is referred to for the character string ba formed by the last character of the immediately preceding character string and the read a.

ｂａはないので、文字部分列ｒｂａ」を直前文字列の最
終文字すを根とする木の最初の文字としてインデックス
０を定め、辞書の登録位置３　（ｎ＝３）に登録する。Since there is no ba, the character substring rba is set at index 0 as the first character of the tree whose root is the last character of the immediately preceding character string, and is registered at registration position 3 (n=3) in the dictionary.

出力した文字部分列の最終文字ａを直前文字列の最終文
字として記憶する。The final character a of the output character substring is stored as the final character of the immediately preceding character string.

（４）次に、第４番目の文字すを読み取る。(4) Next, read the fourth character.

そこで、直前文字列の最終文字ａと読み取ったｂとによ
る文字列ａｂを辞書と参照する。Therefore, the character string ab formed by the last character a of the immediately preceding character string and the read b is referred to in the dictionary.

ａｂは登録位置３に登録されているので、さらに次の文
字Ｃを読み取る。Since ab is registered at registration position 3, the next character C is read.

文字列ａｂｃは辞書に未登録であるので、符号化手段８
は、最大−成文字列「ａｂ」を、ａを根とする木におけ
るｒａｂ」のインデックス０により第４番目の文字すを
表わすコードとしてコード化して出力し、同時に、辞書
の登録位置４に新しく現れた文字列「ａｂｃ」をａを根
とする木の２番目の文字列としてインデックス１で登録
する。Since the character string abc is not registered in the dictionary, the encoding means 8
encodes and outputs the maximum-component character string "ab" as a code representing the fourth character with index 0 of "rab" in the tree whose root is a, and at the same time inserts a new character string into the dictionary at registration position 4. The character string "abc" that appears is registered at index 1 as the second character string in the tree whose root is a.

出力した最大−成文字列の最終文字すを直前文字列の最
終文字として記憶する。The last character of the output maximum-composed character string is stored as the last character of the immediately preceding character string.

（５）同様に、第５番目の文字Ｃを読み取る。(5) Similarly, read the fifth character C.

記憶しである最終文字Ｃと読み取ったｂとの文字列ｂｃ
は未登録であるので文字列ｂｃを、ｂを根とする木の最
初の文字部分列としてインデックス０で、辞書の登録値
Ｗ５（ｎインデックス５）で登録する。Character string bc of memorized final character C and read b
is unregistered, so the character string bc is registered as the first character substring of the tree with b as the root at index 0 and the dictionary registration value W5 (n index 5).

そして、Ｃは直前の文字列の最終文字すを根とする辞書
の木の根につながる文字であるので、文字Ｃを生データ
により出力する。Since C is a character connected to the root of the dictionary tree whose root is the last character of the immediately preceding character string, the character C is output as raw data.

以下、同様の手続きを進め、出力コードａｂａｏｃ・・
・を得る。Below, proceed with the same procedure and output code abaoc...
・ Obtain.

次に第２図により、データ圧縮コードを文字列に復号す
る方式を説明する。Next, a method for decoding a data compression code into a character string will be explained with reference to FIG.

第２図は、本発明の復号方式の基本構成を示す。FIG. 2 shows the basic configuration of the decoding method of the present invention.

図において、２１は入力コード、２２は入力コードより
復元した辞書、２３は入力コード読み取り手段、２４は
入力コードの表わすインデックスと復元された直前文字
列の最終文字、２５は辞書参照手段、２６はインデック
スと直前文字列の最終文字に対応する辞書の登録文字列
より文字列を復号する文字部分列復号手段、２７は復元
文字列より復号文字を出力する復元文字出力手段、２８
は復元した文字部分列の最終文字を記憶する最終文字記
憶手段、２９は復号文字列と次に復号される復号文字列
の第１文字により構成される文字部分列を直前文字列の
最終文字の木にインデックスにより登録する辞書復元手
段である。In the figure, 21 is an input code, 22 is a dictionary restored from the input code, 23 is an input code reading means, 24 is an index represented by the input code and the last character of the restored immediately preceding character string, 25 is a dictionary reference means, and 26 is a dictionary character substring decoding means for decoding a character string from a registered character string in the dictionary corresponding to the index and the last character of the immediately preceding character string; 27 is a restored character output means for outputting decoded characters from the restored character string; 28;
29 is a final character storage means for storing the last character of the restored character string, and 29 is a character storage means for storing the character string consisting of the decoded character string and the first character of the decoded character string to be decoded next. This is a dictionary restoration means that registers in a tree using an index.

次に、第２図の復号方式の基本構成の動作を第１図の基
本構成により符号化したコード０ａＯｂＯａｌＯ・・・
を復号する場合を例として具体的に説明する。Next, a code 0aObOalO... which encodes the operation of the basic configuration of the decoding system shown in FIG.
A case of decoding will be specifically explained as an example.

（１）まず、入力コード読み取り手段２３は入力コード
ａを読み取る。生データであるので、文字部分列復号手
段２６は文字ａを復号し、出力する。そして、復元辞書
２２の登録位置１に文字ａを直前文字列の最終文字０の
木として、インデックスｌで登録する。(1) First, the input code reading means 23 reads the input code a. Since it is raw data, the character substring decoding means 26 decodes the character a and outputs it. Then, the character a is registered at the registration position 1 of the restoration dictionary 22 as a tree of the last character 0 of the immediately preceding character string with the index l.

同時に、復号文字列の最終文字ａを記憶する。At the same time, the final character a of the decoded character string is stored.

（２）同様に、次のコードｌｂを読み取り、生データで
あるので、文字すを復号して出力し、記憶しである文字
ａといま読み取ったｂとの文字列ａｂをａを根とする木
の辞書の登録位置２にインデックス１で登録する。(2) Similarly, read the next code lb, decode and output the character string since it is raw data, and create a string ab of the memorized character a and the b you just read, with a as the root. It is registered at index 1 in registration position 2 of the tree dictionary.

さらに、復号した文字部分列の最終文字すを記憶する。Furthermore, the last character of the decoded character substring is stored.

（３）同様に、次のコードａを読み出し、文字列ａを復
元し、記憶しである最終文字すといま読み取ったｂとの
文字列ｂａをｂ−ｔ−ｍとする木の辞書に登録値Ｎ３、
インデックス１で登録する。(3) In the same way, read the next code a, restore the character string a, and register the character string ba between the last character, which is memorized, and the just read b, in the tree dictionary with b-t-m. value N3,
Register with index 1.

そして、復元したａを記憶する。Then, the restored a is stored.

（４）次に、第４番目のコード０を読み取る。(4) Next, read the fourth code 0.

いま、直前の文字部分列の最終文字はａで入力符号はＯ
であるから、辞書参照手段２５は辞書を参照し、文字部
分列ａｂを読み出す。そして、文字部分列復号手段２６
は文字部分列「ａｂ」を復号する。さらに、その復号文
字部分列と直前の最終文字列の最終文字ａに基づいて、
復号文字出力手段２７は文字すを出力する。Now, the last character of the previous character substring is a and the input code is O
Therefore, the dictionary reference means 25 refers to the dictionary and reads out the character substring ab. Then, character substring decoding means 26
decodes the character substring "ab". Furthermore, based on the decoded character substring and the final character a of the previous final character string,
The decoded character output means 27 outputs characters.

最終文字記憶手段２８は復号した文字列の最終文字すを
記憶する。The final character storage means 28 stores the final character of the decoded character string.

（５）次に第５番目のコードＣを読み取る。(5) Next, read the fifth code C.

生データであるので、文字Ｃを復号するとともに、前の
ステップ（４）で復号した文字部分列ａｂといま復号し
た文字Ｃにより文字列ａｂｃをａを根とする木の辞書に
登録位置４、インデックス２で登録し辞書を復元する。Since it is raw data, the character C is decoded, and the character string abc is registered in the dictionary of the tree whose root is a at position 4, using the character substring ab decoded in the previous step (4) and the character C just decoded. Register with index 2 and restore the dictionary.

上記の説明においては、直前文字列の最終文字ごとに辞
書の木を作成する場合について、説明したが、最終文字
をその種類等によりグループにまとめて、グループごと
に辞書の木を作成し、続く文字部分列を登録するように
してもよい。In the above explanation, we have explained the case where a dictionary tree is created for each final character of the previous character string, but the final characters are grouped according to their type, etc., and a dictionary tree is created for each group. A character substring may also be registered.

〔Example〕

第３図〜第５図により本発明のデータ圧縮方式の説明を
する。The data compression method of the present invention will be explained with reference to FIGS. 3 to 5.

第３図は、本発明を実施するための装置構成を示す。FIG. 3 shows an apparatus configuration for implementing the present invention.

本実施例においては、辞書を文字部分列を登録する全体
辞書と、直前の文字列の最終文字ごとに、続く文字部分
列を全体辞書の登録位置に対応付けてインデックスによ
り登録した個別辞書とに分けて作成している。In this embodiment, the dictionary is divided into an overall dictionary in which substrings of characters are registered, and an individual dictionary in which substrings of subsequent characters are registered by index in association with the registration position in the overall dictionary for each final character of the immediately preceding character string. It is created separately.

図において、３０は入力文字列を符号化するための入力
文字列Ｋを格納するメモリ、３１は文字部分列コードω
を格納するメモリ、３２は直前文字部分列の最終文字Ｐ
Ｋの格納メモリ、３３は符号化の対象としている現文字
列の最終文字に１の格納メモリ、３４はメモリより威る
全体辞書Ｄ（ｎ）、３５はメモリより成る個別辞書でＱ
、ａ、ｂ、ｃ・・・等２５６の各文字ごとに構成される
もの、３６は辞書の木における文字部分列の登録階層の
深さを計測するカウンタ、３７−Ｏ〜３７−２５５は個
別辞書０〜２５５の各インデックスｍ（０）〜ｍ（２５
５）のカウンタ、３８は全体辞書の登録番号ｎのカのカ
ウンタ、３９は辞書を参照しさらに辞書を作成する辞書
参照および作成手段、４０は読み取った文字部分列を符
号化する符号作成手段、４１は作成した文字部分列の符
号を出力する符号出力手段、４２はプログラムに従って
データの符号化処理の実行、制御を行うＣＰＵである。In the figure, 30 is a memory that stores an input character string K for encoding the input character string, and 31 is a character substring code ω.
32 is the last character P of the immediately preceding character substring.
K storage memory, 33 is a storage memory for 1 for the last character of the current character string to be encoded, 34 is an overall dictionary D(n) that is more powerful than the memory, and 35 is an individual dictionary consisting of memories Q
, a, b, c, etc. 256, 36 is a counter that measures the depth of the registration hierarchy of character substrings in the dictionary tree, 37-O to 37-255 are individual Each index m(0) to m(25
5) counter; 38 is a counter for the registration number n of the overall dictionary; 39 is dictionary reference and creation means for referencing the dictionary and further creating a dictionary; 40 is code creation means for encoding the read character partial string; 41 is a code output means for outputting the code of the created character partial string, and 42 is a CPU that executes and controls data encoding processing according to a program.

第４図は第３図の符号化のための装置構成のフローを示
す。FIG. 4 shows a flowchart of the apparatus configuration for the encoding shown in FIG.

第５図（ａ）、（ｂ）は、それぞれ文字列としてａｂａ
ｂｃｂａｂａ・・・を符号化した場合の全体辞書と個別
辞書の実施例の構成を示す。Figures 5(a) and (b) each show aba as a character string.
The configuration of an embodiment of the entire dictionary and individual dictionaries when bcbaba... is encoded is shown.

第６図（ａ）は、上記の文字列を符号化した場合の個別
辞書の木の実施例を示す。FIG. 6(a) shows an example of an individual dictionary tree when the above character string is encoded.

本実施例は、直前の文字列の最終文字の木の根に直接つ
ながる文字が最初に現れた場合にはその１文字（生デー
タ）を送るようにしている。In this embodiment, when a character directly connected to the root of the last character tree of the immediately preceding character string appears for the first time, that one character (raw data) is sent.

第６図（ｂ）は、本発明の符号語の実施例を示し、モー
ドｌは、上記の各個別辞書の木の根に直接繋がる文字が
新たに出現した場合を示す。FIG. 6(b) shows an embodiment of the code word of the present invention, and mode 1 shows the case where a new character directly connected to the root of each individual dictionary tree mentioned above newly appears.

モードｌでは、インデックス０（生データを指定）と文
字の生データの組を符号語として送ることとする。In mode 1, a set of index 0 (designating raw data) and character raw data is sent as a code word.

モードｌ以外の文字または文字列が出現したときは、図
に示すように、善本におけるその文字列のインデックス
を符号語として送ることとする。When a character or character string other than mode l appears, the index of that character string in the Zenbon is sent as a code word, as shown in the figure.

第５図および第６図を参照しつつ、第４図のフローを説
明する。The flow shown in FIG. 4 will be explained with reference to FIGS. 5 and 6.

図における初期条件の設定ステップ■は、個別辞書を２
５６個備える場合を示しているが、説明を簡単にするた
め、文字列として、文字ａ。In the initial condition setting step (■) in the figure, the individual dictionary is
Although the case is shown in which 56 characters are provided, for the sake of simplicity, the character a is used as a character string.

ｂｃの３文字のみよりなる文字列ａｂａｂｃ・・・を符
号化する場合を考える。Consider the case where a character string ababc... consisting of only three characters bc is encoded.

まっ、装置の全体を初期化する（■）。Now, initialize the entire device (■).

初期条件として、（１）直前文字列の最終文字ＰＫを０
とする。（２）文字列コード格納メモリの初Ｍ（ｉＩを
いまの場合０とする（第４図は、２５６としである）、
（３）辞書の木の深さＤＰの測定カウンタを０とする。As an initial condition, (1) the last character PK of the immediately preceding character string is 0.
shall be. (2) The first M (iI of the character string code storage memory is set to 0 in this case (256 in Fig. 4),
(3) Set the measurement counter for the dictionary tree depth DP to 0.

（４）全体辞書の先頭の登録位置を示す先頭アドレスを
今の場合４とする（第４図においては、２５６としであ
る）。個別辞書のインデックスの個数をそれぞれ０とす
る。いまの場合、個別辞書はＯ５ａ、−ｂ、ｃの４つよ
りなるので、それぞれの辞書に登録されるインデックス
の個数ｍ（０）。(4) In this case, the start address indicating the first registration position of the entire dictionary is set to 4 (in FIG. 4, it is set to 256). The number of indexes in each individual dictionary is set to 0. In this case, since the individual dictionary consists of four O5a, -b, and c, the number of indexes registered in each dictionary is m(0).

ｍ　（ａ）、　ｍ　（ｂ）、　ｍ　（ｃ）を０とする。Let m(a), m(b), m(c) be 0.

（１）入力文字列ａｂａｂｃｂａｂａ−・−の先頭文字
ａを読み取る（■）。(1) Read the first character a of the input character string ababcbaba-- (■).

■における判断は文字列を全部読み取って、処理を終了
するかの判断であるので、■に進む。The decision in (2) is whether to read the entire character string and end the process, so proceed to (2).

直前文字列０に続く文字列ａは全体辞書に未登録である
から、■に進む。Since the character string a following the immediately preceding character string 0 is not registered in the overall dictionary, the process proceeds to ■.

いま、深さＤＰは０であるから＠に進む。Now, the depth DP is 0, so proceed to @.

いまの場合、上記のモードｌに該当する場合であるので
、ｍ　（０）＝０と生データａにより符号語としてＯａ
を出力する（＠）。In this case, it corresponds to the above mode l, so with m (0) = 0 and raw data a, Oa is used as a code word.
Outputs (@).

そこで、全体辞書Ｄ（ｎ−４）にいま入力した文字列ａ
（ωの初期値をＯとしであるのでＯａ）を登録し、個別
辞書０　（ＰＫ＝０）にインデックスＩ　　（ｎ＝４）
とし、個別辞書０の登録インデックス個数ｍ　（０）を
１インクリメントし、１（個別辞書０の木には登録文字
はなかった）を登録する（［相］）。Therefore, the character string a that was just input into the overall dictionary D(n-4)
(The initial value of ω is O, so Oa) is registered, and the index I (n=4) is registered in the individual dictionary 0 (PK=0).
Then, the number of registered indexes m (0) of the individual dictionary 0 is incremented by 1, and 1 (there was no registered character in the tree of the individual dictionary 0) is registered ([phase]).

次に、全体辞書の登録位置ｎを１インクリメントする（
［相］）。Next, the registration position n of the entire dictionary is incremented by 1 (
[phase]).

次に、最終文字列ＰＫをいま読み取ったａとし、文字列
コードωを読み取った文字ａのコード（初期条件におい
て設定した１）とする。Next, the final character string PK is the a that has just been read, and the character string code ω is the code of the read character a (1, which was set in the initial conditions).

（２）　次の第２番目の文字すを読み取る。(2) Read the second character below.

ωに一１ｂは、辞書に未登録であるので、■に進み、Ｄ
Ｐ＝０であるから、＠に進む。ω 1b is not registered in the dictionary, so proceed to ■ and enter D
Since P=0, proceed to @.

そこで、いまは、ｍ（ａ）−０、生の文字すのモード１
の場合であるから、ｌｂを外部に出力する。So now, m(a)-0, mode 1 of the raw character
Since this is the case, lb is output to the outside.

そこで、ωに＝１ｂを辞書Ｄ　（ｎ＝５）に登録し、さ
らに、個別辞書にも個別辞書ａ　（ＰＫ−ａ）にインデ
ックスＩ　（ｎ＝５）とし、個別辞書ａのインデックス
の登録個数ｍ　（ａ）を１インクリメントし、■　（個
別辞書ａの木には登録文字はなかった）を登録する（［
相］）、ｎを１インクリメントする（０）、そして、最
終文字ＰＫをいま読み取ったｂとし、入力文字コードω
を初期条件としてさだめたｂのコード２とする。Therefore, ω=1b is registered in dictionary D (n=5), and index I (n=5) is added to individual dictionary a (PK-a) in the individual dictionary, and the number of indexes registered in individual dictionary a is Increment m (a) by 1 and register ■ (there were no registered characters in the tree of individual dictionary a) ([
phase]), increment n by 1 (0), and set the last character PK to b, which was just read, and input character code ω.
Let it be code 2 of b, which is set as the initial condition.

（３）次に、第３番目の文字ａを読み取る。(3) Next, read the third character a.

ωに＝２ｂは未登録であるので、■に進み、ＤＰ＝０で
あるから、■でモードｌとして、ｍ（ｂ）＝０　（ｂの
個別辞書の木には文字列はまだない）であるから、生デ
ータｂとの１１１ｂを出力する。= 2b is not registered in ω, so proceed to ■. Since DP = 0, set mode l in ■, and m(b) = 0 (there is no string in the individual dictionary tree of b yet). Since there is, 111b with raw data b is output.

そコテ、全体辞書にωに＝２ｂをＤ（ｎ−６）に登録し
、同時に、個別辞書ｂ　（ＰＫ＝ｂ）にインデックスＩ
　（ｎ＝６）としてｍ　（ｂ）を１つだけインクレメン
トし、ｌ（個別辞書すの木には登録文字はなかった）を
登録する（＠）。Then, register ω=2b in D(n-6) in the overall dictionary, and at the same time, add index I in the individual dictionary b (PK=b).
Assuming (n=6), m (b) is incremented by one, and l (there was no registered character in the individual dictionary tree) is registered (@).

次に、ｍを１だけインクリメントとしく０）、ＰＫをａ
１ω＝１として次の文字すを読み取る。Next, increment m by 1 (0) and set PK to a
Read the next character with 1ω=1.

（４）　次に、第４番目の文字すを読み取る。(4) Next, read the fourth character.

■の判断において、ωに一１ｂは、全体辞書を参照する
と、コードｎ−５で登録済であるから、■に進む。In the judgment of (2), if ω-1b is referred to the general dictionary, it is already registered with code n-5, so proceed to (2).

そこで、ωを今全体辞書から読み取ったｎ＝５とし、階
層の深さＤＰを１インクリメントしてＤＰ−１，いま読
み取ったｂを最終文字格納メモリに１に格納する。Therefore, ω is set to n=5 which was just read from the entire dictionary, the depth of the hierarchy DP is incremented by 1 to DP-1, and b which is just read is stored to 1 in the final character storage memory.

（５）　次の第５番目の文字Ｃを読み取る。(5) Read the next fifth character C.

次に、■でωに＝５ｃが全体辞書に登録されているか判
断する。Next, at ■, it is determined whether ω=5c is registered in the overall dictionary.

ωに＝５ｃは未登録であるから、■に進む。Since ω=5c is unregistered, proceed to ■.

いま、ＤＰ＝１であるから、■に進む。Now, since DP=1, proceed to ■.

■において、ω＝５（ｎ−５）に対応する個別辞書を参
照し、インデックスＩ（ｎ＝５）＝１を直前文字列の最
終文字ａに続くｂの符号語（モード２）として出力する
。In (2), refer to the individual dictionary corresponding to ω = 5 (n - 5) and output index I (n = 5) = 1 as the code word (mode 2) of b following the last character a of the immediately preceding character string. .

次に、■において、ωに＝５ｂ　（ａｂｃ）を全体辞書
のｎ＝７の登録位置に登録する。同時に個別辞書ａにｎ
−７に対応させてｍ（ＰＫ）を１つインクリメントし、
インデックスＩ（ｎ−７）−２を登録する（個別辞書ａ
における２番目に登録された文字列である）。Next, in (2), ω=5b (abc) is registered at the n=7 registration position in the overall dictionary. At the same time, separate dictionary a and n
-7, increment m(PK) by one,
Register index I(n-7)-2 (individual dictionary a
).

そして、ｎを１インクリメントし、深さＤＰを０とする
。Then, n is incremented by 1 and the depth DP is set to 0.

さらに、ＰＫを最終文字格納メモリＫｌに格納されてい
るｂとし、ωをに１のコード２とする。そして、■にお
いて、再度いま読み取った第５番目の文字ＣをＫとして
ωに＝２ｃが全体辞書に登録されているかどうか判断す
る。Further, let PK be b stored in the final character storage memory Kl, and let ω be the code 2 of 1. Then, in (2), it is determined whether or not ω=2c is registered in the overall dictionary, with the fifth character C just read as K being set as K.

２ｃは全体辞書に未登録であるので、■に進み、ＤＰ＝
Ｏであるから、＠に進み、文字Ｃを生データとしてモー
ドｌの符号語ｏｂを出力する。Since 2c is not registered in the general dictionary, proceed to ■ and set DP=
Since it is O, the process proceeds to @ and outputs the code word ob of mode l using the character C as raw data.

そこで、全体辞書にωに一２ｃをｎ＝８で登録し、いま
ＰＫ＝ｂであるから、個別辞書すにｎ＝８、ｍ　（ｂ）
を１インクリメントし、■（ｎ＝８）＝ｌ　（ｂの木の
２番目の文字列）を登録する■）。Therefore, 12c is registered in ω in the general dictionary with n=8, and now PK=b, so the individual dictionary is n=8, m (b)
Increment by 1 and register ■(n=8)=l (the second character string in the tree of b).■).

さらに、ｎを１インクリメントし、ＰＫをいま読み取っ
たＣとし、ω＝３（ｃの初期条件における値）として、
次の文字を読み取る。Furthermore, increment n by 1, set PK to C that was just read, and set ω = 3 (value under the initial condition of c),
Read next character.

以下同様にして、入力文字列ａｂａｂｃｂａｂａａ・・
・の出力符号として０ａＯｂＯａｌＯｃＯｂ１１３・−
・を得る。Similarly, the input character string ababcbabaa...
0aObOalOcOb113・- as the output sign of ・
・Obtain.

次に、上記の符号より文字列を復号する方式を説明する
。Next, a method for decoding a character string using the above code will be explained.

第７図は、本発明の復号化のための装置構成の実施例を
示す。FIG. 7 shows an embodiment of a device configuration for decoding according to the present invention.

図において、７１は入力コード格納メモリ、７２は個別
辞書のインデックスにより符号語で送られてくる入力コ
ードを全体辞書における文字列のコードに復元した復元
コードを格納するメモリ（ＩＮω）、７３は復元された
直前の文字部分列を格納するメモリ（ＯＬＤω）、７４
は復元された直前の文字部分列の最終文字を格納するメ
モリ（ＰＫ）、７５は直々前の文字部分列の最終文字格
納メモＩＪ（ＰＫＩ）、７６は復元文字列の第１文字格
納メモリ（Ｋｌ）、７７は入力符号より復元された文字
部分列より随時復元する全体辞書Ｄ　（ｎ）、７Ｂは復
元文字列より随時復元する個別辞書ｑ　（ＰＫ、インデ
ックス）、７９−０〜７９−２５５は２５５個の個別辞
書のインデックス個数のカウンタ、８０は入力コードよ
り個別辞書を参照する辞書参照手段、８１は全体辞書よ
り文字部分列を復号する文字部分列復号手段、８２は復
号文字部分列より文字部分列を全体辞書および対応する
個別辞書を復元する辞書復元手段、８３はプログラムに
従って、復号処理を進めるＣＰＵである。In the figure, 71 is an input code storage memory, 72 is a memory (INω) that stores a restoration code obtained by restoring the input code sent as a code word according to the index of the individual dictionary to the character string code in the overall dictionary, and 73 is the restoration code. memory (OLDω) for storing the immediately preceding character substring, 74
is a memory (PK) that stores the last character of the character substring immediately before being restored, 75 is the last character storage memo IJ (PKI) of the immediately previous character substring, and 76 is the first character storage memory (PKI) of the restored character string. Kl), 77 is the entire dictionary D (n) that is restored at any time from the character substring restored from the input code, 7B is the individual dictionary q (PK, index) that is restored at any time from the restored character string, 79-0 to 79-255 is a counter for the number of indexes of 255 individual dictionaries, 80 is a dictionary reference means for referring to the individual dictionary from the input code, 81 is a character segment decoding means for decoding a character segment from the entire dictionary, and 82 is a character segment decoding means for decoding a character segment from the decoded character segment. Dictionary restoring means 83 for restoring character substrings into the entire dictionary and corresponding individual dictionaries is a CPU that proceeds with the decoding process according to a program.

第８図〜第１０図は−続きの符号化のフローを示し、第
８図は、初期化から入力符号が定義されているかどうか
を判断し、入力符号が定義されている場合には、個別辞
書を参照して全体辞書における文字列を表わすコードに
変換するまでのフローを示す。Figures 8 to 10 show the flow of subsequent encoding, and Figure 8 determines whether the input code is defined from initialization, and if the input code is defined, it is determined individually. The flow of referring to a dictionary and converting it into a code representing a character string in the entire dictionary is shown.

第９図は、モードｌの符号を復号する場合のフローを示
す。FIG. 9 shows a flow when decoding a mode I code.

第１Ｏ図は、全体辞書の登録符号より、文字列を復号す
る場合のフローを示す。FIG. 1O shows a flow when a character string is decoded from the registered code of the general dictionary.

入力符号として前記の符号０ａＯｂＯａｌＯＣ・・・が
入力された場合を例として、第８図ないし第１Ｏ図のフ
ローを説明する。The flowcharts in FIGS. 8 to 1O will be explained by taking as an example the case where the above-mentioned code 0aObOalOC... is input as the input code.

先ず、装置の初期化を行う。First, initialize the device.

図における初期条件においては、個別辞書を２５６（ｉ
ｌｌえる場合を示し、２５６個の一文字については０〜
２５５の初期値を与えである場合を示す、初期条件は、
ＰＫ＝Ｏ５ωの初期値を２５６、ＰＫ１＝Ｏ全体辞書の
先頭アドレスをｎ＝２５６．０ＬＤａ＋＝０、各個別辞
書のｍ（０）〜ｍ（２５５）を０とする。In the initial conditions in the figure, the individual dictionaries are set to 256 (i
ll, and for 256 single characters, 0~
The initial conditions are as follows, given the initial value of 255.
The initial value of PK=O5ω is 256, the start address of the entire PK1=O dictionary is n=256.0LDa+=0, and m(0) to m(255) of each individual dictionary are 0.

ここでは、説明を簡単にするためａ、　　ｂ、　　ｃの
３文字のみよりなる場合について考え、ａ、ｂ、ｃにつ
いて初期条件でそれぞれコード１２．３を設定しておく
。さらにωの初期値を０としておく。Here, in order to simplify the explanation, we will consider a case consisting of only three characters, a, b, and c, and set codes 12.3 for each of a, b, and c as initial conditions. Furthermore, the initial value of ω is set to 0.

（１）先頭の入力コード１ａを入力する（■）。(1) Input the first input code 1a (■).

■の判断においてコードが未定義であるので、■に進む
。Since the code is undefined in the judgment of (2), proceed to (2).

■の判断は、直前の文字列の辞書の木の根に直接つく符
号をあられすモード１か、あるいは、ＬＺＷ符号化処理
において例外的に生じる符合の未定義なコード入力のあ
った場合かを判定する。Judgment (①) is based on mode 1, which generates a code directly attached to the root of the dictionary tree of the immediately preceding character string, or when there is an undefined code input that occurs exceptionally in the LZW encoding process. .

いまは、モードｌであるので、第９図のＡに進む。Since we are currently in mode 1, we proceed to A in FIG.

第９図の■において、入力符号Ｏａを生データに＝ａと
して入力し、文字ａを出力する（■）。At ■ in FIG. 9, the input code Oa is input as =a to the raw data, and the character a is output (■).

いま、直前の文字列はないので、■に進み、復元した文
字列ａとＰＫ＝Ｏより全体辞書Ｄ（ｎ−４）に、Ｏａを
登録し、全体辞書を復元する。さらに、ｍ　（０）をイ
ンクリメントしＰＫ＝０とｍ　（０）＝Ｏ１ｎ＝４によ
り個別辞書ｌを復元する。Now, since there is no previous character string, proceed to step (2), register Oa in the overall dictionary D(n-4) from the restored character string a and PK=O, and restore the entire dictionary. Furthermore, m (0) is incremented and the individual dictionary l is restored with PK=0 and m (0)=O1n=4.

さらに、ｎをインクリメントしく＠）、ＰＫにいま復元
したａ、ＰＫ＝０をＯＬＤωに移す。Furthermore, n is incremented @), and a, PK=0, which has just been restored to PK, is transferred to OLDω.

（２）次に、第２番目の入力コードｏｂを読み取る。(2) Next, read the second input code ob.

この場合も、モード１のコードであるから、■から■に
進み、さらにＡに進む。In this case as well, since it is a mode 1 code, the process proceeds from ■ to ■, and then to A.

第９図のフローにおいて（１）の１ａを処理した場合と
同様に、生データｂを出力し、全体辞書の登録位置ｎ＝
５にａｂを登録する。さらに、直前文字部分列の最終文
字ａに対応する個別辞書ａに、ｎ＝５、インデックスｌ
を登録して個別辞書を復元する。In the same way as in the case of processing 1a of (1) in the flow of FIG.
Register ab in 5. Furthermore, n=5 and index l are added to the individual dictionary a corresponding to the last character a of the immediately preceding character substring.
Register and restore individual dictionaries.

（３）次に、第３の入力コードＯａを入力する。符号Ｏ
ａは、同様にモードｌであるから、前記の処理をくり返
し、復元コードとしてａを出力し、全体辞書にｂａを書
き込み、個別辞書すにｎ＝６、インデックス＝１を書き
込む。(3) Next, input the third input code Oa. code O
Since a is also in mode l, the above process is repeated, a is output as the restoration code, ba is written in the overall dictionary, and n=6 and index=1 are written in the individual dictionary.

そこで、ｍ　（ｂ）−１、ｎ＝７、ＰＫ＝ａ。Therefore, m(b)-1, n=7, PK=a.

ＯＬＤω−すとして、次の符号を読み取る。As OLDω-, read the next code.

（４）第４番目のコードはｌである。(4) The fourth code is l.

符合０は定義されているので、第８図における■に進む
。Since the code 0 has been defined, the process proceeds to (■) in FIG.

いま、直前の文字列がａで、入力符号が１であるので、
復元された個別辞書を参照し、対応する全体辞書の登録
位置を確認する。Now, the previous character string is a and the input sign is 1, so
Refer to the restored individual dictionary and check the registration position of the corresponding overall dictionary.

その結果、ｎ＝５、ωに＝１ｂに入力コードを変換し、
ＩＮωに書き込みＢに進む。As a result, convert the input code to n = 5, ω = 1b,
Write to INω and proceed to B.

Ｂは、ＬＺＷ符号における復号処理のフローである。B is a flow of decoding processing in the LZW code.

［相］、［相］は従来のＬＺＷ符号の復号化と同し方式
である。[Phase] and [Phase] are the same methods as the conventional LZW code decoding method.

すなわち、■でコードｌｂを順次スタックに符号す、　
　ａの順に格納し、■で最後に格納したａを残して、上
部のｂを出力する。That is, code lb is sequentially encoded onto the stack using ■.
Store them in the order of a, leave the last stored a and output the top b.

いま、直前の文字部分列は辞書に登録されているので、
［相］に進み、直々前の文字列の最終文字格納メモリに
ＰＫ＝ａ、復号文字の列ａｂの最終文字すをＰＫに書き
込み、復号文字部分列の第１文字ａをＫｌに書き込む。Now, the previous character substring is registered in the dictionary, so
Proceeding to [phase], write PK=a in the last character storage memory of the immediately previous character string, write the last character of the decoded character string ab to PK, and write the first character a of the decoded character substring to Kl.

同時に、○ＬＤωに復号コード１ｂ（ＩＮω）を書き込
み、次のコードを読み取る。At the same time, the decoding code 1b (INω) is written to LDω, and the next code is read.

（５）次に第５番目の符号ＯＣを読み取る。(5) Next, read the fifth code OC.

モードｌのコードであるので、第９図Ａに進み、■にお
いて、Ｃを出力する。Since it is a mode l code, the process proceeds to FIG. 9A, and in 2, C is output.

いまの場合は、直前文字部分列が辞書に未登録の状態で
あるので、■において、ＯＬＤωのｌｂといま入力した
Ｃとにより、全体辞書のｎ−７の位置に文字列ａｂｃを
登録し、同時にｍ（ａ）を１つインクリメントし個別辞
書ａにインデックス＝２を書き込む。In this case, the previous character substring has not been registered in the dictionary, so in ■, register the character string abc at position n-7 of the entire dictionary using lb of OLDω and the C you just input, At the same time, m(a) is incremented by one and index=2 is written in the individual dictionary a.

［相］において、ｎを１インクリメントし、■において
、現在の文字列（最終文字すにおいてＣを読み込んだ時
点における文字列ｂｃ）の登録処理をする。同時に個別
辞書すへの登録処理をする。In [phase], n is incremented by 1, and in (2), the current character string (the character string bc at the time when C is read in the final character) is registered. At the same time, registration processing to the individual dictionary is performed.

以下同様の手順により、入力コードを全部読み取り、復
号することができる。Following the same procedure, all input codes can be read and decoded.

なお、第９図のフローにおける■、［相］のステップは
、従来技術において、ＬＺＷ符号化の例外として説明さ
れた場合の処理を表わす、前述における場合と同様であ
るので説明は省略する。Note that the steps ① and [phase] in the flowchart of FIG. 9 are the same as those described above, which represent the processing described as an exception to LZW encoding in the prior art, so their explanation will be omitted.

なお、上記の実施例においては、各個別辞書の木の根に
つく１文字については、生データを出力する場合につい
て説明したが、各個別辞書の木の根に続く一文字の可能
な組合わせについて、あらかじめ、符号化側、復号化側
において作成しておき、その作成コードにより上記１文
字については出力するようにしてもよい。Note that in the above embodiment, a case has been explained in which raw data is output for one character at the root of each individual dictionary tree, but the possible combinations of one character following the root of each individual dictionary tree are The code may be created on the encoding side and the decoding side, and the above-mentioned one character may be output based on the created code.

また、出力する符号語は、常に〔注目文字列の個別イン
デックス０１次の１文字Ｋ〕の組であられし、そこにお
ける〔次の１文字〕を直前文字列の最終文字として用い
て次の１文字を符号化するようにしてもよい。In addition, the code word to be output is always a set of [one character K of the individual index 01 of the character string of interest], and the [next one character] therein is used as the last character of the immediately preceding character string, and the next one is used as the final character of the previous character string. Characters may also be encoded.

この場合には、符号化、復号化のフローが簡単な構成と
なる。In this case, the flow of encoding and decoding becomes a simple configuration.

〔Effect of the invention〕

本発明によれば、符号化する文字列に対して、過去の文
字列の履歴を採り入れたため、文字列間の頻度等を考慮
して符号語を定める等可能になり、データ圧縮における
冗長性を削減することができる。According to the present invention, since the history of past character strings is incorporated into the character string to be encoded, it is possible to determine code words by considering the frequency between character strings, etc., and redundancy in data compression can be reduced. can be reduced.

また、辞書を複数に分割し、分割辞書のインデックスに
より符合語を作成したため、インデックスの値が小さく
なり、多いデータ量で、登録文字列の数が多くなった場
合にも、短い符号語によりデータ圧縮をすることができ
るため、圧縮率が向上する。In addition, because the dictionary is divided into multiple parts and code words are created using the indexes of the divided dictionaries, the index value becomes small, and even when the amount of data is large and the number of registered character strings increases, the data can be stored using short code words. Since compression can be performed, the compression ratio improves.

[Brief explanation of drawings]

第１図は、本発明の圧縮符合化方式の基本構成を示す図
である。第２図は、本発明の復号化方式の基本構成を示す図であ
る。第３図は、本発明の符号化のための装置構成の実施例を
示す図である。第４図は、本発明の符号化のフローの実施例を示す図で
ある。第５図は、辞書の実施例を示す図である。第６図は、辞書の木と符合語の実施例を示す図である。第７図は、復号化のための装置構成の実施例を示す図で
ある。第８図は、復号化のフロー（１）を示す図である。第９図は、復号化のフロー（２）を示す図である。第１Ｏ図は、復号化のフロー（３）を示す図である。第１１図は、従来技術の課題を解決するための手段の説
明図である。第１２図は、従来のＬＺＷ符号の圧縮符号化の復号方式
を示す図である。第１３図は、従来のＬＺＷ符号化方式のフローを示す図
である。第１４図は、従来のＬＺＷ復号化方式のフローの説明図
である。図面において、ｌ　：入力カ文字列、２　：辞書、３　：文字列読み出し手段、５　：辞書参照手段、８　：符号化手段、９　：辞書登録手段、１０；最終文字記憶手段、ｌｌ：直前文字列の最終文字を根とする辞書の木、符号化のたのの表覆構成の賓燕例第３図（入力ｘ字列ａ’ｏａ’ｏｃｂａ’ｏａａ　・＝−）１
（ｎ）辞書の実巖第５図例従来のＬＺＷ符号化万式のフロー第　１３　　図（ａ）　Ｌ　ＺＷ符号で圧縮符号（ｂ）参照辞書の構成（Ｃ）ＬＺＷ符号のｔ１号従来のＬＺＷ符号の圧縮行号化声号方式％式％従来のＬＺＷ復号イヒ方式のフ０− 第　１４　　図FIG. 1 is a diagram showing the basic configuration of the compression encoding method of the present invention. FIG. 2 is a diagram showing the basic configuration of the decoding method of the present invention. FIG. 3 is a diagram showing an embodiment of an apparatus configuration for encoding according to the present invention. FIG. 4 is a diagram showing an embodiment of the encoding flow of the present invention. FIG. 5 is a diagram showing an example of the dictionary. FIG. 6 is a diagram showing an example of a dictionary tree and code words. FIG. 7 is a diagram showing an example of a device configuration for decoding. FIG. 8 is a diagram showing the decoding flow (1). FIG. 9 is a diagram showing the decoding flow (2). FIG. 1O is a diagram showing the decoding flow (3). FIG. 11 is an explanatory diagram of means for solving the problems of the prior art. FIG. 12 is a diagram showing a decoding method for compression encoding of a conventional LZW code. FIG. 13 is a diagram showing the flow of the conventional LZW encoding method. FIG. 14 is an explanatory diagram of the flow of the conventional LZW decoding method. In the drawings, l: input character string, 2: dictionary, 3: character string reading means, 5: dictionary reference means, 8: encoding means, 9: dictionary registration means, 10: final character storage means, ll: immediately preceding character. Dictionary tree whose root is the last character of the string, Binyan example of the overlapping structure of encoding Figure 3 (input x string a'oa'ocba'oaa ・=-)1
(n) Actual flow of dictionary Figure 5 Example of conventional LZW encoding universal formula Figure 13 (a) Compression code with L ZW code (b) Structure of reference dictionary (C) LZW code t1 Conventional LZW Code compression row coding voice code method % formula % Conventional LZW decoding method F0- Fig. 14

Claims

[Claims] 1) Sequentially encode each different character substring of an input character string, and use the current character substring as a copy of the matching maximum length character substring among the encoded past character substrings. In a data compression device for encoding, for any two consecutive character substrings, a dictionary is created in which the following character substrings are registered for each final character of the previous character substring or for each group of final characters, and the dictionary is A data compression method characterized in that a registration number of a character string to be registered is assigned to each of the above-mentioned final characters or groups of final characters, and a code word of the current character substring to be encoded is created based on the above-mentioned registration number. method. 2) From the input code created by the compressed data method according to claim 1, a dictionary is restored for each final character or final character group of the immediately preceding decoded character substring, and A compressed data restoration method characterized by decoding an input code into a character substring by referring to a restored dictionary from the final character and the current input code.