JPH07152533A

JPH07152533A - Data compressing device

Info

Publication number: JPH07152533A
Application number: JP29937693A
Authority: JP
Inventors: Takaaki Hayashi; 隆昭林
Original assignee: Kyocera Corp
Current assignee: Kyocera Corp
Priority date: 1993-11-30
Filing date: 1993-11-30
Publication date: 1995-06-16
Anticipated expiration: 2017-11-18
Also published as: JP3346626B2

Abstract

PURPOSE:To provide a data compressing device and its method which can quickly grow a dictionary and can improve the data compressing effect by registering plural partial data strings in a single dictionary registration processing when the partial input data strings are sequentially registered in the dictionary and then the input data are compressed and coded by referring to the dictionary. CONSTITUTION:A data compressing device 102 consists of a longest string coincidence retrieving part 104 which retrieves the longest one of symbol strings registered in a dictionary 105 that is coincident with a partial data string of an input data string 101, a coding part 106 which codes the index of the longest coincident symbol string, and a dictionary register part 107 which registers a symbol that connects the symbols including the 1st symbol of the coded longest coincident symbol string through the MaxEnt-th symbol decided by the largest register number set value 108 to the longest coincident symbol string coded just before the relevant time point into the dictionary 105.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、データ中の冗長成分を
取り除くことにより情報損失が生じないデータ圧縮装置
及び方法に関する。特に、文字コード、ベクトル図形、
画像データ等のデータ形式に依存せず、良好な圧縮効果
を得られるユニバーサル圧縮符号化方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a data compression apparatus and method which eliminates redundant components in data without causing information loss. In particular, character codes, vector graphics,
The present invention relates to a universal compression encoding method that can obtain a good compression effect without depending on a data format such as image data.

【０００２】[0002]

【従来の技術】従来より、様々な形式のデータを単一の
符号化方式でデータ圧縮可能な方法としてＬｅｍｐｅｌ
ーＺｉｖの符号化方法（ＬＺ符号）が知られている。2. Description of the Related Art Conventionally, as a method capable of compressing data of various formats with a single encoding method, Lempel
A Ziv encoding method (LZ code) is known.

【０００３】ＬＺ符号にはスライド辞書方式と動的辞書
方式の二つのアルゴリズムが提案されている( 詳しく
は、インターフェース１９９２年ｖｏｌ．８Ｎｏ．１
８３”データ圧縮アルゴリズムとその実現" を参照) 。
一般的に動的辞書方式はスライド辞書方式に比べて圧縮
率は低くなるもののアルゴリズムが簡単で高速処理が可
能であるといわれている。また、動的辞書方式の改良と
して，ＬｅｍｐｅｌーＺｉｖの符号化方法（ＬＺＷ符
号）が提案されている（Ｔ．Ａ．Ｗｅｌｃｈ，”ＡＴｅ
ｃｈｎｉｑｕｅｆｏｒＨｉｇｈーＰｅｒｆｏｒｍａ
ｎｃｅＤａｔａＣｏｍｐｒｅｓｓｉｏｎ”，Ｃｏｍｐ
ｕｔｅｒ，Ｊｕｎｅ，１９８４）。ＬＺＷ符号は入力デ
ータ列を増分分解（ＩｎｃｒｅｍｅｎｔａｌＰａｒｓ
ｉｎｇ）といわれる方法で部分データ列に分解し、その
部分データ列を辞書に登録していき、その辞書を参照し
ながら入力データの符号化処理を進めていくアルゴリズ
ムである。Two algorithms, a slide dictionary method and a dynamic dictionary method, have been proposed for the LZ code (specifically, interface 1992, vol. 8 No. 1).
83 "Data compression algorithms and their implementation").
Generally, the dynamic dictionary method has a lower compression rate than the slide dictionary method, but it is said that the algorithm is simple and high-speed processing is possible. Further, as an improvement of the dynamic dictionary system, a Lempel-Ziv encoding method (LZW code) has been proposed (TA. Welch, “ATe”).
chnique for High-Performa
nce DataCompression ”, Comp
uter, June, 1984). The LZW code is an incremental parsing of the input data string (Incremental Pars).
ing) is a method of decomposing into partial data strings by a method called ing), registering the partial data strings in a dictionary, and proceeding with encoding processing of input data while referring to the dictionary.

【０００４】以下に図面を参照しながらＬＺＷ符号につ
いて詳細に説明する。図４にＬＺＷ符号化方式の処理の
流れを示す。The LZW code will be described in detail below with reference to the drawings. FIG. 4 shows the flow of processing of the LZW encoding method.

【０００５】まず、図４（ａ）を用いてＬＺＷ符号化処
理全体の流れについて説明する。First, the overall flow of the LZW encoding process will be described with reference to FIG.

【０００６】まず、Ｓ４０１で辞書の初期化を行う。辞
書には初期値として入力データ中に出現し得る全てのシ
ンボルを１文字のデータ列として登録しておく。例え
ば、最も簡単な例として０と１しかシンボルが存在しな
い２値データ系列を考えると、辞書には初期値０と１が
辞書の未登録アドレス上に登録され、それぞれに対して
一意に識別可能なインデックスが付される。Ｓ４０２で
は、入力データ列を第１シンボル目から読み込むための
カウンタｎを１にセットする。ここで、入力データ列の
第ｎ番目のシンボルをｓｎと表すこととし、Ｓ４０３で
第１番目のシンボルｓｎ( ｎ＝１) を読み込み、辞書の
中からこのシンボルｓｎと一致するものを検索し、その
インデックスｉを求める。そして、入力データ列のカウ
ンタｎの値を１だけ増加させる。次に、Ｓ４０４では最
長一致検索処理が行われる。ここでは、入力データ列に
おいて前記のシンボルｓｎから始まる部分データ列と最
長一致するシンボル列を辞書の中から探索する。そし
て、最長一致したシンボル列のインデックスの値を新た
にｉへ返し、また、データ列中の最長一致した部分デー
タ列の次のシンボルの値を新たにｓｎとへ返す。最長一
致検索処理の流れについては後で詳しく説明する。次
に、最長一致したシンボル列のインデックスｉはＳ４０
５で符号ｃ（ｉ）に変換され、出力される。Ｓ４０６で
は、辞書に新たにシンボル列の登録を行う。まず、イン
デックスｉで示されるシンボル列の最後尾にｓｎを連結
し、シンボル列ｉｓｎを生成する。インデックスｉで示
されるシンボル列は最長一致した部分データ列なので、
その最後尾に入力データ列中の次のシンボルであるｓｎ
を連結したシンボル列ｉｓｎは辞書の中には存在しな
い。そこで、このシンボル列ｉｓｎを辞書の未登録アド
レス上に登録し、それまでに登録されているシンボル列
から一意に識別可能なインデックスを付して、再びＳ４
０３に戻り、同様の手順で符号化処理を行っていく。First, in S401, the dictionary is initialized. All symbols that can appear in the input data are registered as a single-character data string in the dictionary as initial values. For example, considering the binary data series in which only 0 and 1 symbols exist as the simplest example, initial values 0 and 1 are registered in the dictionary at unregistered addresses in the dictionary, and each can be uniquely identified. The index is attached. In S402, the counter n for reading the input data string from the first symbol is set to 1. Here, the n-th symbol of the input data string is represented by sn, the first symbol sn (n = 1) is read in S403, and the dictionary that matches this symbol sn is searched for, The index i is calculated. Then, the value of the counter n of the input data string is incremented by 1. Next, in S404, the longest match search process is performed. Here, in the input data string, the longest matching symbol string with the partial data string starting from the symbol sn is searched from the dictionary. Then, the index value of the longest matching symbol string is newly returned to i, and the value of the symbol next to the longest matching partial data string in the data string is newly returned to sn. The flow of the longest match search process will be described later in detail. Next, the index i of the longest matching symbol string is S40.
It is converted into code c (i) in 5 and output. In S406, a new symbol string is registered in the dictionary. First, sn is connected to the end of the symbol string indicated by the index i to generate the symbol string isn. Since the symbol string indicated by index i is the longest matching partial data string,
At the end, sn which is the next symbol in the input data string
The symbol string isn, which is a concatenation of, does not exist in the dictionary. Therefore, this symbol string isn is registered on an unregistered address in the dictionary, an index that can be uniquely identified from the symbol strings registered up to that point is added, and S4 is again set.
Returning to 03, the encoding process is performed in the same procedure.

【０００７】次に、最長一致検索処理の流れについて図
４（ｂ）を用いて説明する。まず、Ｓ４１１で処理の終
了を判定するために、まだ入力データ中に処理すべきシ
ンボルが残っているか否かを判定する。シンボルｓｎが
存在しないときはＳ４１２に進み前に求めたインデック
スｉの内容を符号化して符号ｃ（ｉ）を出力し処理を終
了する。また、シンボルｓｎが存在しているときは、そ
のシンボルｓｎを読み込んで、Ｓ４１３に進む。次に、
Ｓ４１３で前に求めたインデックスｉで示されるシンボ
ル列の最後尾に新たに読み込まれたシンボルｓｎを連結
してシンボル列ｉｓｎを生成する。そして、辞書の中に
このシンボル列ｉｓｎと一致するものがあるか否か探索
する。ここで、シンボル列ｉｓn と一致するものが辞書
の中に存在するときは、Ｓ４１５でシンボル列ｉｓｎに
対応する辞書のインデックスを求めｉの値をそのインデ
ックスの値に更新し、入力データのカウンタｎの値を１
だけ増加させる。そして、再びＳ４１１に戻り、同様の
手順で辞書の中からシンボル列ｉｓｎと一致するシンボ
ル列が見いだせなくなるまで処理を繰り返し、入力デー
タ中の部分データ列の中で最長一致するシンボル列を辞
書から捜し出すことを行う。この反復処理により、Ｓ４
１４でシンボル列ｉｓｎと一致するものが辞書の中に存
在しなかったときは、辞書に登録されたシンボル列の中
で前述した入力データの部分データ列と最長一致するも
のはインデックスｉで示されるシンボル列となる。Next, the flow of the longest match search process will be described with reference to FIG. First, in S411, in order to determine the end of processing, it is determined whether or not there are still symbols to be processed in the input data. When the symbol sn does not exist, the process proceeds to S412, the content of the index i obtained before is encoded, the code c (i) is output, and the process is ended. If the symbol sn exists, the symbol sn is read and the process proceeds to S413. next,
In step S413, the newly read symbol sn is connected to the end of the symbol string indicated by the index i previously obtained to generate the symbol string isn. Then, it is searched whether or not there is a match with this symbol string isn in the dictionary. If there is a match in the dictionary with the symbol string isn, the index of the dictionary corresponding to the symbol string isn is obtained in S415, the value of i is updated to the value of the index, and the counter n The value of 1
Only increase. Then, the process returns to S411 again, and the same procedure is repeated until no symbol string matching the symbol string isn can be found in the dictionary, and the longest matching symbol string is searched from the dictionary among the partial data strings in the input data. Do things. By this iterative process, S4
If there is no match in the dictionary with the symbol string isn in 14, the longest match with the above-mentioned partial data string of the input data among the symbol strings registered in the dictionary is indicated by the index i. It becomes a symbol string.

【０００８】以上のようにＬＺＷ符号では過去のデータ
列を部分データ列に分解して各々の部分データ列を順次
辞書に登録していくことにより符号化処理の内部で辞書
の内容が更新されていくことを特徴としている。ここで
作成される辞書を模式的に表すと図５に示すような木構
造となる。ここで、木の枝がシンボルを表し、木の根か
ら節点までで表されるシンボル列が辞書に登録されたシ
ンボル列である。また、その節点に付けられている番号
がシンボル列を識別するためのインデックスとなってい
る。ＬＺＷ符号は、その性質上、入力データ列中に出現
する頻度の高いシンボル列を含む枝は伸びやすくなり、
また、そのように長く伸びた枝で示されたシンボル列長
は各枝を識別するためのインデックスを符号化した際の
符号長よりも長くなるため、データの圧縮が可能とな
る。As described above, in the LZW code, the contents of the dictionary are updated inside the encoding process by dividing the past data string into partial data strings and sequentially registering each partial data string in the dictionary. It is characterized by going. A schematic structure of the dictionary created here has a tree structure as shown in FIG. Here, a tree branch represents a symbol, and a symbol string represented from the root of the tree to a node is a symbol string registered in the dictionary. Further, the number attached to the node serves as an index for identifying the symbol string. By the nature of the LZW code, a branch including a symbol string that frequently appears in the input data string is likely to extend,
Further, since the symbol string length indicated by such a long extended branch is longer than the code length when the index for identifying each branch is coded, data can be compressed.

【０００９】[0009]

【発明が解決しようとする課題】しかし、上述のような
従来のデータ圧縮方式では、図４のＳ４０３からＳ４０
６の反復処理により追加更新される辞書の枝は、それま
でに辞書に登録されていたシンボル列のいずれかの最後
尾に１シンボル連結して作成されるので、１シンボル分
しか伸びない。ＬＺＷ符号では、効果的なデータ圧縮を
行うためには辞書の中で出現頻度の高い枝はできるだけ
速やかに伸びることが望まれるが、前記の構成上、枝の
伸びる速さは極めて制限されたものとなっている。この
ため、特に辞書の作成される初期の段階ではデータ圧縮
性能が低下する。However, in the conventional data compression method as described above, S403 to S40 in FIG.
The branch of the dictionary that is additionally updated by the iterative processing of 6 is created by connecting one symbol to the end of any of the symbol strings registered in the dictionary up to that point, and thus extends for only one symbol. In the LZW code, in order to perform effective data compression, it is desirable that the branch with the high frequency of appearance in the dictionary grows as quickly as possible, but the speed at which the branch grows is extremely limited due to the above configuration. Has become. For this reason, the data compression performance deteriorates especially in the initial stage of creating the dictionary.

【００１０】本発明では、上記の点に鑑み、辞書に登録
されている枝をできるだけ速やかに伸ばすことにより、
より高性能なデータ圧縮を実現することを目的とする。In the present invention, in view of the above point, by extending the branch registered in the dictionary as quickly as possible,
The purpose is to achieve higher performance data compression.

【００１１】[0011]

【課題を解決するための手段】以上の課題を解決するた
めに本発明では、従来のデータ圧縮処理における図４
（ａ）のＳ４０６の辞書登録処理において、一回の処理
で最長一致部分データ列の最後尾に複数個のシンボルを
連結して、新規に生成された複数個のシンボル列を辞書
に登録するように辞書登録手段を改良した。詳しくは、
従来の辞書登録処理では入力データの最長一致部分デー
タ列の最後尾に入力データの最長一致部分データ列に続
く１つのシンボルを連結して生成された１つの新規なシ
ンボル列のみを新たに辞書へ登録していたのに対して、
本発明では、入力データの最長一致部分データ列の最後
尾に入力データの最長一致部分データ列に続く複数個の
シンボルまで連結して生成された複数個の新規なシンボ
ル列を新たに辞書へ登録する辞書登録手段を設けること
とする。さらに、前記最長一致部分データ列に連結され
るシンボルを無制限にすることにより、データ列中に現
れる出現頻度が低いような長大なシンボル列が数多く登
録されることを防止するために、予め辞書登録処理手段
でのシンボル列の最大登録数を設定しておき、一回の処
理で辞書に登録するシンボル列の数を制限する手段を持
たせることとする。In order to solve the above problems, according to the present invention, the conventional data compression processing shown in FIG.
In the dictionary registration process of S406 of (a), a plurality of symbols are concatenated to the end of the longest matching partial data string in one process so that a plurality of newly generated symbol strings are registered in the dictionary. The dictionary registration method was improved. For more information,
In the conventional dictionary registration process, only one new symbol string generated by connecting one symbol following the longest matching partial data string of input data to the end of the longest matching partial data string of input data is newly added to the dictionary. While I was registered,
According to the present invention, a plurality of new symbol strings generated by connecting a plurality of symbols following the longest matching partial data string of input data to the end of the longest matching partial data string of input data are newly registered in the dictionary. A dictionary registration means is provided. Furthermore, by limiting the number of symbols linked to the longest matching partial data string, it is possible to register in advance a dictionary in order to prevent registration of a large number of large symbol strings that appear in the data string with a low appearance frequency. The maximum number of symbol strings registered in the processing means is set in advance, and means for limiting the number of symbol strings registered in the dictionary in one processing is provided.

【００１２】[0012]

【作用】本発明では、以上の手段を持つことにより、辞
書に登録されるシンボル列ができるだけ速やかに長くな
るようになり、従来のＬＺＷ符号より高速に辞書が成長
するようになった。このことにより、特にデータ圧縮処
理において処理の初期の段階でも辞書が充分成長するの
でデータ圧縮効果が高まるものとなる。According to the present invention, by providing the above means, the symbol string registered in the dictionary becomes longer as quickly as possible, and the dictionary grows faster than the conventional LZW code. As a result, especially in the data compression process, the dictionary grows sufficiently even in the initial stage of the process, so that the data compression effect is enhanced.

【００１３】また、一回の辞書登録処理で登録するシン
ボル列の最大数を制限することにより、高速に辞書を成
長させつつ無用に長大なシンボル列を登録することを避
けることができるので、データ圧縮効果が高い構成とな
っている。Further, by limiting the maximum number of symbol strings to be registered in one dictionary registration process, it is possible to grow a dictionary at high speed and avoid registering an unnecessarily long symbol string. It has a high compression effect.

【００１４】[0014]

【実施例】以下、図面を用いて本発明の実施例について
説明する。図１は、本発明の実施例のデータ圧縮装置の
構成を示したものである。Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 shows the configuration of a data compression apparatus according to an embodiment of the present invention.

【００１５】入力データ列１０１がデータ圧縮装置１０
２に入力されると、最長一致検索部１０４で辞書１０５
に登録されているシンボル列と一致する最長の入力デー
タ列１０１の部分データ列が探索される。辞書に登録さ
れているシンボル列にはそれぞれ他のシンボル列と一意
に識別可能なインデックスが付されており、入力データ
１０１の部分データ列と最長一致するシンボル列のイン
デックスが符号化部１０６に送られて符号化され、圧縮
データ１０３に変換され、データ圧縮装置１０２から出
力される。また、最長一致検索部１０４で検索されたシ
ンボル列は、辞書登録部１０７に送られる。ここでは、
現在符号化された最長一致シンボル列とその直前に符号
化された最長一致シンボル列から新たにシンボル列を作
成し、辞書１０５に新規に登録する。この際、一回の辞
書登録処理で登録されるシンボル列の個数は予め定めら
れた最大登録数設定値１０８を越えないこととする。The input data string 101 is the data compression device 10.
2 is input, the longest match search unit 104 causes the dictionary 105
The partial data string of the longest input data string 101 that matches the symbol string registered in is searched. The symbol sequences registered in the dictionary are each provided with an index that can be uniquely identified from other symbol sequences, and the index of the symbol sequence that has the longest match with the partial data sequence of the input data 101 is sent to the encoding unit 106. It is encoded, encoded, converted into compressed data 103, and output from the data compression device 102. The symbol string searched by the longest match search unit 104 is sent to the dictionary registration unit 107. here,
A new symbol string is created from the currently encoded longest matching symbol string and the immediately preceding longest matching symbol string, and is newly registered in the dictionary 105. At this time, the number of symbol strings registered in one dictionary registration process does not exceed a predetermined maximum registration number setting value 108.

【００１６】図２は、本発明の実施例のデータ圧縮方式
の処理の流れを示したものである。本実施例において
も、簡単のために入力データ列は０と１の２つのシンボ
ルにより構成される２値データ列であるものとする。FIG. 2 shows a processing flow of the data compression method according to the embodiment of the present invention. Also in this embodiment, for simplicity, the input data string is assumed to be a binary data string composed of two symbols 0 and 1.

【００１７】最初に、データ圧縮処理の全体の流れにつ
いて図１（ａ）を用いて説明する。まず、Ｓ２０１で従
来のＬＺＷ符号と同様に辞書の初期化を行う。本実施例
では、２値データ列なので０と１のシンボルが互いに一
意に識別可能なインデックスが付されて初期値として登
録される。Ｓ２０２では、入力データ列を第１シンボル
目から読み込むためのカウンタｎを１にセットする。次
に、Ｓ２０３で第ｎ番目のシンボルｓｎを入力し、その
シンボルｓｎと一致するものを辞書の中から探索し、そ
のインデックスをｉＬとする。また、カウンタｎの値を
１だけ増加させる。First, the overall flow of data compression processing will be described with reference to FIG. First, in S201, the dictionary is initialized as in the conventional LZW code. In this embodiment, since it is a binary data string, symbols of 0 and 1 are registered as initial values with indexes that can be uniquely identified from each other. In S202, a counter n for reading the input data string from the first symbol is set to 1. Next, in step S203, the n-th symbol sn is input, the one matching the symbol sn is searched from the dictionary, and its index is set to iL. Also, the value of the counter n is incremented by 1.

【００１８】Ｓ２０３で求めたインデックスｉＬはＳ２
０４で符号化され符号ｃ（ｉＬ）として出力される。次
に、Ｓ２０５でシンボルｓｎについて、それと一致する
ものを辞書の中から探索し、そのインデックスをｉＰと
する。また、カウンタｎの値を１だけ増加させる。Ｓ２
０６では最長一致検索処理が行われる。ここでは、入力
データ列において前記のシンボルｓｎから始まる部分デ
ータ列と最長一致するシンボル列を辞書の中から探索す
る。そして、最長一致したシンボル列のインデックスの
値を新たにｉＰへ返し、また、データ列中の最長一致し
た部分データ列の次のシンボルの値を新たにｓｎとへ返
す。最長一致検索処理の流れについては後で詳しく説明
する。インデックスｉＰはＳ２０７で符号化され、符号
ｃ（ｉＰ）として出力される。The index iL obtained in S203 is S2
It is encoded in 04 and output as a code c (iL). Next, in step S205, the symbol sn is searched for in the dictionary for a match, and its index is set to iP. Also, the value of the counter n is incremented by 1. S2
In 06, the longest match search process is performed. Here, in the input data string, the longest matching symbol string with the partial data string starting from the symbol sn is searched from the dictionary. Then, the index value of the longest matching symbol string is newly returned to iP, and the value of the symbol next to the longest matching partial data string in the data string is newly returned to sn. The flow of the longest match search process will be described later in detail. The index iP is encoded in S207 and output as a code c (iP).

【００１９】次に、Ｓ２０８では新規なシンボル列を辞
書に登録する辞書登録処理が行われる。ここでは、一回
の辞書登録処理につき複数個の新規なシンボル列が辞書
に登録される。ここで、予め辞書登録処理Ｓ２０８で最
大幾つまでの新規なシンボル列を辞書に登録するかを定
めておき、その値をＭａｘＥｎｔとする。つまり、一度
の辞書登録処理により、１個からＭａｘＥｎｔ個までの
新規なシンボル列が辞書に登録される。このため、辞書
登録処理Ｓ２０８では、入力データ列の中で出現頻度の
高くかつ長大な部分データ列が辞書中のシンボル列とし
て登録されることとなり、辞書の学習効果が向上するの
でデータ圧縮率が向上することとなる。辞書登録処理Ｓ
２０８による登録処理の詳細については後で説明する。
最後に、Ｓ２０９でインデックスｉＬとインデックスｉ
Ｐの比較が行われ、インデックスｉＬとインデックスｉ
Ｐの値が異なるときは、インデックスｉＰの値がインデ
ックスｉＬへ代入され、その後Ｓ２０５へと戻る。ま
た、インデックスｉＬとインデックスｉＰの値が等しい
ときは、Ｓ２０５へ戻る。この分岐処理についての説明
も辞書登録処理Ｓ２０８の詳細な説明を行うときに同時
に行う。Next, in S208, a dictionary registration process for registering a new symbol string in the dictionary is performed. Here, a plurality of new symbol strings are registered in the dictionary for each dictionary registration process. Here, the maximum number of new symbol strings to be registered in the dictionary is determined in advance in the dictionary registration processing S208, and the value is set as MaxEnt. That is, one to MaxEnt new symbol strings are registered in the dictionary by one dictionary registration process. Therefore, in the dictionary registration processing S208, a long partial data string having a high appearance frequency in the input data string is registered as a symbol string in the dictionary, and the learning effect of the dictionary is improved, so that the data compression rate is increased. It will be improved. Dictionary registration process S
Details of the registration processing by 208 will be described later.
Finally, in step S209, the index iL and the index i
P is compared and index iL and index i
When the value of P is different, the value of the index iP is substituted into the index iL, and then the process returns to S205. When the values of the index iL and the index iP are equal, the process returns to S205. The description of this branching process will be given at the same time as the detailed description of the dictionary registration process S208.

【００２０】次に、最長一致検索処理の流れについて図
１（ｂ）を用いて説明する。Next, the flow of the longest match search process will be described with reference to FIG.

【００２１】まず、Ｓ２２１で入力データ中にシンボル
ｓｎが存在するかどうか調べる。シンボルｓｎが存在し
ないときは入力データの処理は全て終了したと判断し、
インデックスｉＰをＳ２２２で符号化し、符号ｃ（ｉ
Ｐ）として出力してデータ圧縮処理を終了する。シンボ
ルｓｎが存在するときは処理を続行し、Ｓ２２３へと進
む。First, in S221, it is checked whether or not the symbol sn exists in the input data. When the symbol sn does not exist, it is determined that the processing of the input data is completed,
The index iP is encoded in S222, and the code c (i
P) to end the data compression process. If the symbol sn is present, the process is continued and the process proceeds to S223.

【００２２】Ｓ２２３では、インデックスｉＰで示され
るシンボル列とシンボルｓｎの連結が行われる。連結は
インデックスｉＰで示されるシンボル列の最後尾にシン
ボルｓｎを追加することで達成される。そして、この連
結処理により新たに生成されたシンボル列をｉＰｓｎと
する。Ｓ２２４では、シンボル列ｉＰｓｎに対する探索
を行う。つまり、辞書の中からシンボル列ｉＰｓｎに一
致するものがあるかどうか検索し、一致するものがない
ときは、インデックスｉＰで示されるシンボル列が最長
一致シンボル列となり、最長一致シンボル列のインデッ
クスｉＰ及び最長一致シンボル列の次に現れるシンボル
ｓｎの値をデータ圧縮処理の全体の流れのＳ２０８に返
し、最長一致検索処理を終了する。また、Ｓ２２４で辞
書の中にシンボル列ｉＰｓｎに一致するものがあるとき
は、Ｓ２２１に戻り、最長一致検索処理を継続する。こ
のように、入力データから１シンボルづつ連結処理によ
り部分データ列を伸ばして順次辞書からの検索を行うこ
とにより、辞書に登録されたシンボル列と最長一致する
部分データ列を探索することができる。At S223, the symbol string indicated by the index iP and the symbol sn are connected. The concatenation is achieved by adding the symbol sn to the end of the symbol string indicated by the index iP. Then, the symbol string newly generated by this concatenation processing is set to iPsn. In S224, the symbol string iPsn is searched. That is, the dictionary is searched for a match with the symbol string iPsn, and if there is no match, the symbol string indicated by the index iP becomes the longest matching symbol string, and the index iP of the longest matching symbol string and The value of the symbol sn appearing next to the longest match symbol string is returned to S208 of the overall flow of the data compression process, and the longest match search process is ended. If there is a match in the dictionary with the symbol string iPsn in S224, the process returns to S221 to continue the longest match search process. In this way, the partial data string is extended from the input data by one symbol by the concatenation process and sequentially searched from the dictionary, so that it is possible to search for the partial data string that has the longest match with the symbol string registered in the dictionary.

【００２３】次に、辞書登録処理の流れについて図１
（ｃ）を用いて説明する。辞書登録する際には、前述し
たように予め一回の辞書登録処理で登録することができ
るシンボル列の最大数ＭａｘＥｎｔを定めておく。ま
た、辞書登録処理には、データ圧縮処理の全体の流れか
ら、インデックスｉＬ及びｉＰが渡される。そして、イ
ンデックスｉＬで示されるシンボル列にインデックスｉ
Ｐで示されるシンボル列の先頭のいくつかのシンボルを
連結することにより、新たなシンボル列を生成し、そこ
で生成されたシンボル列を辞書に登録することとなる。
最初に、Ｓ２３１で複数（最大ＭａｘＥｎｔ個）のシン
ボル列を登録するために必要な反復処理を行うためのカ
ウンタ値ｊを０に初期化する。そして、Ｓ２３２におい
て、インデックスｉＰで示されるシンボル列の第ｊ番目
のシンボルをｉＰ〔ｊ〕で示し、インデックスｉＰで示
されるシンボル列にシンボルｉＰ〔ｊ〕が存在するかど
うか判定する。ここで、シンボル列の先頭のシンボルは
第０番目のシンボルであるとする。Next, the flow of the dictionary registration process is shown in FIG.
An explanation will be given using (c). When registering a dictionary, the maximum number MaxEnt of symbol strings that can be registered in one dictionary registration process is set in advance as described above. Further, the indexes iL and iP are passed to the dictionary registration processing from the overall flow of the data compression processing. Then, the index i is added to the symbol string indicated by the index iL.
A new symbol string is generated by concatenating some symbols at the beginning of the symbol string indicated by P, and the generated symbol string is registered in the dictionary.
First, in step S231, a counter value j for performing an iterative process required to register a plurality of (maximum MaxEnt) symbol strings is initialized to zero. Then, in S232, the j-th symbol in the symbol string indicated by the index iP is indicated by iP [j], and it is determined whether or not the symbol iP [j] exists in the symbol string indicated by the index iP. Here, it is assumed that the leading symbol of the symbol string is the 0th symbol.

【００２４】シンボルｉＰ〔ｊ〕が存在しないときは辞
書登録処理を終了し、データ圧縮処理の全体の流れに戻
る。また、シンボルｉＰ〔ｊ〕が存在するときはＳ２３
３に進み、シンボルｉＰ〔ｊ〕をｔとする。次に、Ｓ２
３４で辞書へのシンボル列の登録が行われる。まず、イ
ンデックスｉＬで示されるシンボル列の最後尾にシンボ
ルｔを連結し、連結により新たに生成されたシンボル列
をｉＬｔとする。インデックスｉＬで示されるシンボル
列は、現在辞書に登録されているシンボル列の中で入力
データの部分データ列と最長一致するものである。ま
た、インデックスｉＰで示されるシンボル列は、入力デ
ータ中でインデックスｉＬで示されるシンボル列に続く
シンボル列だから、シンボル列ｉＬｔは現在の辞書には
登録されていない。そこで、新規にシンボル列ｉＬｔを
辞書に登録し、辞書に登録されている他のシンボル列と
一意に識別可能なインデックスを付けておく。このよう
に辞書にシンボル列を登録したらば、Ｓ２３５に進みシ
ンボル列ｉＬｔのインデックスを新たにｉＬとし、ま
た、カウンタ値ｊを１だけ増加させる。最後に、カウン
タ値ｊとＭａｘＥｎｔの値を比較し、等しければ辞書登
録処理を終了し、データ圧縮処理の全体の流れに戻る。
また、カウンタ値ｊとＭａｘＥｎｔが等しくなければＳ
２３２に戻り、辞書登録処理を続ける。When the symbol iP [j] does not exist, the dictionary registration processing is terminated and the overall flow of the data compression processing is returned. When the symbol iP [j] exists, S23
3, the symbol iP [j] is set to t. Next, S2
At 34, the symbol string is registered in the dictionary. First, the symbol t is connected to the end of the symbol string indicated by the index iL, and the symbol string newly generated by the connection is set to iLt. The symbol string indicated by the index iL is the longest match with the partial data string of the input data among the symbol strings currently registered in the dictionary. Further, since the symbol string indicated by the index iP is a symbol string following the symbol string indicated by the index iL in the input data, the symbol string iLt is not registered in the current dictionary. Therefore, the symbol string iLt is newly registered in the dictionary, and an index that can be uniquely identified from other symbol strings registered in the dictionary is added. When the symbol string is registered in the dictionary in this way, the process proceeds to S235, the index of the symbol string iLt is newly set to iL, and the counter value j is incremented by 1. Finally, the counter value j and the MaxEnt value are compared, and if they are equal, the dictionary registration process is terminated and the overall flow of the data compression process is returned to.
If the counter value j and MaxEnt are not equal, S
Returning to 232, the dictionary registration processing is continued.

【００２５】ここで、インデックスｉＬとｉＰの値が等
しいときについて述べる。例えば、図３における（ａ）
のような木構造を持つ（ｅ）のような辞書が作成されて
いるとする。また、このときのインデックスｉＬの値は
２（ｉＬの示すシンボル列は００）であるとする。そし
て、この後に続く入力データ列が０００１１．．．であ
るとすると、最長一致部分データ列は００であり、イン
デックスｉＰの値も２となる。このとき、ＭａｘＥｎｔ
を２とすると、シンボル列０００ ,００００の２つが新
たに辞書に登録され、辞書は木構造（ｂ）を持つ（ｆ）
となる。ここで、通常の処理のように、インデックスｉ
Ｌの値をｉＰの値で更新して処理を進めると、インデッ
クスｉＬの値は２（ｉＬの示すシンボル列は００）で、
次の最長一致部分データ列は 01 となり、インデックス
ｉＰの値は４となる。従って、辞書に登録するシンボル
列は０００，０００１となるので、辞書は木構造（ｃ）
を持つ（ｇ）のように更新される。すると、インデック
ス６とインデックス８の示すシンボル列がともに０００
となり、辞書の構成要素が重複することとなる。このこ
とは、辞書の中に不用なインデックスが増加することに
なるので、インデックスを一意に識別できるように符号
化するための符号長が大きくなることになり、符号化効
率の低下を招く。そのため、インデックスｉＬとｉＰの
値が等しいときはインデックスｉＬの値をｉＰの値で更
新して処理を進めずに、インデックスｉＬの示すシンボ
ル列にｉＰの示すシンボル列の先頭から２シンボル目ま
でを連結して生成された最新に登録されたシンボル列の
インデックスでインデックスｉＬの値を更新して処理を
進めることとする。すると、図２の例では、インデック
スｉＬの値は７となり、それが示すシンボル列は０００
０となって、次の最長一致部分データ列は０１だから、
インデックスｉＰの値は４となり、結局、辞書は木構造
（ｄ）を持つ（ｉ）のように更新される。Here, the case where the values of the indexes iL and iP are equal will be described. For example, (a) in FIG.
It is assumed that a dictionary such as (e) having a tree structure like this has been created. The value of the index iL at this time is 2 (the symbol string indicated by iL is 00). Then, the subsequent input data string is 00011. ．． Then, the longest matching partial data string is 00, and the value of the index iP is also 2. At this time, MaxEnt
Let 2 be 2, two symbol strings 000 and 0000 are newly registered in the dictionary, and the dictionary has a tree structure (b) (f).
Becomes Here, the index i
When the value of L is updated with the value of iP and the processing is advanced, the value of the index iL is 2 (the symbol string indicated by iL is 00),
The next longest matching partial data string is 01, and the index iP value is 4. Therefore, since the symbol string registered in the dictionary is 0000001, the dictionary has a tree structure (c).
Is updated as in (g). Then, the symbol strings indicated by index 6 and index 8 are both 000.
Therefore, the constituent elements of the dictionary are duplicated. This results in an increase in unnecessary indexes in the dictionary, resulting in a large code length for encoding the index so that it can be uniquely identified, resulting in a decrease in encoding efficiency. Therefore, when the values of the index iL are equal to the values of the iP, the value of the index iL is updated with the value of the iP and the process is not advanced. It is assumed that the value of the index iL is updated with the index of the latest registered symbol string generated by concatenating and the processing is advanced. Then, in the example of FIG. 2, the value of the index iL is 7, and the symbol sequence indicated by it is 000.
Since it becomes 0, and the next longest matching partial data string is 01,
The value of the index iP becomes 4, and the dictionary is eventually updated to (i) having the tree structure (d).

【００２６】[0026]

【発明の効果】以上のように、本発明は辞書登録処理を
行う際、最長一致部分データ列の最後尾に入力データ列
のそれに続く複数個のシンボルまで連結して、複数個の
新規なシンボル列を作成し、辞書に登録することによ
り、従来のデータ圧縮処理に比べて辞書の成長が高速に
なり、圧縮効果が高くなる。As described above, according to the present invention, when a dictionary registration process is performed, a plurality of new symbols are connected to the end of the longest matching partial data string up to a plurality of symbols following the input data string. By creating a column and registering it in the dictionary, the dictionary grows faster and the compression effect is higher than in the conventional data compression process.

【００２７】また、前記の最長一致部分データ列に連結
されるシンボル列の最大数を制限することにより、入力
データ列中における出現頻度が極めて低くなるような長
大なシンボル列が登録されることを防止し、圧縮効率が
低下を起こさないものとなっている。By limiting the maximum number of symbol strings concatenated to the longest matching partial data string, it is possible to register a large symbol string whose appearance frequency in the input data string is extremely low. This prevents the compression efficiency from deteriorating.

[Brief description of drawings]

【図１】本発明のデータ圧縮装置の構成を示すブロック
図である。FIG. 1 is a block diagram showing a configuration of a data compression device of the present invention.

【図２】本発明のデータ圧縮装置の制御を示すフローチ
ャート図である。FIG. 2 is a flowchart showing control of the data compression apparatus of the present invention.

【図３】本発明のデータ圧縮装置で用いられる辞書登録
処理の説明図である。FIG. 3 is an explanatory diagram of dictionary registration processing used in the data compression apparatus of the present invention.

【図４】従来のデータ圧縮装置方法のの制御を示すフロ
ーチャート図である。FIG. 4 is a flowchart showing control of a conventional data compression method.

【図５】辞書の木構造による表現を説明した図である。FIG. 5 is a diagram illustrating an expression based on a tree structure of a dictionary.

[Explanation of symbols]

１０１入力データ１０２データ圧縮装置１０３圧縮データ１０４最長一致データ検索部１０５辞書１０６符号化部１０７辞書登録部１０８最長登録数設定値 101 Input Data 102 Data Compressor 103 Compressed Data 104 Longest Match Data Search Unit 105 Dictionary 106 Encoding Unit 107 Dictionary Registration Unit 108 Longest Registration Number Set Value

Claims

[Claims]

1. A data compression apparatus for generating compressed data by removing a redundant component contained in input data composed of discrete information composed of a plurality of types of symbols, wherein the longest data matching a partial data string of the input data. Longest matching search means for searching the symbol string registered in the dictionary for the symbol string, an encoding means for creating a code based on the index given to the symbol string, and an encoding by the encoding means. Generating means for newly generating a plurality of symbol sequences from the generated partial data sequence and the partial data sequence encoded immediately before, and a dictionary registration for registering the symbol sequence generated by the generating means in the dictionary And a data compression device.

2. The data compression apparatus according to claim 1, wherein the dictionary registration means is preset with a maximum value of the number of symbol strings newly registered in one processing.