JP2799228B2

JP2799228B2 - Dictionary initialization method

Info

Publication number: JP2799228B2
Application number: JP19439690A
Authority: JP
Inventors: 茂吉田; 泰彦中野; 佳之岡田; 広隆千葉
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1990-07-23
Filing date: 1990-07-23
Publication date: 1998-09-17
Anticipated expiration: 2013-09-17
Also published as: JPH0480813A

Description

【発明の詳細な説明】〔目次〕概要産業上の利用分野従来の技術発明が解決しようとする課題課題を解決するための手段作用実施例発明の効果〔概要〕データを圧縮する際の辞書初期化方式に関し、所要時間を長くすることなく、辞書の初期化によるデ
ータ圧縮率の低下を防ぐことを目的とし、相異なる文字列を当該文字列よりも前に辞書に登録さ
れた文字列の参照番号と増分とで表した辞書に順次に登
録しながら、入力された文字列を符号化してデータを圧
縮する際の辞書初期化方式において、導入される検索通
知に基づいて、辞書に登録された各要素が検索された回
数を計数する計数手段と、初期化指示に応じて、計数手
段による計数結果に基づいて使用頻度の低い要素を検出
し、辞書の格納領域を少なくとも１つの使用頻度の低い
要素で区切られた複数のブロックに分割するとともに、
これらのブロックの分割に関する分割情報を出力する分
割手段と、この分割情報に基づいて、複数のブロックご
とに、辞書における格納場所を移動する移動手段と、分
割情報に基づいて、移動手段によって移動された辞書の
各要素に含まれる参照番号を該当する要素の移動後の格
納場所に対応して変更する変更手段とを備え、使用頻度
の低い要素を削除するとともに他の要素の辞書における
格納場所を詰めて、新しい要素の登録のための格納場所
を確保するように構成する。DETAILED DESCRIPTION OF THE INVENTION [Table of Contents] Outline Industrial field of application Conventional technology Problems to be solved by the invention Means for solving the problem Operation Embodiment Effects of the invention [Overview] Compress data In order to prevent the reduction of the data compression rate due to the initialization of the dictionary without lengthening the required time, different character strings were registered in the dictionary before the relevant character strings. While sequentially registering in the dictionary represented by the reference number of the character string and the increment, the dictionary is initialized based on the search notification introduced in the dictionary initialization method when encoding the input character string and compressing the data. Counting means for counting the number of times each element registered in the dictionary is searched, and, in response to the initialization instruction, detecting an infrequently used element based on the counting result by the counting means, and storing at least one storage area of the dictionary. Frequent use Thereby divided into a plurality of blocks separated by low elements of,
A dividing unit that outputs division information regarding division of these blocks, a moving unit that moves a storage location in the dictionary for each of a plurality of blocks based on the division information, and a moving unit that moves based on the division information. Means for changing the reference number included in each element of the dictionary according to the storage location of the corresponding element after the movement of the corresponding element, deleting the infrequently used elements and changing the storage location of the other elements in the dictionary. In other words, it is configured to secure a storage location for registering a new element.

[Industrial applications]

本発明は、増分分解型のZiv−Lempel符号化および復
号化の際の辞書の初期化方式に関する。The present invention relates to a dictionary initialization method during incremental decomposition type Ziv-Lempel encoding and decoding.

近年、文字コード，ベクトル情報，画像情報など様々
な種類のデータがコンピュータによって扱われるように
なっており、また、扱われるデータ量も急速に増大して
いる。In recent years, various types of data such as character codes, vector information, and image information have been handled by computers, and the amount of data handled has rapidly increased.

このような膨大なデータを蓄積したり伝送したりする
際には、データの中に含まれている冗長な部分を省いて
データ量を圧縮することが望ましい。このため、テータ
の種類にかかわらず、効率よくデータを圧縮する方法が
望まれている。When storing or transmitting such a huge amount of data, it is desirable to reduce the amount of data by omitting redundant portions included in the data. Therefore, there is a demand for a method of efficiently compressing data regardless of the type of data.

ユニバーサル符号化方式は、予め符号表を定めておく
必要がないため、上述した様々なデータの圧縮に適用す
ることができるという特徴を有している。The universal encoding method has a feature that it can be applied to the above-described various data compression because it is not necessary to define a code table in advance.

ここで、本明細書においては、データの１語単位を
『文字』と称し、連続した複数語のデータを『文字列』
と称する。Here, in this specification, one word unit of data is referred to as “character”, and data of a plurality of continuous words is referred to as “character string”.
Called.

シブ−レンペル（Ziv−Lempel）符号は、上述したユ
ニバーサル符号の代表的な方法であり（宗像著「Ziv−L
empelのデータ圧縮法」，情報処理,Vol.26,No.1,1985参
照）、ユニバーサル型のアルゴリズムと増分分解型のア
ルゴリズムとが提案されている。更に、ユニバーサル型
アルゴリズムの改良として、LZSS符号（T.C.Bell,“Bet
ter OPM/L Text Compression",IEEE Trans.on Commun.,
Vol.COM−34,No.12,Dec.1986参照）があり、増分分解型
アルゴリズムの改良として、LZW符号（T.A.Welch,“A T
ethnique for High−Performance Date Compression",C
omputer,June 1984）がある。The Ziv-Lempel code is a typical method of the universal code described above (see "Ziv-L" written by Munakata).
empel data compression method ", Information Processing, Vol. 26, No. 1, 1985), a universal type algorithm and an incremental decomposition type algorithm have been proposed. Furthermore, as an improvement of the universal algorithm, LZSS codes (TCBell, “Bet
ter OPM / L Text Compression ", IEEE Trans.on Commun.,
Vol.COM-34, No. 12, Dec. 1986). As an improvement of the incremental decomposition type algorithm, LZW code (TAWelch, “AT
ethnique for High-Performance Date Compression ", C
omputer, June 1984).

これらの符号化方式のうち、高速処理が可能であるこ
ととアルゴリズムが簡単であることから、LZW符号が記
憶装置のファイル圧縮などで使われるようになってい
る。Among these encoding methods, the LZW code is used for file compression of a storage device or the like because high-speed processing is possible and the algorithm is simple.

[Conventional technology]

増分分解型アルゴリズムは、入力された文字列を、既
に辞書に登録された部分列に１文字を増分として付加し
て形成される成分の系列に分解し、この成分の系列を登
録された部分列に対応する参照番号と増分とで表すこと
により、入力文字列を符号化するものである。また、上
述した成分は、新しい部分列として辞書に登録され、以
降の符号化処理に用いられるようになっている。The incremental decomposition type algorithm decomposes an input character string into a series of components formed by adding one character as an increment to a substring already registered in the dictionary, and decomposes this series of components into a registered substring. The input character string is encoded by expressing the reference number and the increment corresponding to. Further, the above-described components are registered in the dictionary as new subsequences, and are used in subsequent encoding processing.

更に、LZW符号においては、上述した増分を次の部分
列に組み込むようになっている。Further, in the LZW code, the above-described increment is incorporated in the next subsequence.

以下、簡単のために、入力文字列として、“a",“b",
“c"の３文字からなる文字列“ababcbababaaaaa・・
・”（第６図（ａ）参照）が入力された場合について、
このLZW符号化方式について説明する。Hereinafter, for the sake of simplicity, “a”, “b”,
A character string consisting of three characters "c""ababcbababaaaaa ...
When "" (see FIG. 6 (a)) is input,
The LZW encoding method will be described.

この場合は、上述した３文字“a",“b",“c"を辞書の
アドレス『１』，『２』，『３』のそれぞれに登録し
て、符号化処理を開始する。In this case, the three characters "a", "b", and "c" described above are registered in the dictionary addresses "1", "2", and "3", respectively, and the encoding process is started.

まず、上述した入力文字列の先頭の文字（例えば文字
“a"）を読み込み、辞書からこの文字を検索し、この文
字が格納されている辞書のアドレス（例えば『１』）を
参照番号ωとする。First, the first character (for example, the character "a") of the above-mentioned input character string is read, this character is searched from the dictionary, and the address of the dictionary (for example, "1") where this character is stored is referred to as a reference number ω. I do.

その後、入力文字列の２番目以降の各文字を順次に読
み込んで、この文字を上述した増分に相当する拡張文字
Ｋとし、上述した参照番号ωとこの拡張文字Ｋとの組合
せ（ωＫ）で表される部分列（ωＫ）（以下、組合せ
（ωＫ）を部分列の表現と称する）を辞書から検索す
る。該当する部分列（ωＫ）が検索された場合は、上述
した部分列（ωＫ）に対応する符号を新しい参照番号ω
として、更に、入力文字列の次の文字を読み込んで、上
述した処理を繰り返す。Thereafter, the second and subsequent characters of the input character string are sequentially read, and this character is set as an extended character K corresponding to the increment described above, and is represented by the combination (ωK) of the reference number ω and the extended character K described above. A subsequence (ωK) (hereinafter, the combination (ωK) is referred to as a subsequence expression) is searched from the dictionary. If the corresponding subsequence (ωK) is found, the code corresponding to the subsequence (ωK) is replaced by a new reference number ω
Then, the next character of the input character string is read, and the above-described processing is repeated.

このようにして、符号化しようとする文字列を順次に
１文字ずつ延ばしていき、辞書からこの文字列を順次に
検索することにより、辞書に登録された部分列の中か
ら、入力文字列の注目している部分と最も長く一致する
部分列が検索され、この部分列に対応する参照番号ω
が、該当する符号として出力される。また、このとき、
参照番号ωに対応する部分列（ω）に拡張文字Ｋを継ぎ
足した部分列が、参照番号ωと拡張文字Ｋとの組合せ
（ωＫ）で表され、参照番号が与えられ、新しい部分列
として辞書に登録される。In this way, the character string to be encoded is sequentially extended one character at a time, and this character string is sequentially searched from the dictionary. The subsequence that matches the part of interest the longest is searched, and the reference number ω corresponding to this subsequence is searched.
Is output as the corresponding code. At this time,
A subsequence obtained by adding the extended character K to the subsequence (ω) corresponding to the reference number ω is represented by a combination (ωK) of the reference number ω and the extended character K, is given a reference number, and is a dictionary as a new subsequence. Registered in.

このようにして、第６図（ａ）に示した文字列は、図
において下線を付して示した部分列に分解され、第６図
（ｂ）に示すように、各部分列に対応する符号『１』，
『２』，『４』，…が出力される。また、第６図（ｃ）
に入力文字列と辞書に登録された部分列との対応関係
を、第１表に作成された辞書の例を示す。In this way, the character string shown in FIG. 6 (a) is decomposed into the underlined subsequences in the figure, and as shown in FIG. Sign "1",
"2", "4", ... are output. FIG. 6 (c)
Table 1 shows the correspondence between the input character strings and the substrings registered in the dictionary, and Table 1 shows an example of the dictionary created.

また、上述したようにして作成された辞書は、第７図
に示すように、樹状の構成を持っており、辞書の要素の
それぞれは、辞書の樹の各節点に対応している。第７図
において、各節点に括弧を付して示した数字は、対応す
る辞書の要素の参照番号を示しており、第６図（ａ）に
示した文字列の符号化処理の際に、各節点がたどられた
経路を太い実線で示した。 Further, the dictionary created as described above has a tree-like configuration as shown in FIG. 7, and each of the elements of the dictionary corresponds to each node of the dictionary tree. In FIG. 7, the numbers in parentheses at each node indicate the reference numbers of the corresponding dictionary elements. In the encoding process of the character string shown in FIG. The path followed by each node is indicated by a thick solid line.

ある符号が与えられたときに、この符号を参照番号と
する辞書の樹の枝あるいは葉に相当する節点から根の方
向に向かって、辞書の樹を逆にたどることにより、LZW
符号の復号化処理が行われる。Given a code, the LZW is obtained by tracing the dictionary tree in reverse from the node corresponding to the branch or leaf of the dictionary tree whose reference number is the reference number toward the root.
A code decoding process is performed.

まず、入力された符号ωを参照番号として辞書に登録
されている部分列の表面（ω′Ｋ）を求める。このと
き、拡張文字Ｋをスタックに保持し、次に、この参照番
号ω′に対応して辞書に登録されている部分列の表現を
検索する。このようにして、順次に検索していき、得ら
れた部分列の表現が、拡張文字Ｋのみとなるまで上述し
た処理を繰り返す。その後、スタックに保持された拡張
文字Ｋを最後に保持されたものから順にポップアップし
て出力することにより、入力された符号ωに対応する部
分列が復元される。First, the surface (ω′K) of the subsequence registered in the dictionary is determined using the input code ω as a reference number. At this time, the extended character K is stored in the stack, and then the substring expression registered in the dictionary corresponding to the reference number ω ′ is searched. The search is sequentially performed in this manner, and the above-described processing is repeated until the expression of the obtained partial string is only the extended character K. Thereafter, by popping up and outputting the extended characters K held in the stack in order from the one held last, the partial sequence corresponding to the input code ω is restored.

ここで、最後に得られる拡張文字Ｋは、符号ωに対応
する文字列の先頭の文字であり、この拡張文字Ｋと直前
の符号ω_OLDとの組合せ（ω_OLDK）が、新しい部分列の
表現として辞書に登録されるようになっている。Here, the extended character K obtained at the end is the first character of the character string corresponding to the code ω, and a combination (ω _OLD K) of the extended character K and the code _OLD immediately before is represented by a new subsequence. It is registered in the dictionary as an expression.

上述したように、LZW符号化方式においては、符号化
処理を行うとともに辞書に新しい符号を登録することに
より、入力されたデータの統計的性質を学習しながら、
データ量を効率よく圧縮するようになっている。As described above, in the LZW encoding method, by performing the encoding process and registering a new code in the dictionary, while learning the statistical properties of the input data,
The amount of data is compressed efficiently.

従って、入力されるデータの統計的な性質が変化した
場合には、それまでに蓄積された辞書では、新しい性質
を有する入力データを効率よく圧縮することができない
ため、再度学習を行う必要がある。Therefore, if the statistical properties of the input data change, the dictionary stored so far cannot efficiently compress the input data having the new properties, so that it is necessary to perform learning again. .

ここで、辞書に充分に大きな容量が割り当てられてい
る場合は、学習の履歴を全て保存しておき、新しいデー
タの性質についての再学習を行うことができるが、実際
に辞書に割り当てられるメモリなどの資源は有限であ
る。このため、通常は、辞書に割り当てられた容量まで
登録したときに、圧縮前のデータ量と圧縮後のデータ量
との比として得られる圧縮率をチェックして、圧縮率が
低下している場合は、再学習が必要であると判断し、上
述した符号化処理の開始の際と同様にして、辞書を初期
化していた。Here, if a sufficiently large capacity is allocated to the dictionary, all the learning histories can be stored and re-learning on the properties of new data can be performed. Resources are finite. For this reason, usually, when registering up to the capacity allocated to the dictionary, the compression rate obtained as the ratio of the data amount before compression to the data amount after compression is checked, and the compression rate is reduced. Determined that re-learning was necessary, and initialized the dictionary in the same manner as when the above-described encoding process was started.

[Problems to be solved by the invention]

ところで、上述した従来方式にあっては、辞書を初期
化する際に、それまでの学習の履歴を全て捨ててしま
う。しかしながら、辞書に登録された部分列が少ないと
き、即ち、上述した学習の初期の段階においては、圧縮
率は一般に低くなるので初期化処理の回数が多い場合に
は、充分な容量を有する辞書を備えた理想的な場合に比
べて、圧縮率が低下してしまうという問題点があった。By the way, in the above-described conventional method, when the dictionary is initialized, all the learning histories up to that point are discarded. However, when the number of subsequences registered in the dictionary is small, that is, in the initial stage of the above-described learning, the compression ratio is generally low, and when the number of times of the initialization processing is large, a dictionary having a sufficient capacity is required. There is a problem that the compression ratio is reduced as compared with the ideal case provided.

このように初期化処理による圧縮率の低下を抑える技
法として、本出願人は、既に、特願平２−45164『デー
タ圧縮方式』を提案している。As a technique for suppressing a decrease in the compression ratio due to the initialization processing, the present applicant has already proposed Japanese Patent Application No. 2-45164 “Data Compression Method”.

本出願人の提案による辞書の初期化方式は、辞書に登
録された要素のそれぞれについて、使用頻度を計数する
手段を設け、使用頻度の高い要素は削除することなく辞
書に残し、低い要素を削除して新しい符号を登録する領
域を確保するようにするものであり、それまでの学習の
履歴の一部を残すようになっている。The dictionary initialization method proposed by the present applicant provides a means for counting the frequency of use for each of the elements registered in the dictionary, leaving the frequently used elements in the dictionary without deleting them, and deleting the low frequency elements. Thus, an area for registering a new code is secured, and a part of the learning history up to that time is left.

しかしながら、本出願人の提案による辞書初期化方式
においては、削除する要素を検出するごとに、辞書の全
ての要素について格納領域の変更処理および部分列の表
現の変更処理を行っている。このため、辞書の要素数を
ｎとすると、辞書に対してn²回のアクセス処理が必要と
なり、初期化処理に要する時間が長くなるという欠点を
有している。However, in the dictionary initialization method proposed by the present applicant, every time an element to be deleted is detected, a process of changing the storage area and a process of changing the expression of the subsequence are performed for all the elements of the dictionary. Therefore, when the number of elements dictionary and n, n ² times the access process with respect to the dictionary is required, it has the disadvantage that the time required for initialization is prolonged.

本発明は、このような点にかんがみて創作されたもの
であり、辞書の初期化処理に要する時間を長くすること
なく、辞書の初期化による圧縮率の低下を防ぐようにし
た辞書初期化方式を提供することを目的とする。The present invention has been made in view of such a point, and a dictionary initialization method for preventing a decrease in compression rate due to dictionary initialization without increasing a time required for dictionary initialization processing. The purpose is to provide.

[Means for solving the problem]

第１図は、本発明の原理ブロック図である。 FIG. 1 is a block diagram showing the principle of the present invention.

（ｉ）請求項１の発明図において、相異なる文字列を当該文字列よりも前に
辞書110に登録された文字列の格納場所に対応する参照
番号と１文字からなる増分とで表し、この文字列の表現
を新しい要素として辞書110に順次に登録しながら、入
力された文字列を辞書110に登録された文字列の参照番
号によって符号化してデータを圧縮する際の辞書初期化
方式における計数手段111は、導入される検索通知に基
づいて、辞書110に登録された各要素が検索された回数
を計数する。(I) In the figure, different character strings are represented by a reference number corresponding to a storage location of a character string registered in the dictionary 110 before the character string and an increment of one character. Counting in the dictionary initialization method when compressing data by encoding the input character string by the reference number of the character string registered in the dictionary 110 while sequentially registering the expression of the character string as a new element in the dictionary 110 The means 111 counts the number of times each element registered in the dictionary 110 has been searched, based on the introduced search notification.

分割手段121は、初期化指示に応じて、計数手段111に
よる計数結果に基づいて使用頻度の低い要素を検出し、
辞書110の格納領域を少なくとも１つの使用頻度の低い
要素で区切られた複数のブロックに分割するとともに、
これらのブロックの分割に関する分割情報を出力する。The dividing unit 121 detects an infrequently used element based on the counting result by the counting unit 111 in response to the initialization instruction,
While dividing the storage area of the dictionary 110 into a plurality of blocks separated by at least one infrequently used element,
The division information regarding the division of these blocks is output.

移動手段131は、分割手段121によって出力された分割
情報に基づいて、複数のブロックごとに、ブロック間を
区切っている使用頻度の低い要素の分だけ辞書110にお
ける格納場所を移動する。The moving means 131 moves the storage location in the dictionary 110 for each of a plurality of blocks by an infrequently used element that separates the blocks, based on the division information output by the dividing means 121.

変更手段141は、分割情報に基づいて、移動手段131に
よって移動された辞書110の各要素に含まれる参照番号
を該当する要素の移動後の格納場所に対応して変更す
る。The changing unit 141 changes the reference number included in each element of the dictionary 110 moved by the moving unit 131 based on the division information, in accordance with the storage location of the corresponding element after the movement.

全体として、使用頻度の低い要素を削除するとともに
他の要素の辞書110における格納場所を詰めて、新しい
要素の登録のための格納場所を確保するように構成され
ている。As a whole, the configuration is such that the infrequently used elements are deleted, and the storage locations of the other elements in the dictionary 110 are reduced to secure storage locations for registering new elements.

（ii）請求項２の発明請求項２の発明は、請求項１の発明による辞書初期化
方式において、変更手段141が、各要素に含まれる参照
番号に対応する要素が属しているブロックを２分探索に
よって検出し、該当するブロックの格納場所の移動量に
基づいて、参照番号の変更を行うように構成されてい
る。(Ii) The invention of claim 2 According to the invention of claim 2, in the dictionary initialization method according to the invention of claim 1, the changing means 141 sets the block to which the element corresponding to the reference number included in each element belongs by two. It is configured to detect by a minute search and change the reference number based on the movement amount of the storage location of the corresponding block.

(Operation)

（ｉ）請求項１の発明請求項１の発明においては、導入される検索通知に基
づいて、計数手段111により、辞書110に登録された各要
素が検索された回数が計数され、初期化指示に応じて、
この計数手段111による計数結果に基づいて、分割手段1
21により、使用頻度の低い要素が検出される。また、こ
の分割手段121により、辞書110の格納領域が、少なくと
も１つの使用頻度の低い要素で区切られた複数のブロッ
クに分割され、これらのブロックの分割に関する分割情
報が出力される。(I) The invention of claim 1 In the invention of claim 1, the number of times each element registered in the dictionary 110 has been searched is counted by the counting means 111 based on the introduced search notification, and an initialization instruction is issued. In response to the,
Based on the counting result by the counting means 111, the dividing means 1
According to 21, an element that is used less frequently is detected. The dividing unit 121 divides the storage area of the dictionary 110 into a plurality of blocks separated by at least one infrequently used element, and outputs division information on the division of these blocks.

この分割手段121によって出力された分割情報に基づ
いて、移動手段131により、上述した複数のブロックご
とに、ブロック間を区切っている使用頻度の低い要素の
分だけ辞書110における格納場所が移動させられる。Based on the division information output by the division unit 121, the storage unit in the dictionary 110 is moved by the movement unit 131 by the infrequently used elements separating the blocks for each of the plurality of blocks described above. .

また、上述した分割情報に基づいて、変更手段141に
より、移動手段131によって移動させられた辞書110の各
要素に含まれる参照番号が、該当する要素の移動後の格
納場所に対応する参照番号に変更される。Further, based on the division information described above, the reference number included in each element of the dictionary 110 moved by the moving unit 131 by the changing unit 141 is changed to the reference number corresponding to the storage location of the corresponding element after the movement. Be changed.

このようにして、使用頻度の低い要素が削除され、他
の要素の辞書110における格納場所が詰められ、新しい
要素の登録のための格納場所を確保される。In this way, the less frequently used elements are deleted, the storage locations of the other elements in the dictionary 110 are reduced, and the storage locations for registering new elements are secured.

請求項１の発明にあっては、各要素の移動処理を行う
前に、辞書110の格納領域が、使用頻度の低い削除すべ
き要素によって区切られた複数のブロックに分割され、
これらのブロックごとに、辞書110の要素の格納場所の
移動が行われる。従って、削除すべき要素を検出するご
とに、以降の格納場所に登録されている全ての要素の格
納場所の移動を行う場合に比べて、処理量が大幅に削減
される。これにより、初期化処理に要する時間を長くす
ることなく、学習履歴の一部を保存して、初期化による
圧縮率の低下を防ぐことが可能となる。According to the first aspect of the present invention, before performing the moving process of each element, the storage area of the dictionary 110 is divided into a plurality of blocks divided by the infrequently used elements to be deleted,
The storage location of the elements of the dictionary 110 is moved for each of these blocks. Therefore, every time an element to be deleted is detected, the processing amount is greatly reduced as compared with the case where the storage locations of all the elements registered in the subsequent storage locations are moved. As a result, it is possible to save a part of the learning history without increasing the time required for the initialization processing, and to prevent a reduction in the compression ratio due to the initialization.

（ii）請求項２の発明請求項２の発明においては、変更手段141により、各
要素に含まれる参照番号に対応する要素が属しているブ
ロックが、２分探索によって検出され、該当するブロッ
クの格納場所の移動量に基づいて、参照番号の変更が行
われる。(Ii) Invention of Claim 2 According to the invention of claim 2, the block to which the element corresponding to the reference number included in each element belongs is detected by the changing means 141 by a binary search, and The reference number is changed based on the amount of movement of the storage location.

請求項２の発明にあっては、２分探索手法を用いて該
当するブロックを検出することにより、参照番号の変更
処理に要する時間を短縮することができ、初期化処理に
要する時間を更に短縮することができる。According to the second aspect of the present invention, the time required for the reference number changing process can be reduced by detecting the corresponding block using the binary search method, and the time required for the initialization process is further reduced. can do.

〔Example〕

以下、図面に基づいて本発明の実施例について詳細に
説明する。Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

第２図は、本発明の一実施例による辞書初期化方式を
適用したデータ圧縮装置の構成を示す。FIG. 2 shows a configuration of a data compression apparatus to which a dictionary initialization method according to one embodiment of the present invention is applied.

第５図は、本発明の一実施例による辞書初期化方式を
適用したデータ復元装置の構成を示す。FIG. 5 shows a configuration of a data restoration apparatus to which a dictionary initialization method according to one embodiment of the present invention is applied.

ここで、第１図と実施例との対応関係について説明し
ておく。Here, the correspondence between FIG. 1 and the embodiment will be described.

辞書110は、辞書221に相当する。 The dictionary 110 corresponds to the dictionary 221.

計数手段111は、カウンタ222に相当する。 The counting means 111 corresponds to the counter 222.

分割手段121は、初期化処理部230の分割処理部231に
相当する。The division unit 121 corresponds to the division processing unit 231 of the initialization processing unit 230.

移動手段131は、初期化処理部230の移動処理部232に
相当する。The movement unit 131 corresponds to the movement processing unit 232 of the initialization processing unit 230.

変更手段141は、初期化処理部230の変更処理部233に
相当する。The change unit 141 corresponds to the change processing unit 233 of the initialization processing unit 230.

以上のような対応関係があるものとして、以下実施例
の構成および動作を説明する。The configuration and operation of the embodiment will be described below assuming that there is the above correspondence.

第２図において、210は符号化部を、220はメモリを、
230は初期化処理部をそれぞれ示しており、また、メモ
リ220において、221は辞書を示している。In FIG. 2, 210 is an encoding unit, 220 is a memory,
Reference numeral 230 denotes an initialization processing unit, and in the memory 220, reference numeral 221 denotes a dictionary.

この辞書221は、N_max個の領域に分割されており、各
領域には『１』〜『N_max』のアドレスが付けられてお
り、これらの領域のそれぞれには、符号化部210によ
り、相異なる部分列が登録されるようになっている。The dictionary 221 is divided into N _max areas, and each area is assigned an address of “1” to “N _max ”. Each of these areas is Different substrings are registered.

また、上述した辞書221のN_max個の領域のそれぞれに
対応して、N_max個のカウンタ222₁,…,222_Nmaxが設けら
れており、これらのカウンタ222₁,…,222_Nmaxには、計
数値の初期値として『０』が格納されている。以下、こ
れらのカウンタ222₁,…,222_Nmaxを総称する際には、単
にカウンタ222と称する。Further, corresponding to each of the N _max number of regions of the dictionary 221 as described above, N _max number of counters 222 _1, ..., 222 _Nmax is provided, these counters 222 _1, ..., the 222 _Nmax, “0” is stored as the initial value of the count value. Hereinafter, these counters 222 ₁ ,..., 222 _Nmax will be simply referred to as the counter 222.

第３図に、実施例によるデータ圧縮動作を表す流れ図
を示す。FIG. 3 is a flowchart showing a data compression operation according to the embodiment.

入力文字列を構成する文字のそれぞれについて、１文
字からなる文字列を辞書221に登録して辞書221を初期化
し（ステップ301）、その後、符号化処理を開始する。For each of the characters constituting the input character string, a character string consisting of one character is registered in the dictionary 221 to initialize the dictionary 221 (step 301), and thereafter, the encoding process is started.

例えば、第７図（ａ）に示したような文字“a",“b",
“c"からなる文字列を符号化する場合は、これらの文字
“a",“b",“c"のそれぞれ１文字からなる文字列を辞書
221の要素として、アドレス『１』，『２』，『３』に
登録し、次に登録する辞書221の領域を示す登録開始ア
ドレスｎに『４』のセットすればよい。For example, the characters "a", "b",
When encoding a character string consisting of “c”, the character string consisting of one character of each of these characters “a”, “b”, and “c” is dictionaryd.
As elements of the 221, the addresses “1”, “2”, and “3” are registered, and “4” is set to the registration start address n indicating the area of the dictionary 221 to be registered next.

まず、入力文字列の最初の文字を読み込んで、この文
字が登録されている辞書221のアドレスを参照番号ωと
し（ステップ302）、以下、１文字ずつ順次に入力文字
列を読み込んで、拡張文字Ｋとする（ステップ303）。First, the first character of the input character string is read, and the address of the dictionary 221 in which the character is registered is set as a reference number ω (step 302). It is set to K (step 303).

ステップ304において、まだ読み込むべき文字がある
とされた場合（肯定判定の場合）は、上述した参照文字
ωと拡張文字Ｋとの組合せ（ωＫ）で表された部分列を
辞書221から検索する（ステップ305）。If it is determined in step 304 that there is still a character to be read (in the case of a positive determination), the dictionary 221 searches the dictionary 221 for a substring represented by the combination (ωK) of the reference character ω and the extended character K described above ( Step 305).

該当する部分列が辞書221に登録されている場合は、
ステップ306における肯定判定となり、上述した組合せ
（ωＫ）で表される部分列が登録されているアドレスを
新しい参照番号ωとし（ステップ307）、また、この参
照番号ωに対応するカウンタ222の計数をインクリメン
トして（ステップ308）、ステップ303に戻る。If the corresponding substring is registered in the dictionary 221,
An affirmative determination is made in step 306, and the address at which the substring represented by the combination (ωK) is registered is set as a new reference number ω (step 307), and the count of the counter 222 corresponding to this reference number ω is counted. Increment (step 308), and return to step 303.

このように、実施例によるデータ圧縮装置において
は、従来のLZW符号化アルゴリズム（ステップ301〜ステ
ップ307）に、ステップ308の該当するカウンタ222の計
数をインクリメントする処理が付加されている。As described above, in the data compression device according to the embodiment, the process of incrementing the count of the corresponding counter 222 in step 308 is added to the conventional LZW encoding algorithm (step 301 to step 307).

例えば、第６図（ａ）に示した文字列の１番目の文字
“a"の符号化の際に、組合せ（1b）が辞書221のアドレ
ス『４』に、また、２番目の文字“b"の符号化の際に、
組合せ（2a）が辞書221のアドレス『５』に登録されて
いる。従って、３番目の文字“a"の符号化の際には、続
いて読み出された文字“b"を拡張文字Ｋと文字“a"に対
応する参照番号『１』との組合せ（1b）が既に登録され
ているので、上述したステップ306における肯定判定と
なり、辞書221のアドレス『４』に対応するカウンタ222
₄の計数値がインクリメントされる。For example, when encoding the first character “a” of the character string shown in FIG. 6A, the combination (1b) is added to the address “4” of the dictionary 221 and the second character “b” "When encoding
The combination (2a) is registered at the address “5” of the dictionary 221. Therefore, when encoding the third character "a", the subsequently read character "b" is combined with the extended character K and the reference number "1" corresponding to the character "a" (1b). Has already been registered, the affirmative determination is made in step 306 described above, and the counter 222 corresponding to the address “4” of the dictionary 221 is registered.
The count value of ₄ is incremented.

また、第６図（ａ）に示した８番目の文字“b"を先頭
とする文字列に注目して符号化する際には、まず、文字
“b"に次の文字“a"を拡張文字Ｋとして付加した部分列
“ba"を表す表現（2a）が辞書221から検索される。ま
た、更に次の文字“b"を付加した“bab"を表す部分列の
表現（5b）が検索される。従って、この場合は、アドレ
ス『５』とアドレス『８』とのそれぞれに対応するカウ
ンタ222₅,222₈がインクリメントされる。Also, when encoding by focusing on the character string starting with the eighth character “b” shown in FIG. 6A, first, the next character “a” is extended to the character “b”. The expression (2a) representing the substring “ba” added as the character K is searched from the dictionary 221. Further, a substring expression (5b) representing "bab" to which the next character "b" is added is searched. Therefore, in this case, the counter 222 _5, 222 ₈ corresponding to each of the address "5" and the address "8" is incremented.

このように、各部分列がステップ305において検索さ
れるごとに、該当するカウンタ222の計数値が１づつイ
ンクリメントされる。従って、カウンタ222のそれぞれ
の計数値は、上述した符号化処理によって作成される辞
書の樹の対応する節点を経由して、辞書の樹の枝が延び
た回数、即ち、対応する参照番号が符号化処理で使用さ
れた頻度を示している。In this way, every time each substring is searched in step 305, the count value of the corresponding counter 222 is incremented by one. Accordingly, the count value of each of the counters 222 is the number of times the branch of the dictionary tree has been extended via the corresponding node of the dictionary tree created by the above-described encoding process, that is, the corresponding reference number is Indicates the frequency used in the conversion process.

一方、ステップ306における否定判定の場合は、参照
番号ωを符号として出力し（ステップ309）、上述した
組合せ（ωＫ）を辞書221の要素として、登録開始アド
レスｎに登録する（ステップ310）。これにより、組合
せ（ωＫ）で表される部分列に対応して参照番号『ｎ』
が定義される。On the other hand, in the case of a negative determination in step 306, the reference number ω is output as a code (step 309), and the combination (ωK) described above is registered as an element of the dictionary 221 at the registration start address n (step 310). Thus, the reference number “n” corresponding to the subsequence represented by the combination (ωK)
Is defined.

また、このときの拡張文字Ｋが登録されているアドレ
スを新しい参照番号ωとし（ステップ311）、登録開始
アドレスｎをインクリメントする（ステップ312）。The address where the extended character K is registered at this time is set as a new reference number ω (step 311), and the registration start address n is incremented (step 312).

その後、ステップ313において、登録開始アドレスｎ
と辞書221の最大アドレスN_maxとを比較し、登録開始ア
ドレスｎの方が小さい（肯定判定）場合は、上述したス
テップ303に戻り、ステップ304において、読み込むべき
文字がない（否定判定）とされた場合には、そのときの
参照番号ωを符号として出力して（ステップ314）、処
理を終了する。Then, in step 313, the registration start address n
Is compared with the maximum address _{Nmax of the} dictionary 221. If the registration start address n is smaller (positive determination), the process returns to step 303 described above, and in step 304, there is no character to be read (negative determination). If so, the reference number ω at that time is output as a code (step 314), and the process ends.

一方、上述したステップ313における否定判定の場合
は、辞書221に新しい符号を登録することができないと
判断し、符号化部210は、初期化処理部230に対して、以
下に述べる初期化処理を依頼する（ステップ315）。On the other hand, in the case of a negative determination in step 313 described above, it is determined that a new code cannot be registered in the dictionary 221 and the encoding unit 210 performs an initialization process described below to the initialization processing unit 230. Request (step 315).

この初期化依頼に応じて、初期化処理部230は、辞書2
21に登録された要素の一部を削除し、空いた領域に削除
されずに保存された要素を詰める処理を行うようになっ
ている。In response to this initialization request, the initialization processing unit 230
A process is performed to delete a part of the elements registered in 21 and pack the elements stored in the empty area without being deleted.

この初期化処理部230おいて、231は分割処理部を、23
2は移動処理部を、233は変更処理部をそれぞれ示してい
る。In the initialization processing unit 230, reference numeral 231 denotes a division processing unit;
2 indicates a movement processing unit, and 233 indicates a change processing unit.

分割処理部231は、上述したN_max個のカウンタ222によ
る計数結果に基づいて、削除する要素と保存する要素と
を判別し、辞書221を複数のブロックに分割して、各ブ
ロックに含まれる要素の格納場所の移動量を算出するよ
うになっている。The division processing unit 231 determines an element to be deleted and an element to be stored based on the counting result of the N _max counters 222 described above, divides the dictionary 221 into a plurality of blocks, and includes an element included in each block. The amount of movement of the storage location is calculated.

この分割処理部231によって算出された移動量に基づ
いて、移動処理部232は、保存する要素の格納場所の移
動を行うようになっている。Based on the movement amount calculated by the division processing unit 231, the movement processing unit 232 moves the storage location of the element to be stored.

また、変更処理部233は、新しいアドレスに基づい
て、辞書221に保存された部分列の表現を変更するよう
になっている。Further, the change processing unit 233 changes the expression of the substring stored in the dictionary 221 based on the new address.

第４図（ａ）に分割処理部231の動作を表す流れ図
を、第４図（ｂ）に移動処理部232の動作を表す流れ
図、第４図（ｃ）に変更処理部233の動作を表す流れ図
を示す。FIG. 4 (a) is a flowchart showing the operation of the division processing unit 231, FIG. 4 (b) is a flowchart showing the operation of the movement processing unit 232, and FIG. 4 (c) shows the operation of the change processing unit 233. 4 shows a flowchart.

分割処理部231は、予め、辞書221の着目しているアド
レスを示す変数ｉの初期値として、辞書221を圧縮した
のちの最小アドレスN_minを設定する。例えば、上述した
ように、入力文字列が３文字から構成されている場合
は、最小アドレスN_minとして『４』を設定すればよい。
また、変数ｓと変数ｋとに初期値『０』を設定する。The division processing unit 231 previously sets the minimum address N _min after compressing the dictionary 221 as the initial value of the variable i indicating the address of interest of the dictionary 221. For example, as described above, when the input character string is composed of three characters, "4" may be set as the minimum address _Nmin .
Further, an initial value “0” is set for the variable s and the variable k.

また、分割処理によって生成される複数のブロックの
それぞれに対応する格納場所の変化量からなる配列Slid
eと、各ブロックの最終アドレスからなる配列Posとを定
義し、上述した配列Slideの『０』番目の成分Slide
［０］として数値『０』を設定しておく。Also, an array Slid consisting of the amount of change in storage location corresponding to each of a plurality of blocks generated by the division process
e and an array Pos consisting of the last address of each block are defined, and the “0” th component Slide of the array Slide described above is defined.
A numerical value “0” is set as [0].

分割処理部231は、まず、辞書221のアドレスｉの領域
に対応するカウンタ222_iの計数値Count［ｉ］と所定の
閾値Thと比較し、該当する辞書221の要素が高い頻度で
参照されたか否かに基づいて、この要素を削除するべき
か否かを判定する（ステップ401）。First, the division processing unit 231 compares the count value Count [i] of the counter 222 _i corresponding to the area of the address i of the dictionary 221 with a predetermined threshold Th, and determines whether the element of the dictionary 221 is referred to with high frequency. It is determined whether or not this element should be deleted based on whether or not (step 401).

計数値Count［ｉ］が閾値Thよりも大きい場合（ステ
ップ401における否定判定）に、分割処理231は、該当す
る要素は、使用頻度の高い保存しておくべき要素である
と判断し、変数ｉをインクリメントして（ステップ40
2）、ステップ401に戻る。When the count value Count [i] is larger than the threshold Th (negative determination in step 401), the division process 231 determines that the corresponding element is an element that is frequently used and should be stored, and the variable i Is incremented (step 40
2) Return to step 401.

一方、ステップ401における肯定判定の場合は、変数
ｉの値を別の変数ｊに設定し（ステップ403）、上述し
たステップ401と同様にして、ｊ番目の要素を削除する
べきか否かを判定する（ステップ404）。On the other hand, in the case of an affirmative determination in step 401, the value of variable i is set to another variable j (step 403), and it is determined whether or not the j-th element should be deleted in the same manner as in step 401 described above. (Step 404).

例えば、辞書221のアドレス『６』に登録された要素
に対応するカウンタ222による計数値Count［６］が閾値
Th以下である場合は、上述したステップ401において肯
定判定となり、更に、ステップ404において肯定判定と
なって、変数ｉおよび変数ｓがそれぞれインクリメント
されて（ステップ405）、ステップ404に戻る。For example, the count value Count [6] of the counter 222 corresponding to the element registered at the address “6” of the dictionary 221 is equal to the threshold value.
If it is equal to or less than Th, an affirmative determination is made in step 401 described above, and an affirmative determination is made in step 404, and the variable i and the variable s are each incremented (step 405), and the process returns to step 404.

更に、アドレス『７』の要素に対応するカウンタ222₇
の計数値Count［７］が閾値Th以下であれば、ステップ4
05において変数ｊと変数ｓとが更にインクリメントされ
る。Further, the counter 222 ₇ corresponding to the element of the address “7”
If the count value Count [7] is equal to or smaller than the threshold value Th, Step 4
At 05, the variables j and s are further incremented.

このように、削除される要素が検出されるたびに変数
ｓをインクリメントすることにより、それまでに検出さ
れた削除すべき要素の数が計数される。従って、この変
数ｓは、次の保存すべき要素のアドレスを移動させるべ
き数、即ち、アドレスの変化量を示している。In this way, by incrementing the variable s each time an element to be deleted is detected, the number of elements to be deleted that have been detected so far is counted. Therefore, this variable s indicates the number by which the address of the next element to be stored should be moved, that is, the amount of change in the address.

その後、使用頻度の高い保存すべき要素が検出される
と、ステップ404における否定判定となる。この場合
は、分割処理部231は、変数ｉから数値『１』を差し引
いたものを配列Posのｋ番目の成分Pos［ｋ］に代入し、
変数ｋをインクリメントする（ステップ406）。Thereafter, when a frequently used element to be stored is detected, a negative determination is made in step 404. In this case, the division processing unit 231 substitutes the value obtained by subtracting the value “1” from the variable i into the k-th component Pos [k] of the array Pos,
The variable k is incremented (step 406).

ここで、変数ｉで示されるアドレスに対応する要素は
削除される要素である。従っで、変数ｉから数値『１』
を差し引いたものをｋ番目のブロックの最終アドレスと
することにより、その前の保存すべき要素が連続してい
る部分が、ｋ番目のブロックとして区切られる。また、
このように、辞書221の要素のブロックに区切るごと
に、変数ｋをインクリメントすることにより、分割され
て生成されたブロックの数が計数される。Here, the element corresponding to the address indicated by the variable i is the element to be deleted. Therefore, from the variable i, the numerical value "1"
Is subtracted as the final address of the k-th block, whereby the preceding portion where the elements to be stored are continuous is separated as the k-th block. Also,
As described above, the number of blocks generated by division is counted by incrementing the variable k every time the block is divided into the element blocks of the dictionary 221.

例えば、上述したアドレス『６』に登録された要素が
初めて検出された削除すべき要素であった場合は、配列
Posの０番目の成分Pos［０］にアドレス『５』が設定さ
れ、変数ｋがインクリメントされて『１』となる。For example, if the element registered at the address “6” is the first detected element to be deleted, the array
The address “5” is set to the 0th component Pos [0] of Pos, and the variable k is incremented to “1”.

また、ｋ番目のブロックまでに検出された削除すべき
要素の数を示す変数ｓの値が、ｋ番目のブロックに含ま
れる要素のアドレスの変化量として、配列Slideのｋ番
目の成分Slide［ｋ］に代入される（ステップ407）。The value of the variable s indicating the number of elements to be deleted detected up to the k-th block is the k-th component Slide [k of the array Slide as the amount of change in the address of the element included in the k-th block. ] (Step 407).

例えば、辞書221のアドレス『６』，『７』に登録さ
れた要素が削除すべき要素であるとされた場合は、変数
ｓの値は『２』となっており、この値が、ステップ407
において、配列Slideの１番目の成分Slide［１］に代入
される。For example, if the elements registered at addresses “6” and “7” in the dictionary 221 are determined to be elements to be deleted, the value of the variable s is “2”, and this value is
Is assigned to the first component Slide [1] of the array Slide.

次に、分割処理部231は、変数ｊの値を変数ｉに設定
し（ステップ408）、変数ｉの値と最大アドレスN_maxと
を比較し（ステップ409）、変数ｉが最大アドレスN_max
よりも小さい場合（ステップ409における肯定判定）
は、ステップ401に戻って上述した処理を繰り返す。Next, the division processing unit 231 sets the value of the variable j to the variable i (step 408), compares the value of the variable i with the maximum address _Nmax (step 409), and sets the variable i to the maximum address _Nmax.
If smaller than (Yes in step 409)
Returns to step 401 and repeats the above-described processing.

このようにして、辞書221を削除すべき要素で区切ら
れて複数のブロックに分割され、各ブロックに順次にブ
ロック番号ｋが与えられていき、変数ｉが最大アドレス
N_maxとなったときに、ステップ409における否定判定と
なる。In this way, the dictionary 221 is divided into a plurality of blocks separated by the element to be deleted, and each block is sequentially given the block number k, and the variable i is set to the maximum address.
When N _max is reached, a negative determination is made in step 409.

この場合は、分割処理部231は、最大アドレスN_maxを
配列Posのｋ番目の成分Pos［ｋ］に代入し、このときの
変数ｋを分割数k_mとして保持して（ステップ410）、処
理を終了する。In this case, division processing unit 231 substitutes the maximum address N _max to k-th component Pos [k] of SEQ Pos, holds the variable k at this time as a division number k _m (step 410), the processing To end.

また、このとき、分割処理部231は、上述した配列Sli
de,配列Posおよび分割数k_mを分割情報として移動処理部
232に導入し、移動処理を依頼するようになっている。At this time, the division processing unit 231 also
de, movement processor arrays Pos and division number k _m as division information
Introduced in 232, requesting a transfer process.

ここで、第２表に辞書221とカウンタ222の計数値とを
対応して示し、分割処理部231により、上述した閾値Th
を『３』として分割処理を行った結果を示す。第２表に
おいて、ブロック欄を数字は、各要素が属しているブロ
ックの番号を示している。Here, the dictionary 221 and the count value of the counter 222 are shown in the second table in correspondence with each other.
Is shown as a result of performing the dividing process with “3”. In Table 2, numerals in the block column indicate the numbers of blocks to which each element belongs.

上述した移動処理依頼に応じて、移動処理部232は、
まず、変数ｋに初期値『０』を、変数ｐに配列Posの０
番目の成分Pos［０］の値を、初期化処理後の辞書221の
要素数ｎに上述した最大アドレスN_maxからSlide［km］
を差し引いたものを、変数ｉに最小アドレスN_min（例え
ば『４』）をそれぞれ初期値として設定する（ステップ
421）。また、辞書221に部分列の表現（ωＫ）として登
録された参照番号ωからなる配列Ｗと拡張文字Ｋからな
る配列Ｋとを定義する。また、以下、これらの配列Ｗお
よび配列Ｋのアドレスｉに対応する要素をそれぞれＷ
〔ｉ〕,K〔ｉ〕と称する。 In response to the above-described movement processing request, the movement processing unit 232
First, the initial value “0” is assigned to the variable k, and 0 of the array Pos is assigned to the variable p.
The value of the nth component Pos [0] is set to the number of elements n of the dictionary 221 after the initialization processing from the above-described maximum address _Nmax to Slide [km].
Are subtracted from each other, and the minimum address N _min (for example, “4”) is set as an initial value in a variable i (step
421). Also, an array W composed of reference numbers ω and an array K composed of extended characters K registered in the dictionary 221 as a subsequence expression (ωK) are defined. Hereinafter, the elements corresponding to the address i of these arrays W and K are denoted by W, respectively.
[I] and K [i].

次に、移動処理部232は、変数ｉと変数ｐとを比較
し、変数ｉに対応する辞書221の要素がｋ番目のブロッ
クに含まれているか否かを判定する（ステップ422）。Next, the movement processing unit 232 compares the variable i with the variable p and determines whether or not the element of the dictionary 221 corresponding to the variable i is included in the k-th block (Step 422).

変数ｉの値が変数ｐの値を超えていない場合（ステッ
プ422における肯定判定）は、該当する要素がｋ番目の
ブロックに属していると判断する。この場合は、移動処
理部232は、上述した配列Ｗの成分Ｗ〔ｉ＋Slide
［ｋ］〕をＷ〔ｉ〕に、配列Ｋの成分Ｋ〔ｉ＋Slide
［ｋ］〕をＫ〔ｉ〕に代入する（ステップ423）。この
ことは、辞書221において、アドレス『ｉ＋Slide
［ｋ］』に登録された部分列の表現（ωＫ）をアドレス
『ｉ』に移動させたことに相当する。If the value of the variable i does not exceed the value of the variable p (a positive determination in step 422), it is determined that the corresponding element belongs to the k-th block. In this case, the movement processing unit 232 determines whether the component W [i + Slide
[K]] to W [i] and component K [i + Slide of array K
[K]] is substituted for K [i] (step 423). This means that in the dictionary 221, the address “i + Slide
This corresponds to moving the expression (ωK) of the substring registered in [k] ”to the address“ i ”.

また、このとき、辞書221のアドレス『ｉ＋Slide
［ｋ］』に対応するカウンタ222の計数値Count［ｉ＋Sl
ide［ｋ］］をアドレス『ｉ』に対応するカウンタ222_i
の計数値として設定する（ステップ424）。At this time, the address “i + Slide
[K]], the count value Count [i + Sl of the counter 222.
ide [k]] is set to the counter 222 _i corresponding to the address “i”.
(Step 424).

例えば、第２表に示した分割処理の結果に基づいて、
移動処理を行う場合は、ステップ421において、変数ｉ
には初期値『４』が、変数ｐにはアドレス『５』が設定
されている。この場合は、ステップ422における肯定判
定となり、上述した移動処理が行われる。但し、この場
合は、０番目のブロックに対応する変化量Slide［０］
の値は『０』であるので、該当する要素の格納場所は変
化しない。For example, based on the result of the division processing shown in Table 2,
When performing the movement processing, in step 421, the variable i
Is set to the initial value "4", and the variable p is set to the address "5". In this case, an affirmative determination is made in step 422, and the above-described movement processing is performed. However, in this case, the change amount Slide [0] corresponding to the 0th block
Is "0", the storage location of the corresponding element does not change.

一方、変数ｉの値が変数ｐの値を超えた場合は、該当
する要素は次のブロックに含まれていると判断して、ス
テップ422における否定判定とする。この場合は、ブロ
ックを示す変数ｋをインクリメントし、Pos［ｋ］からS
lide［ｋ］を差し引いたものを変数ｐに設定する（ステ
ップ425）。On the other hand, if the value of the variable i exceeds the value of the variable p, it is determined that the corresponding element is included in the next block, and a negative determination is made in step 422. In this case, the variable k indicating the block is incremented, and Pos [k] is incremented by S
The value obtained by subtracting lide [k] is set as a variable p (step 425).

従って、ステップ425において設定された変数ｐの値
は、辞書221の要素の移動が終了した後におけるｋ番目
のブロックに属する最後の要素のアドレスを示してい
る。Therefore, the value of the variable p set in step 425 indicates the address of the last element belonging to the k-th block after the movement of the element of the dictionary 221 is completed.

例えば、変数ｉの値が『６』となると、ステップ422
における否定判定となり、ステップ425により、ブロッ
クの番号を示す変数ｋと変数ｐの更新が行われ、変数ｐ
に２番目のブロックの最後の要素のアドレスがセットさ
れる。これにより、ステップ422における肯定判定とな
り、上述したステップ423,424の移動処理が行われる。
この場合は、１番目のブロックに対応する変化量Slide
［１］の値は『２』であるので、該当する要素の格納場
所は、それぞれ『２』ずつ詰められる。For example, when the value of the variable i becomes “6”, step 422
In step 425, the variables k and p indicating the block number are updated, and the variable p
Is set to the address of the last element of the second block. Accordingly, an affirmative determination is made in step 422, and the above-described movement processing in steps 423 and 424 is performed.
In this case, the change amount Slide corresponding to the first block
Since the value of [1] is “2”, the storage locations of the corresponding elements are each reduced by “2”.

このようにして、分割処理部231によって分割された
ブロックごとに、格納場所の移動処理が行われる。In this way, the moving process of the storage location is performed for each block divided by the division processing unit 231.

上述した移動処理を行った後に、移動処理部232は、
変更処理部233に対して、上述した配列Ｗのｉ成分の成
分Ｗ〔ｉ〕の値の変更処理を依頼し（ステップ426）、
表現変更部233からの終了通知を持つ（ステップ427）。After performing the above-described movement processing, the movement processing unit 232
It requests the change processing unit 233 to change the value of the component W [i] of the i component of the array W (step 426),
There is an end notification from the expression changing unit 233 (step 427).

ところで、LZW符号化方式においては、辞書221に登録
された部分列は、自身に対応する辞書の樹の節点よりも
根に近い節点に対応する参照番号ωと拡張文字Ｋとで表
されている。従って、上述した移動処理により、節点に
対応する参照番号が変化した場合は、この節点に対応す
る新しい参照番号によって、部分列の表現を改める必要
がある。By the way, in the LZW encoding method, a subsequence registered in the dictionary 221 is represented by a reference number ω and an extended character K corresponding to a node closer to the root than a node of the dictionary tree corresponding to itself. . Therefore, when the reference number corresponding to the node changes due to the above-described movement processing, it is necessary to change the expression of the subsequence with a new reference number corresponding to the node.

上述した変更処理依頼に応じて、変更処理部233は、
まず、２分探索の範囲の下限を示す変数Ｌに初期値
『０』を、上限を示す変数Ｈに初期値として上述した分
割数k_mを設定する（ステップ441）。In response to the change processing request described above, the change processing unit 233
First, an initial value "0" to the variable L indicating the lower limit of the range of binary search, it sets a division number k _m described above as an initial value to a variable H indicating the upper limit (step 441).

次に、変数Ｌと変数Ｈとの平均値ｍを求め（ステップ
442）、上述したＷ〔ｉ〕の値と配列Posのｍ番目の成分
Pos［ｍ］の値とを比較し（ステップ443）、ｉ番目の要
素として登録された参照番号ωが区間〔L,m〕と区間
〔m,H〕とのいずれに属しているかを判定する。Next, an average value m of the variable L and the variable H is obtained (step
442), the value of W [i] described above and the m-th component of array Pos
The value of Pos [m] is compared with the value of Pos [m] (step 443), and it is determined whether the reference number ω registered as the i-th element belongs to section [L, m] or section [m, H]. .

Ｗ〔ｉ〕の値がPos［ｍ］の値以下であるとされた場
合（ステップ443における肯定判定）は、変更処理部233
は、参照番号ωは区間〔L,m〕に属していると判断し、
変数Ｈに平均値ｍから数値『１』を差し引いた値を設定
する（ステップ444）。When it is determined that the value of W [i] is equal to or less than the value of Pos [m] (a positive determination in step 443), the change processing unit 233
Determines that reference number ω belongs to section [L, m],
A value obtained by subtracting the numerical value “1” from the average value m is set to the variable H (step 444).

一方、Ｗ〔ｉ〕の値がPos［ｍ］の値よりも大きいと
された場合（ステップ443における否定判定）は、変更
処理部233は、参照番号ωは区間〔m,H〕に属していると
判断し、変数Ｌに平均値ｍに『１』を加えた値を設定す
る（ステップ445）。On the other hand, when the value of W [i] is determined to be larger than the value of Pos [m] (negative determination in step 443), the change processing unit 233 determines that the reference number ω belongs to the section [m, H]. Then, a value obtained by adding “1” to the average value m is set to the variable L (step 445).

その後、変更処理部233は、変数Ｌと変数Ｈとを比較
し、変数Ｌが変数Ｈ以下であるとされた場合（ステップ
446における肯定判定）は、該当するブロックが検出さ
れていないと判断して、ステップ442に戻って、上述し
た処理を繰り返す。After that, the change processing unit 233 compares the variable L with the variable H, and determines that the variable L is equal to or less than the variable H (step S1).
The affirmative determination in 446) determines that the corresponding block has not been detected, returns to step 442, and repeats the above-described processing.

一方、変数Ｌが変数Ｈを超えたとき（ステップ446に
おける否定判定）に、変更処理部233は該当するブロッ
クを検出したと判断し、配列Ｗのｉ番目の成分Ｗ〔ｉ〕
からＬ番目のブロックに対応する配列Slideの成分Slide
［Ｌ］の値を差し引いたものを要素Ｗ〔ｉ〕に設定し
（ステップ447）、上述した移動処理部232に対して、変
更処理の終了を通知する。On the other hand, when the variable L exceeds the variable H (negative determination in step 446), the change processing unit 233 determines that the corresponding block has been detected, and the i-th component W [i] of the array W
Slide of the array Slide corresponding to the L-th block from
The value obtained by subtracting the value of [L] is set as the element W [i] (step 447), and the above-described movement processing unit 232 is notified of the end of the change processing.

このように、２分探索の手法を用いることにより、該
当するブロックを効率よく探すことができる。As described above, by using the binary search technique, the corresponding block can be efficiently searched.

上述した終了通知に応じて、移動処理部232は処理を
再開し（ステップ427における肯定判定）、変数ｉをイ
ンクリメントして（ステップ428）、この変数ｉと要素
数ｎとを比較する（ステップ429）。In response to the end notification described above, the movement processing unit 232 restarts the process (Yes in step 427), increments the variable i (step 428), and compares the variable i with the number n of elements (step 429). ).

変数ｉが要素数ｎよりも小さいとされた場合（ステッ
プ429の肯定判定）は、ステップ422に戻り、変数ｉが要
素数ｎと等しくなり、このステップ422における否定判
定となるまで、上述した処理を繰り返す。If it is determined that the variable i is smaller than the number of elements n (a positive determination in step 429), the process returns to step 422, and the above-described processing is performed until the variable i becomes equal to the number of elements n and a negative determination is made in step 422. repeat.

また、上述した処理の終了後に、配列Ｗおよび配列Ｋ
の各要素を順次に辞書221の各アドレスに格納し、要素
数ｎを登録開始アドレスとして初期化処理を終了し、こ
の終了通知に応じて、符号化部210は、符号化処理を再
開すればよい。After the above-described processing is completed, the arrays W and K
Are sequentially stored in the respective addresses of the dictionary 221 and the initialization process is completed with the number of elements n as the registration start address. In response to the completion notification, the encoding unit 210 restarts the encoding process. Good.

上述したように、各ブロックの移動量を予め算出して
おき、ブロックごとに格納場所を移動することにより、
格納場所の移動処理に要する時間を辞書221に登録され
た要素数に比例する時間とすることができる。このよう
に、辞書221の要素全体を２重ループで検索する場合に
比べて、処理量を大幅に削減することが可能となる。As described above, the amount of movement of each block is calculated in advance, and by moving the storage location for each block,
The time required for the process of moving the storage location can be set to a time proportional to the number of elements registered in the dictionary 221. As described above, it is possible to greatly reduce the processing amount as compared with the case where the entire element of the dictionary 221 is searched in a double loop.

また、変更処理部233において、２分探索を用いるこ
とにより、各要素に含まれる参照番号に対応する要素が
含まれているブロックを効率よく探すことが可能とな
り、該当する参照番号に基づいて、部分列の表面を変更
する処理を高速化することができる。Further, by using the binary search in the change processing unit 233, it is possible to efficiently search for a block including an element corresponding to the reference number included in each element, and based on the corresponding reference number, The processing for changing the surface of the partial row can be speeded up.

このように、初期化に要する時間を長くすることな
く、学習履歴を保存して、初期化処理に伴う圧縮率の低
下を防ぐことが可能となり、効率よくデータの圧縮を行
うことができる。As described above, without increasing the time required for initialization, it is possible to save the learning history and prevent a reduction in the compression ratio due to the initialization processing, and to perform data compression efficiently.

なお、上述した実施例にあっては、データ圧縮装置に
適用した場合について説明したが、データ復元装置に適
用してもよい。In the above-described embodiment, the case where the present invention is applied to a data compression device has been described. However, the present invention may be applied to a data decompression device.

この場合は、第５図に示すように、符号化部210の代
わりに復号化部510を備えて構成すればよい。In this case, as shown in FIG. 5, a decoding unit 510 may be provided instead of the encoding unit 210.

この復号化部510は、辞書221に基づいて、従来のLZW
符号を復号する処理を行うとともに、辞書221を検索す
るたびに、該当する要素に対応するカウンタ222の計数
値をインクリメントするように構成されている。また、
この復号化部510は、辞書221に最大アドレスN_maxを超え
て登録しようとしたときに、初期化処理部230に対し
て、初期化処理依頼を行うようになっている。The decryption unit 510 uses the conventional LZW
Each time the dictionary 221 is searched, the count value of the counter 222 corresponding to the corresponding element is incremented each time the code is decoded. Also,
The decryption unit 510 requests the initialization processing unit 230 to perform an initialization process when attempting to register in the dictionary 221 exceeding the maximum address _Nmax .

〔The invention's effect〕

上述したように、請求項１の発明によれば、各要素の
移動処理を行う前に、辞書の格納領域が複数のブロック
に分割され、これらのブロックごとに、辞書の要素の格
納場所の移動が行われるので、削除すべき要素を検出す
るごとに、以降の格納場所に登録されている全ての要素
の格納場所の移動を行う場合に比べて、処理量を大幅に
削減することができ、初期化処理に要する時間を長くす
ることなく、学習履歴の一部を保存して、初期化による
圧縮率の低下を防ぐことが可能となり、データの圧縮お
よび復元を効率よく行うことができる。As described above, according to the first aspect of the present invention, the dictionary storage area is divided into a plurality of blocks before each element is moved, and the storage location of the dictionary elements is moved for each of these blocks. Is performed, every time an element to be deleted is detected, the processing amount can be significantly reduced as compared with the case where the storage locations of all the elements registered in the subsequent storage locations are moved, Without lengthening the time required for the initialization process, a part of the learning history can be saved, and a reduction in the compression ratio due to the initialization can be prevented, so that the data can be efficiently compressed and restored.

また、請求項２の発明によれば、２分探索手法を用い
て該当するブロックを検出することにより、参照番号の
変更処理に要する時間を短縮することができ、初期化処
理に要する時間を更に短縮することができる。According to the second aspect of the present invention, the time required for the reference number changing process can be reduced by detecting the corresponding block using the binary search method, and the time required for the initialization process can be further reduced. Can be shortened.

[Brief description of the drawings]

第１図は本発明の原理ブロック図、第２図は本発明の一実施例による辞書初期化方式を適用
したデータ圧縮装置の構成図、第３図は実施例による符号化動作を表す流れ図、第４図は実施例による初期化動作を表す流れ図、第５図は実施例による辞書初期化方式を適用したデータ
復元装置の構成図、第６図はLZW符号化方式の説明図、第７図は辞書の構成を示す図である。図において、 110は辞書、 111は計数手段、 121は分解手段、 131は移動手段、 141は変更手段、 210は符号化部、 220はメモリ、 221は辞書、 222はカウンタ、 230は初期化処理部、 231は分割処理部、 232は移動処理部、 233は変更処理部、 510は復号化部である。1 is a block diagram of the principle of the present invention, FIG. 2 is a configuration diagram of a data compression apparatus to which a dictionary initialization method according to an embodiment of the present invention is applied, FIG. 3 is a flowchart showing an encoding operation according to the embodiment, FIG. 4 is a flowchart showing an initialization operation according to the embodiment, FIG. 5 is a configuration diagram of a data restoration apparatus to which a dictionary initialization system according to the embodiment is applied, FIG. 6 is an explanatory diagram of an LZW encoding system, FIG. FIG. 3 is a diagram showing a configuration of a dictionary. In the figure, 110 is a dictionary, 111 is counting means, 121 is decomposing means, 131 is moving means, 141 is changing means, 210 is an encoding unit, 220 is a memory, 221 is a dictionary, 222 is a counter, and 230 is initialization processing. 231 is a division processing unit, 232 is a movement processing unit, 233 is a change processing unit, and 510 is a decoding unit.

───────────────────────────────────────────────────── フロントページの続き (72)発明者千葉広隆神奈川県川崎市中原区上小田中1015番地富士通株式会社内 (58)調査した分野(Int.Cl.⁶，ＤＢ名) G06F 5/00 H03M 7/30────────────────────────────────────────────────── ─── Continuation of the front page (72) Inventor Hirotaka Chiba 1015 Kamiodanaka, Nakahara-ku, Kawasaki City, Kanagawa Prefecture Inside Fujitsu Limited (58) Field surveyed (Int.Cl. ⁶ , DB name) G06F 5/00 H03M 7 / 30

Claims

(57) [Claims]

A different character string is represented by a reference number corresponding to a storage location of a character string registered in the dictionary (110) before the character string and an increment of one character, and the character string is represented. In the dictionary initialization method when the input character string is encoded by the reference number of the character string registered in the dictionary (110) and the data is compressed while sequentially registering as a new element in the dictionary (110). A counting means (111) for counting the number of times each element registered in the dictionary (110) is searched based on the search notification to be introduced; and a counting means (111) in response to an initialization instruction. Based on the counting result, the less frequently used elements are detected and the dictionary (11
A dividing unit (121) that divides the storage area of (0) into a plurality of blocks separated by at least one infrequently used element, and outputs division information regarding the division of these blocks; The dictionary (110) for each of the plurality of blocks, based on the infrequently used elements separating the blocks, based on the division information output by (110).
A moving means (131) for moving a storage location in the dictionary, and a reference number included in each element of the dictionary (110) moved by the moving means (131) based on the division information after the corresponding element is moved. And a changing means (141) for changing in accordance with the storage location of the storage device. In order to delete the infrequently used elements and to store the storage locations of the other elements in the dictionary (110), A dictionary initialization method characterized in that it is configured to secure a storage location of a dictionary.

2. The change means (141) detects a block to which an element corresponding to a reference number included in each element belongs by a binary search, and based on a movement amount of a storage location of the corresponding block, 2. The dictionary initialization method according to claim 1, wherein reference numbers are changed.