JP2012065097A

JP2012065097A - Compression device, compression method, compression program, and restoration device

Info

Publication number: JP2012065097A
Application number: JP2010206796A
Authority: JP
Inventors: Hiroya Inakoshi; 宏弥稲越; Tatsuya Asai; 達哉浅井; Shinichiro Tako; 真一郎多湖; Seishi Okamoto; 青史岡本
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2010-09-15
Filing date: 2010-09-15
Publication date: 2012-03-29
Anticipated expiration: 2030-09-15
Also published as: JP5585336B2

Abstract

PROBLEM TO BE SOLVED: To compress a data string efficiently.SOLUTION: A return distance threshold table associates and stores data and a threshold of a shift distance by which the data can be shifted according to a frequency of appearance of the data determining whether the data is to be exchanged or not. A character string conversion unit goes on reading from an attention position of an object data string to be compressed; and determines the shift distance by which different data can be shifted based on the return distance threshold table, when the data differing from the data of the attention position appears. The character string conversion unit shifts the attention position to the position where the different data appears, when there is no data being identical with the different data within a range not exceeding a shift distance of the different data. When there is data being identical with the different data; the character string conversion unit exchanges the next data of the identical data for the different data, stores the distance from an original point of the beginning of the object data string to be compressed to the exchanged data and the difference between the exchanged data in an exchange history table, and shifts the original point and the attention position to the position of the exchanged data.

Description

本発明は、圧縮装置、圧縮方法、圧縮プログラムおよび復元装置に関する。 The present invention relates to a compression device, a compression method, a compression program, and a decompression device.

従来、オリジナルのデータ列の内容を損なわずにデータ量を圧縮したり、圧縮されたデータ列をオリジナルのデータ列に復元したりするデータ圧縮・復元技術が開発されている。この技術の一つとして、Ｒ_ＬＥ（Run Length Encoding、ランレングス方式）が存在する。このＲ_ＬＥは、同じデータが連続するデータ列を、データ種別とデータが連続する長さとのペアで符号化する技術である。 Conventionally, a data compression / decompression technique has been developed that compresses the amount of data without losing the contents of the original data string, or restores the compressed data string to the original data string. As one of the techniques, there is R _LE (Run Length Encoding). This R _LE is a technique for encoding a data string in which the same data continues with a pair of a data type and a length in which the data continues.

例えば、下記の式（１）で示されるデータ列ＳがＲ_ＬＥによって符号化されると、データ列Ｓの連続する並び「ｂｂｂｂ」は、「ｂ，４」に圧縮される。データ列Ｓの他の並びについても同様に符号化されると、データ列Ｓは、下記の式（２）で示されるデータ列Ｒ_ＬＥ（Ｓ）に圧縮される。この場合、データ列Ｓのデータ量は１９バイトであり、データ列Ｒ_ＬＥ（Ｓ）のデータ量は１２バイトであるので、データ列Ｓのデータ量は７バイト分圧縮される。 For example, when the data string S represented by the following formula (1) is encoded by R _LE , the continuous sequence “bbbb” of the data string S is compressed to “b, 4”. When the other sequences of the data string S are similarly encoded, the data string S is compressed into a data string R _LE (S) represented by the following equation (2). In this case, since the data amount of the data string S is 19 bytes and the data amount of the data string R _LE (S) is 12 bytes, the data amount of the data string S is compressed by 7 bytes.

Ｓ＝ｂｂｂｂａａｂｂｂｂｃｃｃｂｂｂｄｄｄ・・・（１）
Ｒ_ＬＥ（Ｓ）＝（ｂ，４）（ａ，２）（ｂ，４）（ｃ，３）（ｂ，３）（ｄ，３）・・・（２） S = bbbbababbbbbcccbbbbdd ... (1)
R _LE (S) = (b, 4) (a, 2) (b, 4) (c, 3) (b, 3) (d, 3) (2)

しかし、同じデータが連続することが少ないデータ列に対してＲ_ＬＥを実行すると、却ってデータ量を増大させてしまうことがある。例えば、下記の式（３）で示されるデータ列ＳがＲ_ＬＥによって符号化されると、データ列Ｓは、下記の式（４）で示されるデータ列Ｒ_ＬＥ（Ｓ）となる。この場合、データ量は、９バイトから１４バイトとなり、５バイト分増大してしまう。このため、同じデータが連続するように各データを移動させ、効率よく圧縮できるようにデータ列を変換してからＲ_ＬＥで圧縮する改良技術が存在する。 However, if _RLE is executed on a data string in which the same data is rarely continuous, the data amount may be increased. For example, when the data string S represented by the following expression (3) is encoded by R _LE , the data string S becomes a data string R _LE (S) represented by the following expression (4). In this case, the amount of data increases from 9 bytes to 14 bytes, and increases by 5 bytes. For this reason, there is an improved technique in which each data is moved so that the same data is continuous, the data string is converted so that the data can be efficiently compressed, and then compressed by _RLE .

Ｓ＝ａｂｃｃａｂａａｂ・・・（３）
Ｒ_ＬＥ（Ｓ）＝（ａ，１）（ｂ，１）（ｃ，２）（ａ，１）（ｂ，１）（ａ，２）（ｂ，１）・・・（４） S = abccabaab (3)
R _LE (S) = (a, 1) (b, 1) (c, 2) (a, 1) (b, 1) (a, 2) (b, 1) (4)

例えば、この改良技術では、先頭を０番目とした場合に、式（３）に示したデータ列Ｓの１番目のデータ「ｂ」を４番目に移動させて、データ「ｂ」が連続するようにする。データ列Ｓの他のデータについても、同じデータが連続するように移動させ、データ列Ｓを、下記の式（５）で示されるデータ列Ｔに変換する。そして、このデータ列ＴがＲ_ＬＥによって符号化されると、データ列Ｔは、下記の式（６）で示されるデータ列Ｒ_ＬＥ（Ｔ）となる。この場合、データ量は、９バイトから６バイトとなり、３バイト分圧縮される。 For example, in this improved technique, when the head is 0th, the first data “b” of the data string S shown in Expression (3) is moved to the fourth position so that the data “b” is continuous. To. Other data in the data string S is also moved so that the same data is continuous, and the data string S is converted into a data string T represented by the following equation (5). When this data string T is encoded by R _LE , the data string T becomes a data string R _LE (T) represented by the following equation (6). In this case, the data amount is changed from 9 bytes to 6 bytes and compressed by 3 bytes.

Ｔ＝ａａａａｂｂｂｃｃ・・・（５）
Ｒ_ＬＥ（Ｔ）＝（ａ，４）（ｂ，３）（ｃ，２）・・・（６） T = aaaabbbcc (5)
R _LE (T) = (a, 4) (b, 3) (c, 2) (6)

ところで、この改良技術で圧縮されたデータ列Ｒ_ＬＥ（Ｔ）からデータ列Ｓを復元する過程で、データ列Ｔをデータ列Ｓに逆変換する必要がある。この改良技術では、データ列Ｓがデータ列Ｔに変換される際に、各データの移動前の位置と移動後の位置との対応関係を示す変換関数πを生成しておき、この変換関数πを利用して逆変換を行う。図２５は、従来の変換関数の一例を示す図である。図２５に示すように、変換関数πは、データ列Ｓのｎ番目と、データ列Ｔのπ（ｎ）番目とを対応させている。つまり、データ列Ｓのｎ番目のデータをＳ［ｎ］とし、データ列Ｔのπ（ｎ）番目のデータをＴ［π（ｎ）］とすると、Ｓ［ｎ］＝Ｔ［π（ｎ）］が成り立つ。例えば、データ列Ｓの１番目のデータを復元する場合には、Ｓ［１］＝Ｔ［４］となるので、データ列Ｔの４番目の「ｂ」をデータ列Ｓの１番目に移動させる。このように、変換関数πを用いて、データ列Ｔの各データを順次移動させることで、データ列Ｔをデータ列Ｓに逆変換する。なお、ｎは非負整数（０，１，２・・・ｎ）である。 By the way, in the process of restoring the data string S from the data string R _LE (T) compressed by this improved technique, it is necessary to reversely convert the data string T into the data string S. In this improved technique, when the data string S is converted into the data string T, a conversion function π indicating the correspondence between the position before movement of each data and the position after movement is generated, and this conversion function π Perform inverse transformation using. FIG. 25 is a diagram illustrating an example of a conventional conversion function. As shown in FIG. 25, the conversion function π associates the n-th data string S with the π (n) -th data string T. That is, assuming that the nth data in the data string S is S [n] and the π (n) th data in the data string T is T [π (n)], S [n] = T [π (n) ] Holds. For example, when the first data in the data string S is restored, S [1] = T [4], so the fourth “b” in the data string T is moved to the first in the data string S. . In this way, the data string T is inversely converted into the data string S by sequentially moving each data of the data string T using the conversion function π. Note that n is a non-negative integer (0, 1, 2,... N).

昌達Ｋ’ｚ著、“圧縮アルゴリズム”、ソフトバンクパブリッシングBy Masatatsu K'z, “Compression Algorithm”, Softbank Publishing

しかしながら、上記従来技術では、効率よくデータ列を圧縮することができないという問題があった。 However, the conventional technique has a problem that the data string cannot be efficiently compressed.

例えば、上記従来技術では、同じデータが連続する並びをより長くするために、各データを移動させるパターンをいくつも試みる必要があった。このため、オリジナルのデータ列が長くなると、パターンの数は指数関数的に増加し、膨大な処理負荷がかかっていた。 For example, in the above prior art, it is necessary to try a number of patterns for moving each data in order to make the sequence of the same data continuous longer. For this reason, when the original data string becomes long, the number of patterns increases exponentially, and a huge processing load is applied.

また、例えば、上記従来技術では、変換したデータ列を逆変換するために、変換関数を記憶する必要があった。このため、効率よく圧縮できるようにデータ列を変換してからＲ_ＬＥで圧縮したとしても、変換関数を含む全体のデータ量はほとんど圧縮されておらず、却って全体のデータ量が増加してしまうこともあった。 Further, for example, in the above conventional technique, it is necessary to store a conversion function in order to inversely convert the converted data string. For this reason, even if the data string is converted so that it can be efficiently compressed and then compressed by _RLE , the entire data amount including the conversion function is hardly compressed, and the entire data amount increases. There was also.

開示の技術は、上記に鑑みてなされたものであって、効率よくデータ列を圧縮することができる圧縮装置、圧縮方法、圧縮プログラムおよび復元装置を提供することを目的とする。 The disclosed technology has been made in view of the above, and an object thereof is to provide a compression device, a compression method, a compression program, and a decompression device that can efficiently compress a data string.

本願の開示する技術は、一つの態様において、移動距離テーブルと、移動距離判定部と、置換処理部とを備える。移動距離テーブルは、データと、該データを入れ替えるか否かを決める該データの出現頻度に応じた該データを移動させ得る移動距離の閾値と、を対応付けて記憶する。移動距離判定部は、前記圧縮対象のデータ列の注目位置からデータを読み進め、注目位置のデータとは異なるデータが現れた場合に、前記異なるデータと前記移動距離テーブルとを基にして、前記異なるデータを移動させ得る移動距離を判定する。置換処理部は、前記異なるデータが現れた位置から前記移動距離判定部が判定した移動距離を超えない範囲に、前記異なるデータと同じデータが存在しない場合には、前記注目位置を前記異なるデータが現れた位置に移動させる。また、置換処理部は、前記異なるデータが現れた位置から前記移動距離判定部が判定した移動距離を超えない範囲に、前記異なるデータと同じデータが存在する場合には、該同じデータの次のデータと前記異なるデータとを入れ替える。そして、置換処理部は、データを入れ替え後、圧縮対象のデータ列の先頭の原点から入れ替えたデータまでの距離と、入れ替えたデータ間の距離とを履歴テーブルに格納し、入れ替えたデータの位置に前記原点と前記注目位置とを移動させる。 In one aspect, the technology disclosed in the present application includes a movement distance table, a movement distance determination unit, and a replacement processing unit. The movement distance table stores the data and the threshold of the movement distance that can move the data according to the appearance frequency of the data that determines whether or not the data is replaced. The movement distance determination unit reads data from the target position of the data string to be compressed, and when data different from the data of the target position appears, based on the different data and the movement distance table, The movement distance that can move different data is determined. When the same data as the different data does not exist within a range that does not exceed the movement distance determined by the movement distance determination unit from the position where the different data appears, the replacement processing unit sets the attention position as the different data. Move to the position where it appears. In addition, when the same data as the different data exists in a range not exceeding the movement distance determined by the movement distance determination unit from the position where the different data appears, the replacement processing unit Swap data and the different data. Then, after replacing the data, the replacement processing unit stores the distance from the starting origin of the data string to be compressed to the replaced data and the distance between the replaced data in the history table, and at the position of the replaced data. The origin and the target position are moved.

本願の開示する技術の一つの態様によれば、効率よくデータ列を圧縮することができるという効果を奏する。 According to one aspect of the technology disclosed in the present application, there is an effect that a data string can be efficiently compressed.

図１は、本実施例にかかるデータ圧縮復元装置の構成を示す図である。FIG. 1 is a diagram illustrating the configuration of the data compression / decompression apparatus according to the present embodiment. 図２は、戻り距離閾値表のデータ構造の一例を示す図である。FIG. 2 is a diagram illustrating an example of a data structure of the return distance threshold table. 図３は、置換履歴表のデータ構造の一例を示す図（１）である。FIG. 3 is a diagram (1) illustrating an example of the data structure of the replacement history table. 図４は、各種用語を説明するための図である。FIG. 4 is a diagram for explaining various terms. 図５は、文字列変換部の処理を詳細に説明するための図（１）である。FIG. 5 is a diagram (1) for explaining the processing of the character string conversion unit in detail. 図６は、文字列変換部の処理を詳細に説明するための図（２）である。FIG. 6 is a diagram (2) for explaining the processing of the character string conversion unit in detail. 図７は、文字列変換部の処理を詳細に説明するための図（３）である。FIG. 7 is a diagram (3) for explaining the processing of the character string conversion unit in detail. 図８は、文字列変換部の処理を詳細に説明するための図（４）である。FIG. 8 is a diagram (4) for explaining the processing of the character string conversion unit in detail. 図９は、文字列変換部の処理を詳細に説明するための図（５）である。FIG. 9 is a diagram (5) for explaining the processing of the character string conversion unit in detail. 図１０は、文字列変換部の処理を詳細に説明するための図（６）である。FIG. 10 is a diagram (6) for explaining the processing of the character string conversion unit in detail. 図１１は、文字列変換部の処理を詳細に説明するための図（７）である。FIG. 11 is a diagram (7) for explaining the processing of the character string conversion unit in detail. 図１２は、文字列変換部が一時的に保持する置換履歴表のデータ構造の一例を示す図（１）である。FIG. 12 is a diagram (1) illustrating an example of a data structure of a replacement history table temporarily held by the character string conversion unit. 図１３は、文字列変換部が一時的に保持する置換履歴表のデータ構造の一例を示す図（２）である。FIG. 13 is a diagram (2) illustrating an example of the data structure of the replacement history table temporarily held by the character string conversion unit. 図１４は、文字列変換部が一時的に保持する置換履歴表のデータ構造の一例を示す図（３）である。FIG. 14 is a diagram (3) illustrating an example of the data structure of the replacement history table temporarily held by the character string conversion unit. 図１５は、置換履歴表のデータ構造の一例を示す図（２）である。FIG. 15 is a diagram (2) illustrating an example of the data structure of the replacement history table. 図１６は、原点の情報を復元する処理を説明するための図である。FIG. 16 is a diagram for explaining the process of restoring the origin information. 図１７は、文字列逆変換部の処理を詳細に説明するための図（１）である。FIG. 17 is a diagram (1) for explaining the processing of the character string reverse conversion unit in detail. 図１８は、文字列逆変換部の処理を詳細に説明するための図（２）である。FIG. 18 is a diagram (2) for explaining the processing of the character string reverse conversion unit in detail. 図１９は、圧縮部の処理手順を示すフローチャートである。FIG. 19 is a flowchart illustrating the processing procedure of the compression unit. 図２０は、閾値表生成処理の処理手順を示すフローチャートである。FIG. 20 is a flowchart illustrating a processing procedure of threshold value table generation processing. 図２１は、文字列変換処理の処理手順を示すフローチャートである。FIG. 21 is a flowchart showing the processing procedure of the character string conversion processing. 図２２は、文字列逆変換部の処理手順を示すフローチャートである。FIG. 22 is a flowchart illustrating the processing procedure of the character string reverse conversion unit. 図２３は、置換履歴表のデータ構造の一例を示す図（３）である。FIG. 23 is a diagram (3) illustrating an example of the data structure of the replacement history table. 図２４は、圧縮復元プログラムを実行するコンピュータの一例を示す図である。FIG. 24 is a diagram illustrating an example of a computer that executes a compression / decompression program. 図２５は、従来の変換関数の一例を示す図である。FIG. 25 is a diagram illustrating an example of a conventional conversion function.

以下に、本願の開示する圧縮装置、圧縮方法、圧縮プログラムおよび復元装置の実施例を図面に基づいて詳細に説明する。なお、この実施例によりこの発明が限定されるものではない。各実施例は、処理内容を矛盾させない範囲で適宜組み合わせることが可能である。 Hereinafter, embodiments of a compression device, a compression method, a compression program, and a decompression device disclosed in the present application will be described in detail with reference to the drawings. Note that the present invention is not limited to the embodiments. Each embodiment can be appropriately combined within a range in which processing contents do not contradict each other.

本実施例にかかるデータ圧縮復元装置の構成の一例について説明する。図１は、本実施例にかかるデータ圧縮復元装置の構成を示す図である。図１に示すように、このデータ圧縮復元装置１００は、入力部１１０、出力部１２０、入出力制御部１３０、記憶部１４０、圧縮部１５０、復元部１６０を有する。 An example of the configuration of the data compression / decompression apparatus according to the present embodiment will be described. FIG. 1 is a diagram illustrating the configuration of the data compression / decompression apparatus according to the present embodiment. As illustrated in FIG. 1, the data compression / decompression apparatus 100 includes an input unit 110, an output unit 120, an input / output control unit 130, a storage unit 140, a compression unit 150, and a decompression unit 160.

入力部１１０は、各種情報の入力を受け付ける入力装置である。例えば、入力部１１０は、キーボードやマウスなどに対応する。出力部１２０は、各種情報を出力する出力装置である。例えば、出力部１２０は、ディスプレイやモニタなどに対応する。入出力制御部１３０は、入力部１１０、出力部１２０、記憶部１４０、圧縮部１５０、復元部１６０の間における各種情報の入出力を制御する処理部である。例えば、入出力制御部１３０は、各種情報の入出力を制御するＡＳＩＣ（Application Specific Integrated Circuit）等に対応する。 The input unit 110 is an input device that receives input of various types of information. For example, the input unit 110 corresponds to a keyboard, a mouse, or the like. The output unit 120 is an output device that outputs various types of information. For example, the output unit 120 corresponds to a display, a monitor, or the like. The input / output control unit 130 is a processing unit that controls input / output of various types of information among the input unit 110, the output unit 120, the storage unit 140, the compression unit 150, and the restoration unit 160. For example, the input / output control unit 130 corresponds to an ASIC (Application Specific Integrated Circuit) that controls input / output of various types of information.

記憶部１４０は、入力ファイル１４１と、戻り距離閾値表１４２と、置換履歴表１４３と、出力ファイル１４４とを有する。記憶部１４０は、例えば、ＲＡＭ（Random Access Memory）、ＲＯＭ（Read Only Memory）、フラッシュメモリ（Flash Memory）などの半導体メモリ素子、ハードディスクや光ディスクなどの記憶装置に対応する。 The storage unit 140 includes an input file 141, a return distance threshold table 142, a replacement history table 143, and an output file 144. The storage unit 140 corresponds to, for example, a semiconductor memory device such as a random access memory (RAM), a read only memory (ROM), or a flash memory, and a storage device such as a hard disk or an optical disk.

入力ファイル１４１は、複数の入力文字列を含むファイルである。例えば、入力文字列Ｓは、下記の式（７）に示される文字列である。 The input file 141 is a file including a plurality of input character strings. For example, the input character string S is a character string represented by the following equation (7).

Ｓ＝ａａｃａｂａａａｂｃａａａａｂａｂａａａ・・・（７） S = aacabaaabcaaaaababaaa (7)

戻り距離閾値表１４２は、各文字の戻り距離閾値を保持するテーブルである。この戻り距離閾値は、各文字が入力文字列Ｓに出現する出現頻度に応じて設定される閾値である。戻り距離閾値は、該当文字の出現頻度が高いほど小さい値が設定され、出現頻度が低いほど大きい値が設定される。例えば、戻り距離閾値表１４２は、入力文字列Ｓに含まれる各文字に対応付けて、出現数と、戻り距離閾値とを保持する。 The return distance threshold table 142 is a table that holds a return distance threshold for each character. This return distance threshold is a threshold set according to the appearance frequency of each character appearing in the input character string S. As the return distance threshold, a smaller value is set as the appearance frequency of the corresponding character is higher, and a larger value is set as the appearance frequency is lower. For example, the return distance threshold table 142 holds the number of appearances and the return distance threshold in association with each character included in the input character string S.

図２は、戻り距離閾値表のデータ構造の一例を示す図である。図２に示すように、戻り距離閾値表１４２は、文字「ａ」に対応付けて、出現数「１４」と、戻り距離閾値「０」とを保持する。また、戻り距離閾値表１４２は、文字「ｂ」に対応付けて、出現数「４」と、戻り距離閾値「４」とを保持する。また、戻り距離閾値表１４２は、文字「ｃ」に対応付けて、出現数「２」と、戻り距離閾値「９」とを保持する。 FIG. 2 is a diagram illustrating an example of a data structure of the return distance threshold table. As illustrated in FIG. 2, the return distance threshold table 142 holds the number of appearances “14” and the return distance threshold “0” in association with the character “a”. Further, the return distance threshold table 142 stores the number of appearances “4” and the return distance threshold “4” in association with the character “b”. In addition, the return distance threshold table 142 stores the number of appearances “2” and the return distance threshold “9” in association with the character “c”.

置換履歴表１４３は、Ｒ_ＬＥの圧縮に都合の良い文字列に変換した文字列Ｔを、変換前の入力文字列Ｓに戻す場合に利用するデータを保持するテーブルである。この置換履歴表１４３は、例えば、オフセットと、戻り距離とを対応付けて保持する。置換履歴表１４３に関する説明の詳細は後述する。 The replacement history table 143 is a table that holds data used when the character string T converted to a character string convenient for _RLE compression is returned to the input character string S before conversion. For example, the replacement history table 143 holds an offset and a return distance in association with each other. Details regarding the replacement history table 143 will be described later.

図３は、置換履歴表のデータ構造の一例を示す図である。図３に示すように、置換履歴表１４３は、オフセット「５」と、戻り距離「３」とを対応付けて保持する。また、置換履歴表１４３は、オフセット「４」と、戻り距離「７」とを対応付けて保持する。また、置換履歴表１４３は、オフセット「８」と、戻り距離「３」とを対応付けて保持する。 FIG. 3 is a diagram illustrating an example of the data structure of the replacement history table. As illustrated in FIG. 3, the replacement history table 143 holds an offset “5” and a return distance “3” in association with each other. Also, the replacement history table 143 holds the offset “4” and the return distance “7” in association with each other. Also, the replacement history table 143 holds the offset “8” and the return distance “3” in association with each other.

出力ファイル１４４は、圧縮部１５０により圧縮された文字列を含むファイルである。例えば、圧縮部１５０により圧縮された文字列は、下記の式（８）に示される出力文字列Ｒ_ＬＥ（Ｔ）である。 The output file 144 is a file including a character string compressed by the compression unit 150. For example, the character string compressed by the compression unit 150 is an output character string R _LE (T) represented by the following equation (8).

Ｒ_ＬＥ（Ｔ）＝（ａ，２）（ｃ，２）（ｂ，２）（ａ，８）（ｂ，２）（ａ，４）・・・（８） R _LE (T) = (a, 2) (c, 2) (b, 2) (a, 8) (b, 2) (a, 4) (8)

図１の説明に戻る。圧縮部１５０は、入力ファイル１４１に含まれる入力文字列を圧縮する処理部である。この圧縮部１５０は、閾値表生成部１５１と、文字列変換部１５２と、Ｒ_ＬＥ符号化部１５３とを有する。 Returning to the description of FIG. The compression unit 150 is a processing unit that compresses an input character string included in the input file 141. The compression unit 150 includes a threshold table generation unit 151, a character string conversion unit 152, and an R _LE encoding unit 153.

閾値表生成部１５１は、入力文字列に含まれる文字ごとに戻り距離閾値を算出し、図２に示した戻り距離閾値表１４２を生成する処理部である。以下において、閾値表生成部１５１の処理について具体的に説明する。 The threshold value table generating unit 151 is a processing unit that calculates a return distance threshold value for each character included in the input character string and generates the return distance threshold value table 142 shown in FIG. Hereinafter, the processing of the threshold table generation unit 151 will be specifically described.

例えば、閾値表生成部１５１は、入力ファイル１４１から入力文字列を取得し、取得した入力文字列を先頭から末尾まで１文字ずつ読み込む。閾値表生成部１５１は、文字ごとに出現数をカウントし、カウントした文字ごとの出現数を戻り距離閾値表１４２に記録する。閾値表生成部１５１は、出現数をソートキーとして、戻り距離閾値表１４２を降順にソートする。そして、閾値表生成部１５１は、文字ごとに、該当文字より出現数が多い文字の出現数の和を該当文字の出現数で除算し、除算した値の小数点以下第一位を四捨五入した値を各文字の戻り距離閾値として戻り距離閾値表１４２に記録する。 For example, the threshold value table generation unit 151 acquires an input character string from the input file 141, and reads the acquired input character string character by character from the beginning to the end. The threshold value table generation unit 151 counts the number of appearances for each character, and records the counted number of appearances for each character in the return distance threshold value table 142. The threshold table generation unit 151 sorts the return distance threshold table 142 in descending order using the number of appearances as a sort key. Then, the threshold value table generation unit 151 divides, for each character, the sum of the number of appearances of characters having a greater number of appearances than the corresponding character by the number of appearances of the corresponding character, and rounded off the first decimal place of the divided value. The return distance threshold value 142 of each character is recorded in the return distance threshold value table 142.

例えば、閾値表生成部１５１が式（７）に示した入力文字列Ｓを読み込んだ場合には、閾値表生成部１５１は、文字「ａ」の出現数として「１４」を戻り距離閾値表１４２に記録する。同様に、閾値表生成部１５１は、文字「ｂ」の出現数として「４」を記録し、文字「ｃ」の出現数として「２」を記録する。ここで、文字「ａ」より出現数が多い文字は存在しないので、文字「ａ」より出現数が多い文字の出現数の和は「０」となる。このため、閾値表生成部１５１は、この「０」を文字「ａ」の出現数「１４」で除算し、文字「ａ」の戻り距離閾値として「０」を算出する。また、文字「ｂ」より出現数が多い文字は文字「ａ」であるので、文字「ｂ」より出現数が多い文字の出現数の和は「１４」となる。このため、閾値表生成部１５１は、この「１４」を文字「ｂ」の出現数「４」で除算し、文字「ｂ」の戻り距離閾値として「４」を算出する。また、文字「ｃ」より出現数が多い文字は文字「ａ」と文字「ｂ」であるので、文字「ｃ」より出現数が多い文字の出現数の和は「１８」となる。このため、閾値表生成部１５１は、この「１８」を文字「ｃ」の出現数「２」で除算し、文字「ｃ」の戻り距離閾値として「９」を算出する。 For example, when the threshold value table generating unit 151 reads the input character string S shown in Expression (7), the threshold value table generating unit 151 returns “14” as the appearance number of the character “a” and returns the distance threshold value table 142. To record. Similarly, the threshold value table generation unit 151 records “4” as the number of appearances of the character “b” and records “2” as the number of appearances of the character “c”. Here, since there is no character having a higher number of appearances than the character “a”, the sum of the number of appearances of characters having a higher number of appearances than the character “a” is “0”. Therefore, the threshold value table generation unit 151 divides this “0” by the appearance number “14” of the character “a” and calculates “0” as the return distance threshold value of the character “a”. Further, since the character having a higher appearance number than the character “b” is the character “a”, the sum of the appearance numbers of characters having a higher appearance number than the character “b” is “14”. Therefore, the threshold value table generation unit 151 divides this “14” by the appearance number “4” of the character “b” and calculates “4” as the return distance threshold value of the character “b”. Further, since the characters having more appearances than the character “c” are the characters “a” and “b”, the sum of the appearances of the characters having more appearances than the character “c” is “18”. Therefore, the threshold value table generation unit 151 divides this “18” by the appearance number “2” of the character “c”, and calculates “9” as the return distance threshold value of the character “c”.

文字列変換部１５２は、Ｒ_ＬＥの圧縮方式にとって都合のよい並び順となるように、入力ファイル１４１の文字の順序を変換する処理部である。すなわち、文字列変換部１５２は、同一の文字が連続するように各文字を移動させ、文字列を変換する。なお、文字の移動距離は、戻り距離閾値により制限される。 String conversion unit 152, so that a good sorted convenient for compression scheme R _LE, a processing unit for converting the order of the characters in the input file 141. That is, the character string conversion unit 152 converts each character string by moving each character so that the same character continues. Note that the character moving distance is limited by a return distance threshold.

ここで、文字列変換部１５２の処理を説明する前に、この処理を説明する場合に利用する用語について説明する。図４は、各種用語を説明するための図である。スライドバッファは、入力文字列Ｓの一部を格納するバッファである。文字列変換部１５２は、スライドバッファ内の入力文字列Ｓの変換が終了するたびに、未変換の入力文字列Ｓをスライドバッファに順次格納する。 Here, before describing the processing of the character string conversion unit 152, terms used in describing this processing will be described. FIG. 4 is a diagram for explaining various terms. The slide buffer is a buffer for storing a part of the input character string S. Each time the conversion of the input character string S in the slide buffer is completed, the character string conversion unit 152 sequentially stores the unconverted input character string S in the slide buffer.

原点ｏは、基準となる文字の位置を示すものである。注目位置ｐは、置換元の文字を検出する際に基準となる位置を示すものであり、原点ｏから末尾に向かって移動する。オフセットｍは、原点ｏから置換元の文字までの相対距離である。戻り距離ｎは、文字を置換した場合に、置換元の文字から置換先の文字までの移動距離に対応するものである。例えば、図４に示す太文字「ａ」と「ｂ」とを置換する場合には、オフセットｍが「６」となり、戻り距離ｎが「１」となる。 The origin o indicates the position of the reference character. The attention position p indicates a position serving as a reference when detecting the replacement source character, and moves from the origin o toward the end. The offset m is a relative distance from the origin o to the replacement source character. The return distance n corresponds to the moving distance from the replacement source character to the replacement destination character when the character is replaced. For example, when the bold characters “a” and “b” shown in FIG. 4 are replaced, the offset m is “6” and the return distance n is “1”.

文字列変換部１５２が文字列を変換する処理について説明する。文字列変換部１５２は、注目位置ｐが指す文字ｘとは異なる文字ｙを検出するまで、入力文字列Ｓを末尾に向かって１文字ずつ読み進める。文字列変換部１５２は、文字ｙを検出すると、文字ｙよりも先頭側に存在し、かつ、文字ｙと同一の文字ｙ’を検出するまで、入力文字列Ｓを先頭に向かって１文字ずつ読み進める。 Processing in which the character string conversion unit 152 converts a character string will be described. The character string conversion unit 152 reads the input character string S character by character toward the end until it detects a character y different from the character x indicated by the target position p. When the character string conversion unit 152 detects the character y, the character string conversion unit 152 moves the input character string S toward the head one by one until it detects the character y ′ that is present at the head of the character y and is identical to the character y. Continue reading.

文字ｙの戻り距離閾値以内でｙ’を検出した場合には、文字列変換部１５２は、文字ｙ’の次の文字ｚと、文字ｙとを置換する。文字列変換部１５２は、原点ｏ、オフセットｍ、戻り距離ｎを対応付けて置換履歴表１４３に記録する。そして、文字列変換部１５２は、原点ｏ、注目位置ｐをオフセットｍの位置に設定し同様の処理を繰り返し実行する。一方、文字ｙの戻り距離閾値以内でｙ’を検出しなかった場合には、文字列変換部１５２は、注目位置ｐを文字ｙに設定し、同様の処理を繰り返し実行する。 When y ′ is detected within the return distance threshold of the character y, the character string conversion unit 152 replaces the character z with the character z next to the character y ′. The character string conversion unit 152 records the origin o, the offset m, and the return distance n in the replacement history table 143 in association with each other. Then, the character string conversion unit 152 sets the origin o and the target position p to the position of the offset m, and repeatedly executes the same processing. On the other hand, if y ′ is not detected within the return distance threshold of the character y, the character string conversion unit 152 sets the attention position p to the character y and repeatedly executes the same processing.

次に、文字列変換部１５２の処理を詳細に説明する。図５〜１１は、文字列変換部の処理を詳細に説明するための図である。また、図１２〜１４は、文字列変換部が一時的に保持する置換履歴表のデータ構造の一例を示す図である。ここでは説明の便宜上、入力文字列Ｓが、全てのスライドバッファ内に格納できるものとする。また、入力文字列ＳをＳ＝ａａｃａｂａａａｂｃａａａａｂａｂａａａとする。 Next, the processing of the character string conversion unit 152 will be described in detail. FIGS. 5-11 is a figure for demonstrating in detail the process of a character string conversion part. 12 to 14 are diagrams illustrating an example of a data structure of a replacement history table temporarily held by the character string conversion unit. Here, for convenience of explanation, it is assumed that the input character string S can be stored in all slide buffers. Further, the input character string S is S = aacabaaabcaaaaabaaaa.

図５について説明する。文字列変換部１５２は、入力文字列Ｓをスライドバッファに格納する。また、文字列変換部１５２は、原点ｏおよび注目位置ｐを入力文字列Ｓの先頭の文字「ａ」に設定する。この場合には、原点ｏ＝０となる（ステップＳ１０）。また、文字列変換部１５２は、注目位置ｐが指す文字「ａ」とは異なる文字を検出するまで末尾に向かって１文字ずつ読み進め、オフセットｍ＝２となる位置で、文字「ｃ」を検出する（ステップＳ１１）。文字列変換部１５２は、この文字「ｃ」から先頭に向かって１文字ずつ読み進めるが、文字「ｃ」の戻り距離閾値「９」以内で文字「ｃ」を検出しないため、注目位置ｐをオフセットｍ＝２の位置に設定する。 FIG. 5 will be described. The character string conversion unit 152 stores the input character string S in the slide buffer. In addition, the character string conversion unit 152 sets the origin o and the target position p to the first character “a” of the input character string S. In this case, the origin o = 0 (step S10). Further, the character string conversion unit 152 reads one character at a time until it detects a character different from the character “a” indicated by the target position p, and reads the character “c” at a position where the offset m = 2. It detects (step S11). The character string conversion unit 152 reads one character at a time from the character “c” toward the head, but does not detect the character “c” within the return distance threshold “9” of the character “c”. Set to the position of offset m = 2.

図６の説明に移行する。文字列変換部１５２は、注目位置ｐが指す文字「ｃ」とは異なる文字を検出するまで末尾に向かって１文字ずつ読み進め、オフセットｍ＝３となる位置で、文字「ａ」を検出する（ステップＳ１２）。文字列変換部１５２は、この文字「ａ」から先頭に向かって１文字ずつ読み進めるが、文字「ａ」の戻り距離閾値「０」以内で文字「ａ」を検出しないため、注目位置ｐをオフセットｍ＝３の位置に設定する。 Shifting to the description of FIG. The character string conversion unit 152 reads one character at a time toward the end until a character different from the character “c” indicated by the target position p is detected, and detects the character “a” at a position where the offset m = 3. (Step S12). The character string conversion unit 152 reads one character at a time from the character “a” toward the top, but does not detect the character “a” within the return distance threshold “0” of the character “a”. Set to the position of offset m = 3.

文字列変換部１５２は、注目位置ｐが指す文字「ａ」とは異なる文字を検出するまで末尾に向かって１文字ずつ読み進め、オフセットｍ＝４となる位置で、文字「ｂ」を検出する（ステップＳ１３）。文字列変換部１５２は、この文字「ｂ」から先頭に向かって１文字ずつ読み進めるが、文字「ｂ」の戻り距離閾値「４」以内で文字「ｃ」を検出しないため、注目位置ｐをオフセットｍ＝４の位置に設定する。 The character string conversion unit 152 reads one character toward the end until a character different from the character “a” indicated by the target position p is detected, and detects the character “b” at a position where the offset m = 4. (Step S13). The character string conversion unit 152 reads one character at a time from the character “b” toward the head, but does not detect the character “c” within the return distance threshold “4” of the character “b”. Set to the position of offset m = 4.

文字列変換部１５２は、注目位置ｐが指す文字「ｂ」とは異なる文字を検出するまで末尾に向かって１文字ずつ読み進め、オフセットｍ＝５となる位置で、文字「ａ」を検出する（ステップＳ１４）。文字列変換部１５２は、この文字「ａ」から先頭に向かって１文字ずつ読み進めるが、文字「ａ」の戻り距離閾値「０」以内で文字「ａ」を検出しないため、注目位置ｐをオフセットｍ＝５の位置に設定する。 The character string conversion unit 152 reads one character at a time toward the end until a character different from the character “b” indicated by the target position p is detected, and detects the character “a” at a position where the offset m = 5. (Step S14). The character string conversion unit 152 reads one character at a time from the character “a” toward the top, but does not detect the character “a” within the return distance threshold “0” of the character “a”. Set to the position of offset m = 5.

図７の説明に移行する。文字列変換部１５２は、注目位置ｐが指す文字「ａ」とは異なる文字を検出するまで末尾に向かって１文字ずつ読み進め、オフセットｍ＝８となる位置で、文字「ｂ」を検出する（ステップＳ１５）。文字列変換部１５２は、この文字「ｂ」から先頭に向かって１文字ずつ読み進め、文字「ｂ」の戻り距離閾値「４」以内である戻り距離ｎ＝４の位置に文字「ｂ」を検出する。文字列変換部１５２は、この文字「ｂ」の次の文字「ａ」と、オフセットｍ＝８の位置に存在する文字「ｂ」とを置換する（ステップＳ１６）。文字列変換部１５２は、原点ｏおよび注目位置ｐをオフセットｍ＝８の位置に設定する。 Shifting to the description of FIG. The character string conversion unit 152 reads one character at a time toward the end until a character different from the character “a” indicated by the target position p is detected, and detects the character “b” at a position where the offset m = 8. (Step S15). The character string conversion unit 152 reads one character at a time from the character “b” toward the beginning, and sets the character “b” at a return distance n = 4 within the return distance threshold “4” of the character “b”. To detect. The character string conversion unit 152 replaces the character “a” next to the character “b” and the character “b” present at the position of the offset m = 8 (step S16). The character string conversion unit 152 sets the origin o and the target position p to the position of the offset m = 8.

文字列変換部１５２は、ステップＳ１６の処理が終了した時点で、原点ｏ「０」、オフセットｍ「８」、戻り距離ｎ「４」を対応付けて、置換履歴表１４３に格納する。図１２に、ステップＳ１６が終了した時点での置換履歴表のデータの内容を示す。 The character string conversion unit 152 stores the origin o “0”, the offset m “8”, and the return distance n “4” in association with each other in the replacement history table 143 when the process of step S16 ends. FIG. 12 shows the data contents of the replacement history table at the time when step S16 is completed.

図８の説明に移行する。文字列変換部１５２は、注目位置ｐが指す文字「ａ」とは異なる文字を検出するまで末尾に向かって１文字ずつ読み進め、オフセットｍ＝１となる位置で、文字「ｃ」を検出する（ステップＳ１７）。文字列変換部１５２は、この文字「ｃ」から先頭に向かって１文字ずつ読み進め、文字「ｃ」の戻り距離閾値「７」以内である戻り距離ｎ＝７の位置に文字「ｃ」を検出する。文字列変換部１５２は、この文字「ｃ」の次の文字「ａ」と、オフセットｍ＝１の位置に存在する文字「ｃ」とを置換する（ステップＳ１８）。文字列変換部１５２は、原点ｏおよび注目位置ｐをオフセットｍ＝１の位置に設定する。 Shifting to the description of FIG. The character string conversion unit 152 reads one character at a time toward the end until a character different from the character “a” indicated by the target position p is detected, and detects the character “c” at a position where the offset m = 1. (Step S17). The character string conversion unit 152 reads the character “c” from the character “c” one by one toward the head, and sets the character “c” at a return distance n = 7 that is within the return distance threshold “7” of the character “c”. To detect. The character string conversion unit 152 replaces the character “a” next to the character “c” and the character “c” present at the position of the offset m = 1 (step S18). The character string conversion unit 152 sets the origin o and the target position p to the position where the offset m = 1.

文字列変換部１５２は、ステップＳ１８の処理が終了した時点で、原点ｏ「８」、オフセットｍ「１」、戻り距離ｎ「７」を対応付けて、置換履歴表１４３に格納する。図１３に、ステップＳ１８が終了した時点での置換履歴表のデータの内容を示す。 The character string conversion unit 152 stores the origin o “8”, the offset m “1”, and the return distance n “7” in association with each other in the replacement history table 143 when the process of step S18 ends. FIG. 13 shows the data contents of the replacement history table at the time when step S18 is completed.

図９の説明に移行する。文字列変換部１５２は、注目位置ｐが指す文字「ａ」とは異なる文字を検出するまで末尾に向かって１文字ずつ読み進め、オフセットｍ＝５となる位置で、文字「ｂ」を検出する（ステップＳ１９）。文字列変換部１５２は、この文字「ｂ」から先頭に向かって１文字ずつ読み進めるが、文字「ｂ」の戻り距離閾値「４」以内で文字「ｂ」を検出しないため、注目位置ｐをオフセットｍ＝５の位置に設定する。 The description shifts to the description of FIG. The character string conversion unit 152 reads one character at a time toward the end until a character different from the character “a” indicated by the target position p is detected, and detects the character “b” at a position where the offset m = 5. (Step S19). The character string conversion unit 152 reads one character at a time from the character “b” toward the head, but does not detect the character “b” within the return distance threshold “4” of the character “b”. Set to the position of offset m = 5.

文字列変換部１５２は、注目位置ｐが指す文字「ｂ」とは異なる文字を検出するまで末尾に向かって１文字ずつ読み進め、オフセットｍ＝６となる位置で、文字「ａ」を検出する（ステップＳ２０）。文字列変換部１５２は、この文字「ａ」から先頭に向かって１文字ずつ読み進めるが、文字「ａ」の戻り距離閾値「０」以内で文字「ａ」を検出しないため、注目位置ｐをオフセットｍ＝６の位置に設定する。 The character string conversion unit 152 reads one character at a time toward the end until a character different from the character “b” indicated by the target position p is detected, and detects the character “a” at a position where the offset m = 6. (Step S20). The character string conversion unit 152 reads one character at a time from the character “a” toward the top, but does not detect the character “a” within the return distance threshold “0” of the character “a”. Set to the position of offset m = 6.

図１０の説明に移行する。文字列変換部１５２は、注目位置ｐが指す文字「ａ」とは異なる文字を検出するまで末尾に向かって１文字ずつ読み進め、オフセットｍ＝７となる位置で、文字「ｂ」を検出する（ステップＳ２１）。文字列変換部１５２は、この文字「ｂ」から先頭に向かって１文字ずつ読み進め、文字「ｂ」の戻り距離閾値「４」以内である戻り距離ｎ＝２の位置に文字「ｂ」を検出する。文字列変換部１５２は、この文字「ｂ」の次の文字「ａ」と、オフセットｍ＝７の位置に存在する文字「ｂ」とを置換する（ステップＳ２２）。文字列変換部１５２は、原点ｏおよび注目位置ｐをオフセットｍ＝７の位置に設定する。 The description shifts to the description of FIG. The character string conversion unit 152 reads the characters one by one toward the end until a character different from the character “a” indicated by the target position p is detected, and detects the character “b” at a position where the offset m = 7. (Step S21). The character string conversion unit 152 reads the character “b” from the character “b” one by one toward the head, and sets the character “b” at the return distance n = 2 within the return distance threshold “4” of the character “b”. To detect. The character string conversion unit 152 replaces the character “a” next to the character “b” and the character “b” present at the position of the offset m = 7 (step S22). The character string conversion unit 152 sets the origin o and the target position p to the position of the offset m = 7.

文字列変換部１５２は、ステップＳ２２の処理が終了した時点で、原点ｏ「９」、オフセットｍ「７」、戻り距離ｎ「２」を対応付けて、置換履歴表１４３に格納する。図１４に、ステップＳ２２が終了した時点での置換履歴表のデータの内容を示す。 The character string converting unit 152 stores the origin o “9”, the offset m “7”, and the return distance n “2” in association with each other in the replacement history table 143 when the process of step S22 ends. FIG. 14 shows the data contents of the replacement history table at the time when step S22 is completed.

図１１の説明に移行する。文字列変換部１５２は、注目位置ｐが指す文字「ａ」とは異なる文字を検出するまで末尾に向かって１文字ずつ読み進める。しかし、該当文字を検出する前にスライドバッファの末尾に到達する（ステップＳ２３）。文字列変換部１５２は、スライドバッファに格納された文字列を文字列Ｔとする。また、文字列変換部１５２は、図１４に示した置換履歴表の原点ｏの情報を取り除いたものを、置換履歴表１４３に格納する（ステップＳ２４）。 The description shifts to the description of FIG. The character string conversion unit 152 reads the characters one by one toward the end until a character different from the character “a” indicated by the target position p is detected. However, the end of the slide buffer is reached before the corresponding character is detected (step S23). The character string conversion unit 152 sets the character string stored in the slide buffer as the character string T. Further, the character string conversion unit 152 stores the information obtained by removing the information on the origin o of the replacement history table illustrated in FIG. 14 in the replacement history table 143 (step S24).

上記のように、文字列変換部１５２がステップＳ１０〜Ｓ２４の処理を実行することで、入力文字列Ｓは、文字列Ｔに変換される。文字列変換部１５２は、文字列Ｔ＝ａａｃｃｂｂａａａａａａａａｂｂａａａａをＲ_ＬＥ符号化部１５３に出力する。 As described above, the input character string S is converted into the character string T by the character string conversion unit 152 executing the processes of steps S10 to S24. The character string conversion unit 152 outputs the character string T = aaccbbaaaaaaaabbaaaa to the R _LE encoding unit 153.

また、ステップＳ２４に示したように、文字列変換部１５２は、置換履歴表をそのままの状態で記憶部１４０に記憶することはない。置換履歴表の原点の情報は、オフセットと戻り距離との関係から一意に導くことができる。このため、文字列変換部１５２は、原点の情報を取り除いた置換履歴表１４３を記憶部１４０に記憶することで、記憶部１４０が記憶すべきデータ量を削減する。 Further, as shown in step S24, the character string conversion unit 152 does not store the replacement history table in the storage unit 140 as it is. The information on the origin of the replacement history table can be uniquely derived from the relationship between the offset and the return distance. For this reason, the character string conversion unit 152 stores the replacement history table 143 from which the origin information is removed in the storage unit 140, thereby reducing the amount of data to be stored in the storage unit 140.

また、文字列変換部１５２は、置換履歴表１４３のオフセットと戻り距離とのペアを１バイトのデータ量で格納する。図１５は、置換履歴表のデータ構造の一例を示す図（２）である。図１５に示す例では、文字列変換部１５２は、置換履歴表１４３の１行目のオフセット「８」を４ビットに格納し、戻り距離「４」を４ビットに格納することで、（８，４）を１バイトに格納する。同様に、文字列変換部１５２は、２行目のオフセット「１」を４ビットに格納し、戻り距離「７」を４ビットに格納することで、（１，７）を１バイトに格納する。また、文字列変換部１５２は、３行目のオフセット「７」を４ビットに格納し、戻り距離「２」を４ビットに格納することで、（７，２）を１バイトに格納する。つまり、文字列変換部１５２は、図１５に示す置換履歴表１４３を３バイトのデータ量で記憶部１４０に格納する。 Further, the character string conversion unit 152 stores the pair of the offset and the return distance in the replacement history table 143 with a data amount of 1 byte. FIG. 15 is a diagram (2) illustrating an example of the data structure of the replacement history table. In the example illustrated in FIG. 15, the character string conversion unit 152 stores the offset “8” in the first row of the replacement history table 143 in 4 bits and stores the return distance “4” in 4 bits. , 4) is stored in 1 byte. Similarly, the character string conversion unit 152 stores (1, 7) in 1 byte by storing the offset “1” of the second row in 4 bits and storing the return distance “7” in 4 bits. . The character string conversion unit 152 stores (7, 2) in 1 byte by storing the offset “7” in the third row in 4 bits and the return distance “2” in 4 bits. That is, the character string conversion unit 152 stores the replacement history table 143 shown in FIG. 15 in the storage unit 140 with a data amount of 3 bytes.

図１の説明に戻る。Ｒ_ＬＥ符号化部１５３は、Ｒ_ＬＥの圧縮方式に基づいて、文字列を圧縮する処理部である。Ｒ_ＬＥ符号化部１５３が行うＲ_ＬＥの圧縮方式は、従来のものと同一である。Ｒ_ＬＥ符号化部１５３は、圧縮した文字列を出力文字列として、出力ファイル１４４に格納する。 Returning to the description of FIG. The R _LE encoding unit 153 is a processing unit that compresses a character string based on the R _LE compression method. Compression schemes _{R LE} which R _LE coding unit 153 performs is the same as the conventional. The R _LE encoding unit 153 stores the compressed character string in the output file 144 as an output character string.

例えば、Ｒ_ＬＥ符号化部１５３は、文字列変換部１５２により入力された文字列Ｔ＝ａａｃｃｂｂａａａａａａａａｂｂａａａａを、出力文字列Ｒ_ＬＥ（Ｔ）＝（ａ，２）（ｃ，２）（ｂ，２）（ａ，８）（ｂ，２）（ａ，４）に符号化する。 For example, the R _LE encoding unit 153 converts the character string T = aaccbbaaaaaaaabbaaaaa input by the character string conversion unit 152 into the output character string R _LE (T) = (a, 2) (c, 2) (b, 2). It encodes to (a, 8) (b, 2) (a, 4).

復元部１６０は、出力ファイル１４４から入力ファイル１４１を復元する処理部である。復元部１６０は、Ｒ_ＬＥ復号化部１６１と、文字列逆変換部１６２とを有する。 The restoration unit 160 is a processing unit that restores the input file 141 from the output file 144. The restoration unit 160 includes an R _LE decoding unit 161 and a character string reverse conversion unit 162.

Ｒ_ＬＥ復号化部１６１は、Ｒ_ＬＥの復号方式に基づいて、出力文字列を復号する処理部である。Ｒ_ＬＥ復号化部１６１が行うＲ_ＬＥの復号方式は、従来のものと同一である。例えば、Ｒ_ＬＥ復号化部１６１は、出力文字列を先頭の文字から辿っていき、連続する文字の種別と文字が連続する長さとに基づいて、符号化前の文字列に復号化する。Ｒ_ＬＥ復号化部１６１は、復号した文字列を文字列逆変換部１６２に出力する。 The R _LE decoding unit 161 is a processing unit that decodes the output character string based on the R _LE decoding method. Decoding scheme _{R LE} which R _LE decoding unit 161 performs is the same as the conventional. For example, the R _LE decoding unit 161 traces the output character string from the first character, and decodes the output character string to a character string before encoding based on the type of consecutive characters and the length of consecutive characters. The R _LE decoding unit 161 outputs the decoded character string to the character string reverse conversion unit 162.

例えば、Ｒ_ＬＥ復号化部１６１は、出力ファイル１４４に格納された出力文字列Ｒ_ＬＥ（Ｔ）＝（ａ，２）（ｃ，２）（ｂ，２）（ａ，８）（ｂ，２）（ａ，４）を、文字列Ｔ＝ａａｃｃｂｂａａａａａａａａｂｂａａａａに復号する。 For example, the R _LE decoding unit 161 outputs the output character string R _LE (T) = (a, 2) (c, 2) (b, 2) (a, 8) (b, 2) stored in the output file 144. ) (A, 4) is decoded into the character string T = aaccbbaaaaaaaabbaaaaa.

文字列逆変換部１６２は、Ｒ_ＬＥの圧縮方式にとって都合のよい並び順となるように変換された文字列を元の文字列に逆変換する処理部である。以下において、文字列逆変換部１６２の処理を具体的に説明する。文字列逆変換部１６２は、置換履歴表１４３を記憶部１４０から読み込み、置換履歴表１４３の原点の情報を復元した後に、文字列を逆変換する。なお、ここでは、逆変換対象となる文字列ＴをＴ＝ａａｃｃｂｂａａａａａａａａｂｂａａａａとする。また、置換履歴表１４３のデータ構造を図１５に示すものとする。 String inverse conversion unit 162 is a processing unit for inversely converting the converted character string to be a good sorted convenient for compression method R _LE to the original string. Hereinafter, the process of the character string reverse conversion unit 162 will be described in detail. The character string reverse conversion unit 162 reads the replacement history table 143 from the storage unit 140, restores the origin information of the replacement history table 143, and then reversely converts the character string. Here, the character string T to be reversely converted is assumed to be T = aaccbbaaaaaaaabbaaaaa. The data structure of the replacement history table 143 is shown in FIG.

文字列逆変換部１６２が原点の情報を復元する処理について説明する。図１６は、原点の情報を復元する処理を説明するための図である。ここでは、図１５に示した置換履歴表の原点を復元する場合を説明する。文字列逆変換部１６２は、ｎ−１行目の原点に、ｎ−１行目のオフセットの値を加算することで、ｎ行目の原点の値を求める。ただし、１行目の原点の値を０とする。図１６に示す例では、１行目の原点の値は０となる。２行目の原点の値は８となる。１行目の原点の値は９となる。 A process in which the character string reverse conversion unit 162 restores the origin information will be described. FIG. 16 is a diagram for explaining the process of restoring the origin information. Here, a case will be described in which the origin of the replacement history table shown in FIG. 15 is restored. The character string inverse conversion unit 162 obtains the value of the origin of the n-th row by adding the offset value of the n-1 row to the origin of the n-1 row. However, the value of the origin of the first line is 0. In the example shown in FIG. 16, the value of the origin of the first row is 0. The value of the origin of the second line is 8. The value of the origin of the first line is 9.

文字列逆変換部１６２が文字列を逆変換する処理について説明する。文字列逆変換部１６２は、原点を復元した置換履歴表を最後の行から一行ずつ読み出し、置換する２つの文字を判定する。置換する一方の文字は、文字列の先頭から「原点ｏ＋オフセットｍ」の位置に対応する文字となる。置換するもう一方の文字は、文字列の先頭から「原点ｏ＋オフセットｍ−戻り距離ｎ＋１」の位置に対応する文字となる。文字列逆変換部１６２は、置換する２つの文字を特定した後に、各文字を置換する。文字列逆変換部１６２は、上記処理を繰り返し実行することで、文字列を逆変換する。文字列逆変換部１６２は、逆変換した文字列を出力部１２０に出力しても良いし、記憶部１４０に記憶しても良い。 Processing in which the character string reverse conversion unit 162 reversely converts the character string will be described. The character string reverse conversion unit 162 reads the replacement history table whose origin has been restored line by line from the last line, and determines two characters to be replaced. One character to be replaced is a character corresponding to the position of “origin o + offset m” from the beginning of the character string. The other character to be replaced is a character corresponding to the position “origin o + offset m−return distance n + 1” from the beginning of the character string. The character string reverse conversion unit 162 replaces each character after specifying two characters to be replaced. The character string reverse conversion unit 162 performs reverse conversion of the character string by repeatedly executing the above processing. The character string reverse conversion unit 162 may output the reversely converted character string to the output unit 120 or may store the character string in the storage unit 140.

次に、文字列逆変換部１６２の処理を詳細に説明する。図１７、図１８は、文字列逆変換部の処理を詳細に説明するための図である。また、原点を復元した置換履歴表は、図１６の右側に示すものとする。 Next, the processing of the character string reverse conversion unit 162 will be described in detail. 17 and 18 are diagrams for explaining the processing of the character string reverse conversion unit in detail. The replacement history table with the origin restored is shown on the right side of FIG.

図１７について説明する。文字列逆変換部１６２は、変換対象となる文字列Ｔ＝ａａｃｃｂｂａａａａａａａａｂｂａａａａをバッファに読み込む（ステップＳ２５）。文字列逆変換部１６２は、置換履歴表の３行目のデータを読み込み、置換する２つの文字を判定する。置換履歴表の３行目のデータは、原点ｏ＝９、オフセットｍ＝７、戻り距離ｎ＝２となる。このため、置換する文字は、先頭から１６番目の文字「ａ」と１５番目の文字「ｂ」となる。文字列逆変換部１６２は、先頭から１６番目の文字「ａ」と１５番目の文字「ｂ」とを置換する（ステップＳ２６）。 FIG. 17 will be described. The character string reverse conversion unit 162 reads the character string T = aaccbbaaaaaaaabbaaaa to be converted into the buffer (step S25). The character string reverse conversion unit 162 reads the data in the third row of the replacement history table and determines two characters to be replaced. The data in the third row of the replacement history table has an origin o = 9, an offset m = 7, and a return distance n = 2. Therefore, the characters to be replaced are the 16th character “a” and the 15th character “b” from the top. The character string reverse conversion unit 162 replaces the 16th character “a” from the top with the 15th character “b” (step S26).

図１８の説明に移行する。文字列逆変換部１６２は、置換履歴表の２行目のデータを読み込み、置換する２つの文字を判定する。置換履歴表の２行目のデータは、原点ｏ＝８、オフセットｍ＝１、戻り距離ｎ＝７となる。このため、置換する文字は、先頭から９番目の文字「ｃ」と３番目の文字「ａ」となる。文字列逆変換部１６２は、先頭から９番目の文字「ｃ」と３番目の文字「ａ」とを置換する（ステップＳ２７）。 The description shifts to the description of FIG. The character string reverse conversion unit 162 reads the data in the second row of the replacement history table and determines two characters to be replaced. The data in the second row of the replacement history table has an origin o = 8, an offset m = 1, and a return distance n = 7. Therefore, the characters to be replaced are the ninth character “c” and the third character “a” from the top. The character string reverse conversion unit 162 replaces the ninth character “c” from the top with the third character “a” (step S27).

文字列逆変換部１６２は、置換履歴表の１行目のデータを読み込み、置換する２つの文字を判定する。置換履歴表の１行目のデータは、原点ｏ＝０、オフセットｍ＝８、戻り距離ｎ＝４となる。このため、置換する文字は、先頭から８番目の文字「ｂ」と５番目の文字「ａ」となる。文字列逆変換部１６２は、先頭から８番目の文字「ｂ」と５番目の文字「ａ」とを置換する（ステップＳ２８）。ステップＳ２８が終了した時点で、置換履歴表に対応する置換が全て終了する。 The character string reverse conversion unit 162 reads the data in the first row of the replacement history table and determines two characters to be replaced. The data in the first row of the replacement history table has an origin o = 0, an offset m = 8, and a return distance n = 4. Therefore, the characters to be replaced are the eighth character “b” and the fifth character “a” from the top. The character string reverse conversion unit 162 replaces the eighth character “b” from the top with the fifth character “a” (step S28). When step S28 ends, all the replacements corresponding to the replacement history table are completed.

上記のように、文字列逆変換部１６２がステップＳ２５〜Ｓ２８の処理を実行することで、文字列Ｔ＝ａａｃｃｂｂａａａａａａａａｂｂａａａａは、文字列Ｔ＝ａａｃａｂａａａｂｃａａａａｂａｂａａａに逆変換される。この逆変換された文字列は、Ｒ_ＬＥの圧縮方式にあわせて変換される前の文字列に一致する。 As described above, the character string reverse conversion unit 162 executes the processing of steps S25 to S28, so that the character string T = aaccbbaaaaaaaaaabaaaa is inversely converted to the character string T = aaabaaaaaaaaaaaaa. The reversely converted character string matches the character string before being converted in accordance with the _RLE compression method.

ところで、図１に示した圧縮部１５０および復元部１６０は、例えば、ＡＳＩＣ（Application Specific Integrated Circuit）やＦＰＧＡ（Field Programmable Gate Array）などの集積装置に対応する。または、圧縮部１５０および復元部１６０は、例えば、ＣＰＵ（Central Processing Unit）やＭＰＵ（Micro Processing Unit）などの電子回路に対応する。 Meanwhile, the compression unit 150 and the restoration unit 160 illustrated in FIG. 1 correspond to an integrated device such as an application specific integrated circuit (ASIC) or a field programmable gate array (FPGA), for example. Or the compression part 150 and the decompression | restoration part 160 respond | correspond to electronic circuits, such as CPU (Central Processing Unit) and MPU (Micro Processing Unit), for example.

次に、本実施例にかかるデータ圧縮復元装置１００の処理手順について説明する。図１９は、圧縮部の処理手順を示すフローチャートである。図１９に示す処理は、例えば、記憶部１４０に入力ファイル１４１が格納されたことを契機として実行される。 Next, a processing procedure of the data compression / decompression apparatus 100 according to the present embodiment will be described. FIG. 19 is a flowchart illustrating the processing procedure of the compression unit. The process illustrated in FIG. 19 is executed when the input file 141 is stored in the storage unit 140, for example.

図１９に示すように、圧縮部１５０は、閾値表生成処理を実行し（ステップＳ１０１）、文字列変換処理を実行する（ステップＳ１０２）。そして、圧縮部１５０は、Ｒ_ＬＥの圧縮方式に基づいて、文字列を圧縮する（ステップＳ１０３）。 As illustrated in FIG. 19, the compression unit 150 executes a threshold value table generation process (step S101) and executes a character string conversion process (step S102). Then, the compression unit 150 compresses the character string based on the R _LE compression method (step S103).

次に、図１９のステップＳ１０１に示した閾値表生成処理について説明する。図２０は、閾値表生成処理の処理手順を示すフローチャートである。図２０に示すように、閾値表生成部１５１は、入力ファイル１４１から入力文字列を取得し、取得した入力文字列を先頭から末尾まで１文字ずつ読み込む（ステップＳ２０１）。閾値表生成部１５１は、文字ごとに出現数をカウントし、カウントした文字ごとの出現数を戻り距離閾値表１４２に記録する（ステップＳ２０２）。 Next, the threshold value table generation process shown in step S101 of FIG. 19 will be described. FIG. 20 is a flowchart illustrating a processing procedure of threshold value table generation processing. As illustrated in FIG. 20, the threshold value table generation unit 151 acquires an input character string from the input file 141, and reads the acquired input character string character by character from the beginning to the end (step S201). The threshold value table generating unit 151 counts the number of appearances for each character, and records the counted number of appearances for each character in the return distance threshold value table 142 (step S202).

閾値表生成部１５１は、出現数をソートキーとして、戻り距離閾値表１４２を降順にソートする（ステップＳ２０３）。閾値表生成部１５１は、文字ごとに、該当文字より出現数が多い文字の出現数の和を該当文字の出現数で除算し、除算した値の小数点以下第一位を四捨五入した値を各文字の戻り距離閾値として戻り距離閾値表１４２に記録する（ステップＳ２０４）。 The threshold value table generating unit 151 sorts the return distance threshold value table 142 in descending order using the number of appearances as a sort key (step S203). For each character, the threshold value table generating unit 151 divides the sum of the number of appearances of characters having a larger number of appearances than the corresponding character by the number of appearances of the corresponding character, and rounds off the first decimal place of the divided value to each character. Is recorded in the return distance threshold table 142 (step S204).

次に、図１９のステップＳ１０２に示した文字列変換処理について説明する。図２１は、文字列変換処理の処理手順を示すフローチャートである。図２１に示すように、文字列変換部１５２は、入力ファイル１４１から入力文字列をスライドバッファに読み込み（ステップＳ３０１）、初期化処理を行う（ステップＳ３０２）。ステップＳ３０２の初期化処理において、文字列変換部１５２は、原点ｏ、注目位置ｐをスライドバッファの先頭にセットする。 Next, the character string conversion process shown in step S102 of FIG. 19 will be described. FIG. 21 is a flowchart showing the processing procedure of the character string conversion processing. As shown in FIG. 21, the character string converter 152 reads an input character string from the input file 141 into the slide buffer (step S301), and performs an initialization process (step S302). In the initialization process in step S302, the character string conversion unit 152 sets the origin o and the target position p at the head of the slide buffer.

文字列変換部１５２は、注目位置ｐが指す文字ｘとは異なる文字ｙを検出するまで、入力文字列Ｓを末尾に向かって１文字ずつ読み進める（ステップＳ３０３）。文字列変換部１５２は、スライドバッファの末尾に到達する前に文字ｙを検出した場合には（ステップＳ３０４，Ｎｏ）、文字ｙよりも先頭側に存在し、かつ、文字ｙと同一の文字ｙ’を検出するまで、先頭側に１文字ずつ読み進める（ステップＳ３０５）。 The character string converter 152 reads the input character string S one character at a time until it detects a character y different from the character x indicated by the target position p (step S303). When the character string conversion unit 152 detects the character y before reaching the end of the slide buffer (No in step S304), the character string conversion unit 152 exists on the front side of the character y and is the same character y as the character y. Until 'is detected, one character is read at the head side (step S305).

文字列変換部１５２は、文字ｙの戻り距離閾値以内でｙ’を検出した場合には（ステップＳ３０６，Ｙｅｓ）、文字ｙ’の次の文字ｚと、文字ｙとを置換する（ステップＳ３０７）。文字列変換部１５２は、原点ｏ、オフセットｍ、戻り距離ｎを対応付けて置換履歴表に記録する（ステップＳ３０８）。文字列変換部１５２は、原点ｏ、注目位置ｐをオフセットｍの位置に設定し（ステップＳ３０９）、ステップＳ３０３に移行する。 When the character string conversion unit 152 detects y ′ within the return distance threshold of the character y (Yes in Step S306), the character string conversion unit 152 replaces the character z with the character z next to the character y ′ (Step S307). . The character string conversion unit 152 records the origin o, the offset m, and the return distance n in association with each other in the replacement history table (step S308). The character string conversion unit 152 sets the origin o and the position of interest p to the position of the offset m (step S309), and proceeds to step S303.

一方、文字列変換部１５２は、文字ｙの戻り距離閾値以内でｙ’を検出しなかった場合には（ステップＳ３０６，Ｎｏ）、注目位置ｐをオフセットｍの位置に設定し（ステップＳ３１０）、ステップＳ３０３に移行する。 On the other hand, when the character string conversion unit 152 does not detect y ′ within the return distance threshold of the character y (No at Step S306), the character string conversion unit 152 sets the target position p to the position of the offset m (Step S310). The process proceeds to step S303.

ところで、ステップＳ３０４において、文字列変換部１５２は、文字ｙを検出する前にスライドバッファの末尾に到達した場合には（ステップＳ３０４，Ｙｅｓ）、スライドバッファ内の文字列を更新する（ステップＳ３１１）。つまり、文字列変換部１５２は、入力ファイル１４１から文字列を読み出し、読み出した文字列をスライドバッファに格納する。 Incidentally, in step S304, when the character string conversion unit 152 reaches the end of the slide buffer before detecting the character y (step S304, Yes), the character string conversion unit 152 updates the character string in the slide buffer (step S311). . That is, the character string conversion unit 152 reads a character string from the input file 141 and stores the read character string in the slide buffer.

文字列変換部１５２は、入力ファイルの末尾に到達していない場合には（ステップＳ３１２，Ｎｏ）、ステップＳ３０３に移行する。一方、文字列変換部１５２は、入力ファイルの末尾に到達した場合には（ステップＳ３１２，Ｙｅｓ）、スライドバッファ内の文字列をＲ_ＬＥ符号化部１５３に出力し（ステップＳ３１３）、処理を終了する。 If the character string conversion unit 152 has not reached the end of the input file (No at Step S312), the character string conversion unit 152 proceeds to Step S303. On the other hand, when the character string conversion unit 152 reaches the end of the input file (Yes in step S312), the character string conversion unit 152 outputs the character string in the slide buffer to the _RLE encoding unit 153 (step S313), and ends the process. To do.

次に、図１に示した文字列逆変換部１６２の処理手順について説明する。図２２は、本実施例にかかる文字列逆変換部の処理手順を示すフローチャートである。図２２に示す処理は、例えば、記憶部１４０に置換履歴表１４３と出力ファイル１４４が格納されたことを契機として実行される。 Next, the processing procedure of the character string reverse conversion unit 162 shown in FIG. 1 will be described. FIG. 22 is a flowchart illustrating the processing procedure of the character string reverse conversion unit according to the present embodiment. The process illustrated in FIG. 22 is executed, for example, when the replacement history table 143 and the output file 144 are stored in the storage unit 140.

図２２に示すように、文字列逆変換部１６２は、置換履歴表１４３を読み込み（ステップＳ４０１）、置換履歴表１４３の原点を復元する（ステップＳ４０２）。文字列逆変換部１６２は、出力文字列Ｔをバッファに読み込み（ステップＳ４０３）、置換履歴表の末尾から、未選択の行を選択する（ステップＳ４０４）。 As shown in FIG. 22, the character string reverse conversion unit 162 reads the replacement history table 143 (step S401), and restores the origin of the replacement history table 143 (step S402). The character string reverse conversion unit 162 reads the output character string T into the buffer (step S403), and selects an unselected line from the end of the replacement history table (step S404).

文字列逆変換部１６２は、置換履歴表の行を全て選択した場合には（ステップＳ４０５，Ｙｅｓ）、文字列Ｔを出力し（ステップＳ４０６）、処理を終了する。一方、文字列逆変換部１６２は置換履歴表の行を全て選択していない場合には（ステップＳ４０５，Ｎｏ）、出力文字列ＴにおいてＴ［ｏ＋ｍ］とＴ［ｏ＋ｍ−ｎ＋１］とを置換し（ステップＳ４０７）、ステップＳ４０４に移行する。ここで、ｏは原点、ｍはオフセット、ｎは戻り距離とする。 When all the rows of the replacement history table have been selected (step S405, Yes), the character string reverse conversion unit 162 outputs the character string T (step S406) and ends the process. On the other hand, when all the rows of the replacement history table are not selected (step S405, No), the character string reverse conversion unit 162 replaces T [o + m] and T [o + m−n + 1] in the output character string T. (Step S407), the process proceeds to Step S404. Here, o is the origin, m is the offset, and n is the return distance.

次に、入力文字列ＳをそのままＲ_ＬＥの圧縮方式により圧縮した場合のバイト数と、圧縮部１５０が、入力文字列Ｓを文字列Ｔに変換した後に圧縮した場合のバイト数との比較結果を示す。なお、入力文字列Ｓを文字列Ｔに変換した後に圧縮した場合のバイト数には、文字列Ｔから入力文字列Ｓに逆変換する場合に必要となる置換履歴表のバイト数を含める。また、１文字を１バイト、置換履歴表の各数値を１バイトとする。 Next, a comparison result between the number of bytes when the input character string S is directly compressed by the _RLE compression method and the number of bytes when the compression unit 150 compresses the input character string S after converting it to the character string T. Indicates. The number of bytes when the input character string S is compressed after being converted to the character string T includes the number of bytes of the replacement history table that is necessary when the character string T is converted back to the input character string S. One character is 1 byte, and each numerical value in the replacement history table is 1 byte.

入力文字列ＳをＳ＝ａａｃａｂａａａｂｃａａａａｂａｂａａａとする。従来のように、この入力文字列ＳをＲ_ＬＥで圧縮すると、Ｒ_ＬＥ（Ｓ）＝（ａ，２）（ｃ，１）（ａ，１）（ｂ，１）（ａ，３）（ｂ，１）（ｃ，１）（ａ，４）（ｂ，１）（ａ，１）（ｂ，１）（ａ，３）となる。このため、Ｒ_ＬＥ（Ｓ）のデータ量は「２４」バイトとなる。 Let the input character string S be S = aacabaaabcaaaaabaaaa. When this input character string S is compressed with R _LE as in the conventional case, R _LE (S) = (a, 2) (c, 1) (a, 1) (b, 1) (a, 3) (b , 1) (c, 1) (a, 4) (b, 1) (a, 1) (b, 1) (a, 3). Therefore, the data amount of R _LE (S) is “24” bytes.

入力文字列ＳをＲ_ＬＥの圧縮方式にとって都合の良い並び順に変換した文字列を、文字列Ｔ＝ａａｃｃｂｂａａａａａａａａｂｂａａａａとする。また、文字列Ｔを入力文字列Ｓに逆変換するための置換履歴表を図２３に示す。図２３は、置換履歴表のデータ構造の一例を示す図（３）である。文字列ＴをＲ_ＬＥで圧縮すると、Ｒ_ＬＥ（Ｔ）＝（ａ，２）（ｃ，２）（ｂ，２）（ａ，８）（ｂ，２）（ａ，４）となる。このため、Ｒ_ＬＥ（Ｔ）のデータ量は「１２」バイトとなる。また、図２３に示した置換履歴表のデータ量は、原点の情報を省き、オフセットと戻り距離とのペアを１バイトに格納すると、３バイトとなる。このため、Ｒ_ＬＥ（Ｔ）のデータ量と置換履歴表のデータ量とを加算すると、「１５」バイトとなる。 A character string obtained by converting the input character string S into an arrangement order convenient for the R _LE compression method is defined as a character string T = aaccbbaaaaaaaabbaaaa. FIG. 23 shows a replacement history table for reversely converting the character string T into the input character string S. FIG. 23 is a diagram (3) illustrating an example of the data structure of the replacement history table. When the string T is compressed by _{R _LE,} the R LE (T) = (a , 2) (c, 2) (b, 2) (a, 8) (b, 2) (a, 4). Therefore, the data amount of R _LE (T) is “12” bytes. The data amount of the replacement history table shown in FIG. 23 is 3 bytes when the origin information is omitted and the pair of offset and return distance is stored in 1 byte. Therefore, the sum of the data amount of R _LE (T) and the data amount of the replacement history table is “15” bytes.

したがって、圧縮部１５０は、置換履歴表のデータ量を合わせた場合であっても、従来技術の圧縮方法により圧縮された文字列のデータ量とを比較して、データ量を削減することができる。上記に示した例では、圧縮部１５０は、従来技術と比較して、９バイト削減することができる。 Therefore, the compression unit 150 can reduce the data amount by comparing the data amount of the character string compressed by the conventional compression method even when the data amount of the replacement history table is combined. . In the example shown above, the compression unit 150 can reduce 9 bytes compared to the conventional technique.

次に、本実施例にかかるデータ圧縮復元装置１００の効果について説明する。従来技術では、図２５に示したように、文字列に含まれる全ての文字について、移動前と移動後の位置関係を記憶していた。これに対して、データ圧縮復元装置１００は、文字列に含まれる文字のうち、置換していない文字については位置関係を記憶せず、置換した文字についてのみの位置関係を記憶する。このため、データ圧縮復元装置１００は、記憶部１４０が記憶すべきデータ量を削減することができる。 Next, effects of the data compression / decompression apparatus 100 according to the present embodiment will be described. In the prior art, as shown in FIG. 25, the positional relationship before and after movement is stored for all characters included in the character string. On the other hand, the data compression / decompression apparatus 100 does not store the positional relationship for characters that are not replaced among the characters included in the character string, but stores the positional relationship only for the replaced characters. For this reason, the data compression / decompression apparatus 100 can reduce the amount of data that the storage unit 140 should store.

また、従来技術では、移動前と移動後の位置関係を記憶する際に、文字の位置を示す番号をそのままの形式で記憶していたため、圧縮対象の文字列が長いほど、大きな整数、つまり、長いビット長で記憶していた。例えば、従来技術では、１５１６１７７番目の文字と１５１６１７９番目の文字が置換された場合には、（１５１６１７７，１５１６１７９）を記憶していた。これに対して、データ圧縮復元装置１００は、オフセットおよび戻り距離を記憶する。オフセットおよび戻り距離は、いずれも文字の位置関係を差分で表したものであるため、データ圧縮復元装置１００は、圧縮対象の文字列が長くても、小さな整数、つまり、短いビット長で記憶することができる。このため、データ圧縮復元装置１００は、記憶部１４０が記憶すべきデータ量を削減することができる。 Further, in the prior art, when storing the positional relationship before and after the movement, the number indicating the position of the character is stored in the form as it is, so the longer the character string to be compressed, the larger the integer, that is, I remembered it with a long bit length. For example, in the prior art, when the 1516177th character and the 1516179th character are replaced, (1516177, 1516179) is stored. On the other hand, the data compression / decompression apparatus 100 stores the offset and the return distance. Since both the offset and the return distance represent the positional relationship of characters as a difference, the data compression / decompression apparatus 100 stores a small integer, that is, a short bit length even if the character string to be compressed is long. be able to. For this reason, the data compression / decompression apparatus 100 can reduce the amount of data that the storage unit 140 should store.

また、従来技術では、文字列の長さがｎの場合には、メモリコストはＯ（ｎ）であった。これに対して、データ圧縮復元装置１００は、スライドバッファを用いるのでメモリコストはＯ（１）となり、従来技術と比較してメモリコストを削減することができる。 In the prior art, when the length of the character string is n, the memory cost is O (n). On the other hand, since the data compression / decompression apparatus 100 uses a slide buffer, the memory cost is O (1), and the memory cost can be reduced as compared with the prior art.

また、データ圧縮復元装置１００は、入力文字列Ｓを変換する場合に、原点以降の文字を置換対象とし、置換元となる文字の領域を制限している。このため、原点は、置換を行うたびに置換元となった文字の位置に再設定されるので、一度置換対象となった文字が再度置換されることを防止することができる。さらに、データ圧縮復元装置１００は、置換先となる文字を検出する場合に、置換元の文字の出現頻度が高いほど小さい値となる戻り距離閾値を設定する。このため、出現頻度が高い文字ほど置換対象になり難くなるので、置換の頻度を抑えることができる。このようなことから、データ圧縮復元装置１００は、圧縮にかかる処理負荷を削減することができる。具体的に文字列の長さがｎの場合には、従来技術の計算コストはＯ（ｎｌｏｇｎ）である。これに対して、本発明の計算コストはＯ（ｎ）となり、従来技術と比較して計算コストを削減することができる。 In addition, when converting the input character string S, the data compression / decompression device 100 sets the character after the origin as a replacement target and restricts the area of the character that is the replacement source. For this reason, since the origin is reset to the position of the character that is the replacement source every time replacement is performed, it is possible to prevent the character that has been replaced once from being replaced again. Further, when detecting a replacement destination character, the data compression / decompression apparatus 100 sets a return distance threshold value that becomes smaller as the replacement source character appears more frequently. For this reason, characters with higher appearance frequencies are less likely to be replaced, so that the replacement frequency can be suppressed. For this reason, the data compression / decompression apparatus 100 can reduce the processing load for compression. Specifically, when the length of the character string is n, the calculation cost of the prior art is O (nlogn). On the other hand, the calculation cost of the present invention is O (n), and the calculation cost can be reduced as compared with the prior art.

また、データ圧縮復元装置１００は、圧縮された文字列を復元する場合に、置換履歴表の原点を復元し、復元した原点と、オフセットと、戻り距離とに基づいて、圧縮された文字列を復号化し、逆変換する。このため、置換履歴表にオフセットと戻り距離のみが記憶されている場合でも、正確に文字列を復元することができる。 Further, when restoring the compressed character string, the data compression / decompression apparatus 100 restores the origin of the replacement history table, and based on the restored origin, the offset, and the return distance, the compressed character string is restored. Decrypt and reverse transform. For this reason, even when only the offset and the return distance are stored in the replacement history table, the character string can be accurately restored.

ところで、図１に示したデータ圧縮復元装置１００の各構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されていることを要しない。すなわち、データ圧縮復元装置１００の分散・統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。例えば、図１に示した圧縮部１５０および復元部１６０を同一の装置が有している必要は無い。別々の装置が、圧縮部１５０、復元部１６０をそれぞれ備えていても構わない。 Incidentally, each component of the data compression / decompression apparatus 100 shown in FIG. 1 is functionally conceptual, and does not necessarily need to be physically configured as illustrated. That is, the specific form of distribution / integration of the data compression / decompression apparatus 100 is not limited to the one shown in the drawing, and all or a part of the data compression / decompression apparatus 100 can be functionally or physically processed in an arbitrary unit according to various loads or usage conditions. Can be distributed and integrated. For example, it is not necessary that the same apparatus has the compression unit 150 and the restoration unit 160 illustrated in FIG. Different devices may include the compression unit 150 and the restoration unit 160, respectively.

また、本実施例において説明した各処理のうち、自動的に行われるものとして説明した処理の全部または一部を手動的に行うこともでき、あるいは、手動的に行われるものとして説明した処理の全部または一部を公知の方法で自動的に行うこともできる。例えば、圧縮部１５０の処理は、記憶部１４０に入力ファイル１４１が格納されたことを契機として自動的に実行されるものと説明したが、これに限定されるものではない。圧縮部１５０の処理は、記憶部１４０に入力ファイル１４１が格納された後に、手動的に開始されるようにしても良い。この他、上述文書中や図面中で示した処理手順、制御手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる。例えば、戻り距離閾値表１４２は、入力文字列Ｓに含まれる各文字に対応付けて、出現数と、戻り距離閾値とを保持するものと説明したが、これに限定されるものではない。戻り距離閾値表１４２は、入力文字列Ｓに含まれる各文字に対応付けて、戻り距離閾値のみを保持しても良い。 In addition, among the processes described in the present embodiment, all or part of the processes described as being automatically performed can be performed manually, or the processes described as being performed manually can be performed. All or a part can be automatically performed by a known method. For example, the processing of the compression unit 150 has been described as being automatically executed when the input file 141 is stored in the storage unit 140, but is not limited thereto. The processing of the compression unit 150 may be started manually after the input file 141 is stored in the storage unit 140. In addition, the processing procedures, control procedures, specific names, and information including various data and parameters shown in the above-described document and drawings can be arbitrarily changed unless otherwise specified. For example, the return distance threshold table 142 has been described as holding the number of appearances and the return distance threshold in association with each character included in the input character string S, but is not limited to this. The return distance threshold table 142 may hold only the return distance threshold in association with each character included in the input character string S.

また、本実施例では、データ圧縮復元装置１００が文字データを処理対象する場合を説明したが、これに限定されるものではない。例えば、データ圧縮復元装置は、画像データを処理対象としても良い。 In the present embodiment, the case where the data compression / decompression apparatus 100 processes character data has been described. However, the present invention is not limited to this. For example, the data compression / decompression apparatus may process image data as a processing target.

また、本実施例では、文字列変換部１５２が置換履歴表１４３のオフセットと戻り距離とのペアを１バイトに格納することで、置換履歴表１４３を記憶部１４０に格納する場合を説明したが、これに限定されるものではない。例えば、文字列変換部１５２は、可変長符号を用いて、置換履歴表１４３を記憶部１４０に格納するようにしても良い。 In the present embodiment, the case has been described in which the character string conversion unit 152 stores the replacement history table 143 in the storage unit 140 by storing the offset / return distance pair of the replacement history table 143 in one byte. However, the present invention is not limited to this. For example, the character string conversion unit 152 may store the replacement history table 143 in the storage unit 140 using a variable length code.

例えば、文字列変換部１５２は、可変長符号であるワイル符号を用いて、置換履歴表１４３を記憶部１４０に格納する。ワイル符号は、任意の整数を、数値の大きさに応じたビット長で表現する可変長符号である。例えば、整数「１〜４」は、ワイル符号を用いると「０ｘｘ」と３バイトで表現される。また、整数「５〜８」は、ワイル符号を用いると「１０ｘｘｘ」と５バイトで表現される。また、整数「９〜１６」は、ワイル符号を用いると「１１０ｘｘｘｘ」と７バイトで表現される。また、整数「１７〜３２」は、ワイル符号を用いると「１１１０ｘｘｘｘｘ」と９バイトで表現される。なお、「ｘ」は、０または１である。 For example, the character string conversion unit 152 stores the replacement history table 143 in the storage unit 140 using a Weyl code that is a variable length code. The Weyl code is a variable-length code that represents an arbitrary integer with a bit length corresponding to the magnitude of the numerical value. For example, the integers “1 to 4” are expressed by 3 bytes as “0xx” when a Weyl code is used. Further, the integer “5 to 8” is expressed by 5 bytes as “10xxx” when a Weyl code is used. In addition, the integer “9 to 16” is expressed by “110xxxx” and 7 bytes when the Weyl code is used. Further, the integer “17 to 32” is expressed by 9 bytes as “1110xxxx” when the Weyl code is used. “X” is 0 or 1.

図１５に示した例では、文字列変換部１５２は、置換履歴表１４３の１行目のオフセット「８」を５ビットに格納し、戻り距離「４」を３ビットに格納する。また、文字列変換部１５２は、２行目のオフセット「１」を３ビットに格納し、戻り距離「７」を５ビットに格納する。また、文字列変換部１５２は、３行目のオフセット「７」を５ビットに格納し、戻り距離「２」を３ビットに格納する。つまり、文字列変換部１５２は、図１５に示す置換履歴表１４３を２４ビット（３バイト）のデータ量で格納する。 In the example illustrated in FIG. 15, the character string conversion unit 152 stores the offset “8” of the first row of the replacement history table 143 in 5 bits and stores the return distance “4” in 3 bits. Further, the character string conversion unit 152 stores the offset “1” of the second row in 3 bits and stores the return distance “7” in 5 bits. Further, the character string conversion unit 152 stores the offset “7” of the third row in 5 bits and stores the return distance “2” in 3 bits. That is, the character string conversion unit 152 stores the replacement history table 143 shown in FIG. 15 with a data amount of 24 bits (3 bytes).

オフセットと戻り距離とのペアを１バイトに格納する場合には、文字列変換部１５２が記憶部１４０に格納可能なオフセットの数値や戻り距離の数値には、自ら制限があった。例えば、オフセットを４ビットに格納する場合には、文字列変換部１５２が格納可能なオフセットの数値は、「０〜１５」に限られていた。これに対して、可変長符号を用いる場合には、数値の大きさに応じたビット長で記憶部１４０に格納する。このため、データ圧縮復元装置１００は、オフセットや戻り距離の数値の大きさに関わらず、置換履歴表１４３を記憶部１４０に記憶することができる。 When a pair of offset and return distance is stored in 1 byte, the numerical value of the offset and the return distance that can be stored in the storage unit 140 by the character string conversion unit 152 are restricted by themselves. For example, when the offset is stored in 4 bits, the numerical value of the offset that can be stored by the character string conversion unit 152 is limited to “0 to 15”. On the other hand, when a variable length code is used, it is stored in the storage unit 140 with a bit length corresponding to the numerical value. For this reason, the data compression / decompression apparatus 100 can store the replacement history table 143 in the storage unit 140 regardless of the numerical values of the offset and the return distance.

また、上述の実施例で説明したデータ圧縮復元装置１００等の処理は、予め用意されたプログラムを各種のコンピュータで実行することによって実現することもできる。ここで、図２４を用いて、上記の実施例で説明したデータ圧縮復元装置１００による処理と同様の機能を実現する圧縮復元プログラムを実行するコンピュータの一例を説明する。図２４は、圧縮復元プログラムを実行するコンピュータの一例を示す図である。 Further, the processing of the data compression / decompression apparatus 100 and the like described in the above-described embodiments can be realized by executing a program prepared in advance on various computers. Here, an example of a computer that executes a compression / decompression program that realizes the same function as the processing performed by the data compression / decompression apparatus 100 described in the above embodiment will be described with reference to FIG. FIG. 24 is a diagram illustrating an example of a computer that executes a compression / decompression program.

図２４に示すように、データ圧縮復元装置１００として機能するコンピュータ２００は、各種演算処理を実行するＣＰＵ（Central Processing Unit）２０１と、ユーザからのデータの入力を受け付ける入力装置２０２と、モニタ２０３を有する。また、コンピュータ２００は、記憶媒体からプログラム等を読み取る媒体読み取り装置２０４と、ネットワークを介して他のコンピュータとの間でデータの授受を行うネットワークインターフェース装置２０５を有する。また、コンピュータ２００は、各種情報を一時記憶するＲＡＭ（Random Access Memory）２０６と、ハードディスク装置２０７を有する。各装置２０１〜２０７は、バス２０８に接続される。 As shown in FIG. 24, a computer 200 functioning as the data compression / decompression apparatus 100 includes a CPU (Central Processing Unit) 201 that executes various arithmetic processes, an input device 202 that receives input of data from a user, and a monitor 203. Have. The computer 200 also includes a medium reading device 204 that reads programs and the like from a storage medium, and a network interface device 205 that exchanges data with other computers via a network. The computer 200 also includes a RAM (Random Access Memory) 206 that temporarily stores various information and a hard disk device 207. Each device 201 to 207 is connected to a bus 208.

ハードディスク装置２０７は、上述したデータ圧縮復元装置１００の機能と同様の機能を発揮する圧縮プログラム２０７ａと、復元プログラム２０７ｂと、各種データ２０７ｃとを記憶する。各種データ２０７ｃは、図１に示した入力ファイル１４１、戻り距離閾値表１４２、置換履歴表１４３、出力ファイル１４４等に対応する。なお、圧縮プログラム２０７ａと、復元プログラム２０７ｂと、各種データ２０７ｃとを適宜分散させて、ネットワークを介して通信可能に接続された他のコンピュータの記憶部に記憶させておくこともできる。 The hard disk device 207 stores a compression program 207a that exhibits the same functions as those of the data compression / decompression device 100 described above, a restoration program 207b, and various data 207c. The various data 207c corresponds to the input file 141, the return distance threshold table 142, the replacement history table 143, the output file 144, and the like shown in FIG. Note that the compression program 207a, the restoration program 207b, and the various data 207c may be appropriately distributed and stored in a storage unit of another computer that is communicably connected via a network.

そして、ＣＰＵ２０１が圧縮プログラム２０７ａをハードディスク装置２０７から読み出してＲＡＭ２０６に展開することにより、圧縮プログラム２０７ａは、圧縮プロセス２０６ａとして機能する。この圧縮プロセス２０６ａは、図１に示した圧縮部１５０に対応する。 Then, the CPU 201 reads the compression program 207a from the hard disk device 207 and expands it in the RAM 206, whereby the compression program 207a functions as the compression process 206a. This compression process 206a corresponds to the compression unit 150 shown in FIG.

ＣＰＵ２０１が復元プログラム２０７ｂをハードディスク装置２０７から読み出してＲＡＭ２０６に展開することにより、復元プログラム２０７ｂは、復元プロセス２０６ｂとして機能する。この復元プロセス２０６ｂは、図１に示した復元部１６０に対応する。また、ＣＰＵ２０１は、ハードディスク装置２０７から各種データ２０７ｃを読み出して、ＲＡＭ２０６に格納する。 When the CPU 201 reads the restoration program 207b from the hard disk device 207 and expands it in the RAM 206, the restoration program 207b functions as the restoration process 206b. The restoration process 206b corresponds to the restoration unit 160 illustrated in FIG. Further, the CPU 201 reads various data 207 c from the hard disk device 207 and stores it in the RAM 206.

圧縮プロセス２０６ａは、各種データ２０６ｃに含まれる入力ファイルに対して圧縮処理を実行する。復元プロセス２０６ｂは、各種データ２０６ｃに含まれる圧縮済みの文字列を、置換履歴表に基づいて復元する。 The compression process 206a performs compression processing on the input file included in the various data 206c. The restoration process 206b restores the compressed character string included in the various data 206c based on the replacement history table.

なお、圧縮プログラム２０７ａおよび復元プログラム２０７ｂについては、必ずしも最初からハードディスク装置２０７に記憶させておく必要はない。例えば、コンピュータ２００に挿入されるフレキシブルディスク（ＦＤ）、ＣＤ−ＲＯＭ、ＤＶＤディスク、光磁気ディスク、ＩＣカード等の「可搬用の記憶媒体」に各プログラムを記憶させておく。そして、コンピュータ２００がこれらから各プログラムを読み出して実行するようにしても良い。 Note that the compression program 207a and the restoration program 207b are not necessarily stored in the hard disk device 207 from the beginning. For example, each program is stored in a “portable storage medium” such as a flexible disk (FD), a CD-ROM, a DVD disk, a magneto-optical disk, and an IC card inserted into the computer 200. Then, the computer 200 may read and execute each program from these.

以上の各実施例を含む実施形態に関し、さらに以下の付記を開示する。 The following supplementary notes are further disclosed with respect to the embodiments including the above examples.

（付記１）データと、該データを入れ替えるか否かを決める該データの出現頻度に応じた該データを移動させ得る移動距離の閾値と、を対応付けて記憶する移動距離テーブルと、
前記圧縮対象のデータ列の注目位置からデータを読み進め、注目位置のデータとは異なるデータが現れた場合に、前記異なるデータと前記移動距離テーブルとを基にして、前記異なるデータを移動させ得る移動距離を判定する移動距離判定部と、
前記異なるデータが現れた位置から前記移動距離判定部が判定した移動距離を超えない範囲に、前記異なるデータと同じデータが存在しない場合には、前記注目位置を前記異なるデータが現れた位置に移動させ、
前記異なるデータが現れた位置から前記移動距離判定部が判定した移動距離を超えない範囲に、前記異なるデータと同じデータが存在する場合には、該同じデータの次のデータと前記異なるデータとを入れ替え、圧縮対象のデータ列の先頭の原点から入れ替えたデータまでの距離と、入れ替えたデータ間の距離とを履歴テーブルに格納し、入れ替えたデータの位置に前記原点と前記注目位置とを移動させる置換処理部と
を備えたことを特徴とする圧縮装置。 (Supplementary Note 1) A movement distance table that stores data and a threshold of a movement distance that can move the data according to the appearance frequency of the data that determines whether or not to replace the data;
The data is read from the target position of the data string to be compressed, and when different data from the data at the target position appears, the different data can be moved based on the different data and the moving distance table. A movement distance determination unit for determining a movement distance;
If the same data as the different data does not exist within the range that does not exceed the movement distance determined by the movement distance determination unit from the position where the different data appears, the target position is moved to the position where the different data appears. Let
When the same data as the different data exists within a range not exceeding the movement distance determined by the movement distance determination unit from the position where the different data appears, the next data of the same data and the different data are The distance from the first origin of the data sequence to be replaced and compressed to the replaced data and the distance between the replaced data are stored in the history table, and the origin and the target position are moved to the position of the replaced data. A compression apparatus comprising: a replacement processing unit.

（付記２）データと、該データを入れ替えるか否かを決める該データの出現頻度に応じた該データを移動させ得る移動距離の閾値と、を対応付けて記憶する移動距離テーブルを保持する圧縮装置が、
前記圧縮対象のデータ列の注目位置からデータを読み進め、注目位置のデータとは異なるデータが現れた場合に、前記異なるデータと前記移動距離テーブルとを基にして、前記異なるデータを移動させ得る移動距離を判定する移動距離判定ステップと、
前記異なるデータが現れた位置から前記移動距離判定ステップが判定した移動距離を超えない範囲に、前記異なるデータと同じデータが存在しない場合には、前記注目位置を前記異なるデータが現れた位置に移動させ、
前記異なるデータが現れた位置から前記移動距離判定ステップが判定した移動距離を超えない範囲に、前記異なるデータと同じデータが存在する場合には、該同じデータの次のデータと前記異なるデータとを入れ替え、圧縮対象のデータ列の先頭の原点から入れ替えたデータまでの距離と、入れ替えたデータ間の距離とを履歴テーブルに格納し、入れ替えたデータの位置に前記原点と前記注目位置とを移動させる置換処理ステップと
を実行することを特徴とする圧縮方法。 (Additional remark 2) The compression apparatus which hold | maintains the movement distance table which matches and memorize | stores data and the threshold value of the movement distance which can move this data according to the appearance frequency of this data which determines whether this data is replaced | exchanged But,
The data is read from the target position of the data string to be compressed, and when different data from the data at the target position appears, the different data can be moved based on the different data and the moving distance table. A moving distance determining step for determining a moving distance;
If the same data as the different data does not exist within the range that does not exceed the movement distance determined in the movement distance determination step from the position where the different data appears, the target position is moved to the position where the different data appears. Let
When the same data as the different data exists within a range not exceeding the movement distance determined by the movement distance determination step from the position where the different data appears, the next data of the same data and the different data are The distance from the first origin of the data sequence to be replaced and compressed to the replaced data and the distance between the replaced data are stored in the history table, and the origin and the target position are moved to the position of the replaced data. A compression method comprising: performing a replacement processing step.

（付記３）データと、該データを入れ替えるか否かを決める該データの出現頻度に応じた該データを移動させ得る移動距離の閾値と、を対応付けて記憶する移動距離テーブルを保持するコンピュータに、
前記圧縮対象のデータ列の注目位置からデータを読み進め、注目位置のデータとは異なるデータが現れた場合に、前記異なるデータと前記移動距離テーブルとを基にして、前記異なるデータを移動させ得る移動距離を判定する移動距離判定手順と、
前記異なるデータが現れた位置から前記移動距離判定手順が判定した移動距離を超えない範囲に、前記異なるデータと同じデータが存在しない場合には、前記注目位置を前記異なるデータが現れた位置に移動させ、
前記異なるデータが現れた位置から前記移動距離判定手順が判定した移動距離を超えない範囲に、前記異なるデータと同じデータが存在する場合には、該同じデータの次のデータと前記異なるデータとを入れ替え、圧縮対象のデータ列の先頭の原点から入れ替えたデータまでの距離と、入れ替えたデータ間の距離とを履歴テーブルに格納し、入れ替えたデータの位置に前記原点と前記注目位置とを移動させる置換処理手順と
を実行させることを特徴とする圧縮プログラム。 (Supplementary Note 3) In a computer that holds a movement distance table that stores data and a threshold of a movement distance that can move the data according to the appearance frequency of the data that determines whether or not to replace the data. ,
The data is read from the target position of the data string to be compressed, and when different data from the data at the target position appears, the different data can be moved based on the different data and the moving distance table. A moving distance determination procedure for determining a moving distance;
If the same data as the different data does not exist within the range that does not exceed the moving distance determined by the moving distance determination procedure from the position where the different data appears, the target position is moved to the position where the different data appears. Let
When the same data as the different data exists within a range not exceeding the movement distance determined by the movement distance determination procedure from the position where the different data appears, the next data of the same data and the different data are The distance from the first origin of the data sequence to be replaced and compressed to the replaced data and the distance between the replaced data are stored in the history table, and the origin and the target position are moved to the position of the replaced data. A compression program characterized by causing a replacement processing procedure to be executed.

（付記４）付記１に記載の履歴テーブルに含まれる前記原点から入れ替えたデータまでの距離を基にして、前記原点を算出する原点算出部と、
前記原点と、前記履歴テーブルに含まれる前記原点から入れ替えたデータまでの距離と、前記入れ替えたデータ間の距離とを基にして、入れ替えられたデータの組を判定するデータ判定部と、
前記データ判定部によって判定された同一の組のデータを入れ替えることでデータ列を復元する復元部と
を備えたことを特徴とする復元装置。 (Supplementary Note 4) An origin calculation unit that calculates the origin based on the distance from the origin to the replaced data included in the history table described in Supplementary Note 1,
A data determination unit that determines a set of replaced data based on the origin, a distance from the origin included in the history table to the replaced data, and a distance between the replaced data;
A restoration apparatus comprising: a restoration unit that restores a data string by replacing the same set of data determined by the data determination unit.

（付記５）復元装置が、
付記１に記載の履歴テーブルに含まれる前記原点から入れ替えたデータまでの距離を基にして、前記原点を算出する原点算出ステップと、
前記原点と、前記履歴テーブルに含まれる前記原点から入れ替えたデータまでの距離と、前記入れ替えたデータ間の距離とを基にして、入れ替えられたデータの組を判定するデータ判定ステップと、
前記データ判定ステップによって判定された同一の組のデータを入れ替えることでデータ列を復元する復元ステップと
を実行することを特徴とする復元方法。 (Appendix 5) The restoration device is
An origin calculation step for calculating the origin based on a distance from the origin to the replaced data included in the history table according to appendix 1,
A data determination step of determining a set of replaced data based on the origin, a distance from the origin included in the history table to the replaced data, and a distance between the replaced data;
A restoration method comprising: performing a restoration step of restoring a data string by replacing the same set of data determined by the data determination step.

（付記６）コンピュータに、
付記１に記載の履歴テーブルに含まれる前記原点から入れ替えたデータまでの距離を基にして、前記原点を算出する原点算出手順と、
前記原点と、前記履歴テーブルに含まれる前記原点から入れ替えたデータまでの距離と、前記入れ替えたデータ間の距離とを基にして、入れ替えられたデータの組を判定するデータ判定手順と、
前記データ判定手順によって判定された同一の組のデータを入れ替えることでデータ列を復元する復元手順と
を実行させることを特徴とする復元プログラム。 (Appendix 6)
An origin calculation procedure for calculating the origin based on the distance from the origin to the replaced data included in the history table described in Supplementary Note 1,
A data determination procedure for determining a set of replaced data based on the origin, a distance from the origin included in the history table to the replaced data, and a distance between the replaced data;
And a restoration procedure for restoring a data string by exchanging the same set of data determined by the data determination procedure.

１００データ圧縮復元装置
１１０入力部
１２０出力部
１３０入出力制御部
１４０記憶部
１４１入力ファイル
１４２戻り距離閾値表
１４３置換履歴表
１４４出力ファイル
１５０圧縮部
１５１閾値表生成部
１５２文字列変換部
１５３Ｒ_ＬＥ符号化部
１６０復元部
１６１Ｒ_ＬＥ復号化部
１６２文字列逆変換部
２００コンピュータ
２０１ＣＰＵ
２０２入力装置
２０３モニタ
２０４媒体読み取り装置
２０５ネットワークインターフェース装置
２０６ａ圧縮プロセス
２０６ｂ復元プロセス
２０６ｃ各種データ
２０７ハードディスク装置
２０７ａ圧縮プログラム
２０７ｂ復元プログラム
２０７ｃ各種データ
２０８バス 100 Data compression / decompression apparatus 110 Input unit 120 Output unit 130 Input / output control unit 140 Storage unit 141 Input file 142 Return distance threshold table 143 Replacement history table 144 Output file 150 Compression unit 151 Threshold table generation unit 152 Character string conversion unit 153 R _LE Encoding unit 160 Restoration unit 161 R _LE decoding unit 162 Character string reverse conversion unit 200 Computer 201 CPU
202 Input Device 203 Monitor 204 Medium Reading Device 205 Network Interface Device 206a Compression Process 206b Restoration Process 206c Various Data 207 Hard Disk Device 207a Compression Program 207b Restoration Program 207c Various Data 208 Bus

Claims

A movement distance table that associates and stores data and a threshold of a movement distance that can move the data according to the appearance frequency of the data that determines whether or not to replace the data;
The data is read from the target position of the data string to be compressed, and when different data from the data at the target position appears, the different data can be moved based on the different data and the moving distance table. A movement distance determination unit for determining a movement distance;
If the same data as the different data does not exist within the range that does not exceed the movement distance determined by the movement distance determination unit from the position where the different data appears, the target position is moved to the position where the different data appears. Let
When the same data as the different data exists within a range not exceeding the movement distance determined by the movement distance determination unit from the position where the different data appears, the next data of the same data and the different data are The distance from the first origin of the data sequence to be replaced and compressed to the replaced data and the distance between the replaced data are stored in the history table, and the origin and the target position are moved to the position of the replaced data. A compression apparatus comprising: a replacement processing unit.

A compression apparatus that holds a movement distance table that stores data and a threshold of a movement distance that can move the data according to the appearance frequency of the data that determines whether or not to replace the data.
The data is read from the target position of the data string to be compressed, and when different data from the data at the target position appears, the different data can be moved based on the different data and the moving distance table. A moving distance determining step for determining a moving distance;
If the same data as the different data does not exist within the range that does not exceed the movement distance determined in the movement distance determination step from the position where the different data appears, the target position is moved to the position where the different data appears. Let
When the same data as the different data exists within a range not exceeding the movement distance determined by the movement distance determination step from the position where the different data appears, the next data of the same data and the different data are The distance from the first origin of the data sequence to be replaced and compressed to the replaced data and the distance between the replaced data are stored in the history table, and the origin and the target position are moved to the position of the replaced data. A compression method comprising: performing a replacement processing step.

A computer that holds a movement distance table that associates and stores data and a threshold of a movement distance that can move the data according to the appearance frequency of the data that determines whether to replace the data,
The data is read from the target position of the data string to be compressed, and when different data from the data at the target position appears, the different data can be moved based on the different data and the moving distance table. A moving distance determination procedure for determining a moving distance;
If the same data as the different data does not exist within the range that does not exceed the moving distance determined by the moving distance determination procedure from the position where the different data appears, the target position is moved to the position where the different data appears. Let
When the same data as the different data exists within a range not exceeding the movement distance determined by the movement distance determination procedure from the position where the different data appears, the next data of the same data and the different data are The distance from the first origin of the data sequence to be replaced and compressed to the replaced data and the distance between the replaced data are stored in the history table, and the origin and the target position are moved to the position of the replaced data. A compression program characterized by causing a replacement processing procedure to be executed.

An origin calculation unit that calculates the origin based on a distance from the origin to the replaced data included in the history table according to claim 1;
A data determination unit that determines a set of replaced data based on the origin, a distance from the origin included in the history table to the replaced data, and a distance between the replaced data;
A restoration apparatus comprising: a restoration unit that restores a data string by replacing the same set of data determined by the data determination unit.