JP2018007236A

JP2018007236A - Compression program, restoration program, compression method, restoration method, and information processing device

Info

Publication number: JP2018007236A
Application number: JP2017023626A
Authority: JP
Inventors: 鷹詔中尾; Takanori Nakao
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2016-06-22
Filing date: 2017-02-10
Publication date: 2018-01-11
Anticipated expiration: 2037-02-10
Also published as: JP6972568B2

Abstract

PROBLEM TO BE SOLVED: To improve compression efficiency.SOLUTION: An information processing device 100 extracts prescribed delimiters and a character string sandwiched by two prescribed delimiters from compression object data 110. The information processing device 100 generates first data 121 obtained by arranging extracted delimiters so as to be able to specify a sequence in which the extracted delimiters appear in the compression object data 110. The information processing device 100 generates second data 122 obtained by arranging the extracted character strings so as to be able to specify a sequence in which character strings sandwiched by two delimiters of a combination of types appear in the compression object data 110 in each of the types of combinations of two delimiters. The information processing device 100 performs compression using a slide dictionary method of the generated first data 121 and second data 122.SELECTED DRAWING: Figure 1

Description

本発明は、圧縮プログラム、復元プログラム、圧縮方法、復元方法、および情報処理装置に関する。 The present invention relates to a compression program, a restoration program, a compression method, a restoration method, and an information processing apparatus.

従来、データを圧縮してデータのサイズを低減する圧縮技術がある。圧縮技術には、例えば、圧縮後のデータを圧縮前のデータに復元することができる可逆圧縮（ロスレス圧縮）技術がある。可逆圧縮技術には、例えば、「ＬＺ７７（Ｌｅｍｐｅｌ−Ｚｉｖ７７）」という圧縮アルゴリズムや「ＬＺＭＡ（Ｌｅｍｐｅｌ−Ｚｉｖ−Ｍａｒｋｏｖｃｈａｉｎ−Ａｌｇｏｒｉｔｈｍ）」という圧縮アルゴリズムを使用する圧縮技術などがある。 Conventionally, there is a compression technique for compressing data to reduce the size of the data. The compression technique includes, for example, a lossless compression technique that can restore the compressed data to the data before compression. The lossless compression technique includes, for example, a compression technique using a compression algorithm “LZ77 (Lempel-Ziv 77)” and a compression algorithm “LZMA (Lempel-Ziv-Markov chain-Algorithm)”.

関連する先行技術としては、例えば、情報ファイルの複数のレコードを区分データが共通する集合に分類し、１つのレコードを構成する複数の区分データを区分の種類ごとに並び換えることにより圧縮を行うものがある。また、例えば、ログデータを取得し、時刻情報を抽出し、時刻情報を所定のデータ幅ごとに分割することで、複数のバイト群を生成し、バイト群ごとに前後の値を比較して、差分値を算出することで差分値情報を生成し、差分値情報を、ＬＺ７７方式によって圧縮する技術がある。 As related prior art, for example, compression is performed by classifying a plurality of records in an information file into a set having a common classification data, and rearranging a plurality of classification data constituting one record for each classification type. There is. In addition, for example, log data is acquired, time information is extracted, time information is divided into predetermined data widths, a plurality of byte groups are generated, and the values before and after each byte group are compared, There is a technique in which difference value information is generated by calculating a difference value, and the difference value information is compressed by the LZ77 method.

特開平１１−６５９０２号公報JP-A-11-65902 特開２０１２−２３５２８９号公報JP 2012-235289 A

しかしながら、上述した従来技術では、圧縮効率が悪くなる場合がある。例えば、圧縮対象データにおいて同じ文字列が比較的近い位置に連続して現れる場合に比べて、同じ文字列が比較的離れた位置に連続して現れる場合には、圧縮後のデータのサイズが大きくなってしまう傾向がある。 However, in the above-described conventional technology, the compression efficiency may deteriorate. For example, compared with the case where the same character string appears continuously at a relatively close position in the compression target data, the size of the compressed data is larger when the same character string appears continuously at a relatively distant position. There is a tendency to become.

１つの側面では、本発明は、圧縮効率の向上を図ることができる圧縮プログラム、復元プログラム、圧縮方法、復元方法、および情報処理装置を提供することを目的とする。 In one aspect, an object of the present invention is to provide a compression program, a decompression program, a compression method, a decompression method, and an information processing apparatus that can improve compression efficiency.

本発明の一側面によれば、圧縮対象データから、所定の区切り文字と、２つの前記区切り文字に挟まれた文字列とを抽出し、前記区切り文字が前記圧縮対象データに出現した順序を特定可能に前記区切り文字を並べた第１のデータと、２つの前記区切り文字の組み合わせの種類ごとに、当該種類の前記組み合わせの２つの前記区切り文字に挟まれた前記文字列が前記圧縮対象データに出現した順序を特定可能に前記文字列を並べた第２のデータとを生成し、生成した前記第１のデータと前記第２のデータとに対してスライド辞書法を用いた圧縮を行う圧縮プログラム、圧縮方法、および情報処理装置が提案される。 According to one aspect of the present invention, a predetermined delimiter character and a character string sandwiched between two delimiters are extracted from the compression target data, and the order in which the delimiter characters appear in the compression target data is specified. For each type of combination of the first data in which the delimiters are arranged and two delimiters, the character string sandwiched between the two delimiters of the combination of the types is included in the compression target data. A compression program that generates second data in which the character strings are arranged so that the order of appearance can be specified, and compresses the generated first data and the second data using a slide dictionary method A compression method and an information processing apparatus are proposed.

また、本発明の一側面によれば、所定の区切り文字が圧縮対象データに出現した順序を特定可能に前記区切り文字を並べた第１のデータと、２つの前記区切り文字の組み合わせの種類ごとに、当該種類の前記組み合わせの２つの前記区切り文字に挟まれた文字列が前記圧縮対象データに出現した順序を特定可能に前記文字列を並べた第２のデータとにスライド辞書法を用いた圧縮を行って得られた圧縮データを復号し、復号によって得られた前記第２のデータを参照し、復号によって得られた前記第１のデータにおける２つの前記区切り文字の間に、当該２つの前記区切り文字の前記組み合わせの種類に対応付けられた前記文字列を順次挿入することにより、前記圧縮対象データを生成する復元プログラム、復元方法、および情報処理装置が提案される。 In addition, according to one aspect of the present invention, for each type of combination of the first data in which the delimiters are arranged so that the order in which the predetermined delimiters appear in the compression target data can be specified, and the two delimiters The compression using the slide dictionary method with the second data in which the character strings are arranged so that the order in which the character string sandwiched between the two delimiters of the combination of the type appears in the compression target data can be specified The compressed data obtained by performing the decoding, referring to the second data obtained by decoding, and between the two delimiters in the first data obtained by decoding A restoration program, a restoration method, and an information processing apparatus for generating the compression target data by sequentially inserting the character strings associated with the combination types of delimiters The draft.

本発明の一態様によれば、圧縮効率の向上を図ることができるという効果を奏する。 According to one aspect of the present invention, there is an effect that compression efficiency can be improved.

図１は、実施の形態にかかる圧縮方法の一実施例を示す説明図である。FIG. 1 is an explanatory diagram of an example of the compression method according to the embodiment. 図２は、情報処理装置１００のハードウェア構成例を示すブロック図である。FIG. 2 is a block diagram illustrating a hardware configuration example of the information processing apparatus 100. 図３は、情報処理装置１００の機能的構成例を示すブロック図である。FIG. 3 is a block diagram illustrating a functional configuration example of the information processing apparatus 100. 図４は、圧縮対象データ４００を圧縮する一例を示す説明図（その１）である。FIG. 4 is an explanatory diagram (part 1) illustrating an example of compressing the compression target data 400. 図５は、圧縮対象データ４００を圧縮する一例を示す説明図（その２）である。FIG. 5 is an explanatory diagram (part 2) illustrating an example of compressing the compression target data 400. 図６は、圧縮対象データ４００を圧縮する一例を示す説明図（その３）である。FIG. 6 is an explanatory diagram (part 3) illustrating an example of compressing the compression target data 400. 図７は、圧縮対象データ４００を圧縮する一例を示す説明図（その４）である。FIG. 7 is an explanatory diagram (part 4) illustrating an example of compressing the compression target data 400. 図８は、圧縮対象データ４００を圧縮する一例を示す説明図（その５）である。FIG. 8 is an explanatory diagram (part 5) illustrating an example of compressing the compression target data 400. 図９は、圧縮対象データ４００を圧縮する一例を示す説明図（その６）である。FIG. 9 is an explanatory diagram (part 6) illustrating an example of compressing the compression target data 400. 図１０は、圧縮対象データ４００を圧縮する一例を示す説明図（その７）である。FIG. 10 is an explanatory diagram (part 7) illustrating an example of compressing the compression target data 400. 図１１は、圧縮対象データ４００を圧縮する一例を示す説明図（その８）である。FIG. 11 is an explanatory diagram (part 8) illustrating an example of compressing the compression target data 400. 図１２は、圧縮対象データ４００を復元する一例を示す説明図（その１）である。FIG. 12 is an explanatory diagram (part 1) illustrating an example of restoring the compression target data 400. 図１３は、圧縮対象データ４００を復元する一例を示す説明図（その２）である。FIG. 13 is an explanatory diagram (part 2) illustrating an example of restoring the compression target data 400. 図１４は、圧縮対象データ４００を復元する一例を示す説明図（その３）である。FIG. 14 is an explanatory diagram (part 3) illustrating an example of restoring the compression target data 400. 図１５は、圧縮対象データ４００を復元する一例を示す説明図（その４）である。FIG. 15 is an explanatory diagram (part 4) illustrating an example of restoring the compression target data 400. 図１６は、圧縮対象データ４００を復元する一例を示す説明図（その５）である。FIG. 16 is an explanatory diagram (part 5) illustrating an example of restoring the compression target data 400. 図１７は、圧縮対象データ４００を復元する一例を示す説明図（その６）である。FIG. 17 is an explanatory diagram (part 6) illustrating an example of restoring the compression target data 400. 図１８は、圧縮対象データ４００を復元する一例を示す説明図（その７）である。FIG. 18 is an explanatory diagram (part 7) illustrating an example of restoring the compression target data 400. 図１９は、圧縮対象データ４００を復元する一例を示す説明図（その８）である。FIG. 19 is an explanatory diagram (part 8) illustrating an example of restoring the compression target data 400. 図２０は、圧縮対象データ４００を復元する一例を示す説明図（その９）である。FIG. 20 is an explanatory diagram (No. 9) illustrating an example of restoring the compression target data 400. 図２１は、圧縮対象データ４００を復元する一例を示す説明図（その１０）である。FIG. 21 is an explanatory diagram (part 10) illustrating an example of restoring the compression target data 400. 図２２は、圧縮対象データ４００を復元する一例を示す説明図（その１１）である。FIG. 22 is an explanatory diagram (part 11) illustrating an example of restoring the compression target data 400. 図２３は、圧縮対象データ４００を復元する一例を示す説明図（その１２）である。FIG. 23 is an explanatory diagram (part 12) illustrating an example of restoring the compression target data 400. 図２４は、圧縮処理手順の一例を示すフローチャートである。FIG. 24 is a flowchart illustrating an example of the compression processing procedure. 図２５は、復元処理手順の一例を示すフローチャートである。FIG. 25 is a flowchart illustrating an example of the restoration processing procedure.

以下に、図面を参照して、本発明にかかる圧縮プログラム、復元プログラム、圧縮方法、復元方法、および情報処理装置の実施の形態を詳細に説明する。 Hereinafter, embodiments of a compression program, a decompression program, a compression method, a decompression method, and an information processing apparatus according to the present invention will be described in detail with reference to the drawings.

（実施の形態にかかる圧縮方法の一実施例）
図１は、実施の形態にかかる圧縮方法の一実施例を示す説明図である。情報処理装置１００は、スライド辞書法を用いた圧縮を行うコンピュータである。スライド辞書法を用いた圧縮は、例えば、「ＬＺ７７」による圧縮や、「ＬＺＭＡ」による圧縮などである。 (One Example of Compression Method According to Embodiment)
FIG. 1 is an explanatory diagram of an example of the compression method according to the embodiment. The information processing apparatus 100 is a computer that performs compression using a slide dictionary method. Examples of compression using the slide dictionary method include compression by “LZ77” and compression by “LZMA”.

例えば、スライド辞書法を用いた圧縮では、圧縮対象データ１１０の中に同じ文字列が複数出現する場合に、後に出現する文字列を、先に出現する同じ文字列の開始位置と長さとを示すデータに変換して、圧縮対象データ１１０を圧縮することが考えられる。 For example, in the compression using the slide dictionary method, when the same character string appears in the compression target data 110, the character string that appears later indicates the start position and length of the same character string that appears first. It is conceivable to convert the data to be compressed 110 by converting into data.

しかしながら、この場合、圧縮対象データ１１０の内容によって圧縮効率が大きく変化してしまう傾向がある。例えば、圧縮対象データ１１０において同じ文字列が比較的近い位置に連続して現れる場合に比べて、同じ文字列が比較的離れた位置に連続して現れる場合には、圧縮後のデータのサイズが小さくなりにくく、圧縮効率が悪くなってしまう傾向がある。 However, in this case, the compression efficiency tends to vary greatly depending on the content of the compression target data 110. For example, when the same character string appears continuously at a relatively distant position compared to the case where the same character string continuously appears at a relatively close position in the compression target data 110, the size of the compressed data is reduced. It tends to be difficult to reduce, and compression efficiency tends to deteriorate.

具体的には、スライド辞書法を用いた圧縮では、後に出現する文字列より前にあるスライド窓と呼ばれる参照範囲に同じ文字列があれば、後に出現する文字列を先に出現する同じ文字列の開始位置と長さとを示すデータに変換する。このとき、同じ文字列が比較的離れた位置に出現すると、後に出現する文字列より前にある参照範囲に同じ文字列がなく、後に出現する文字列を先に出現する同じ文字列の開始位置と長さとを示すデータに変換することができない場合がある。このため、圧縮後のデータのサイズが小さくなりにくく、圧縮効率が悪くなってしまう傾向がある。 Specifically, in the compression using the slide dictionary method, if there is the same character string in a reference range called a slide window preceding the character string that appears later, the same character string that appears first after the character string that appears later Is converted into data indicating the start position and length. At this time, if the same character string appears at a relatively distant position, there is no same character string in the reference range before the character string that appears later, and the start position of the same character string that appears first after the character string that appears later In some cases, the data cannot be converted into data indicating the length. For this reason, the size of the data after compression tends not to be small, and the compression efficiency tends to deteriorate.

これに対し、後に出現する文字列を、参照範囲を広げて、比較的離れた位置に先に出現する同じ文字列の開始位置と長さとを示すデータに変換しようとすると、開始位置を示すために用いるビット数の増大化を招いてしまう。このため、後に出現する文字列を、先に出現する同じ文字列の開始位置と長さとを示すデータに変換しても、圧縮後のデータのサイズが小さくなりにくく、圧縮効率が悪くなってしまう傾向がある。 On the other hand, if the character string that appears later is converted to data indicating the start position and length of the same character string that appears earlier at a relatively distant position by expanding the reference range, the start position is indicated. This increases the number of bits used for. For this reason, even if a character string that appears later is converted into data indicating the start position and length of the same character string that appears earlier, the size of the data after compression is difficult to reduce, and compression efficiency deteriorates. Tend.

一方で、同じ文字列が比較的近い位置に出現すると、後に出現する文字列より前にある参照範囲に同じ文字列があり、後に出現する文字列を先に出現する同じ文字列の開始位置と長さとを示すデータに変換することができる可能性がある。このため、圧縮効率が悪くなりにくい傾向がある。 On the other hand, when the same character string appears at a relatively close position, there is the same character string in the reference range before the character string that appears later, and the character string that appears later is the start position of the same character string that appears first. There is a possibility that it can be converted into data indicating the length. For this reason, the compression efficiency tends not to deteriorate.

また、同じ文字列が比較的近い位置に出現すれば、後に出現する文字列を、参照範囲を比較的狭くしても、先に出現する同じ文字列の開始位置と長さとを示すデータに変換することができる可能性があり、開始位置を示すために用いるビット数は比較的小さくなる。このため、後に出現する文字列を、先に出現する文字列の開始位置と長さとを示すデータに変換しても、圧縮効率が悪くなりにくい傾向がある。 If the same character string appears in a relatively close position, the character string that appears later is converted into data indicating the start position and length of the same character string that appears earlier even if the reference range is relatively narrow. The number of bits used to indicate the starting position is relatively small. For this reason, even if a character string that appears later is converted into data indicating the start position and length of the character string that appears earlier, the compression efficiency tends to be less likely to deteriorate.

そこで、本実施の形態では、圧縮対象データ１１０を基に、同じ文字列が比較的近い位置に連続して現れやすく、かつ、圧縮対象データ１１０に戻すことが可能なデータを生成してから、スライド辞書法を用いた圧縮を行うことができる圧縮方法について説明する。 Therefore, in the present embodiment, based on the compression target data 110, the same character string is likely to appear continuously in relatively close positions, and data that can be returned to the compression target data 110 is generated. A compression method capable of performing compression using the slide dictionary method will be described.

具体的には、圧縮対象データ１１０が所定の区切り文字で区切られた一定の形式に沿ったデータの集まりである場合、圧縮対象データ１１０において同じ種類の組み合わせの２つの区切り文字で挟まれる複数の文字列は、同じ文字列になる可能性がある。そこで、本実施の形態では、圧縮対象データ１１０を基に、２つの区切り文字の組み合わせの種類を用いて、同じ文字列が比較的近い位置に連続して現れやすいデータを生成することにより、圧縮効率の向上を図ることになる。 Specifically, when the compression target data 110 is a collection of data along a certain format separated by a predetermined delimiter character, a plurality of data sandwiched between two delimiters of the same type combination in the compression target data 110 Strings can be the same string. Therefore, in the present embodiment, based on the compression target data 110, the type of combination of two delimiters is used to generate data that is likely to appear continuously at relatively close positions, thereby compressing the data. Efficiency will be improved.

図１の例では、情報処理装置１００は、圧縮対象データ１１０を受け付ける。圧縮対象データ１１０とは、所定の区切り文字を含むデータである。圧縮対象データ１１０は、例えば、所定の区切り文字で区切られた一定の形式に沿ったログが集約され、同じ文字列が繰り返し出現するログデータであることが好ましい。図１の例では、圧縮対象データ１１０は、「＄１１：０３＃ｌｏａｄ：＃ｊｅｒａｓｕｒｅ＃ｌｏａｄ：＃ｌｒｃ＃ｌｏａｄ：＃ｉｓａ＄１１：０３＃ｍｏｎｍａｐ＄１１：０３＃ａｄｄｉｎｇ＃ａｕｔｈ＃ｐｒｏｔｏｃｏｌ：＃ｎｏｎｅ＄」である。 In the example of FIG. 1, the information processing apparatus 100 receives compression target data 110. The compression target data 110 is data including a predetermined delimiter. The compression target data 110 is preferably log data in which, for example, logs along a certain format divided by a predetermined delimiter are aggregated, and the same character string appears repeatedly. In the example of FIG. 1, the compression target data 110 includes “$ 11: 03 # load: # jerase # load: # lrc # load: # isa $ 11: 03 # monmap $ 11: 03 # adding # auth # protocol: # none $ ".

区切り文字とは、圧縮対象データ１１０に含まれる文字列を便宜上区切るために用いられる文字である。区切り文字は、例えば、数字やアルファベット以外の文字である。区切り文字は、ユーザからの操作入力によって設定されてもよい。区切り文字は、例えば、１文字である。区切り文字は、例えば、文字数が２以上の文字列を含んでもよい。区切り文字は、例えば、制御文字を含んでもよい。 The delimiter is a character used to delimit a character string included in the compression target data 110 for convenience. The delimiter is, for example, a character other than a number or alphabet. The delimiter may be set by an operation input from the user. The delimiter is, for example, one character. The delimiter may include, for example, a character string having two or more characters. The delimiter may include a control character, for example.

（１−１）情報処理装置１００は、圧縮対象データ１１０から、所定の区切り文字と、２つの所定の区切り文字に挟まれた文字列とを抽出する。２つの所定の区切り文字に挟まれた文字列は、例えば、区切り文字を含まない文字列であるとする。情報処理装置１００は、例えば、所定の区切り文字「＄：＃：＃＃：＃＃：＃＄：＃＄：＃＃＃：＃＄」を抽出する。情報処理装置１００は、例えば、文字列「１１１１１１０３０３０３ｍｏｎｍａｐａｄｄｉｎｇａｕｔｈｌｏａｄｌｏａｄｌｏａｄｐｒｏｔｏｃｏｌｉｓａｎｏｎｅｊｅｒａｓｕｒｅｌｒｃ」などを抽出する。 (1-1) The information processing apparatus 100 extracts a predetermined delimiter character and a character string sandwiched between two predetermined delimiter characters from the compression target data 110. A character string sandwiched between two predetermined delimiters is, for example, a character string that does not include a delimiter. The information processing apparatus 100 extracts, for example, a predetermined delimiter “$: #: ##: ##: # $: # $: ####: # $”. The information processing apparatus 100 extracts, for example, a character string “11 11 11 03 03 03 monmap adding auth load load protocol protocol isone none jerase lrc”.

（１−２）情報処理装置１００は、抽出した区切り文字が圧縮対象データ１１０に出現した順序を特定可能に、抽出した区切り文字を並べた第１のデータ１２１を生成する。情報処理装置１００は、例えば、抽出した区切り文字を出現順に並べたデータを、第１のデータ１２１として生成する。第１のデータ１２１は、区切り文字が圧縮対象データ１１０に出現した順序を特定可能であればよく、抽出した区切り文字を出現順の逆に並べたデータであってもよい。 (1-2) The information processing apparatus 100 generates first data 121 in which the extracted delimiters are arranged so that the order in which the extracted delimiters appear in the compression target data 110 can be specified. For example, the information processing apparatus 100 generates, as the first data 121, data in which the extracted delimiters are arranged in the order of appearance. The first data 121 only needs to be able to specify the order in which delimiters appear in the compression target data 110, and may be data in which the extracted delimiters are arranged in reverse order of appearance.

（１−３）情報処理装置１００は、２つの区切り文字の組み合わせの種類ごとに、抽出した当該種類の組み合わせの２つの区切り文字に挟まれた文字列が圧縮対象データ１１０に出現した順序を特定可能に、抽出した文字列を並べた第２のデータ１２２を生成する。組み合わせは、区切り文字を組み合わせる順番を考慮した組み合わせである。組み合わせは、例えば、区切り文字「：」と「＃」との組み合わせと、区切り文字「＃」と「：」との組み合わせとを、異なる組み合わせとして含んでもよい。また、組み合わせは、区切り文字を組み合わせる順番を考慮しない組み合わせであってもよい。 (1-3) The information processing apparatus 100 specifies, for each type of combination of two delimiters, the order in which the character string sandwiched between the two delimiters of the extracted combination of the type appears in the compression target data 110 The second data 122 in which the extracted character strings are arranged is generated. The combination is a combination that considers the order in which the delimiters are combined. The combination may include, for example, a combination of delimiters “:” and “#” and a combination of delimiters “#” and “:” as different combinations. Further, the combination may be a combination that does not consider the order of combining the delimiters.

種類は、２つの区切り文字を組み合わせたパターンである。種類は、複数のパターンをまとめたものであってもよい。情報処理装置１００は、例えば、区切り文字「：」と「＃」との組み合わせのパターンと、区切り文字「＃」と「：」との組み合わせのパターンとを、同じ種類の組み合わせとして扱ってもよい。 The type is a pattern in which two delimiters are combined. The type may be a combination of a plurality of patterns. For example, the information processing apparatus 100 may treat a combination pattern of delimiters “:” and “#” and a combination pattern of delimiters “#” and “:” as the same type of combination. .

情報処理装置１００は、例えば、２つの区切り文字の組み合わせの種類ごとに区切って、抽出した文字列を出現順に並べたデータを、第２のデータ１２２として生成する。第２のデータ１２２は、文字列が圧縮対象データ１１０に出現した順序を特定可能であればよく、抽出した文字列を出現順の逆に並べたデータであってもよい。また、第２のデータ１２２は、２つの区切り文字の組み合わせの種類自体を示す情報を含んでいなくてもよい。第２のデータ１２２は、例えば、第１のデータ１２１から特定可能な２つの区切り文字の組み合わせの種類を利用することにより、各種類の組み合わせの２つの区切り文字に挟まれた文字列が圧縮対象データ１１０に出現した順序を特定可能であってもよい。 For example, the information processing apparatus 100 generates, as the second data 122, data in which the extracted character strings are arranged in the order of appearance by dividing each type of combination of two delimiters. The second data 122 only needs to be able to specify the order in which the character strings appear in the compression target data 110, and may be data in which the extracted character strings are arranged in reverse order of appearance. Further, the second data 122 may not include information indicating the type of combination of the two delimiters. The second data 122 uses, for example, a combination of two delimiters that can be specified from the first data 121, so that a character string sandwiched between two delimiters of each type of combination is a compression target. The order of appearance in the data 110 may be specified.

（１−４）情報処理装置１００は、生成した第１のデータ１２１と第２のデータ１２２とに対してスライド辞書法を用いた圧縮を行う。情報処理装置１００は、例えば、生成した第１のデータ１２１に第２のデータ１２２を連結し、ＬＺＭＡによる圧縮を行う。そして、情報処理装置１００は、圧縮により得られた圧縮データ１３０を出力する。情報処理装置１００は、第１のデータ１２１と第２のデータ１２２とを連結し、第１のデータ１２１については符号化されないように、ＬＺＭＡによる圧縮を行ってもよい。 (1-4) The information processing apparatus 100 performs compression using the slide dictionary method on the generated first data 121 and second data 122. For example, the information processing apparatus 100 concatenates the second data 122 to the generated first data 121 and performs compression by LZMA. Then, the information processing apparatus 100 outputs the compressed data 130 obtained by the compression. The information processing apparatus 100 may link the first data 121 and the second data 122 and perform compression by LZMA so that the first data 121 is not encoded.

これによれば、情報処理装置１００は、圧縮対象データ１１０に戻すことが可能である第１のデータ１２１と第２のデータ１２２とを生成することができ、第２のデータ１２２においては同じ文字列が比較的近い位置に連続して現れやすいようにすることができる。これにより、情報処理装置１００は、開始位置を示すために用いるビット数を少なくしやすく、また、同じ文字列を見つけやすく圧縮しやすいようにして、圧縮効率の向上を図ることができる。 According to this, the information processing apparatus 100 can generate the first data 121 and the second data 122 that can be returned to the compression target data 110, and the same character is used in the second data 122. It is possible to make it easy for the columns to appear continuously in relatively close positions. As a result, the information processing apparatus 100 can easily reduce the number of bits used to indicate the start position, and can easily find and compress the same character string, thereby improving the compression efficiency.

情報処理装置１００は、具体的には、所定の区切り文字で区切られた一定の形式に沿ったデータが集約されている圧縮対象データ１１０について圧縮効率の向上を図りやすくすることができる。このため、情報処理装置１００は、表計算ソフトウェアから出力される表データや、ログ収集ソフトウェアから出力されるログデータなどを圧縮対象データ１１０として用いる場合には、圧縮効率をより向上させることができる。 Specifically, the information processing apparatus 100 can easily improve the compression efficiency of the compression target data 110 in which data along a certain format divided by a predetermined delimiter is aggregated. For this reason, the information processing apparatus 100 can further improve the compression efficiency when the table data output from the spreadsheet software or the log data output from the log collection software is used as the compression target data 110. .

（情報処理装置１００のハードウェア構成例）
次に、図２を用いて、情報処理装置１００のハードウェア構成例について説明する。 (Hardware configuration example of information processing apparatus 100)
Next, a hardware configuration example of the information processing apparatus 100 will be described with reference to FIG.

図２は、情報処理装置１００のハードウェア構成例を示すブロック図である。図２において、情報処理装置１００は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）２０１と、メモリ２０２と、ネットワークＩ／Ｆ（Ｉｎｔｅｒｆａｃｅ）２０３と、ディスクドライブ２０４と、ディスク２０５と、記録媒体Ｉ／Ｆ２０６とを有する。また、各構成部は、バス２００によってそれぞれ接続される。 FIG. 2 is a block diagram illustrating a hardware configuration example of the information processing apparatus 100. In FIG. 2, the information processing apparatus 100 includes a CPU (Central Processing Unit) 201, a memory 202, a network I / F (Interface) 203, a disk drive 204, a disk 205, and a recording medium I / F 206. . Each component is connected by a bus 200.

ここで、ＣＰＵ２０１は、情報処理装置１００の全体の制御を司る。メモリ２０２は、例えば、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）およびフラッシュＲＯＭなどを有する。具体的には、例えば、フラッシュＲＯＭやＲＯＭが各種プログラムを記憶し、ＲＡＭがＣＰＵ２０１のワークエリアとして使用される。各種プログラムは、例えば、実施の形態にかかる圧縮プログラムや復元プログラムを含んでもよい。メモリ２０２に記憶されるプログラムは、ＣＰＵ２０１にロードされることで、コーディングされている処理をＣＰＵ２０１に実行させる。 Here, the CPU 201 governs overall control of the information processing apparatus 100. The memory 202 includes, for example, a ROM (Read Only Memory), a RAM (Random Access Memory), and a flash ROM. Specifically, for example, a flash ROM or ROM stores various programs, and a RAM is used as a work area of the CPU 201. The various programs may include, for example, a compression program and a restoration program according to the embodiment. The program stored in the memory 202 is loaded on the CPU 201 to cause the CPU 201 to execute the coded process.

ネットワークＩ／Ｆ２０３は、通信回線を通じてネットワーク２１０に接続され、ネットワーク２１０を介して他のコンピュータに接続される。そして、ネットワークＩ／Ｆ２０３は、ネットワーク２１０と内部のインターフェースを司り、他のコンピュータからのデータの入出力を制御する。ネットワークＩ／Ｆ２０３には、例えば、モデムやＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）アダプタなどを採用することができる。 The network I / F 203 is connected to the network 210 through a communication line, and is connected to another computer via the network 210. The network I / F 203 controls an internal interface with the network 210 and controls data input / output from other computers. For example, a modem or a LAN (Local Area Network) adapter may be employed as the network I / F 203.

ディスクドライブ２０４は、ＣＰＵ２０１の制御に従ってディスク２０５に対するデータのリード／ライトを制御する。ディスクドライブ２０４は、例えば、磁気ディスクドライブである。ディスク２０５は、ディスクドライブ２０４の制御で書き込まれたデータを記憶する不揮発メモリである。ディスク２０５は、例えば、磁気ディスク、光ディスクなどである。 The disk drive 204 controls reading / writing of data with respect to the disk 205 according to the control of the CPU 201. The disk drive 204 is, for example, a magnetic disk drive. The disk 205 is a non-volatile memory that stores data written under the control of the disk drive 204. The disk 205 is, for example, a magnetic disk or an optical disk.

記録媒体Ｉ／Ｆ２０６は、外部の記録媒体２０７に接続され、外部の記録媒体２０７と内部のインターフェースを司り、外部の記録媒体２０７に対するデータの入出力を制御する。記録媒体Ｉ／Ｆ２０６は、例えば、ＵＳＢ（ＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓ）ポートである。記録媒体２０７は、例えば、ＵＳＢメモリである。記録媒体２０７は、実施の形態にかかる圧縮プログラムや復元プログラムを記憶してもよい。 The recording medium I / F 206 is connected to the external recording medium 207, manages an internal interface with the external recording medium 207, and controls data input / output with respect to the external recording medium 207. The recording medium I / F 206 is, for example, a USB (Universal Serial Bus) port. The recording medium 207 is, for example, a USB memory. The recording medium 207 may store the compression program and decompression program according to the embodiment.

情報処理装置１００は、上述した構成部のほか、例えば、ＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）、半導体メモリ、キーボード、マウス、ディスプレイなどを有することにしてもよい。また、情報処理装置１００は、ディスクドライブ２０４およびディスク２０５の代わりに、ＳＳＤおよび半導体メモリなどを有していてもよい。情報処理装置１００は、具体的には、携帯電話、スマートフォン、タブレット端末、ＰＣ（ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒ）やサーバなどである。 In addition to the components described above, the information processing apparatus 100 may include, for example, an SSD (Solid State Drive), a semiconductor memory, a keyboard, a mouse, a display, and the like. Further, the information processing apparatus 100 may include an SSD, a semiconductor memory, and the like instead of the disk drive 204 and the disk 205. Specifically, the information processing apparatus 100 is a mobile phone, a smart phone, a tablet terminal, a PC (Personal Computer), a server, or the like.

（情報処理装置１００の機能的構成例）
次に、図３を用いて、情報処理装置１００の機能的構成例について説明する。 (Functional configuration example of information processing apparatus 100)
Next, a functional configuration example of the information processing apparatus 100 will be described with reference to FIG.

図３は、情報処理装置１００の機能的構成例を示すブロック図である。情報処理装置１００は、抽出部３０１と、生成部３０２と、圧縮部３０３と、復号部３０４と、復元部３０５とを含む。抽出部３０１〜復元部３０５は、制御部となる機能であり、具体的には、図２に示したメモリ２０２、ディスク２０５などの記憶領域に記憶されたプログラムをＣＰＵ２０１に実行させることにより、または、ネットワークＩ／Ｆ２０３により、その機能を実現する。各機能部の処理結果は、例えば、メモリ２０２、ディスク２０５などの記憶領域に記憶される。 FIG. 3 is a block diagram illustrating a functional configuration example of the information processing apparatus 100. The information processing apparatus 100 includes an extraction unit 301, a generation unit 302, a compression unit 303, a decoding unit 304, and a restoration unit 305. The extraction unit 301 to the restoration unit 305 are functions serving as a control unit. Specifically, by causing the CPU 201 to execute a program stored in a storage area such as the memory 202 and the disk 205 illustrated in FIG. The function is realized by the network I / F 203. The processing result of each functional unit is stored in a storage area such as the memory 202 and the disk 205, for example.

抽出部３０１は、圧縮対象データ１１０から、所定の区切り文字と、２つの所定の区切り文字に挟まれた文字列とを抽出する。圧縮対象データ１１０とは、所定の区切り文字を含むデータである。圧縮対象データ１１０は、例えば、所定の区切り文字で区切られた一定の形式に沿ったログが集約され、同じ文字列が繰り返し出現するログデータであることが好ましい。圧縮対象データ１１０は、具体的には、ユーザからの操作入力によって作成されたデータであってもよい。圧縮対象データ１１０は、具体的には、他のコンピュータから情報処理装置１００に送信され、圧縮を依頼されたデータであってもよい。 The extraction unit 301 extracts a predetermined delimiter character and a character string sandwiched between two predetermined delimiter characters from the compression target data 110. The compression target data 110 is data including a predetermined delimiter. The compression target data 110 is preferably log data in which, for example, logs along a certain format divided by a predetermined delimiter are aggregated, and the same character string appears repeatedly. More specifically, the compression target data 110 may be data created by an operation input from a user. Specifically, the compression target data 110 may be data transmitted from another computer to the information processing apparatus 100 and requested to be compressed.

区切り文字とは、圧縮対象データ１１０に含まれる文字列を便宜上区切るために用いられる文字である。区切り文字は、例えば、数字やアルファベット以外の文字である。組み合わせは、区切り文字を組み合わせる順番を考慮した組み合わせである。また、組み合わせは、区切り文字を組み合わせる順番を考慮しない組み合わせであってもよい。 The delimiter is a character used to delimit a character string included in the compression target data 110 for convenience. The delimiter is, for example, a character other than a number or alphabet. The combination is a combination that considers the order in which the delimiters are combined. Further, the combination may be a combination that does not consider the order of combining the delimiters.

抽出部３０１は、例えば、圧縮対象データ１１０から、数字やアルファベット以外の文字を区切り文字として抽出し、数字やアルファベット以外の２つの文字に挟まれた文字列を抽出する。これにより、抽出部３０１は、区切り文字と、区切り文字以外の文字列とを分類することができる。 For example, the extraction unit 301 extracts characters other than numerals and alphabets as delimiters from the compression target data 110, and extracts a character string sandwiched between two characters other than numerals and alphabets. Thereby, the extraction part 301 can classify | categorize a delimiter and a character string other than a delimiter.

抽出部３０１は、区切り文字として、文字数が２以上の文字列を用いるようにしてもよい。抽出部３０１は、例えば、区切り文字として「：＃」を用いるようにする。これにより、抽出部３０１は、区切り文字の種類数を増大させやすくし、２つの区切り文字の組み合わせの種類数を増大させやすくすることができる。このため、抽出部３０１は、同じ種類の組み合わせの２つの区切り文字に挟まれた文字列が同じ属性の文字列になりやすくし、第２のデータ１２２において同じ文字列が比較的近くに集まりやすくすることができ、圧縮効率の向上を図ることができる。 The extraction unit 301 may use a character string having two or more characters as a delimiter. For example, the extraction unit 301 uses “: #” as a delimiter. Thereby, the extraction unit 301 can easily increase the number of types of delimiters, and can easily increase the number of types of combinations of two delimiters. Therefore, the extraction unit 301 makes it easy for a character string sandwiched between two delimiters of the same type of combination to be a character string having the same attribute, and in the second data 122, the same character string is likely to gather relatively close And the compression efficiency can be improved.

抽出部３０１は、区切り文字の文字数が１であるとして、区切り文字として文字数が２以上の文字列を用いないようにしてもよい。抽出部３０１は、例えば、区切り文字として「：＃」を用いないようにし、「：＃」を区切り文字「：」と「＃」とが連続して出現したとして扱う。また、抽出部３０１は、例えば、区切り文字として「：＃」を用いないようにし、「：＃」を区切り文字「：」が出現したとして扱ってもよい。これにより、抽出部３０１は、区切り文字の種類数を増大しにくくし、２つの区切り文字の組み合わせの種類数を増大しにくくすることができる。このため、抽出部３０１は、２つの区切り文字の組み合わせの種類ごとに、当該種類の組み合わせの２つの区切り文字に挟まれた文字列を分類する際に用いられる記憶領域のサイズを抑制し、記憶領域を節約するとともに効率よく処理を行うことができる。 The extraction unit 301 may not use a character string having two or more characters as a delimiter, assuming that the number of delimiters is one. For example, the extraction unit 301 does not use “: #” as a delimiter and treats “: #” as the delimiters “:” and “#” appear in succession. For example, the extraction unit 301 may not use “: #” as a delimiter and may treat “: #” as an appearance of the delimiter “:”. Accordingly, the extraction unit 301 can hardly increase the number of types of delimiters and can hardly increase the number of types of combinations of two delimiters. For this reason, for each type of combination of two delimiters, the extraction unit 301 suppresses the size of a storage area used when classifying a character string sandwiched between two delimiters of the combination of the types, and stores It is possible to save the area and perform processing efficiently.

抽出部３０１は、区切り文字として制御文字を用いてもよい。抽出部３０１は、例えば、区切り文字として、改行を示す制御文字を用いる。これにより、抽出部３０１は、２つの区切り文字の組み合わせの種類ごとに文字列を分類するときに、制御文字を含む組み合わせの２つの区切り文字に挟まれた文字列についても分類することができる。結果として、抽出部３０１は、第２のデータ１２２において同じ文字列が比較的近くに集まりやすくすることができ、圧縮効率の向上を図ることができる。 The extraction unit 301 may use a control character as a delimiter. For example, the extraction unit 301 uses a control character indicating a line feed as a delimiter. Thus, when the extraction unit 301 classifies the character string for each type of combination of two delimiters, it can also classify the character string sandwiched between the two delimiters of the combination including the control character. As a result, the extraction unit 301 can easily gather the same character strings in the second data 122 and can improve the compression efficiency.

抽出部３０１は、所定の区切り文字の指定を受け付け、圧縮対象データ１１０から、指定を受け付けた所定の区切り文字と、指定を受け付けた２つの区切り文字に挟まれた文字列を抽出してもよい。抽出部３０１は、例えば、ユーザからの操作入力によって所定の区切り文字の指定を受け付け、圧縮対象データ１１０から、指定を受け付けた所定の区切り文字と、指定を受け付けた２つの区切り文字に挟まれた文字列を抽出する。これにより、抽出部３０１は、圧縮対象データ１１０の形式に沿った区切り文字を用いやすくなり、第２のデータ１２２において同じ文字列が比較的近くに集まりやすくすることができ、圧縮効率の向上を図ることができる。 The extraction unit 301 may accept designation of a predetermined delimiter and may extract from the compression target data 110 a character string sandwiched between the predetermined delimiter that has received the designation and the two delimiters that have accepted the designation. . For example, the extraction unit 301 receives designation of a predetermined delimiter character by an operation input from the user, and is sandwiched between the predetermined delimiter character that has received the designation and the two delimiter characters that have received the designation from the compression target data 110. Extract a string. Thereby, the extraction unit 301 can easily use delimiters in accordance with the format of the compression target data 110, and can easily collect the same character strings in the second data 122, thereby improving compression efficiency. Can be planned.

抽出部３０１は、圧縮対象データ１１０の先頭には改行を示す制御文字が出現するものとして扱うことができる。これにより、抽出部３０１は、２つの区切り文字の組み合わせの種類ごとに文字列を分類するときに、圧縮対象データ１１０の先頭の文字列についても分類することができる。 The extraction unit 301 can handle the data to be compressed 110 as a control character indicating a line feed appears at the beginning. Thus, the extraction unit 301 can also classify the first character string of the compression target data 110 when classifying the character string for each type of combination of two delimiters.

生成部３０２は、第１のデータ１２１と第２のデータ１２２とを生成する。第１のデータ１２１とは、圧縮対象データ１１０において区切り文字が出現した順序を特定可能に区切り文字を並べたデータである。第２のデータ１２２とは、２つの区切り文字の組み合わせの種類ごとに、当該種類の組み合わせの２つの区切り文字に挟まれた文字列が圧縮対象データ１１０に出現した順序を特定可能に文字列を並べたデータである。種類は、２つの区切り文字を組み合わせたパターンである。種類は、複数のパターンをまとめたものであってもよい。例えば、区切り文字「：」と「＃」との組み合わせと、区切り文字「＃」と「：」との組み合わせとは、同じ種類の組み合わせと扱われてもよい。 The generation unit 302 generates first data 121 and second data 122. The first data 121 is data in which delimiters are arranged so that the order in which delimiters appear in the compression target data 110 can be specified. The second data 122 is a character string that can specify the order in which the character string sandwiched between two delimiters of the combination of the types appears in the compression target data 110 for each type of combination of two delimiters. It is the arranged data. The type is a pattern in which two delimiters are combined. The type may be a combination of a plurality of patterns. For example, the combination of the delimiters “:” and “#” and the combination of the delimiters “#” and “:” may be treated as the same type of combination.

生成部３０２は、例えば、圧縮対象データ１１０から区切り文字以外の文字を削除し、区切り文字と区切り文字との境界を判別可能なように、区切り文字と区切り文字との間に所定の文字「ａ」を挿入し、圧縮対象データ１１０を第１のデータ１２１に変形する。また、生成部３０２は、区切り文字の文字数が固定である場合、区切り文字と区切り文字との境界を明示しなくてもよいため、区切り文字と区切り文字との間に所定の文字「ａ」を挿入しなくてもよい。 For example, the generation unit 302 deletes a character other than the delimiter from the compression target data 110 so that the boundary between the delimiter and the delimiter can be determined. ”Is inserted, and the compression target data 110 is transformed into the first data 121. In addition, when the number of characters of the delimiter is fixed, the generation unit 302 does not need to clearly indicate the boundary between the delimiter and the delimiter, and thus a predetermined character “a” is inserted between the delimiter and the delimiter. It does not have to be inserted.

生成部３０２は、例えば、２つの区切り文字の組み合わせの種類のそれぞれに対応するドットを並べる。そして、生成部３０２は、２つの区切り文字の組み合わせの種類のそれぞれに対応するドットの直前に、当該種類の組み合わせの２つの区切り文字に挟まれた文字列を「，」で区切って挿入し、第２のデータ１２２を生成する。これにより、生成部３０２は、圧縮対象データ１１０に戻すことが可能である第１のデータ１２１と第２のデータ１２２とを生成することができ、第２のデータ１２２においては同じ文字列が比較的近い位置に連続して現れやすいようにすることができる。 For example, the generation unit 302 arranges dots corresponding to each of the types of combinations of two delimiters. Then, the generation unit 302 inserts a character string sandwiched between two delimiters of the combination of the types immediately before the dot corresponding to each of the combinations of the two delimiters, separated by “,”, Second data 122 is generated. Thereby, the generation unit 302 can generate the first data 121 and the second data 122 that can be returned to the compression target data 110, and the same character string is compared in the second data 122. It can be made easy to appear continuously at close positions.

生成部３０２は、圧縮対象データ１１０に出現する区切り文字の種類数が閾値より大きい場合、圧縮対象データ１１０を、出現する区切り文字の種類数が閾値以下になる部分データに分割する場合があってもよい。この場合、生成部３０２は、部分データごとに、第１のデータ１２１と第２のデータ１２２とを生成する。この場合、第１のデータ１２１は、例えば、区切り文字が部分データに出現した順序を特定可能に区切り文字を並べたデータである。この場合、第２のデータ１２２は、例えば、部分データに出現する２つの区切り文字の組み合わせの種類ごとに、当該種類の組み合わせの２つの区切り文字に挟まれた文字列が部分データに出現した順序を特定可能に文字列を並べたデータである。これにより、生成部３０２は、２つの区切り文字の組み合わせの種類ごとに、当該種類の組み合わせの２つの区切り文字に挟まれた文字列を分類する際に用いられる記憶領域のサイズを抑制し、記憶領域を節約するとともに効率よく処理を行うことができる。 When the number of types of delimiters appearing in the compression target data 110 is greater than the threshold, the generation unit 302 may divide the compression target data 110 into partial data in which the number of types of delimiters appearing is equal to or less than the threshold. Also good. In this case, the generating unit 302 generates the first data 121 and the second data 122 for each partial data. In this case, the first data 121 is, for example, data in which delimiters are arranged so that the order in which the delimiters appear in the partial data can be specified. In this case, the second data 122 is, for example, the order in which a character string sandwiched between two delimiters of the combination of the types appears in the partial data for each type of combination of two delimiters that appears in the partial data. This is data in which character strings are arranged so that can be specified. Thereby, the generation unit 302 suppresses the size of the storage area used for classifying the character string sandwiched between the two delimiters of the combination of the types for each type of combination of the two delimiters, It is possible to save the area and perform processing efficiently.

生成部３０２は、具体的には、区切り文字を、圧縮対象データ１１０に出現した順序に沿って所定の文字と交互に並べた第１のデータ１２１を生成する。生成部３０２は、具体的には、２つの区切り文字の組み合わせの種類ごとに、当該種類の組み合わせの２つの区切り文字に挟まれた文字列を、圧縮対象データ１１０に出現した順序に沿って所定の文字と交互に並べた第２のデータ１２２を生成する。これにより、生成部３０２は、圧縮対象データ１１０に戻すことが可能である第１のデータ１２１と第２のデータ１２２とを生成することができ、第２のデータ１２２においては同じ文字列が比較的近い位置に連続して現れやすいようにすることができる。 Specifically, the generation unit 302 generates first data 121 in which delimiters are alternately arranged with predetermined characters in the order of appearance in the compression target data 110. Specifically, for each type of combination of two delimiters, the generation unit 302 predetermines a character string sandwiched between two delimiters of the combination of the types in the order in which they appear in the compression target data 110. Second data 122 arranged alternately with the characters is generated. Thereby, the generation unit 302 can generate the first data 121 and the second data 122 that can be returned to the compression target data 110, and the same character string is compared in the second data 122. It can be made easy to appear continuously at close positions.

圧縮部３０３は、生成した第１のデータ１２１と第２のデータ１２２とに対してスライド辞書法を用いた圧縮を行う。圧縮は、例えば、「ＬＺ７７」の圧縮アルゴリズムによる圧縮である。圧縮は、例えば、「ＬＺ７７」の圧縮アルゴリズムを改良した各種アルゴリズムによる圧縮であってもよく、具体的には、「ＬＺＭＡ」の圧縮アルゴリズムによる圧縮などである。 The compression unit 303 compresses the generated first data 121 and second data 122 using a slide dictionary method. The compression is, for example, compression by the “LZ77” compression algorithm. The compression may be, for example, compression by various algorithms obtained by improving the compression algorithm of “LZ77”, and specifically, compression by a compression algorithm of “LZMA”.

圧縮部３０３は、例えば、生成した第１のデータ１２１と第２のデータ１２２とに対してＬＺ７７またはＬＺＭＡの圧縮アルゴリズムを用いた圧縮を行う。圧縮部３０３は、具体的には、生成した第１のデータ１２１と第２のデータ１２２とを連結し、連結したデータの先頭に第１のデータ１２１と第２のデータ１２２との境界を示すメタデータを付与してから、ＬＺＭＡの圧縮アルゴリズムを用いた圧縮を行う。 For example, the compression unit 303 performs compression using the LZ77 or LZMA compression algorithm on the generated first data 121 and second data 122. Specifically, the compression unit 303 concatenates the generated first data 121 and the second data 122 and indicates the boundary between the first data 121 and the second data 122 at the head of the concatenated data. After adding the metadata, compression using the LZMA compression algorithm is performed.

メタデータは圧縮されなくてもよい。メタデータは、例えば、第２のデータ１２２の先頭の位置を示す値であり、具体的には、第１のデータ１２１のサイズである。メタデータは、第１のデータ１２１と第２のデータ１２２との間に挿入されてもよい。メタデータは、例えば、第１のデータ１２１と第２のデータ１２２との間に挿入される場合、第１のデータ１２１にも第２のデータ１２２にも出現しない文字によって表現される。 The metadata may not be compressed. The metadata is, for example, a value indicating the start position of the second data 122, and specifically the size of the first data 121. The metadata may be inserted between the first data 121 and the second data 122. For example, when the metadata is inserted between the first data 121 and the second data 122, the metadata is represented by characters that do not appear in the first data 121 or the second data 122.

また、圧縮部３０３は、具体的には、第１のデータ１２１と第２のデータ１２２とを連結し、メタデータを付与した後、メタデータと第１のデータ１２１とについては圧縮せず、第２のデータ１２２を圧縮してもよい。これにより、圧縮部３０３は、同じ文字列が比較的近い位置に連続して現れやすいようにした第２のデータ１２２について、スライド辞書法を用いた圧縮における圧縮効率の向上を図ることができる。また、圧縮部３０３は、第２のデータ１２２に対する圧縮に用いることが好ましいＬＺ７７またはＬＺＭＡの圧縮アルゴリズムを用いることができ、さらに圧縮データのサイズの低減化を図ることができる。 Specifically, the compression unit 303 concatenates the first data 121 and the second data 122, adds the metadata, and then does not compress the metadata and the first data 121. The second data 122 may be compressed. Thereby, the compression unit 303 can improve the compression efficiency in the compression using the slide dictionary method for the second data 122 in which the same character string is likely to appear continuously at relatively close positions. In addition, the compression unit 303 can use an LZ77 or LZMA compression algorithm that is preferably used for compression of the second data 122, and can further reduce the size of the compressed data.

復号部３０４は、第１のデータ１２１と第２のデータ１２２とにスライド辞書法を用いた圧縮を行って得られた圧縮データを復号する。復号部３０４は、圧縮データに対して、ＬＺ７７またはＬＺＭＡの圧縮アルゴリズムに対応する復号アルゴリズムを用いた復号を行う。これにより、復号部３０４は、圧縮対象データ１１０に戻すことが可能である第１のデータ１２１と第２のデータ１２２とを取得することができる。 The decoding unit 304 decodes the compressed data obtained by compressing the first data 121 and the second data 122 using the slide dictionary method. The decoding unit 304 performs decoding on the compressed data using a decoding algorithm corresponding to the LZ77 or LZMA compression algorithm. Thereby, the decoding unit 304 can acquire the first data 121 and the second data 122 that can be returned to the compression target data 110.

復元部３０５は、復号によって得られた第２のデータ１２２を参照し、復号によって得られた第１のデータ１２１における２つの区切り文字の間に、当該２つの区切り文字の組み合わせの種類に対応付けられた文字列を順次挿入する。これにより、情報処理装置１００は、効率よく圧縮された圧縮データから、圧縮元である圧縮対象データ１１０を復元することができる。 The restoration unit 305 refers to the second data 122 obtained by decryption, and associates between the two delimiters in the first data 121 obtained by decryption with the combination type of the two delimiters The specified character string is inserted sequentially. Thus, the information processing apparatus 100 can restore the compression target data 110 that is the compression source from the compressed data that has been efficiently compressed.

ここでは、情報処理装置１００が、圧縮対象データ１１０を圧縮する機能と、圧縮対象データ１１０を復元する機能との両方を有する場合について説明したが、これに限らない。例えば、情報処理装置１００は、圧縮対象データ１１０を圧縮する機能と、圧縮対象データ１１０を復元する機能とのいずれかを有さなくてもよい。 Here, the case where the information processing apparatus 100 has both the function of compressing the compression target data 110 and the function of restoring the compression target data 110 has been described, but the present invention is not limited thereto. For example, the information processing apparatus 100 may not have either a function for compressing the compression target data 110 or a function for restoring the compression target data 110.

（圧縮対象データ４００を圧縮する一例）
次に、図４〜図１１を用いて、情報処理装置１００が圧縮対象データ４００を圧縮する一例について説明する。 (Example of compressing the compression target data 400)
Next, an example in which the information processing apparatus 100 compresses the compression target data 400 will be described with reference to FIGS.

図４〜図１１は、圧縮対象データ４００を圧縮する一例を示す説明図である。図４において、情報処理装置１００は、圧縮対象データ４００の入力を受け付ける。図４の例では、圧縮対象データ４００は、数字やアルファベット以外の文字を、所定の区切り文字として含むデータである。所定の区切り文字は、文字数が２以上の文字列を含んでよい。所定の区切り文字は、制御文字を含んでよい。 4-11 is explanatory drawing which shows an example which compresses the compression object data 400. FIG. In FIG. 4, the information processing apparatus 100 accepts input of compression target data 400. In the example of FIG. 4, the compression target data 400 is data including characters other than numerals and alphabets as predetermined delimiters. The predetermined delimiter may include a character string having two or more characters. The predetermined delimiter character may include a control character.

具体的には、圧縮対象データ４００の１行目は、「１１：０３ｌｏａｄ：ｊｅｒａｓｕｒｅｌｏａｄ：ｌｒｃｌｏａｄ：ｉｓａ」であり、末尾で改行されている。圧縮対象データ４００の２行目は、「１１：０３ｍｏｎｍａｐ」であり、末尾で改行されている。圧縮対象データ４００の３行目は、「１１：０３ａｄｄｉｎｇａｕｔｈｐｒｏｔｏｃｏｌ：ｎｏｎｅ」であり、末尾で改行されている。改行を示す制御文字は非表示である。 Specifically, the first line of the compression target data 400 is “11:03 load: jerture load: lrc load: isa”, and is line-breaked at the end. The second line of the compression target data 400 is “11:03 monmap”, which is a line feed at the end. The third line of the compression target data 400 is “11:03 adding auth protocol: none”, and is line-breaked at the end. Control characters indicating line breaks are not displayed.

以下の説明では、便宜上、空白を「＃」と表記し、改行を示す制御文字を「＄」と表記する場合がある。このため、以下の説明では、圧縮対象データ４００は、「１１：０３＃ｌｏａｄ：＃ｊｅｒａｓｕｒｅ＃ｌｏａｄ：＃ｌｒｃ＃ｌｏａｄ：＃ｉｓａ＄１１：０３＃ｍｏｎｍａｐ＄１１：０３＃ａｄｄｉｎｇ＃ａｕｔｈ＃ｐｒｏｔｏｃｏｌ：＃ｎｏｎｅ＄」である。ここで、図５の説明に移行する。 In the following description, for the sake of convenience, a space may be expressed as “#”, and a control character indicating a line feed may be expressed as “$”. Therefore, in the following description, the compression target data 400 is “11: 03 # load: # jeraseure # load: # lrc # load: # isa $ 11: 03 # monmap $ 11: 03 # adding # auth # protocol: # None $ ". Here, the description shifts to the description of FIG.

図５において、情報処理装置１００は、入力を受け付けた圧縮対象データ４００に基づいて、圧縮用テーブル５００を用意する。圧縮用テーブル５００は、圧縮対象データ４００に出現する２つの区切り文字の組み合わせのパターンごとに、当該パターンの組み合わせの２つの区切り文字に挟まれた文字列を分類するために用いられるテーブルである。圧縮用テーブル５００は、圧縮対象データ４００に出現する２つの区切り文字の組み合わせのパターンごとに、当該パターンの組み合わせに対応する記憶領域を有する。 In FIG. 5, the information processing apparatus 100 prepares a compression table 500 based on the compression target data 400 that has received an input. The compression table 500 is a table used for classifying a character string sandwiched between two delimiters of a combination of patterns for each pattern of combinations of two delimiters that appear in the compression target data 400. The compression table 500 has a storage area for each combination of two delimiters appearing in the compression target data 400 for each combination of the patterns.

情報処理装置１００は、例えば、圧縮対象データ４００に基づいて区切り文字の種類数「４」を計数し、区切り文字の種類数「４」の二乗「１６」個の記憶領域を有する圧縮用テーブル５００を用意する。ここで、図６の説明に移行する。 For example, the information processing apparatus 100 counts the number of types of delimiters “4” based on the compression target data 400 and has a storage table of “16” squares of the number of types of delimiters “4”. Prepare. Here, the description shifts to the description of FIG.

図６において、情報処理装置１００は、圧縮対象データ４００の先頭が区切り文字ではない場合、圧縮対象データ４００の先頭に所定の区切り文字を付与しておく。図６の例では、情報処理装置１００は、圧縮対象データ４００の先頭に、改行を示す制御文字「＄」を付与しておく。また、情報処理装置１００は、圧縮対象データ４００の末尾が区切り文字ではない場合、圧縮対象データ４００の末尾に所定の区切り文字を付与しておいてもよい。 In FIG. 6, when the top of the compression target data 400 is not a delimiter, the information processing apparatus 100 assigns a predetermined delimiter to the top of the compression target data 400. In the example of FIG. 6, the information processing apparatus 100 adds a control character “$” indicating a line feed to the head of the compression target data 400. In addition, when the end of the compression target data 400 is not a delimiter, the information processing apparatus 100 may add a predetermined delimiter to the end of the compression target data 400.

次に、情報処理装置１００は、圧縮対象データ４００に基づいて、所定の組み合わせルールにしたがって２つの区切り文字の組み合わせのパターンを特定する。所定の組み合わせルールは、例えば、先に出現した区切り文字から順に選択し、選択した区切り文字に対して、先に出現した区切り文字から順に組み合わせていくルールである。 Next, the information processing apparatus 100 specifies a combination pattern of two delimiters based on the compression target data 400 according to a predetermined combination rule. The predetermined combination rule is, for example, a rule that selects in order from the delimiter that appears first, and combines the selected delimiter in order from the delimiter that appears first.

図６の例では、情報処理装置１００は、最初に出現した区切り文字「＄」を選択し、選択した区切り文字「＄」に対して、最初に出現した区切り文字「＄」を組み合わせたパターンを特定する。同様に、情報処理装置１００は、選択した区切り文字「＄」に対して、２番目に出現した区切り文字「：」、３番目に出現した区切り文字「＃」、最後に出現した区切り文字「：＃」それぞれを組み合わせたパターンを特定する。情報処理装置１００は、同様に、２番目に出現した区切り文字「：」、３番目に出現した区切り文字「＃」、最後に出現した区切り文字「：＃」それぞれを選択して、２つの区切り文字の組み合わせのパターンを特定する。 In the example of FIG. 6, the information processing apparatus 100 selects a delimiter “$” that appears first, and a pattern in which the delimiter “$” that appears first is combined with the selected delimiter “$”. Identify. Similarly, the information processing apparatus 100, for the selected delimiter character “$”, the second delimiter character “:”, the third delimiter character “#”, and the last delimiter character “:”. # "Identify patterns that combine each. Similarly, the information processing apparatus 100 selects each of the delimiter “:” that appears second, the delimiter “#” that appears third, the delimiter “: #” that appears last, and Identify character combination patterns.

そして、情報処理装置１００は、用意した記憶領域のそれぞれに、特定した２つの区切り文字の組み合わせのパターンのそれぞれを記憶し、特定した２つの区切り文字の組み合わせのパターンのそれぞれに対応付ける。図６の例では、情報処理装置１００は、用意した記憶領域を、区切り文字「＄」と「＄」との組み合わせのパターンに対応付ける。また、情報処理装置１００は、例えば、用意した記憶領域を、区切り文字「＄」と「：」との組み合わせのパターンに対応付ける。また、情報処理装置１００は、例えば、用意した記憶領域を、区切り文字「＄」と「＃」との組み合わせのパターンに対応付ける。また、情報処理装置１００は、例えば、用意した記憶領域を、区切り文字「＄」と「：＃」との組み合わせのパターンに対応付ける。 Then, the information processing apparatus 100 stores each of the identified two delimiter combination patterns in each of the prepared storage areas, and associates each of the two specified delimiter combination patterns with each other. In the example of FIG. 6, the information processing apparatus 100 associates the prepared storage area with a combination pattern of delimiters “$” and “$”. Further, the information processing apparatus 100 associates the prepared storage area with a combination pattern of delimiters “$” and “:”, for example. Further, the information processing apparatus 100 associates the prepared storage area with a combination pattern of delimiters “$” and “#”, for example. Further, the information processing apparatus 100 associates the prepared storage area with a combination pattern of delimiters “$” and “: #”, for example.

ここでは、情報処理装置１００が、２つの区切り文字の組み合わせのパターンのうち、圧縮対象データ４００において文字列を挟むために用いられていないパターンについても特定する場合について説明したが、これに限らない。例えば、情報処理装置１００は、圧縮対象データ４００において文字列を挟むために用いられていないパターンについて特定しなくてもよい。この場合、情報処理装置１００は、区切り文字の種類数の二乗個の記憶領域を用意しなくても、圧縮対象データ４００において文字列を挟むために用いられているパターンに対応する記憶領域を用意すればよい。ここで、図７の説明に移行する。 Here, a case has been described in which the information processing apparatus 100 also identifies a pattern that is not used for sandwiching a character string in the compression target data 400 among patterns of combinations of two delimiters, but is not limited thereto. . For example, the information processing apparatus 100 does not have to specify a pattern that is not used to sandwich a character string in the compression target data 400. In this case, the information processing apparatus 100 prepares a storage area corresponding to the pattern used to sandwich the character string in the compression target data 400 without preparing a storage area that is the square of the number of types of delimiters. do it. Here, the description shifts to the description of FIG.

図７において、情報処理装置１００は、２つの区切り文字の組み合わせのパターンごとに、圧縮対象データ４００において当該パターンの組み合わせの２つの区切り文字に挟まれた文字列を抽出する。そして、情報処理装置１００は、２つの区切り文字の組み合わせのパターンのそれぞれに対応付けた記憶領域に、圧縮対象データ４００において当該２つの区切り文字に挟まれた文字列を、圧縮対象データ４００に出現した順序を特定可能に並べて記憶する。 In FIG. 7, the information processing apparatus 100 extracts a character string sandwiched between two delimiters of a combination of patterns in the compression target data 400 for each pattern of combinations of two delimiters. Then, the information processing apparatus 100 causes a character string sandwiched between the two delimiters in the compression target data 400 to appear in the compression target data 400 in a storage area associated with each combination pattern of two delimiters. The specified order is stored in an identifiable manner.

情報処理装置１００は、例えば、区切り文字「＄」と「：」との組み合わせのパターンに対応付けた記憶領域に、当該組み合わせの２つの区切り文字に挟まれた文字列「１１」「１１」「１１」を、「，」で区切って出現順に並べて記憶する。同様に、情報処理装置１００は、区切り文字「：」と「＃」との組み合わせのパターンに対応付けた記憶領域に、当該組み合わせの２つの区切り文字に挟まれた文字列「０３」「０３」「０３」を、「，」で区切って出現順に並べて記憶する。また、情報処理装置１００は、区切り文字「＄」と「＄」との組み合わせのパターンに対応付けた記憶領域には、当該組み合わせの２つの区切り文字に挟まれた文字列がないため、文字列を記憶しない。ここで、図８の説明に移行する。 For example, the information processing apparatus 100 stores character strings “11”, “11”, and “2” sandwiched between two delimiters of the combination in a storage area associated with a combination pattern of delimiters “$” and “:”. 11 ”are separated by“, ”and stored in the order of appearance. Similarly, the information processing apparatus 100 stores character strings “03” and “03” sandwiched between two delimiters of the combination in a storage area associated with a combination pattern of delimiters “:” and “#”. “03” is separated by “,” and stored in the order of appearance. Further, the information processing apparatus 100 does not include a character string sandwiched between two delimiters of the combination in the storage area associated with the combination pattern of delimiters “$” and “$”. Do not remember. Here, the description shifts to the description of FIG.

図８において、情報処理装置１００は、圧縮対象データ４００の先頭に付与しておいた所定の区切り文字を削除する。図８の例では、情報処理装置１００は、圧縮対象データ４００の先頭に付与しておいた改行を示す制御文字「＄」を削除する。また、情報処理装置１００は、圧縮対象データ４００の末尾に所定の区切り文字を付与していれば、圧縮対象データ４００の末尾に付与しておいた所定の区切り文字についても削除する。 In FIG. 8, the information processing apparatus 100 deletes a predetermined delimiter character added to the head of the compression target data 400. In the example of FIG. 8, the information processing apparatus 100 deletes the control character “$” indicating a line feed attached to the head of the compression target data 400. In addition, if a predetermined delimiter is added to the end of the compression target data 400, the information processing apparatus 100 also deletes the predetermined delimiter added to the end of the compression target data 400.

そして、情報処理装置１００は、第１のデータ８００を生成する。第１のデータ８００とは、圧縮対象データ４００において区切り文字が出現した順序を特定可能に区切り文字を並べたデータである。情報処理装置１００は、例えば、圧縮対象データ４００から区切り文字以外の文字を削除し、区切り文字と区切り文字との間に所定の文字「ａ」を挿入することにより、圧縮対象データ４００を第１のデータ８００に変形する。 Then, the information processing apparatus 100 generates first data 800. The first data 800 is data in which delimiters are arranged so that the order in which delimiters appear in the compression target data 400 can be specified. For example, the information processing apparatus 100 deletes characters other than the delimiter from the compression target data 400 and inserts a predetermined character “a” between the delimiter and the delimiter, thereby causing the compression target data 400 to be the first The data 800 is transformed.

図８の例では、情報処理装置１００は、圧縮対象データ４００から区切り文字以外の文字を削除して、圧縮対象データ４００を「：＃：＃＃：＃＃：＃＄：＃＄：＃＃＃：＃＄」に変形する。そして、情報処理装置１００は、区切り文字と区切り文字との間に所定の文字「ａ」を挿入し、第１のデータ８００「ａ：ａ＃ａ：＃ａ＃ａ：＃ａ＃ａ：＃ａ＄ａ：ａ＃ａ＄ａ：ａ＃ａ＃ａ＃ａ：＃ａ＄」に変形する。 In the example of FIG. 8, the information processing apparatus 100 deletes characters other than the delimiters from the compression target data 400 and changes the compression target data 400 to “: ##: ##: ##: ##: ##”: ##. #: # $ ". The information processing apparatus 100 inserts a predetermined character “a” between the delimiters and the first data 800 “a: a # a: # a # a: # a # a: #”. a $ a: a # a $ a: a # a # a # a: # a $ ".

ここでは、情報処理装置１００が、圧縮対象データ４００の先頭に付与しておいた所定の区切り文字を削除する場合について説明したが、これに限らない。例えば、情報処理装置１００は、圧縮対象データ４００の先頭に付与しておいた所定の区切り文字をそのまま残しておき、圧縮対象データ４００が復元される際に削除されるようにしておいてもよい。ここで、図９の説明に移行する。 Here, a case has been described in which the information processing apparatus 100 deletes a predetermined delimiter character added to the head of the compression target data 400, but the present invention is not limited to this. For example, the information processing apparatus 100 may leave a predetermined delimiter character added to the head of the compression target data 400 as it is and delete it when the compression target data 400 is restored. . Here, the description shifts to the description of FIG. 9.

図９において、情報処理装置１００は、２つの区切り文字の組み合わせのパターンのそれぞれに対応するドットを並べたデータ９００を作成する。そして、情報処理装置１００は、２つの区切り文字の組み合わせのパターンのそれぞれに対応するドットの直前に、当該パターンの組み合わせの２つの区切り文字に挟まれた文字列を「，」で区切って挿入する。 In FIG. 9, the information processing apparatus 100 creates data 900 in which dots corresponding to respective patterns of combinations of two delimiters are arranged. Then, the information processing apparatus 100 inserts a character string sandwiched between two delimiters of the pattern combination immediately before the dot corresponding to each of the two delimiter combination patterns, and delimits the character string with “,”. .

図９の例では、情報処理装置１００は、２つの区切り文字の組み合わせの１６個のパターンのそれぞれに対応する１６個のドットを並べたデータ「．．．．．．．．．．．．．．．．」を作成する。ここで、先頭のドットから順に、特定された順が早い組み合わせのパターンに対応する。換言すれば、先頭のドットは、最初に特定された区切り文字「＄」と「＄」との組み合わせのパターンに対応する。２番目のドットは、２番目に特定された区切り文字「＄」と「：」との組み合わせのパターンに対応する。 In the example of FIG. 9, the information processing apparatus 100 has data “...” arranged in 16 dots corresponding to each of 16 patterns of combinations of two delimiters. ... "is created. Here, in order from the first dot, the identified pattern corresponds to the combination pattern with the earlier order. In other words, the first dot corresponds to the combination pattern of the delimiters “$” and “$” specified first. The second dot corresponds to the combination pattern of the delimiters “$” and “:” specified second.

そして、情報処理装置１００は、区切り文字「＄」と「：」との組み合わせのパターンに対応するドットの直前に、当該組み合わせの２つの区切り文字に挟まれた文字列「１１」「１１」「１１」を「，」で区切って出現順に並べて挿入する。このように、情報処理装置１００は、１６個のドットを並べたデータ「．．．．．．．．．．．．．．．．」を、「．１１，１１，１１．．．．．．．．．．．．．．．」に変形する。 Then, the information processing apparatus 100 immediately precedes the dot corresponding to the combination pattern of the delimiters “$” and “:”, and character strings “11”, “11”, and “11” sandwiched between the two delimiters of the combination. 11 ”are separated by“, ”and arranged in the order of appearance. In this way, the information processing apparatus 100 converts the data “...,. ... ".

同様に、情報処理装置１００は、区切り文字「：」と「＃」との組み合わせのパターンに対応するドットの直前に、当該組み合わせの２つの区切り文字に挟まれた文字列「０３」「０３」「０３」を、「，」で区切って出現順に並べて挿入する。また、情報処理装置１００は、区切り文字「＄」と「＄」との組み合わせのパターンに対応するドットの直前には、当該組み合わせの２つの区切り文字に挟まれた文字列がないため、文字列を挿入しない。ここで、図１０の説明に移行する。 Similarly, the information processing apparatus 100 determines that the character strings “03” and “03” sandwiched between two delimiters of the combination immediately before the dot corresponding to the combination pattern of the delimiters “:” and “#”. Insert “03” in the order of appearance, separated by “,”. In addition, the information processing apparatus 100 does not include a character string sandwiched between two delimiters of the combination immediately before a dot corresponding to a combination pattern of delimiters “$” and “$”. Is not inserted. Here, the description shifts to the description of FIG.

図１０において、情報処理装置１００は、それぞれのパターンの２つの区切り文字に挟まれた文字列を「，」で区切って挿入することにより変形されたデータを、第２のデータ１０００とする。第２のデータ１０００とは、２つの区切り文字の組み合わせのパターンごとに、当該パターンの組み合わせの２つの区切り文字に挟まれた文字列が圧縮対象データ４００に出現した順序を特定可能に文字列を並べたデータである。 In FIG. 10, the information processing apparatus 100 sets the second data 1000 as data that has been transformed by inserting a character string sandwiched between two delimiters of each pattern and inserting it with “,”. The second data 1000 is a character string that can specify the order in which a character string sandwiched between two delimiters of the combination of patterns appears in the compression target data 400 for each pattern of two delimiter combinations. It is the arranged data.

図１０の例では、第２のデータ１０００は、「．１１，１１，１１．．．．．０３，０３，０３．．ｍｏｎｍａｐ．．ａｄｄｉｎｇ，ａｕｔｈ．ｌｏａｄ，ｌｏａｄ，ｌｏａｄ，ｐｒｏｔｏｃｏｌ．ｉｓａ，ｎｏｎｅ．．ｊｅｒａｓｕｒｅ，ｌｒｃ．」である。 In the example of FIG. 10, the second data 1000 includes “.11, 11, 11... 03, 03. 03. monmap..adding, auth.load, load, load, protocol.isa, none. .. jeraseure, lrc. "

ここで、情報処理装置１００は、圧縮対象データ４００を復元する際に、所定の組み合わせルールを用いれば第１のデータ８００から、図６において特定された２つの区切り文字の組み合わせと、順番も一致したパターンを特定可能である。このため、第２のデータ１０００は、第１のデータ８００から特定可能な２つの区切り文字の組み合わせの種類を利用することにより、各種類の組み合わせの２つの区切り文字に挟まれた文字列が圧縮対象データ４００に出現した順序を特定可能であればよい。結果として、第２のデータ１０００は、２つの区切り文字の組み合わせのパターン自体を示す情報を含んでいない。また、第２のデータ１０００は、２つの区切り文字の組み合わせのパターン自体を示す情報を含んでもよい。ここで、図１１の説明に移行する。 Here, when restoring the compression target data 400, the information processing apparatus 100 uses the predetermined combination rule to match the order of the combination of the two delimiters specified in FIG. 6 from the first data 800. Pattern can be specified. For this reason, the second data 1000 uses a combination of two delimiters that can be specified from the first data 800 to compress a character string sandwiched between two delimiters of each type of combination. It is sufficient that the order of appearance in the target data 400 can be specified. As a result, the second data 1000 does not include information indicating the pattern of the combination of two delimiters. The second data 1000 may include information indicating the pattern of the combination of two delimiters. Here, the description shifts to the description of FIG.

図１１において、情報処理装置１００は、第１のデータ８００と第２のデータ１０００とを連結して、ＬＺＭＡの圧縮アルゴリズムを用いて圧縮を行うことにより、圧縮ファイル１１００を生成する。情報処理装置１００は、例えば、第１のデータ８００の末尾に第２のデータ１０００を連結する場合、第１のデータ８００と第２のデータ１０００との境界を示すメタデータを第１のデータ８００の先頭に付与しておく。そして、情報処理装置１００は、連結した第１のデータ８００と第２のデータ１０００とをまとめて、ＬＺＭＡの圧縮アルゴリズムを用いて圧縮を行う。メタデータは圧縮されなくてもよい。 In FIG. 11, the information processing apparatus 100 concatenates first data 800 and second data 1000 and performs compression using an LZMA compression algorithm to generate a compressed file 1100. For example, when the second data 1000 is linked to the end of the first data 800, the information processing apparatus 100 sets the metadata indicating the boundary between the first data 800 and the second data 1000 as the first data 800. It is given at the beginning of. Then, the information processing apparatus 100 collects the linked first data 800 and second data 1000 and compresses them using an LZMA compression algorithm. The metadata may not be compressed.

（圧縮対象データ４００を復元する一例）
次に、図１２〜図２３を用いて、図１１において生成された圧縮ファイル１１００から、情報処理装置１００が圧縮対象データ４００を復元する一例について説明する。 (Example of restoring compression target data 400)
Next, an example in which the information processing apparatus 100 restores the compression target data 400 from the compressed file 1100 generated in FIG. 11 will be described with reference to FIGS.

図１２〜図２３は、圧縮対象データ４００を復元する一例を示す説明図である。図１２において、情報処理装置１００は、ＬＺＭＡの圧縮アルゴリズムに対応する復号アルゴリズムを用いて圧縮ファイル１１００を復号し、第１のデータ８００と第２のデータ１０００とを取得する。 12 to 23 are explanatory diagrams illustrating an example of restoring the compression target data 400. FIG. In FIG. 12, the information processing apparatus 100 decrypts the compressed file 1100 using a decryption algorithm corresponding to the LZMA compression algorithm, and obtains first data 800 and second data 1000.

情報処理装置１００は、例えば、ＬＺＭＡの圧縮アルゴリズムに対応する復号アルゴリズムを用いて圧縮ファイル１１００を復号する。そして、情報処理装置１００は、先頭のメタデータに基づいて第１のデータ８００と第２のデータ１０００との境界を特定し、第１のデータ８００と第２のデータ１０００とを分割して取得する。ここで、図１３の説明に移行する。 For example, the information processing apparatus 100 decrypts the compressed file 1100 using a decryption algorithm corresponding to the LZMA compression algorithm. Then, the information processing apparatus 100 specifies the boundary between the first data 800 and the second data 1000 based on the top metadata, and acquires the first data 800 and the second data 1000 by dividing them. To do. Here, the description shifts to the description of FIG.

図１３において、情報処理装置１００は、取得した第１のデータ８００に基づいて、復元用テーブル１３００を用意する。復元用テーブル１３００は、第１のデータ８００に出現する２つの区切り文字の組み合わせのパターンごとに、当該パターンの組み合わせの２つの区切り文字に挟まれた文字列を分類するために用いられるテーブルである。復元用テーブル１３００は、第１のデータ８００に出現する２つの区切り文字の組み合わせのパターンごとに、当該パターンの組み合わせに対応する記憶領域を有する。以下、復元用テーブル１３００は、圧縮用テーブル５００と同じ記憶内容になるように作成されていく。 In FIG. 13, the information processing apparatus 100 prepares a restoration table 1300 based on the acquired first data 800. The restoration table 1300 is a table used for classifying a character string sandwiched between two delimiters of a combination of the patterns for each pattern of combinations of two delimiters appearing in the first data 800. . The restoration table 1300 has a storage area corresponding to each combination of patterns for each combination pattern of two delimiters appearing in the first data 800. Hereinafter, the restoration table 1300 is created so as to have the same stored contents as the compression table 500.

情報処理装置１００は、例えば、第１のデータ８００に基づいて区切り文字の種類数「４」を計数し、区切り文字の種類数「４」の二乗「１６」個の記憶領域を有する復元用テーブル１３００を用意する。ここで、図１４の説明に移行する。 For example, the information processing apparatus 100 counts the number of types of delimiters “4” based on the first data 800 and has a storage table of “16” squares of the number of types of delimiters “4”. 1300 is prepared. Here, the description shifts to the description of FIG.

図１４において、情報処理装置１００は、第１のデータ８００の先頭が区切り文字ではない場合、第１のデータ８００の先頭に所定の区切り文字を付与しておく。図１４の例では、情報処理装置１００は、第１のデータ８００の先頭に、改行を示す制御文字「＄」を付与しておく。また、情報処理装置１００は、第１のデータ８００の末尾が区切り文字ではない場合、第１のデータ８００の末尾に所定の区切り文字を付与しておいてもよい。 In FIG. 14, the information processing apparatus 100 assigns a predetermined delimiter character to the head of the first data 800 when the head of the first data 800 is not a delimiter character. In the example of FIG. 14, the information processing apparatus 100 adds a control character “$” indicating a line feed to the top of the first data 800. Further, when the end of the first data 800 is not a delimiter, the information processing apparatus 100 may add a predetermined delimiter to the end of the first data 800.

次に、情報処理装置１００は、第１のデータ８００に基づいて、所定の組み合わせルールにしたがって２つの区切り文字の組み合わせのパターンを特定する。所定の組み合わせルールは、例えば、先に出現した区切り文字から順に選択し、選択した区切り文字に対して、先に出現した区切り文字から順に組み合わせていくルールである。所定の組み合わせルールは、図６において情報処理装置１００が２つの区切り文字の組み合わせのパターンを特定した際の組み合わせルールと同一である。 Next, the information processing apparatus 100 identifies a combination pattern of two delimiters based on the first data 800 according to a predetermined combination rule. The predetermined combination rule is, for example, a rule that selects in order from the delimiter that appears first, and combines the selected delimiter in order from the delimiter that appears first. The predetermined combination rule is the same as the combination rule when the information processing apparatus 100 specifies a combination pattern of two delimiters in FIG.

図１４の例では、情報処理装置１００は、最初に出現した区切り文字「＄」を選択し、選択した区切り文字「＄」に対して、最初に出現した区切り文字「＄」を組み合わせたパターンを特定する。同様に、情報処理装置１００は、選択した区切り文字「＄」に対して、２番目に出現した区切り文字「：」、３番目に出現した区切り文字「＃」、最後に出現した区切り文字「：＃」それぞれを組み合わせたパターンを特定する。情報処理装置１００は、同様に、２番目に出現した区切り文字「：」、３番目に出現した区切り文字「＃」、最後に出現した区切り文字「：＃」それぞれを選択して、２つの区切り文字の組み合わせのパターンを特定する。 In the example of FIG. 14, the information processing apparatus 100 selects the delimiter “$” that appears first, and combines the delimiter “$” that appears first with the selected delimiter “$”. Identify. Similarly, the information processing apparatus 100, for the selected delimiter character “$”, the second delimiter character “:”, the third delimiter character “#”, and the last delimiter character “:”. # "Identify patterns that combine each. Similarly, the information processing apparatus 100 selects each of the delimiter “:” that appears second, the delimiter “#” that appears third, the delimiter “: #” that appears last, and Identify character combination patterns.

そして、情報処理装置１００は、用意した記憶領域のそれぞれに、特定した２つの区切り文字の組み合わせのパターンのそれぞれを記憶し、特定した２つの区切り文字の組み合わせのパターンのそれぞれに対応付ける。図１４の例では、情報処理装置１００は、用意した記憶領域を、区切り文字「＄」と「＄」との組み合わせのパターンに対応付ける。また、情報処理装置１００は、例えば、用意した記憶領域を、区切り文字「＄」と「：」との組み合わせのパターンに対応付ける。また、情報処理装置１００は、例えば、用意した記憶領域を、区切り文字「＄」と「＃」との組み合わせのパターンに対応付ける。また、情報処理装置１００は、例えば、用意した記憶領域を、区切り文字「＄」と「：＃」との組み合わせのパターンに対応付ける。 Then, the information processing apparatus 100 stores each of the identified two delimiter combination patterns in each of the prepared storage areas, and associates each of the two specified delimiter combination patterns with each other. In the example of FIG. 14, the information processing apparatus 100 associates the prepared storage area with a combination pattern of delimiters “$” and “$”. Further, the information processing apparatus 100 associates the prepared storage area with a combination pattern of delimiters “$” and “:”, for example. Further, the information processing apparatus 100 associates the prepared storage area with a combination pattern of delimiters “$” and “#”, for example. Further, the information processing apparatus 100 associates the prepared storage area with a combination pattern of delimiters “$” and “: #”, for example.

ここでは、情報処理装置１００が、２つの区切り文字の組み合わせのパターンのうち、第１のデータ８００において所定の文字「ａ」を挟むために用いられていないパターンについても特定する場合について説明したが、これに限らない。例えば、情報処理装置１００は、第１のデータ８００において所定の文字「ａ」を挟むために用いられていないパターンについて特定しなくてもよい。この場合、情報処理装置１００は、区切り文字の種類数の二乗個の記憶領域を用意しなくても、第１のデータ８００において所定の文字「ａ」を挟むために用いられているパターンに対応する記憶領域を用意すればよい。ここで、図１５の説明に移行する。 Here, a case has been described in which the information processing apparatus 100 also identifies a pattern that is not used to sandwich the predetermined character “a” in the first data 800 from among the combinations of the two delimiters. Not limited to this. For example, the information processing apparatus 100 may not specify a pattern that is not used to sandwich the predetermined character “a” in the first data 800. In this case, the information processing apparatus 100 corresponds to the pattern used to sandwich the predetermined character “a” in the first data 800 without preparing a storage area of the square number of types of delimiters. What is necessary is just to prepare the memory area to perform. Here, the description shifts to the description of FIG.

図１５において、情報処理装置１００は、第２のデータ１０００を参照して、２つの区切り文字の組み合わせのパターンごとに、当該パターンに対応付けて並べられた文字列を抽出する。そして、情報処理装置１００は、２つの区切り文字の組み合わせのパターンごとに、当該パターンに対応付けられた記憶領域に、第２のデータ１０００において当該パターンの組み合わせに対応付けて並べられた文字列を記憶する。 In FIG. 15, the information processing apparatus 100 refers to the second data 1000 and extracts a character string arranged in association with the pattern for each combination pattern of two delimiters. Then, for each pattern of the combination of two delimiters, the information processing apparatus 100 stores a character string arranged in association with the combination of the patterns in the second data 1000 in the storage area associated with the pattern. Remember.

情報処理装置１００は、例えば、区切り文字「＄」と「：」との組み合わせのパターンに対応付けられた記憶領域に、第２のデータ１０００において当該パターンに対応付けて並べられた文字列「１１，１１，１１」を記憶する。同様に、情報処理装置１００は、区切り文字「：」と「＃」との組み合わせのパターンに対応付けられた記憶領域に、第２のデータ１０００において当該パターンに対応付けて並べられた文字列「０３，０３，０３」を記憶する。また、情報処理装置１００は、区切り文字「＄」と「＄」との組み合わせのパターンに対応付けた記憶領域には、第２のデータ１０００において当該パターンに対応付けて並べられた文字列がないため、文字列を記憶しない。ここで、図１６の説明に移行する。 For example, the information processing apparatus 100 stores the character string “11” arranged in association with the pattern in the second data 1000 in the storage area associated with the combination pattern of the delimiters “$” and “:”. , 11, 11 ". Similarly, the information processing apparatus 100 stores in the storage area associated with the combination pattern of the delimiters “:” and “#” the character string “ 03, 03, 03 "is stored. Further, the information processing apparatus 100 has no character string arranged in association with the pattern in the second data 1000 in the storage area associated with the combination pattern of the delimiters “$” and “$”. Therefore, the character string is not stored. Here, the description shifts to the description of FIG.

図１６において、情報処理装置１００は、２つの区切り文字の組み合わせのパターンに対応する記憶領域に記憶された文字列のいずれかの挿入先となる、第１のデータ８００において当該パターンの組み合わせの２つの区切り文字に挟まれた箇所を特定する。 In FIG. 16, the information processing apparatus 100 uses the combination 2 of the pattern in the first data 800 as the insertion destination of one of the character strings stored in the storage area corresponding to the combination pattern of the two delimiters. Specify the location between two delimiters.

図１６の例では、情報処理装置１００は、例えば、第１のデータ８００のうち所定の文字「ａ」がある箇所を探索する。情報処理装置１００は、具体的には、第１のデータ８００のうち１番目に出現する文字「ａ」がある箇所であって、区切り文字「＄」と「：」とに挟まれた箇所を特定する。ここで、図１７の説明に移行する。 In the example of FIG. 16, for example, the information processing apparatus 100 searches for a place where the predetermined character “a” is present in the first data 800. Specifically, the information processing apparatus 100 detects a portion where the first character “a” appears in the first data 800 and is sandwiched between delimiters “$” and “:”. Identify. Here, the description shifts to FIG.

図１７において、情報処理装置１００は、特定した文字「ａ」がある箇所を挟む２つの区切り文字の組み合わせのパターンを特定する。情報処理装置１００は、区切り文字「＄」と「：」との組み合わせのパターンを特定すると、特定した区切り文字「＄」と「：」との組み合わせのパターンに対応付けた記憶領域に記憶された文字列「１１，１１，１１」の先頭の文字列「１１」を読み出す。そして、情報処理装置１００は、読み出した文字列「１１」を、特定した箇所にある文字「ａ」の代わりに、第１のデータ８００に挿入する。ここで、図１８の説明に移行する。 In FIG. 17, the information processing apparatus 100 specifies a combination pattern of two delimiters that sandwich a portion where the specified character “a” is present. When the information processing apparatus 100 specifies the combination pattern of the delimiters “$” and “:”, the information processing apparatus 100 stores the combination pattern in association with the specified combination pattern of the delimiters “$” and “:”. The first character string “11” of the character string “11, 11, 11” is read. Then, the information processing apparatus 100 inserts the read character string “11” into the first data 800 instead of the character “a” at the specified location. Here, the description shifts to the description of FIG.

図１８において、情報処理装置１００は、特定した区切り文字「＄」と「：」との組み合わせのパターンに対応付けた記憶領域に記憶された文字列「１１，１１，１１」のうち、読み出し済みである先頭の文字列「１１」を削除する。ここで、図１９の説明に移行する。 In FIG. 18, the information processing apparatus 100 has read out the character strings “11, 11, 11” stored in the storage area associated with the combination pattern of the specified delimiters “$” and “:”. The first character string “11” is deleted. Here, the description shifts to FIG.

図１９において、情報処理装置１００は、同様に、第１のデータ８００のうち２番目に出現する文字「ａ」がある箇所であって、区切り文字「：」と「＃」とに挟まれた箇所を特定する。情報処理装置１００は、特定した文字「ａ」がある箇所を挟む２つの区切り文字の組み合わせのパターンを特定する。情報処理装置１００は、区切り文字「：」と「＃」との組み合わせのパターンを特定すると、特定した区切り文字「：」と「＃」との組み合わせのパターンに対応付けた記憶領域に記憶された文字列「０３，０３，０３」の先頭の文字列「０３」を読み出す。そして、情報処理装置１００は、読み出した文字列「０３」を、特定した箇所にある文字「ａ」の代わりに、第１のデータ８００に挿入する。 In FIG. 19, similarly, the information processing apparatus 100 is a portion where the character “a” appears second in the first data 800 and is sandwiched between delimiters “:” and “#”. Identify the location. The information processing apparatus 100 specifies a combination pattern of two delimiters that sandwich a portion where the specified character “a” is located. When the information processing apparatus 100 identifies the combination pattern of the delimiters “:” and “#”, the information processing apparatus 100 stores the pattern in the storage area associated with the identified combination pattern of the delimiters “:” and “#”. The first character string “03” of the character string “03, 03, 03” is read. Then, the information processing apparatus 100 inserts the read character string “03” into the first data 800 instead of the character “a” at the specified location.

情報処理装置１００は、特定した区切り文字「：」と「＃」との組み合わせのパターンに対応付けた記憶領域に記憶された文字列「０３，０３，０３」のうち、読み出し済みである先頭の文字列「０３」を削除する。ここで、図２０の説明に移行する。 The information processing apparatus 100 reads the first read out character string “03, 03, 03” stored in the storage area associated with the combination pattern of the specified delimiters “:” and “#”. The character string “03” is deleted. Here, the description shifts to the description of FIG.

図２０において、情報処理装置１００は、同様に、第１のデータ８００のうち３番目に出現する文字「ａ」がある箇所であって、区切り文字「＃」と「：＃」とに挟まれた箇所を特定する。情報処理装置１００は、特定した文字「ａ」がある箇所を挟む２つの区切り文字の組み合わせのパターンを特定する。情報処理装置１００は、特定した区切り文字「＃」と「：＃」との組み合わせのパターンに対応付けた記憶領域に記憶された文字列「ｌｏａｄ，ｌｏａｄ，ｌｏａｄ，ｐｒｏｔｏｃｏｌ」の先頭の文字列「ｌｏａｄ」を読み出す。そして、情報処理装置１００は、読み出した文字列「ｌｏａｄ」を、特定した箇所にある文字「ａ」の代わりに、第１のデータ８００に挿入する。 In FIG. 20, similarly, the information processing apparatus 100 is a portion where the third character “a” appears in the first data 800 and is sandwiched between delimiters “#” and “: #”. Identify the location. The information processing apparatus 100 specifies a combination pattern of two delimiters that sandwich a portion where the specified character “a” is located. The information processing apparatus 100 uses the first character string “load, load, load, protocol” stored in the storage area associated with the combination pattern of the specified delimiters “#” and “: #”. read "load". The information processing apparatus 100 inserts the read character string “load” into the first data 800 instead of the character “a” at the specified location.

情報処理装置１００は、特定した区切り文字「＃」と「：＃」との組み合わせのパターンに対応付けた記憶領域に記憶された文字列「ｌｏａｄ，ｌｏａｄ，ｌｏａｄ，ｐｒｏｔｏｃｏｌ」のうち、読み出し済みである先頭の文字列「ｌｏａｄ」を削除する。ここで、図２１の説明に移行する。 The information processing apparatus 100 has already read out the character string “load, load, load, protocol” stored in the storage area associated with the combination pattern of the specified delimiters “#” and “: #”. Delete some leading character string “load”. Here, the description shifts to the description of FIG.

図２１において、情報処理装置１００は、同様に、第１のデータ８００のうち４番目に出現する文字「ａ」がある箇所であって、区切り文字「：＃」と「＃」とに挟まれた箇所を特定する。情報処理装置１００は、特定した文字「ａ」がある箇所を挟む２つの区切り文字の組み合わせのパターンを特定する。情報処理装置１００は、特定した区切り文字「：＃」と「＃」との組み合わせのパターンに対応付けた記憶領域に記憶された文字列「ｊｅｒａｓｕｒｅ，ｌｒｃ」の先頭の文字列「ｊｅｒａｓｕｒｅ」を読み出す。そして、情報処理装置１００は、読み出した文字列「ｊｅｒａｓｕｒｅ」を、特定した箇所にある文字「ａ」の代わりに、第１のデータ８００に挿入する。 In FIG. 21, similarly, the information processing apparatus 100 is a portion where the character “a” that appears fourth in the first data 800 is present, and is sandwiched between delimiters “: #” and “#”. Identify the location. The information processing apparatus 100 specifies a combination pattern of two delimiters that sandwich a portion where the specified character “a” is located. The information processing apparatus 100 reads the first character string “jeraseure” of the character string “jeraseure, lrc” stored in the storage area associated with the specified combination pattern of the delimiters “: #” and “#”. . Then, the information processing apparatus 100 inserts the read character string “jeraseure” into the first data 800 instead of the character “a” at the specified location.

情報処理装置１００は、特定した区切り文字「：＃」と「＃」との組み合わせのパターンに対応付けた記憶領域に記憶された文字列「ｊｅｒａｓｕｒｅ，ｌｒｃ」のうち、読み出し済みである先頭の文字列「ｊｅｒａｓｕｒｅ」を削除する。ここで、図２２の説明に移行する。 The information processing apparatus 100 reads the first character that has been read out of the character string “jeraseure, lrc” stored in the storage area associated with the combination pattern of the specified delimiters “: #” and “#” Delete the column “jeraseure”. Here, the description shifts to the description of FIG.

図２２において、情報処理装置１００は、同様に、第１のデータ８００における文字「ａ」を置き換えていく。情報処理装置１００は、第１のデータ８００を「＄１１：０３＃ｌｏａｄ：＃ｊｅｒａｓｕｒｅ＃ｌｏａｄ：＃ｌｒｃ＃ｌｏａｄ：＃ｉｓａ＄１１：０３＃ｍｏｎｍａｐ＄１１：０３＃ａｄｄｉｎｇ＃ａｕｔｈ＃ｐｒｏｔｏｃｏｌ：＃ｎｏｎｅ＄」に変形する。ここで、図２３の説明に移行する。 In FIG. 22, the information processing apparatus 100 similarly replaces the character “a” in the first data 800. The information processing apparatus 100 stores the first data 800 as “$ 11: 03 # load: # jerase # load: # lrc # load: # isa $ 11: 03 # monmap $ 11: 03 # adding # auth # protocol: # "none $". Here, the description shifts to the description of FIG.

図２３において、情報処理装置１００は、第１のデータ８００の先頭に付与した改行を示す制御文字を削除し、圧縮対象データ４００を復元する。これにより、情報処理装置１００は、圧縮効率を向上させて圧縮された圧縮ファイル１１００からでも、圧縮対象データ４００を元に戻すことができる。 In FIG. 23, the information processing apparatus 100 deletes the control character indicating the line feed added to the head of the first data 800 and restores the compression target data 400. As a result, the information processing apparatus 100 can restore the compression target data 400 even from the compressed file 1100 compressed with improved compression efficiency.

ここでは、圧縮対象データ４００を圧縮する情報処理装置１００と、圧縮対象データ４００を復元する情報処理装置１００とが、同じ装置である場合について説明したが、これに限らない。例えば、圧縮対象データ４００を圧縮する情報処理装置１００と、圧縮対象データ４００を復元する情報処理装置１００とは、別の装置であってもよい。この場合、圧縮対象データ４００を圧縮する情報処理装置１００と、圧縮対象データ４００を復元する情報処理装置１００とは、２つの区切り文字の組み合わせのパターンを特定する際には、同じ組み合わせルールを用いることになる。 Although the information processing apparatus 100 that compresses the compression target data 400 and the information processing apparatus 100 that restores the compression target data 400 have been described here as being the same apparatus, the present invention is not limited thereto. For example, the information processing apparatus 100 that compresses the compression target data 400 and the information processing apparatus 100 that restores the compression target data 400 may be different apparatuses. In this case, the information processing apparatus 100 that compresses the compression target data 400 and the information processing apparatus 100 that restores the compression target data 400 use the same combination rule when specifying a combination pattern of two delimiters. It will be.

（圧縮処理手順の一例）
次に、図２４を用いて、圧縮処理手順の一例について説明する。 (Example of compression processing procedure)
Next, an example of the compression processing procedure will be described with reference to FIG.

図２４は、圧縮処理手順の一例を示すフローチャートである。図２４において、情報処理装置１００は、圧縮対象データ４００となるテキストファイルの入力を受け付ける（ステップＳ２４０１）。 FIG. 24 is a flowchart illustrating an example of the compression processing procedure. In FIG. 24, the information processing apparatus 100 receives an input of a text file that is the compression target data 400 (step S2401).

次に、情報処理装置１００は、テキストファイルを、区切り文字と、区切り文字以外の文字列とに分割する（ステップＳ２４０２）。そして、情報処理装置１００は、区切り文字と区切り文字との間に所定の文字を挟むように、すべての区切り文字を並べて記録する（ステップＳ２４０３）。 Next, the information processing apparatus 100 divides the text file into a delimiter character and a character string other than the delimiter character (step S2402). Then, the information processing apparatus 100 records all the delimiters side by side so that a predetermined character is sandwiched between the delimiters and the delimiter (step S2403).

次に、情報処理装置１００は、すべての文字列を、前後に出現する２つの区切り文字の組み合わせのパターンごとにソートする（ステップＳ２４０４）。そして、情報処理装置１００は、ソート後のすべての文字列を、前後に出現する２つの区切り文字の組み合わせのパターンを特定可能に記録する（ステップＳ２４０５）。 Next, the information processing apparatus 100 sorts all the character strings for each pattern of combinations of two delimiters that appear before and after (step S2404). Then, the information processing apparatus 100 records all the sorted character strings in such a manner that a combination pattern of two delimiters appearing before and after can be specified (step S2405).

次に、情報処理装置１００は、記録した区切り文字と、記録した文字列とに対してＬＺＭＡによる圧縮を行う（ステップＳ２４０６）。そして、情報処理装置１００は、圧縮により得られた圧縮データを出力し（ステップＳ２４０７）、圧縮処理を終了する。これにより、情報処理装置１００は、圧縮対象データ４００を復元可能に圧縮することができる。 Next, the information processing apparatus 100 performs LZMA compression on the recorded delimiter character and the recorded character string (step S2406). Then, the information processing apparatus 100 outputs the compressed data obtained by the compression (step S2407) and ends the compression process. Thereby, the information processing apparatus 100 can compress the compression target data 400 so that it can be restored.

（復元処理手順の一例）
次に、図２５を用いて、復元処理手順の一例について説明する。 (Example of restoration processing procedure)
Next, an example of the restoration processing procedure will be described with reference to FIG.

図２５は、復元処理手順の一例を示すフローチャートである。図２５において、情報処理装置１００は、文字列配列ｄｅｌ［］、文字列配列ｄａｔａ［］［］［］、整数配列ＩＤ［］、整数ｉ、および整数配列ｊ［］［］を用意する（ステップＳ２５０１）。 FIG. 25 is a flowchart illustrating an example of the restoration processing procedure. In FIG. 25, the information processing apparatus 100 prepares a character string array del [], a character string array data [] [] [], an integer array ID [], an integer i, and an integer array j [] [] (step S2501).

文字列配列ｄｅｌ［］は、読み込んだ区切り文字が順に格納される配列である。文字列配列ｄｅｌ［］のインデックスは、読み込んだ区切り文字に割り振られる通し番号であり、その区切り文字が復元データの先頭から何番目に存在するかを示す番号である。例えば、ｄｅｌ［１］は、復元データの先頭から１番目に存在する区切り文字が格納される。 The character string array del [] is an array in which the read delimiter characters are stored in order. The index of the character string array del [] is a serial number assigned to the read delimiter character, and is a number indicating the number of the delimiter character from the beginning of the restored data. For example, del [1] stores a delimiter that exists first from the beginning of the restored data.

文字列配列ｄａｔａ［］［］［］は、読み込んだ文字列が順に格納される配列である。文字列配列ｄａｔａ［］［］［］の１番目のインデックスは、読み込んだ文字列の直前に出現する区切り文字の種類に対して割り振られる番号である。文字列配列ｄａｔａ［］［］［］の２番目のインデックスは、読み込んだ文字列の直後に出現する区切り文字の種類に対して割り振られる番号である。 The character string array data [] [] [] is an array in which the read character strings are stored in order. The first index of the character string array data [] [] [] is a number assigned to the type of delimiter that appears immediately before the read character string. The second index of the character string array data [] [] [] is a number assigned to the type of delimiter character that appears immediately after the read character string.

文字列配列ｄａｔａ［］［］［］の３番目のインデックスは、１，２番目のインデックスに対応する種類の区切り文字の組み合わせに挟まれた文字列に、順に割り振られる通し番号である。例えば、区切り文字「＄」の種類に対して割り振られた番号が「１」であれば、ｄａｔａ［１］［１］［１］は、２つの区切り文字「＄」の組み合わせに挟まれた文字列のうち、復元データの先頭から１番目に存在する文字列が格納される。 The third index of the character string array data [] [] [] is a serial number that is sequentially assigned to a character string sandwiched between combinations of delimiters of the types corresponding to the first and second indexes. For example, if the number assigned to the type of delimiter “$” is “1”, data [1] [1] [1] is a character sandwiched between two delimiters “$”. Among the columns, the character string existing first from the beginning of the restored data is stored.

整数配列ＩＤ［］は、区切り文字の種類に対して割り振られる番号が格納される配列である。整数配列ＩＤ［］のインデックスは、区切り文字に割り振られる通し番号であり、その区切り文字が復元データの先頭から何番目に存在するかを示す番号である。例えば、ＩＤ［１］は、復元データの先頭から１番目に存在する文字列の種類に対して割り振られた番号が格納される。 The integer array ID [] is an array in which numbers assigned to the types of delimiters are stored. The index of the integer array ID [] is a serial number assigned to the delimiter character, and is a number indicating the number of the delimiter character from the beginning of the restored data. For example, ID [1] stores a number assigned to the type of character string existing first from the top of the restored data.

整数ｉは、復元データの先頭から何番目に存在する区切り文字を指定するかを示す番号である。整数ｉは初期値１である。 The integer i is a number indicating the number of delimiters existing from the beginning of the restored data. The integer i is the initial value 1.

整数配列ｊ［］［］は、区切り文字の組み合わせごとに、その組み合わせに挟まれた文字列のうち、何番目の文字列を指定するかを示す番号が格納される配列である。整数配列ｊ［］［］の１番目のインデックスは、区切り文字の組み合わせの前側の区切り文字の種類に対して割り振られる番号である。整数配列ｊ［］［］の２番目のインデックスは、区切り文字の組み合わせの後ろ側の区切り文字の種類に対して割り振られる番号である。 The integer array j [] [] is an array that stores, for each combination of delimiters, a number indicating the number of character strings to be specified among the character strings sandwiched between the combinations. The first index of the integer array j [] [] is a number assigned to the type of delimiter on the front side of the delimiter combination. The second index of the integer array j [] [] is a number assigned to the type of delimiter behind the combination of delimiters.

ｊ［］［］は、それぞれ、初期値１である。例えば、区切り文字「＄」の種類に対して割り振られた番号が「１」であれば、ｊ［１］［１］は、２つの区切り文字「＄」の組み合わせに挟まれた文字列のうち、何番目の文字列を指定するかを示す番号が格納される。 j [] [] is an initial value of 1, respectively. For example, if the number assigned to the type of delimiter “$” is “1”, j [1] [1] is a character string between two combinations of delimiters “$”. , A number indicating which character string is to be specified is stored.

次に、情報処理装置１００は、圧縮データに対してＬＺＭＡによる復号を行い、すべての区切り文字と、すべての文字列とを並べた復元データを取得する（ステップＳ２５０２）。そして、情報処理装置１００は、すべての区切り文字を、順番にｄｅｌ［］に読み込む（ステップＳ２５０３）。 Next, the information processing apparatus 100 performs decoding by LZMA on the compressed data, and obtains restored data in which all delimiters and all character strings are arranged (step S2502). Then, the information processing apparatus 100 reads all delimiters in del [] in order (step S2503).

次に、情報処理装置１００は、すべての文字列を、前後に出現する２つの区切り文字の組み合わせのパターンごとに分類して、順番にｄａｔａ［］［］［］に読み込む（ステップＳ２５０４）。そして、情報処理装置１００は、ｄｅｌ［ｉ］を記録して、ｉに１を加算する（ステップＳ２５０５）。 Next, the information processing apparatus 100 classifies all the character strings into patterns of combinations of two delimiters that appear before and after, and sequentially reads the data into data [] [] [] (step S2504). The information processing apparatus 100 records del [i] and adds 1 to i (step S2505).

次に、情報処理装置１００は、ｉが区切り文字の数を超えているか否かを判定する（ステップＳ２５０６）。ここで、区切り文字の数を超えている場合（ステップＳ２５０６：Ｙｅｓ）、情報処理装置１００は、復元処理を終了する。 Next, the information processing apparatus 100 determines whether i exceeds the number of delimiters (step S2506). If the number of delimiters is exceeded (step S2506: Yes), the information processing apparatus 100 ends the restoration process.

一方で、区切り文字の数を超えていない場合（ステップＳ２５０６：Ｎｏ）、情報処理装置１００は、ｄａｔａ［ＩＤ［ｄｅｌ［ｉ−１］］］［ＩＤ［ｄｅｌ［ｉ］］］［ｊ［ＩＤ［ｄｅｌ［ｉ−１］］］［ＩＤ［ｄｅｌ［ｉ］］］］を記録して、ｊ［ＩＤ［ｄｅｌ［ｉ−１］］］［ＩＤ［ｄｅｌ［ｉ］］］に１を加算する（ステップＳ２５０７）。そして、情報処理装置１００は、ステップＳ２５０５の処理に戻る。これにより、情報処理装置１００は、圧縮対象データ４００を復元することができる。 On the other hand, when the number of delimiters has not been exceeded (step S2506: No), the information processing apparatus 100 uses data [ID [del [i-1]]] [ID [del [i]]] [j [ID. [Del [i-1]]] [ID [del [i]]]] is recorded, and 1 is added to j [ID [del [i-1]]] [ID [del [i]]]. (Step S2507). Then, the information processing apparatus 100 returns to the process of step S2505. Thereby, the information processing apparatus 100 can restore the compression target data 400.

以上説明したように、情報処理装置１００によれば、圧縮対象データ４００から、所定の区切り文字と、２つの所定の区切り文字に挟まれた文字列とを抽出することができる。また、情報処理装置１００によれば、区切り文字が圧縮対象データ４００に出現した順序を特定可能に区切り文字を並べた第１のデータ８００を生成することができる。また、情報処理装置１００によれば、２つの区切り文字の組み合わせの種類ごとに、当該種類の組み合わせの２つの区切り文字に挟まれた文字列が圧縮対象データ４００に出現した順序を特定可能に文字列を並べた第２のデータ１０００を生成することができる。また、情報処理装置１００によれば、生成した第１のデータ８００と第２のデータ１０００とに対してスライド辞書法を用いた圧縮を行うことができる。これにより、情報処理装置１００は、圧縮対象データ４００に戻すことが可能である第１のデータ８００と第２のデータ１０００とを生成することができ、第２のデータ１０００においては同じ文字列が比較的近い位置に連続して現れやすいようにすることができる。結果として、情報処理装置１００は、開始位置を示すために用いるビット数を少なくしやすく、また、同じ文字列を見つけやすく圧縮しやすいようにして、圧縮効率の向上を図ることができる。 As described above, the information processing apparatus 100 can extract a predetermined delimiter character and a character string sandwiched between two predetermined delimiter characters from the compression target data 400. Further, according to the information processing apparatus 100, it is possible to generate the first data 800 in which the delimiters are arranged so that the order in which the delimiters appear in the compression target data 400 can be specified. Further, according to the information processing apparatus 100, for each type of combination of two delimiters, a character string that can specify the order in which a character string sandwiched between two delimiters of the type combination appears in the compression target data 400. Second data 1000 in which columns are arranged can be generated. Further, according to the information processing apparatus 100, the generated first data 800 and second data 1000 can be compressed using a slide dictionary method. Thereby, the information processing apparatus 100 can generate the first data 800 and the second data 1000 that can be returned to the compression target data 400, and the same character string is included in the second data 1000. It can be made easy to appear continuously in relatively close positions. As a result, the information processing apparatus 100 can improve the compression efficiency by easily reducing the number of bits used to indicate the start position and making it easy to find and compress the same character string.

また、情報処理装置１００によれば、所定の区切り文字の指定を受け付けることができる。また、情報処理装置１００によれば、圧縮対象データ４００から、指定を受け付けた所定の区切り文字と、指定を受け付けた２つの区切り文字に挟まれた文字列を抽出することができる。これにより、情報処理装置１００は、圧縮対象データ４００の形式に沿った区切り文字を用いやすくなり、第２のデータ１０００において同じ文字列が比較的近くに集まりやすくすることができ、圧縮効率の向上を図ることができる。 Further, according to the information processing apparatus 100, it is possible to accept designation of a predetermined delimiter character. Further, according to the information processing apparatus 100, it is possible to extract from the compression target data 400 a predetermined delimiter character that has received a designation and a character string that is sandwiched between two delimiter characters that have accepted a designation. As a result, the information processing apparatus 100 can easily use delimiters in accordance with the format of the compression target data 400, and can easily collect the same character strings in the second data 1000, thereby improving compression efficiency. Can be achieved.

また、情報処理装置１００によれば、圧縮対象データ４００に出現する区切り文字の種類数が閾値より大きい場合、圧縮対象データ４００を、出現する区切り文字の種類数が閾値以下になる部分データに分割することができる。また、情報処理装置１００によれば、区切り文字が部分データに出現した順序を特定可能に区切り文字を並べた第１のデータ８００を生成することができる。また、情報処理装置１００によれば、２つの区切り文字の組み合わせの種類ごとに、当該種類の組み合わせの２つの区切り文字に挟まれた文字列が部分データに出現した順序を特定可能に文字列を並べた第２のデータ１０００を生成することができる。これにより、情報処理装置１００は、２つの区切り文字の組み合わせの種類ごとに、当該種類の組み合わせの２つの区切り文字に挟まれた文字列を分類する際に用いられる記憶領域のサイズを抑制し、記憶領域を節約するとともに効率よく処理を行うことができる。 Further, according to the information processing apparatus 100, when the number of types of delimiters appearing in the compression target data 400 is larger than the threshold, the compression target data 400 is divided into partial data in which the number of types of delimiters appearing is equal to or less than the threshold. can do. Further, according to the information processing apparatus 100, it is possible to generate the first data 800 in which the delimiters are arranged so that the order in which the delimiters appear in the partial data can be specified. Further, according to the information processing apparatus 100, for each type of combination of two delimiters, a character string is specified so that the order in which the character string sandwiched between the two delimiters of that type combination appears in the partial data can be specified. The arranged second data 1000 can be generated. Thereby, the information processing apparatus 100 suppresses the size of the storage area used when classifying the character string sandwiched between the two delimiters of the combination of the types for each type of combination of the two delimiters. It is possible to save the storage area and perform processing efficiently.

また、情報処理装置１００によれば、区切り文字を、圧縮対象データ４００に出現した順序に沿って所定の文字と交互に並べた第１のデータ８００を生成することができる。情報処理装置１００によれば、２つの区切り文字の組み合わせの種類ごとに、当該種類の組み合わせの２つの区切り文字に挟まれた文字列を、圧縮対象データ４００に出現した順序に沿って所定の文字と交互に並べた第２のデータ１０００を生成することができる。これにより、情報処理装置１００は、区切り文字が２文字以上であっても、区切り文字と区切り文字との境界を判別可能に第１のデータ８００を生成することができる。また、情報処理装置１００は、文字列と文字列との境界を判別可能に第２のデータ１０００を生成することができる。 Further, according to the information processing apparatus 100, it is possible to generate the first data 800 in which delimiters are alternately arranged with predetermined characters in the order in which they appear in the compression target data 400. According to the information processing apparatus 100, for each type of combination of two delimiters, a character string sandwiched between two delimiters of the combination of the types is a predetermined character in the order of appearance in the compression target data 400. The second data 1000 arranged alternately can be generated. Thereby, the information processing apparatus 100 can generate the first data 800 so that the boundary between the delimiter and the delimiter can be determined even if the delimiter is two or more. The information processing apparatus 100 can generate the second data 1000 so that the boundary between the character string and the character string can be determined.

また、情報処理装置１００によれば、ＬＺ７７またはＬＺＭＡの圧縮アルゴリズムを用いた圧縮を行うことができる。これにより、情報処理装置１００は、第２のデータ１０００に対する圧縮に用いることが好ましいＬＺ７７またはＬＺＭＡの圧縮アルゴリズムを用いることができ、圧縮データのサイズの低減化を図ることができる。 Further, according to the information processing apparatus 100, compression using an LZ77 or LZMA compression algorithm can be performed. Thereby, the information processing apparatus 100 can use the compression algorithm of LZ77 or LZMA that is preferably used for compression of the second data 1000, and can reduce the size of the compressed data.

また、情報処理装置１００によれば、区切り文字に文字数が２以上の文字列を含むようにすることができる。これにより、情報処理装置１００は、区切り文字の種類数を増大させやすくし、２つの区切り文字の組み合わせの種類数を増大させやすくすることができる。このため、情報処理装置１００は、同じ種類の組み合わせの２つの区切り文字に挟まれた文字列が同じ属性の文字列になりやすくし、第２のデータ１０００において同じ文字列が比較的近くに集まりやすくすることができ、圧縮効率の向上を図ることができる。 Moreover, according to the information processing apparatus 100, the delimiter can include a character string having two or more characters. As a result, the information processing apparatus 100 can easily increase the number of types of delimiters, and can easily increase the number of types of combinations of two delimiters. For this reason, the information processing apparatus 100 makes it easy for a character string sandwiched between two delimiters of the same type of combination to be a character string having the same attribute, and the same character strings are gathered relatively close in the second data 1000. The compression efficiency can be improved.

また、情報処理装置１００によれば、区切り文字の文字数は１であるとし、区切り文字に文字数が２以上の文字列を含まないようにすることができる。これにより、情報処理装置１００は、区切り文字の種類数を増大しにくくし、２つの区切り文字の組み合わせの種類数を増大しにくくすることができる。このため、情報処理装置１００は、２つの区切り文字の組み合わせの種類ごとに、当該種類の組み合わせの２つの区切り文字に挟まれた文字列を分類する際に用いられる記憶領域のサイズを抑制し、記憶領域を節約するとともに効率よく処理を行うことができる。また、情報処理装置１００は、区切り文字の文字数を１に固定するため、第１のデータ８００において区切り文字と区切り文字との境界を明示しなくてもよくなり、第１のデータ８００のサイズの低減化を図ることができる。 Further, according to the information processing apparatus 100, the number of characters of the delimiter is 1, and the delimiter can be configured not to include a character string having two or more characters. Thereby, the information processing apparatus 100 can hardly increase the number of types of delimiters and can hardly increase the number of types of combinations of two delimiters. For this reason, the information processing apparatus 100 suppresses the size of the storage area used when classifying the character string sandwiched between the two delimiters of the combination of the types for each type of combination of the two delimiters, It is possible to save the storage area and perform processing efficiently. In addition, since the information processing apparatus 100 fixes the number of characters of the delimiter to 1, it is not necessary to clearly indicate the boundary between the delimiter and the delimiter in the first data 800, and the size of the first data 800 is reduced. Reduction can be achieved.

また、情報処理装置１００によれば、区切り文字に制御文字を含むようにすることができる。これにより、情報処理装置１００は、２つの区切り文字の組み合わせの種類ごとに文字列を分類するときに、制御文字を含む組み合わせの２つの区切り文字に挟まれた文字列についても分類することができる。結果として、情報処理装置１００は、第２のデータ１０００において同じ文字列が比較的近くに集まりやすくすることができ、圧縮効率の向上を図ることができる。 Further, according to the information processing apparatus 100, the delimiter can include a control character. Accordingly, when the information processing apparatus 100 classifies the character strings for each combination type of the two delimiters, the information processing apparatus 100 can also classify the character strings sandwiched between the two delimiters of the combination including the control character. . As a result, the information processing apparatus 100 can easily gather the same character strings in the second data 1000 relatively close to each other, and can improve the compression efficiency.

また、情報処理装置１００によれば、圧縮対象データ４００の先頭には改行を示す制御文字が出現するものとして扱い、第２のデータ１０００を生成することができる。これにより、情報処理装置１００は、２つの区切り文字の組み合わせの種類ごとに文字列を分類するときに、圧縮対象データ４００の先頭の文字列についても分類することができる。 Further, according to the information processing apparatus 100, it is possible to generate the second data 1000 by treating the compression target data 400 as if a control character indicating a line feed appears at the top. Thus, the information processing apparatus 100 can also classify the first character string of the compression target data 400 when classifying the character string for each type of combination of two delimiters.

また、情報処理装置１００によれば、第１のデータ８００と第２のデータ１０００とにスライド辞書法を用いた圧縮を行って得られた圧縮データを復号することができる。情報処理装置１００によれば、復号によって得られた第２のデータ１０００を参照し、復号によって得られた第１のデータ８００における２つの区切り文字の間に、当該２つの区切り文字の組み合わせの種類に対応付けられた文字列を順次挿入することができる。これにより、情報処理装置１００は、効率よく圧縮された圧縮データから、圧縮元である圧縮対象データ４００を復元することができる。 Further, according to the information processing apparatus 100, it is possible to decode the compressed data obtained by compressing the first data 800 and the second data 1000 using the slide dictionary method. According to the information processing apparatus 100, the second data 1000 obtained by decoding is referred to, and the combination type of the two delimiters is between two delimiters in the first data 800 obtained by the decryption. Character strings associated with can be sequentially inserted. Thereby, the information processing apparatus 100 can restore the compression target data 400 that is the compression source from the compressed data that has been efficiently compressed.

なお、本実施の形態で説明した圧縮方法は、予め用意されたプログラムをパーソナル・コンピュータやワークステーション等のコンピュータで実行することにより実現することができる。本圧縮プログラムは、ハードディスク、フレキシブルディスク、ＣＤ−ＲＯＭ、ＭＯ、ＤＶＤ等のコンピュータで読み取り可能な記録媒体に記録され、コンピュータによって記録媒体から読み出されることによって実行される。また本圧縮プログラムは、インターネット等のネットワークを介して配布してもよい。 The compression method described in this embodiment can be realized by executing a program prepared in advance on a computer such as a personal computer or a workstation. This compression program is recorded on a computer-readable recording medium such as a hard disk, a flexible disk, a CD-ROM, an MO, and a DVD, and is executed by being read from the recording medium by the computer. The compression program may be distributed via a network such as the Internet.

上述した実施の形態に関し、さらに以下の付記を開示する。 The following additional notes are disclosed with respect to the embodiment described above.

（付記１）コンピュータに、
圧縮対象データから、所定の区切り文字と、２つの前記区切り文字に挟まれた文字列とを抽出し、
前記区切り文字が前記圧縮対象データに出現した順序を特定可能に前記区切り文字を並べた第１のデータと、２つの前記区切り文字の組み合わせの種類ごとに、当該種類の前記組み合わせの２つの前記区切り文字に挟まれた前記文字列が前記圧縮対象データに出現した順序を特定可能に前記文字列を並べた第２のデータとを生成し、
生成した前記第１のデータと前記第２のデータとに対してスライド辞書法を用いた圧縮を行う、
処理を実行させることを特徴とする圧縮プログラム。 (Supplementary note 1)
Extracting a predetermined delimiter and a character string sandwiched between the two delimiters from the compression target data,
For each combination type of the first data in which the delimiter characters are arranged so that the order in which the delimiter characters appear in the compression target data can be specified and two delimiter characters, the two delimiters of the combination of the type Generating the second data in which the character string is arranged so that the order in which the character string sandwiched between characters appears in the compression target data can be specified;
Compressing the generated first data and the second data using a slide dictionary method;
A compression program characterized by causing processing to be executed.

（付記２）前記コンピュータに、
前記所定の区切り文字の指定を受け付ける、処理を実行させ、
前記抽出する処理は、
前記圧縮対象データから、指定を受け付けた前記所定の区切り文字と、２つの前記区切り文字に挟まれた文字列を抽出する、ことを特徴とする付記１に記載の圧縮プログラム。 (Supplementary note 2)
Accept the designation of the predetermined delimiter, execute the process,
The extraction process is:
The compression program according to appendix 1, wherein the predetermined delimiter that has received a designation and a character string sandwiched between the two delimiters are extracted from the compression target data.

（付記３）前記コンピュータに、
前記圧縮対象データに出現する前記区切り文字の種類数が閾値より大きい場合、前記圧縮対象データを、出現する前記区切り文字の種類数が閾値以下になる部分データに分割する、処理を実行させ、
前記生成する処理は、
前記区切り文字が前記部分データに出現した順序を特定可能に前記区切り文字を並べた第１のデータと、前記部分データに出現する２つの前記区切り文字の組み合わせの種類ごとに、当該種類の前記組み合わせの２つの前記区切り文字に挟まれた前記文字列が前記部分データに出現した順序を特定可能に前記文字列を並べた第２のデータとを生成する、ことを特徴とする付記１または２に記載の圧縮プログラム。 (Supplementary note 3)
When the number of types of delimiters appearing in the compression target data is greater than a threshold, the compression target data is divided into partial data in which the number of types of delimiters appearing is equal to or less than the threshold,
The process to generate is
For each type of combination of first data in which the delimiter characters are arranged so that the order in which the delimiter characters appear in the partial data can be specified and two delimiter characters that appear in the partial data, the combination of the types (2) generating the second data in which the character string is arranged so that the order in which the character string sandwiched between the two delimiter characters appears in the partial data can be specified The described compression program.

（付記４）前記生成する処理は、
前記区切り文字を、前記圧縮対象データに出現した順序に沿って所定の文字と交互に並べた前記第１のデータと、２つの前記区切り文字の組み合わせの種類ごとに、当該種類の前記組み合わせの２つの前記区切り文字に挟まれた前記文字列を、前記圧縮対象データに出現した順序に沿って所定の文字と交互に並べた前記第２のデータとを生成する、ことを特徴とする付記１〜３のいずれか一つに記載の圧縮プログラム。 (Supplementary Note 4) The process to generate is as follows:
For each type of combination of the first data in which the delimiter character is alternately arranged with a predetermined character in the order of appearance in the compression target data and two delimiter characters, 2 of the combination of the type The second data in which the character string sandwiched between two delimiters is alternately arranged with a predetermined character in the order of appearance in the compression target data is generated. 4. The compression program according to any one of 3.

（付記５）前記圧縮は、ＬＺ７７またはＬＺＭＡの圧縮アルゴリズムを用いた圧縮である、ことを特徴とする付記１〜４のいずれか一つに記載の圧縮プログラム。 (Supplementary Note 5) The compression program according to any one of Supplementary Notes 1 to 4, wherein the compression is compression using a compression algorithm of LZ77 or LZMA.

（付記６）前記区切り文字は文字数が２以上の文字列を含む、ことを特徴とする付記１〜５のいずれか一つに記載の圧縮プログラム。 (Supplementary note 6) The compression program according to any one of Supplementary notes 1 to 5, wherein the delimiter includes a character string having two or more characters.

（付記７）前記区切り文字の文字数は１である、ことを特徴とする付記１〜５のいずれか一つに記載の圧縮プログラム。 (Supplementary note 7) The compression program according to any one of Supplementary notes 1 to 5, wherein the number of characters of the delimiter is one.

（付記８）前記区切り文字は制御文字を含む、ことを特徴とする付記１〜７のいずれか一つに記載の圧縮プログラム。 (Supplementary note 8) The compression program according to any one of Supplementary notes 1 to 7, wherein the delimiter includes a control character.

（付記９）前記生成する処理は、
前記圧縮対象データの先頭には改行を示す制御文字が出現するものとして扱い、前記第２のデータを生成する、ことを特徴とする付記１〜８のいずれか一つに記載の圧縮プログラム。 (Supplementary Note 9) The process to generate is
The compression program according to any one of appendices 1 to 8, wherein the compression data is handled as if a control character indicating a line feed appears at the beginning of the data to be compressed, and the second data is generated.

（付記１０）コンピュータが、
圧縮対象データから、所定の区切り文字と、２つの前記区切り文字に挟まれた文字列とを抽出し、
前記区切り文字が前記圧縮対象データに出現した順序を特定可能に前記区切り文字を並べた第１のデータと、２つの前記区切り文字の組み合わせの種類ごとに、当該種類の前記組み合わせの２つの前記区切り文字に挟まれた前記文字列が前記圧縮対象データに出現した順序を特定可能に前記文字列を並べた第２のデータとを生成し、
生成した前記第１のデータと前記第２のデータとに対してスライド辞書法を用いた圧縮を行う、
処理を実行することを特徴とする圧縮方法。 (Supplementary note 10)
Extracting a predetermined delimiter and a character string sandwiched between the two delimiters from the compression target data,
For each combination type of the first data in which the delimiter characters are arranged so that the order in which the delimiter characters appear in the compression target data can be specified and two delimiter characters, the two delimiters of the combination of the type Generating the second data in which the character string is arranged so that the order in which the character string sandwiched between characters appears in the compression target data can be specified;
Compressing the generated first data and the second data using a slide dictionary method;
A compression method characterized by executing processing.

（付記１１）圧縮対象データから、所定の区切り文字と、２つの前記区切り文字に挟まれた文字列とを抽出し、
前記区切り文字が前記圧縮対象データに出現した順序を特定可能に前記区切り文字を並べた第１のデータと、２つの前記区切り文字の組み合わせの種類ごとに、当該種類の前記組み合わせの２つの前記区切り文字に挟まれた前記文字列が前記圧縮対象データに出現した順序を特定可能に前記文字列を並べた第２のデータとを生成し、
生成した前記第１のデータと前記第２のデータとに対してスライド辞書法を用いた圧縮を行う、
制御部を有することを特徴とする情報処理装置。 (Supplementary Note 11) Extracting a predetermined delimiter and a character string sandwiched between the two delimiters from the compression target data,
For each combination type of the first data in which the delimiter characters are arranged so that the order in which the delimiter characters appear in the compression target data can be specified and two delimiter characters, the two delimiters of the combination of the type Generating the second data in which the character string is arranged so that the order in which the character string sandwiched between characters appears in the compression target data can be specified;
Compressing the generated first data and the second data using a slide dictionary method;
An information processing apparatus having a control unit.

（付記１２）コンピュータに、
所定の区切り文字が圧縮対象データに出現した順序を特定可能に前記区切り文字を並べた第１のデータと、２つの前記区切り文字の組み合わせの種類ごとに、当該種類の前記組み合わせの２つの前記区切り文字に挟まれた文字列が前記圧縮対象データに出現した順序を特定可能に前記文字列を並べた第２のデータとにスライド辞書法を用いた圧縮を行って得られた圧縮データを復号し、
復号によって得られた前記第２のデータを参照し、復号によって得られた前記第１のデータにおける２つの前記区切り文字の間に、当該２つの前記区切り文字の前記組み合わせの種類に対応付けられた前記文字列を順次挿入することにより、前記圧縮対象データを生成する、
処理を実行させることを特徴とする復元プログラム。 (Supplementary note 12)
For each type of combination of first data in which the delimiter characters are arranged so that the order in which the predetermined delimiter characters appear in the compression target data can be specified and two delimiter characters, the two delimiters of the combination of the type Decoding compressed data obtained by performing compression using the slide dictionary method on the second data in which the character strings are arranged so that the order in which the character strings sandwiched between characters appear in the compression target data can be specified ,
The second data obtained by decoding is referred to, and the two delimiters in the first data obtained by decoding are associated with the combination type of the two delimiters The compression target data is generated by sequentially inserting the character strings.
A restoration program characterized by causing a process to be executed.

（付記１３）コンピュータが、
所定の区切り文字が圧縮対象データに出現した順序を特定可能に前記区切り文字を並べた第１のデータと、２つの前記区切り文字の組み合わせの種類ごとに、当該種類の前記組み合わせの２つの前記区切り文字に挟まれた文字列が前記圧縮対象データに出現した順序を特定可能に前記文字列を並べた第２のデータとにスライド辞書法を用いた圧縮を行って得られた圧縮データを復号し、
復号によって得られた前記第２のデータを参照し、復号によって得られた前記第１のデータにおける２つの前記区切り文字の間に、当該２つの前記区切り文字の前記組み合わせの種類に対応付けられた前記文字列を順次挿入することにより、前記圧縮対象データを生成する、
処理を実行することを特徴とする復元方法。 (Supplementary note 13)
For each type of combination of first data in which the delimiter characters are arranged so that the order in which the predetermined delimiter characters appear in the compression target data can be specified and two delimiter characters, the two delimiters of the combination of the type Decoding compressed data obtained by performing compression using the slide dictionary method on the second data in which the character strings are arranged so that the order in which the character strings sandwiched between characters appear in the compression target data can be specified ,
The second data obtained by decoding is referred to, and the two delimiters in the first data obtained by decoding are associated with the combination type of the two delimiters The compression target data is generated by sequentially inserting the character strings.
A restoration method characterized by executing processing.

（付記１４）所定の区切り文字が圧縮対象データに出現した順序を特定可能に前記区切り文字を並べた第１のデータと、２つの前記区切り文字の組み合わせの種類ごとに、当該種類の前記組み合わせの２つの前記区切り文字に挟まれた文字列が前記圧縮対象データに出現した順序を特定可能に前記文字列を並べた第２のデータとにスライド辞書法を用いた圧縮を行って得られた圧縮データを復号し、
復号によって得られた前記第２のデータを参照し、復号によって得られた前記第１のデータにおける２つの前記区切り文字の間に、当該２つの前記区切り文字の前記組み合わせの種類に対応付けられた前記文字列を順次挿入することにより、前記圧縮対象データを生成する、
制御部を有することを特徴とする情報処理装置。 (Supplementary Note 14) For each type of combination of first data in which the delimiter characters are arranged so that the order in which the predetermined delimiter characters appear in the compression target data can be specified and two delimiter characters, the combination of the type Compression obtained by performing compression using the slide dictionary method on the second data in which the character strings are arranged so that the order in which the character strings sandwiched between the two delimiters appear in the compression target data can be specified Decrypt the data,
The second data obtained by decoding is referred to, and the two delimiters in the first data obtained by decoding are associated with the combination type of the two delimiters The compression target data is generated by sequentially inserting the character strings.
An information processing apparatus having a control unit.

１００情報処理装置
１１０，４００圧縮対象データ
１２１，１２２，８００，９００，１０００データ
１３０圧縮データ
２００バス
２０１ＣＰＵ
２０２メモリ
２０３ネットワークＩ／Ｆ
２０４ディスクドライブ
２０５ディスク
２０６記録媒体Ｉ／Ｆ
２０７記録媒体
２１０ネットワーク
３０１抽出部
３０２生成部
３０３圧縮部
３０４復号部
３０５復元部
５００圧縮用テーブル
１１００圧縮ファイル
１３００復元用テーブル DESCRIPTION OF SYMBOLS 100 Information processing apparatus 110,400 Compression object data 121,122,800,900,1000 Data 130 Compression data 200 Bus 201 CPU
202 Memory 203 Network I / F
204 Disc drive 205 Disc 206 Recording medium I / F
207 recording medium 210 network 301 extraction unit 302 generation unit 303 compression unit 304 decoding unit 305 restoration unit 500 compression table 1100 compressed file 1300 restoration table

Claims

On the computer,
Extracting a predetermined delimiter and a character string sandwiched between the two delimiters from the compression target data,
For each combination type of the first data in which the delimiter characters are arranged so that the order in which the delimiter characters appear in the compression target data can be specified and two delimiter characters, the two delimiters of the combination of the type Generating the second data in which the character string is arranged so that the order in which the character string sandwiched between characters appears in the compression target data can be specified;
Compressing the generated first data and the second data using a slide dictionary method;
A compression program characterized by causing processing to be executed.

In the computer,
Accept the designation of the predetermined delimiter, execute the process,
The extraction process is:
The compression program according to claim 1, wherein the predetermined delimiter that has received a designation and a character string sandwiched between the two delimiters are extracted from the compression target data.

In the computer,
When the number of types of delimiters appearing in the compression target data is greater than a threshold, the compression target data is divided into partial data in which the number of types of delimiters appearing is equal to or less than the threshold,
The process to generate is
For each type of combination of first data in which the delimiter characters are arranged so that the order in which the delimiter characters appear in the partial data can be specified and two delimiter characters that appear in the partial data, the combination of the types 3. The second data in which the character string is arranged so that the order in which the character string sandwiched between the two delimiters of the character appears in the partial data can be specified is generated. The compression program described in.

The compression program according to claim 1, wherein the delimiter includes a character string having two or more characters.

The compression program according to any one of claims 1 to 4, wherein the number of delimiters is one.

Computer
Extracting a predetermined delimiter and a character string sandwiched between the two delimiters from the compression target data,
For each combination type of the first data in which the delimiter characters are arranged so that the order in which the delimiter characters appear in the compression target data can be specified and two delimiter characters, the two delimiters of the combination of the type Generating the second data in which the character string is arranged so that the order in which the character string sandwiched between characters appears in the compression target data can be specified;
Compressing the generated first data and the second data using a slide dictionary method;
A compression method characterized by executing processing.

Extracting a predetermined delimiter and a character string sandwiched between the two delimiters from the compression target data,
For each combination type of the first data in which the delimiter characters are arranged so that the order in which the delimiter characters appear in the compression target data can be specified and two delimiter characters, the two delimiters of the combination of the type Generating the second data in which the character string is arranged so that the order in which the character string sandwiched between characters appears in the compression target data can be specified;
Compressing the generated first data and the second data using a slide dictionary method;
An information processing apparatus having a control unit.

On the computer,
For each type of combination of first data in which the delimiter characters are arranged so that the order in which the predetermined delimiter characters appear in the compression target data can be specified and two delimiter characters, the two delimiters of the combination of the type Decoding compressed data obtained by performing compression using the slide dictionary method on the second data in which the character strings are arranged so that the order in which the character strings sandwiched between characters appear in the compression target data can be specified ,
The second data obtained by decoding is referred to, and the two delimiters in the first data obtained by decoding are associated with the combination type of the two delimiters The compression target data is generated by sequentially inserting the character strings.
A restoration program characterized by causing a process to be executed.

Computer
For each type of combination of first data in which the delimiter characters are arranged so that the order in which the predetermined delimiter characters appear in the compression target data can be specified and two delimiter characters, the two delimiters of the combination of the type Decoding compressed data obtained by performing compression using the slide dictionary method on the second data in which the character strings are arranged so that the order in which the character strings sandwiched between characters appear in the compression target data can be specified ,
The second data obtained by decoding is referred to, and the two delimiters in the first data obtained by decoding are associated with the combination type of the two delimiters The compression target data is generated by sequentially inserting the character strings.
A restoration method characterized by executing processing.

For each type of combination of first data in which the delimiter characters are arranged so that the order in which the predetermined delimiter characters appear in the compression target data can be specified and two delimiter characters, the two delimiters of the combination of the type Decoding compressed data obtained by performing compression using the slide dictionary method on the second data in which the character strings are arranged so that the order in which the character strings sandwiched between characters appear in the compression target data can be specified ,
The second data obtained by decoding is referred to, and the two delimiters in the first data obtained by decoding are associated with the combination type of the two delimiters The compression target data is generated by sequentially inserting the character strings.
An information processing apparatus having a control unit.