JPWO2011105463A1

JPWO2011105463A1 - Data compression apparatus, data compression method, and program storage medium

Info

Publication number: JPWO2011105463A1
Application number: JP2012501836A
Authority: JP
Inventors: 知生海老山; 光樹祐成; 照之今井
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2010-02-23
Filing date: 2011-02-17
Publication date: 2013-06-20
Also published as: WO2011105463A1

Abstract

効率の良いデータ圧縮率を維持できるデータ圧縮装置等を提供する。データ圧縮装置６００は、圧縮部６０１と、監視部６０２と、タイミング制御部６０３と、辞書生成部６０４とを有している。圧縮部６０１は、圧縮対象のデータを、予め与えられている辞書データに基づいて圧縮する機能を備えている。監視部６０２は、圧縮部６０１が前記データを圧縮する場合における前記辞書データの使用頻度を算出する機能を備えている。タイミング制御部６０３は、算出した前記使用頻度が予め設定された閾値よりも低下した場合に、前記辞書データを更新する指示を出す機能を備えている。辞書生成部６０４は、その指示を受けて、圧縮部１０２が圧縮したデータを利用して前記辞書データを新規に生成する機能を備えている。Provided is a data compression device or the like that can maintain an efficient data compression rate. The data compression apparatus 600 includes a compression unit 601, a monitoring unit 602, a timing control unit 603, and a dictionary generation unit 604. The compression unit 601 has a function of compressing data to be compressed based on dictionary data given in advance. The monitoring unit 602 has a function of calculating the use frequency of the dictionary data when the compression unit 601 compresses the data. The timing control unit 603 has a function of issuing an instruction to update the dictionary data when the calculated use frequency falls below a preset threshold value. In response to the instruction, the dictionary generation unit 604 has a function of newly generating the dictionary data using the data compressed by the compression unit 102.

Description

本発明は、データ圧縮装置、データ圧縮方法およびプログラム記憶媒体に関する。 The present invention relates to a data compression apparatus, a data compression method, and a program storage medium.

辞書式圧縮法を利用してデータを圧縮するデータ圧縮装置が、特開２００１−０４４８５０号公報（特許文献１）に示されている。このデータ圧縮装置は、データを圧縮する際に、データ圧縮に利用する最適な辞書が生成されるまで辞書の更新を繰り返す。そして、このデータ圧縮装置は、最適な辞書が生成された段階でその辞書を用いてデータを圧縮する。
図２４は、そのデータ圧縮装置の構成例を示すブロック図である。データ圧縮装置１２は、辞書生成更新部１３と、辞書保持部１５と、辞書出力部１６と、データ圧縮部１７と、圧縮データ出力部１８とを備えている。辞書生成更新部１３は、圧縮する対象である全データ列１１に基づいて辞書１４を生成する機能と辞書１４を更新する機能を有する。辞書保持部１５は、辞書１４を保持する機能を有する。辞書出力部１６は、辞書１４を出力する機能を有する。データ圧縮部１７は、辞書１４に基づいて全データ列１１を圧縮する機能を有する。圧縮データ出力部１８は、データ圧縮部１７による圧縮データ１９を出力する機能を有する。ここでの辞書１４とは、全データ列１１と登録番号との組合せを示すデータである。データ圧縮装置１２は、その辞書１４に基づいて全データ列１１を登録番号に置き換えることでデータを圧縮する。
データ圧縮装置１２は、全データ列１１を登録番号に置き換える際に、頻出回数の多いデータを優先的に辞書１４の登録番号に置き換えることにより、データ圧縮率を高めることができる。ただし、このようにして生成された辞書１４は登録数が非常に多くなる場合がある。このため、データ圧縮装置１２は、辞書１４の中から使用頻度の低い登録データを削除することによって辞書の更新を行う。そして、データ圧縮装置１２は、辞書１４の登録数が所定数になった段階で辞書１４が最適になったと判断する。データ圧縮装置１２は、このようにして得られた辞書１４に基づき全データ列１１を圧縮することによって、データ圧縮率を高めている。Japanese Patent Laid-Open No. 2001-044850 (Patent Document 1) discloses a data compression apparatus that compresses data using a lexicographic compression method. When compressing data, this data compression apparatus repeats updating the dictionary until an optimal dictionary used for data compression is generated. And this data compression apparatus compresses data using the dictionary in the stage where the optimal dictionary was produced | generated.
FIG. 24 is a block diagram showing a configuration example of the data compression apparatus. The data compression apparatus 12 includes a dictionary generation / update unit 13, a dictionary holding unit 15, a dictionary output unit 16, a data compression unit 17, and a compressed data output unit 18. The dictionary generation / updating unit 13 has a function of generating the dictionary 14 and a function of updating the dictionary 14 based on all data strings 11 to be compressed. The dictionary holding unit 15 has a function of holding the dictionary 14. The dictionary output unit 16 has a function of outputting the dictionary 14. The data compression unit 17 has a function of compressing the entire data sequence 11 based on the dictionary 14. The compressed data output unit 18 has a function of outputting the compressed data 19 from the data compression unit 17. The dictionary 14 here is data indicating a combination of all data strings 11 and registration numbers. The data compression device 12 compresses the data by replacing the entire data string 11 with the registration number based on the dictionary 14.
The data compression device 12 can increase the data compression rate by preferentially replacing data with a large number of frequent occurrences with the registration number of the dictionary 14 when the entire data string 11 is replaced with the registration number. However, the dictionary 14 generated in this way may have a very large number of registrations. For this reason, the data compression apparatus 12 updates the dictionary by deleting registration data that is less frequently used from the dictionary 14. Then, the data compression device 12 determines that the dictionary 14 is optimized when the number of registrations in the dictionary 14 reaches a predetermined number. The data compression apparatus 12 increases the data compression rate by compressing the entire data string 11 based on the dictionary 14 obtained in this way.

特開２００１−０４４８５０号公報JP 2001-044850 A

しかしながら、データ圧縮装置１２は、データが連続して入力している状態では辞書１４の更新を行わない。このため、次のような場合に、データ圧縮装置１２によるデータ圧縮率が低下する虞がある。例えば、データ圧縮装置１２に、「Ａ」、「Ｂ」、「Ｃ」を多く含む文字列のデータが入力している状態から、「Ａ」、「Ｂ」、「Ｃ」を殆ど含まない文字列のデータが入力する状態に傾向が変わる場合がある。このような場合には、データ圧縮装置１２は、傾向が変わる前は「Ａ」、「Ｂ」、「Ｃ」を多く含んだ辞書１４を用いてデータ圧縮を行う。しかし、入力データの傾向が変わって入力データに「Ａ」、「Ｂ」、「Ｃ」が殆ど含まれなくなると、データ圧縮装置１２が辞書１４を使用する頻度が低下する。これにより、データ圧縮装置１２によるデータ圧縮率が低下する。
本発明は上記課題を解決するためになされている。すなわち、本発明の主な目的は、効率の良いデータ圧縮を維持できるデータ圧縮装置、データ圧縮方法およびプログラム記憶媒体を提供することである。However, the data compression device 12 does not update the dictionary 14 when data is continuously input. For this reason, in the following cases, the data compression rate by the data compression device 12 may decrease. For example, from a state where character string data including many “A”, “B”, and “C” is input to the data compression device 12, characters that hardly include “A”, “B”, and “C”. The trend may change to a state where column data is entered. In such a case, the data compression device 12 performs data compression using the dictionary 14 including many “A”, “B”, and “C” before the trend changes. However, if the tendency of the input data changes and the input data hardly includes “A”, “B”, and “C”, the frequency at which the data compression device 12 uses the dictionary 14 decreases. Thereby, the data compression rate by the data compression apparatus 12 falls.
The present invention has been made to solve the above problems. That is, a main object of the present invention is to provide a data compression apparatus, a data compression method, and a program storage medium that can maintain efficient data compression.

本発明のデータ圧縮装置は、
圧縮対象のデータを、予め与えられている辞書データに基づいて圧縮する圧縮手段と、
前記圧縮手段が前記データを圧縮する場合における前記辞書データの使用頻度を算出する監視手段と、
算出した前記使用頻度が予め設定された閾値よりも低下した場合に、前記辞書データを更新する指示を出すタイミング制御手段と、
前記指示を受けて、前記圧縮手段が圧縮したデータに基づいて前記辞書データを新規に生成する辞書生成手段と
を有する。
本発明のデータ圧縮方法は、
圧縮対象のデータを、予め与えられている辞書データに基づいて圧縮し、
前記データを圧縮する場合における前記辞書データの使用頻度を算出し、
算出した前記使用頻度が予め設定された閾値よりも低下した場合に、前記辞書データを更新する指示を出し、
前記指示を受けて、圧縮した前記データに基づいて前記辞書データを新規に生成する。
本発明のプログラム記憶媒体は、
圧縮対象のデータを、予め与えられている辞書データに基づいて圧縮する処理と、
前記データを圧縮する場合における前記辞書データの使用頻度を算出する処理と、
算出した前記使用頻度が予め設定された閾値よりも低下した場合に、前記辞書データを更新する指示を出す処理と、
前記指示を受けて、圧縮した前記データに基づいて前記辞書データを新規に生成する処理とをデータ圧縮装置に実行させるコンピュータプログラムを記憶している。
なお、本発明の上記した主な目的は、上記構成のデータ圧縮装置に対応するデータ圧縮方法によっても達成される。さらに、本発明の上記した主な目的は、上記データ圧縮装置およびデータ圧縮方法をコンピュータによって実現するコンピュータプログラムが格納されているプログラム記憶媒体によっても達成される。The data compression apparatus of the present invention
Compression means for compressing data to be compressed based on dictionary data given in advance;
Monitoring means for calculating the frequency of use of the dictionary data when the compression means compresses the data;
Timing control means for issuing an instruction to update the dictionary data when the calculated use frequency falls below a preset threshold;
In response to the instruction, the apparatus has dictionary generation means for newly generating the dictionary data based on the data compressed by the compression means.
The data compression method of the present invention includes:
Compress data to be compressed based on dictionary data given in advance,
Calculating the frequency of use of the dictionary data when compressing the data;
When the calculated usage frequency falls below a preset threshold value, an instruction to update the dictionary data is issued,
In response to the instruction, the dictionary data is newly generated based on the compressed data.
The program storage medium of the present invention includes:
A process of compressing data to be compressed based on dictionary data given in advance;
A process of calculating the use frequency of the dictionary data when compressing the data;
A process of issuing an instruction to update the dictionary data when the calculated use frequency falls below a preset threshold;
In response to the instruction, a computer program for causing the data compression apparatus to execute processing for newly generating the dictionary data based on the compressed data is stored.
The main object of the present invention is also achieved by a data compression method corresponding to the data compression apparatus having the above-described configuration. Furthermore, the main object of the present invention is also achieved by a program storage medium storing a computer program for realizing the data compression apparatus and the data compression method by a computer.

本発明によれば、効率の良いデータ圧縮を維持できる。 According to the present invention, efficient data compression can be maintained.

図１は、本発明に係る第１実施形態のデータ圧縮装置の構成を示すブロック図である。
図２は、辞書式圧縮法によるデータ圧縮の一例を説明する図である。
図３は、図２中のデータＤ００１において、隣り合う２文字の出現回数をカウントした結果を表す図である。
図４は、図２中のデータＤ００２において、隣り合う２文字の出現回数をカウントした結果を表す図である。
図５は、図２中のデータＤ００３において、隣り合う２文字の出現回数をカウントした結果を表す図である。
図６は、辞書データの一例を示す図である。
図７Ａは、圧縮に使用した辞書データとその使用回数とが対になったデータのイメージを表した図である。
図７Ｂは、圧縮したデータの総容量の一例を示す図である。
図８は、辞書データの使用頻度を示すデータの一例を表す図である。
図９Ａは、圧縮データの一例を示す図である。
図９Ｂは、辞書データの一例を示す図である。
図１０は、辞書式圧縮法で圧縮されたデータを展開する手法の一例を説明する図である。
図１１は、図１０に引き続き、辞書式圧縮法で圧縮されたデータを展開する手法の一例を説明する図である。
図１２は、さらに引き続いて、辞書式圧縮法で圧縮されたデータを展開する手法の一例を説明する図である。
図１３は、本発明に係る第１実施形態のデータ圧縮装置の動作例を示すフローチャートである。
図１４は、本発明に係る第２実施形態のデータ圧縮装置の構成を示すブロック図である。
図１５は、本発明に係る第２実施形態のデータ圧縮装置の動作例を示すフローチャートである。
図１６は、本発明に係る第２実施形態のデータ圧縮装置を構成する調整部の具体的な動作例を示すフローチャートである。
図１７は、本発明に係る第３実施形態のデータ圧縮装置の構成を示すブロック図である。
図１８は、本発明に係る第３実施形態のデータ圧縮装置の動作例を示すフローチャートである。
図１９は、本発明に係る第４実施形態のデータ圧縮装置の構成を示すブロック図である。
図２０は、本発明に係る第４実施形態のデータ圧縮装置の動作例を示すフローチャートである。
図２１は、本発明に係る第５実施形態のデータ圧縮装置の構成を示すブロック図である。
図２２は、本発明に係る第５実施形態のデータ圧縮装置の動作例を示すフローチャートである。
図２３は、その他の実施形態を説明する図である。
図２４は、特許文献１に記載されているデータ圧縮装置の構成を示すブロック図である。FIG. 1 is a block diagram showing the configuration of the data compression apparatus according to the first embodiment of the present invention.
FIG. 2 is a diagram for explaining an example of data compression by the lexicographic compression method.
FIG. 3 is a diagram illustrating a result of counting the number of appearances of two adjacent characters in the data D001 in FIG.
FIG. 4 is a diagram showing the result of counting the number of appearances of two adjacent characters in the data D002 in FIG.
FIG. 5 is a diagram illustrating a result of counting the number of appearances of two adjacent characters in the data D003 in FIG.
FIG. 6 is a diagram illustrating an example of dictionary data.
FIG. 7A is a diagram showing an image of data in which dictionary data used for compression and the number of uses thereof are paired.
FIG. 7B is a diagram illustrating an example of the total capacity of compressed data.
FIG. 8 is a diagram illustrating an example of data indicating the usage frequency of dictionary data.
FIG. 9A is a diagram illustrating an example of compressed data.
FIG. 9B is a diagram illustrating an example of dictionary data.
FIG. 10 is a diagram for explaining an example of a technique for expanding data compressed by the lexicographic compression method.
FIG. 11 is a diagram for explaining an example of a technique for expanding data compressed by the lexicographic compression method following FIG.
FIG. 12 is a diagram for explaining an example of a technique for further expanding data compressed by the lexicographic compression method.
FIG. 13 is a flowchart showing an operation example of the data compression apparatus according to the first embodiment of the present invention.
FIG. 14 is a block diagram showing the configuration of the data compression apparatus according to the second embodiment of the present invention.
FIG. 15 is a flowchart showing an operation example of the data compression apparatus according to the second embodiment of the present invention.
FIG. 16 is a flowchart showing a specific operation example of the adjustment unit constituting the data compression apparatus according to the second embodiment of the present invention.
FIG. 17 is a block diagram showing the configuration of the data compression apparatus according to the third embodiment of the present invention.
FIG. 18 is a flowchart showing an operation example of the data compression apparatus according to the third embodiment of the present invention.
FIG. 19 is a block diagram showing the configuration of the data compression apparatus according to the fourth embodiment of the present invention.
FIG. 20 is a flowchart showing an operation example of the data compression apparatus according to the fourth embodiment of the present invention.
FIG. 21 is a block diagram showing the configuration of the data compression apparatus according to the fifth embodiment of the present invention.
FIG. 22 is a flowchart showing an operation example of the data compression apparatus according to the fifth embodiment of the present invention.
FIG. 23 is a diagram for explaining another embodiment.
FIG. 24 is a block diagram showing the configuration of the data compression device described in Patent Document 1.

以下、本発明に係る実施形態を図面を参照して説明する。
（第１実施形態）
図１は、本発明に係る第１実施形態のデータ圧縮装置（データ圧縮システム）の構成を示すブロック図である。このデータ圧縮装置１００は、制御装置１０１と、第１記憶部１０３と、第２記憶部１０４と、第３記憶部１０８とを有する。第１記憶部１０３と、第２記憶部１０４と、第３記憶部１０８とは、記憶装置により構成される。その記憶装置は、データを格納（記憶）する記憶媒体（例えば、ハードディスク装置や、ＲＡＭ（ｒａｎｄｏｍａｃｃｅｓｓｍｅｍｏｒｙ）など）を有する。この第１実施形態では、第１記憶部１０３は、後述する圧縮データを記憶する圧縮データ記憶部（圧縮データ記憶手段）として機能する。第２記憶部１０４は、後述する辞書データを記憶する辞書データ記憶部（辞書データ記憶手段）として機能する。第３記憶部１０８は、後述する展開データを記憶する展開データ記憶部（展開データ記憶手段）として機能する。
制御装置１０１は、例えばＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）を含むコンピュータである。当該制御装置１０１は、コンピュータが読み取り可能な記憶媒体に格納されている各種コンピュータプログラムを適宜に実行することによって、データ圧縮装置１００の全体的な動作を司る。
この第１実施形態では、制御装置１０１は、データ発生部１１０により発生するデータを辞書式圧縮法により圧縮する機能を有する。さらに、制御装置１０１は、データを圧縮する際に参照する辞書データを生成する機能と辞書データを更新する機能を有する。すなわち、制御装置１０１は、コンピュータプログラムに従って動作することによって、次のような機能ブロックを実現する。その機能ブロックとは、圧縮部１０２と、監視部（辞書使用頻度監視手段）１０５と、タイミング制御部（辞書更新タイミング手段）１０６と、展開部（圧縮データ展開手段）１０７と、辞書生成部（辞書構築手段）１０９とを含む。なお、データ発生部１１０が発生するデータは、どのような種類のデータであってもよいが、ここでは、説明を分かり易くするために、文字列データとする。
圧縮部（圧縮手段）１０２は、第２記憶部１０４に記憶されている後述するような辞書データに基づいて、データを辞書式圧縮法により圧縮する機能（データ圧縮機能）を有する。その圧縮するデータとは、データ発生部１１０から入力するデータや、第３記憶部１０８に記憶されている後述するような展開データである。なお、辞書式圧縮法には、ＬＺＷ（Ｌｅｍｐｅｌ−ＺｉｖａｎｄＷｅｌｃｈ）やＬＺ７７（Ｌｅｍｐｅｌ−Ｚｉｖ７７）などの様々な種類がある。圧縮部１０２が採用する辞書式圧縮法の種類は、特に限定しなくてよい。ただし、第１実施形態では、説明を分かり易くするために、圧縮部１０２は、ＢＰＥ（ＢｙｔｅＰａｉｒＥｎｃｏｄｉｎｇ）法と呼ばれる辞書式圧縮法によりデータを圧縮することとする。
そのＢＰＥ法は、出現頻度が高い２バイトデータを１バイトデータに置き換えていくことを繰り返して対象データを圧縮するアルゴリズムである。図２は、ＢＰＥ法によるデータ圧縮の一例を説明する図である。圧縮部１０２は、ＢＰＥ法により次のようにデータを圧縮する。
例えば、第２記憶部１０４は、図６に示すような複数の辞書データを保持している。辞書データは、置き換え前の文字列と、当該文字列に対応する置き換え後の文字との関係を表すデータである。図６に示す辞書データの一つは、置き換え前の文字列「ＡＢ」と、当該文字列に対応する置き換え後の文字「Ｇ」との関係を表すデータである。図６に示す別の辞書データの一つは、置き換え前の文字列「ＤＥ」と、当該文字列に対応する置き換え後の文字「Ｈ」との関係を表すデータである。図６に示すさらに別の辞書データの一つは、置き換え前の文字列「ＧＣ」と、当該文字列に対応する置き換え後の文字「Ｉ」との関係を表すデータである。このような辞書データは、辞書生成部１０９によって後述のように生成される。
圧縮部１０２は、例えば、図２に示すような文字列であるデータＤ００１が入力すると、第２記憶部１０４の辞書データを参照して、データＤ００２のように圧縮する。すなわち、圧縮部１０２は、辞書データに基づき、データＤ００１における隣り合う２文字「ＡＢ」を１文字「Ｇ」に置き換える。これにより、圧縮部１０２は、データＤ００１を圧縮したデータＤ００２を作成する。さらに、圧縮部１０２は、辞書データに基づき、データＤ００２における隣り合う２文字「ＤＥ」を１文字「Ｈ」に置き換える。これにより、圧縮部１０２は、データＤ００２を圧縮したデータＤ００３を作成する。さらに、圧縮部１０２は、辞書データに基づき、データＤ００３における隣り合う２文字「ＧＣ」を１文字「Ｉ」に置き換える。これにより、圧縮部１０２は、データＤ００３を圧縮したデータＤ００４を作成する。さらに、圧縮部１０２は、データＤ００４に関しても上記同様に圧縮しようとする。しかし、データＤ００４は、辞書データに対応する文字列「ＡＢ」、「ＤＥ」、「ＧＣ」の何れも有していない。このため、圧縮部１０２は、データＤ００４を圧縮できない。このことから、圧縮部１０２は、データを圧縮する処理（データ圧縮処理）を終了する。
圧縮部１０２は、上記したようなデータ圧縮機能に加えて、さらに、最終的に得られたデータＤ００４を第１記憶部１０３に書き込む機能を有する。なお、データ圧縮装置１００が稼動を開始した以後の稼働初期では、第２記憶部１０４には辞書データが記憶されていない。このため、圧縮部１０２は圧縮動作を行わず、データ発生部１１０から入力したデータをそのまま第１記憶部１０３に書き込む。
さらに、圧縮部１０２は、後述するように、第２記憶部１０４に格納されている辞書データが新たな辞書データに置換された場合に、第３記憶部１０８に格納されている全ての展開データを、新規の辞書データに基づいて圧縮する機能を有する。さらに、圧縮部１０２は、第３記憶部１０８に記憶されている全ての展開データを圧縮し終えると、第３記憶部１０８に格納されている全ての展開データを削除する機能を有している。
さらに、圧縮部１０２は、データ圧縮処理における辞書データの使用状況に関するデータ（使用状況データ）と、そのデータ圧縮処理で処理した圧縮前のデータの総容量を示すデータ（総容量データ）を作成する機能を有する。さらに、圧縮部１０２は、それらデータを一時保持した後、当該データを監視部１０５に出力する機能を有する。
図７Ａは、使用状況データの一例を示す。図７Ｂは、総容量データの一例を示す。図７Ａの例では、使用状況データは、辞書データとその使用回数が対となったデータである。図７Ａの使用状況データによれば、「ＡＢ」を「Ｇ」に置き換えるという辞書データは１０回使用されている。また、「ＤＥ」を「Ｈ」に置き換えるという辞書データは２０回使用されている。さらに、「ＧＣ」を「Ｉ」に置き換えるという辞書データは５回使用されている。
また、図７Ｂによれば、総容量データは１５３０００Ｂｙｔｅである。なお、圧縮部１０２から監視部１０５にデータを出力するタイミングは、圧縮部１０２がデータ発生部１１０から入力したデータを圧縮する毎であってもよいし、予め設定した時間間隔毎でもよい。
また、データ圧縮装置１００が稼動を開始した以後の稼働初期では、圧縮部１０２は、前記の如くデータ圧縮処理を行わないため、データ発生部１１０から入力したデータの総容量だけを監視部１０５に出力する。
辞書生成部（辞書生成手段）１０９は、辞書データを生成する機能と、生成した辞書データを第２記憶部１０４に格納する機能とを有する。当該辞書生成部１０９が辞書データを生成する手順は、圧縮部１０２が採用している辞書式圧縮法の種類に応じて設定される。すなわち、この第１実施形態で採用している辞書式圧縮法の種類は、ＢＰＥ法である。このため、辞書式生成部１０９は、ＢＰＥ法に応じた手順によって、辞書データを生成する。
例えば、辞書式生成部１０９は、後述する展開部１０７から動作を開始する指示（動作指示）を受けた場合に、第３記憶部１０８に格納されている全てのデータに基づいて、辞書データを次のように生成する。なお、データ圧縮装置１００が稼動を開始した以後の稼働初期では、後述するように、データ発生部１１０から入力したデータが、そのまま、圧縮部１０２と第１記憶部１０３と展開部１０７を介して、第３記憶部１０８に格納されている。
すなわち、辞書式生成部１０９は、まず、図２に示すような文字列である入力データＤ００１中の隣り合う２文字（文字列）の出現回数をカウントする。データＤ００１の場合には、隣り合う２文字として、最初に「ＡＢ」が現れる。これにより、辞書式生成部１０９は、「ＡＢ」のカウント値を“１”とする。次の隣り合う２文字は、「ＢＤ」である。これにより、辞書式生成部１０９は、「ＢＤ」のカウント値を“１”とする。次の隣り合う２文字は、「ＤＡ」である。これにより、辞書式生成部１０９は、「ＤＡ」のカウント値を“１”とする。さらに次の隣り合う２文字は、「ＡＢ」である。当該「ＡＢ」は既に現れているから、辞書式生成部１０９は、その「ＡＢ」のカウント値を“１”増加して“２”とする。以降、辞書式生成部１０９は、同様の処理（カウント処理）を繰り返す。
図３に示される表は、データＤ００１中における隣り合う２文字の出現回数（カウント値）を示す表である。その出現回数（カウント値）は、辞書式生成部１０９が上記のような処理を行うことによって得た値である。図３に示す表に基づくと、データＤ００１において、出現回数の最も多い隣り合う２文字は、「ＡＢ」であり、その「ＡＢ」の出現回数（カウント値）は４回である。
次に、辞書式生成部１０９は、上記のようなカウント処理により得られた結果に基づいて、出現回数の最も多い隣り合う２文字「ＡＢ」に対応する別の１文字（例えば「Ｇ」）を置換文字として設定する。その置換文字（置き換える文字）は、入力データＤ００１に含まれていない文字である。それというのは、入力データＤ００１に含まれている文字を置換文字として設定してしまうと、圧縮後のデータを元通りに復元できないからである。なお、置換文字は、数字や記号であってもよい。
そして、辞書式生成部１０９は、隣り合う２文字「ＡＢ」と、置換文字「Ｇ」とが対応しているデータを、辞書データとして、作成（生成）する。
さらに、辞書式生成部１０９は、入力データＤ００１において、「ＡＢ」を置換文字「Ｇ」に置き換える。これにより、辞書式生成部１０９は、データＤ００２を作成する。
次に、辞書式生成部１０９は、データＤ００２に対しても、入力データＤ００１と同様に、隣り合う２文字の出現回数をカウントする。これにより、辞書式生成部１０９は、図４に示されるような情報を得ることができる。図４に示す表は、データＤ００２中における隣り合う２文字の出現回数（カウント値）を示す表である。図４に示す表に基づくと、データＤ００２において、出現回数の最も多い隣り合う２文字は、「ＤＥ」であり、その「ＤＥ」の出現回数（カウント値）は３回である。このことから、辞書式生成部１０９は、前記同様に、出現回数の最も多い隣り合う２文字「ＤＥ」を置き換える置換文字（「Ｈ」）を設定する。これにより、辞書式生成部１０９は、隣り合う２文字「ＤＥ」と置換文字「Ｈ」が対応しているデータを、辞書データとして、作成する。また、辞書式生成部１０９は、データＤ００２中の「ＤＥ」を置換文字「Ｈ」に置き換えることによって、データＤ００３を作成する。
さらに、辞書式生成部１０９は、データＤ００３に対しても、入力データＤ００１等と同様に、隣り合う２文字（文字列）の出現回数をカウントする。図５に示す表は、データＤ００３中における隣り合う２文字の出現回数（カウント値）を示す表である。図５に示す表に基づくと、データＤ００３において、出現回数の最も多い隣り合う２文字は、「ＧＣ」であり、その「ＧＣ」の出現回数（カウント値）は２回である。このことから、辞書式生成部１０９は、前記同様に、出現回数の最も多い隣り合う２文字「ＧＣ」を置き換える置換文字（「Ｉ」）を設定する。これにより、辞書式生成部１０９は、隣り合う２文字「ＧＣ」と置換文字「Ｉ」が対応しているデータを、辞書データとして、作成する。また、辞書式生成部１０９は、データＤ００３中の「ＧＣ」を置換文字「Ｉ」に置き換えることによって、データＤ００４を作成する。
さらに、辞書式生成部１０９は、データＤ００４に関しても、上記同様の処理を行う。すなわち、辞書式生成部１０９は、データＤ００４に対しても、隣り合う２文字の出現回数をカウントする。この処理により、辞書式生成部１０９は、データＤ００４において、隣り合う２文字の何れも１回しか出現しないことを検知する。これにより、辞書式生成部１０９は、上記したような隣り合う２文字の出現回数をカウントする処理、および、置換文字を設定する処理、つまり、辞書データを生成する処理を終了する。そして、辞書式生成部１０９は、生成した新規の辞書データ（図６参照）を第２記憶部１０４に格納する。なお、第２記憶部１０４に既に辞書データが格納されている場合には、辞書式生成部１０９は、その古い辞書データを削除してから、新規の辞書データを第２記憶部１０４に格納する。
さらに、辞書式生成部１０９は、新規の辞書データを第２記憶部１０４に格納し終えた以降に、新規の辞書データを第２記憶部１０４に格納したことを圧縮部１０２に通知する機能を有する。
監視部（監視手段）１０５は、圧縮部１０２から受け取った使用状況データおよび総容量データに基づき、辞書データの使用頻度を計算する機能を有する。例えば、監視部１０５は、次式（１）を利用して、辞書データの使用頻度を計算する。
使用頻度＝（辞書データの使用回数）÷（データの総容量）・・・・・（１）
監視部１０５は、さらに、算出した使用頻度を示すデータ（使用頻度データ）を、タイミング制御部１０６に出力する機能を有する。図８は、監視部１０５がタイミング制御部１０６に出力する使用頻度データの一例を示す。図８に示されている使用頻度データは、辞書データと使用頻度との関係を示すデータである。
なお、データ圧縮装置１００が稼動を開始した以後の稼働初期では、前述したように、圧縮部１０２が監視部１０５に出力するデータは、総容量データだけである。このため、監視部１０５は、辞書データの使用頻度の計算を行わず、総容量データをそのままタイミング制御部１０６に出力する。
タイミング制御部（タイミング制御手段）１０６は、監視部１０５から受け取る使用頻度データに基づいて、辞書データの使用頻度が低下していることを検知した場合に、展開部１０７に、動作を開始する指示を出力する機能を有している。具体的には、タイミング制御部１０６は、受け取った使用頻度データの使用頻度を合計した値（合計値）が、予め設定された閾値よりも小さくなった場合に、辞書データの使用頻度が低下したと判断し、展開部１０７に動作開始を指示する。なお、ここで用いる閾値を、ユーザーが適宜に設定できるようにしてもよい。
なお、データ圧縮装置１００が稼動を開始した以後の稼働初期では、タイミング制御部１０６は、監視部１０５から、使用頻度データではなく、総容量データを受け取る。このような場合には、タイミング制御部１０６は、例えば、監視部１０５から受け取る総容量データの容量値が、予め設定された値以上であることを検知した場合に、展開部１０７に動作開始を指示する。あるいは、タイミング制御部１０６は、監視部１０５から総容量データを受け取った回数が予め設定した回数に達した場合に、展開部１０７に動作開始を指示してもよい。あるいは、タイミング制御部１０６は、監視部１０５から総容量データを最初に受け取ってから、予め設定された時間を経過した後に、展開部１０７に動作開始を指示してもよい。
展開部（展開手段）１０７は、タイミング制御部１０６から、動作を開始する指示を受け取った場合に、第１記憶部１０３に記憶されている圧縮データを展開する機能を有している。なお、展開部１０７が圧縮データを展開する手法は、圧縮部１０２が採用している圧縮法に応じた手法である。第１実施形態では、ＢＰＥ法を用いているので、展開部１０７は、第２記憶部１０４に記憶されている辞書データに基づいて、次のように、データを展開する。
例えば、ここでは、展開部１０７は、図９Ａに示すような圧縮データＴ０１２を、図９Ｂに示す辞書データを参照しながら展開（データ展開）するとする。なお、図９Ａの圧縮データＴ０１２は図２のデータＤ００４に対応する。また、図９Ｂの辞書データは図６の辞書データに対応する。図１０〜図１２は、データ展開の流れを説明する図である。
展開部１０７は、まず、適当な大きさのバッファ１１１（図１０参照）を用意する。そして、展開部１０７は、そのバッファ１１１の先頭に圧縮データＴ０１２の先頭のデータ「Ｇ」を、図１０に示すＢ００１のように入力する。このバッファ１１１の先頭データ「Ｇ」は、辞書データを参照すると、「ＡＢ」に対応する。このため、展開部１０７は、バッファ１１１における「Ｇ」を、図１０に示すＢ００２のように、「ＡＢ」に置き換える。次に、バッファ１１１の先頭にあるデータ「Ａ」に対応する２文字（文字列）は、辞書データに登録されていない。このような場合には、展開部１０７は、そのデータ「Ａ」を、バッファ１１１から出力データとして取り出す。これにより、バッファ１１１の先頭のデータは図１０に示すＢ００３のように、「Ｂ」となる。また、出力データは、図１０に示すＴ１０１のように「Ａ」となる。
バッファ１１１の先頭にあるデータ「Ｂ」に対応する２文字（文字列）は、辞書データに登録されていない。このことから、展開部１０７は、上記同様に、バッファ１１１からデータ「Ｂ」を出力データとして取り出す。これにより、出力データは、図１１に示すＴ１０２のように「ＡＢ」となる。
上記「Ｂ」の取り出しにより、バッファ１１１は空になる。これにより、展開部１０７は、圧縮データＴ０１２の２番目のデータ「Ｄ」を、図１１に示すＢ００４のようにバッファ１１１に入力する。そのデータ「Ｄ」に対応する２文字（文字列）は、辞書データに登録されていない。このことから、上記同様に、展開部１０７は、バッファ１１１からデータ「Ｄ」を出力データとして取り出す。これにより、出力データは、図１１に示すＴ１０３のように「ＡＢＤ」となる。
上記「Ｄ」の取り出しにより、バッファ１１１は、再び空になる。これにより、展開部１０７は、圧縮データＴ０１２の３番目のデータ「Ｉ」を、図１１に示すＢ００５のようにバッファ１１１に入力する。そのデータ「Ｉ」は、辞書データを参照すると「ＧＣ」に対応する。このため、展開部１０７は、バッファ１１１における「Ｉ」を、図１１に示すＢ００６のように、「ＧＣ」に置き換える。
さらに、展開部１０７は、上記同様に、バッファ１１１の先頭のデータ「Ｇ」を、辞書データに基づいて、図１２に示すＢ００７のように、データ「ＡＢ」に置き換える。これにより、バッファ１１１の先頭のデータは、データ「Ａ」となる。このデータ「Ａ」に対応する２文字（文字列）は、辞書データに登録されていない。このため、展開部１０７は、上記同様に、バッファ１１１からデータ「Ａ」を出力データとして取り出す。これにより、出力データは、図１２に示すＴ１０４のように「ＡＢＤＡ」となる。
データ「Ａ」の取り出しにより、バッファ１１１の先頭のデータは図１２に示すＢ００８のように、データ「Ｂ」となる。このデータ「Ｂ」に対応する２文字（文字列）は、辞書データに登録されていない。このため、展開部１０７は、上記同様に、そのデータ「Ｂ」を、バッファ１１１から出力データとして取り出す。これにより、出力データは、図１２に示すＴ１０５のように「ＡＢＤＡＢ」となる。
データ「Ｂ」の取り出しにより、バッファ１１１の先頭のデータは図１２に示すＢ００９のようにデータ「Ｃ」となる。展開部１０７は、上記のような処理（動作）を繰り返して、データＴ０１２を展開する。つまり、バッファ１１１から取り出されたデータ（出力データ）が展開データである。
展開部１０７は、上記のように展開されたデータ（展開データ）を第３記憶部１０８に格納する機能を有する。さらに、展開部１０７は、第１記憶部１０３に記憶されている全ての圧縮データを展開し終えた以降に、第１記憶部１０３に記憶されている全ての圧縮データを削除する機能を備えている。さらに、展開部１０７は、第１記憶部１０３における圧縮データを削除し終え、かつ、展開データを第３記憶部１０８に格納し終えた以降に、辞書生成部１０９に、動作を開始する指示を出力する機能を有している。これにより、辞書生成部１０９は、前記の如く、その指示を受けて、第３記憶部１０８に記憶されている展開データを利用して、新規の辞書データを生成する。
なお、データ圧縮装置１００が稼動を開始した以後の稼働初期において、前記の如く、第２記憶部１０４は辞書データを保持していない。また、辞書データが第２記憶部１０４に格納されなければ、圧縮部１０２は、データを圧縮する動作（処理）を行わない。この場合、第１記憶部１０３に記憶されているデータは、圧縮されていないデータ（非圧縮データ）である。さらに、展開部１０７は、辞書データが第２記憶部１０４に格納されなければ、データを展開することができない。このため、展開部１０７は、第１記憶部１０３に格納されている非圧縮データをそのまま第３記憶部１０８に移す機能を備えている。
この第１実施形態では、辞書生成部１０９が辞書データを生成する動作を開始したときに第３記憶部１０８に記憶されているデータは、データ圧縮装置１００が稼働を開始した以後の稼働初期、あるいは、前回辞書データが生成された以降に記憶されたデータである。辞書生成部１０９は、そのようなデータを用いて、辞書データを生成するので、辞書データを生成した時点において、最もデータ圧縮率を高くできる辞書データを生成できる。
なお、実際は辞書データを生成する処理には時間がかかる。このため、辞書データを生成している間にもデータ発生部１１０からデータが入力する場合がある。この場合に、その入力したデータを、第２記憶部１０４に記憶されている古い辞書データに基づいて、圧縮部１０２が圧縮すると、次のような不都合が生じる。つまり、新規の辞書データを生成している間に古い辞書データに基づいて圧縮した圧縮データと、新規の辞書データを生成した後に当該新規の辞書データに基づいて圧縮した圧縮データとが整合しない。そこで、第１実施形態では、辞書データを生成している間には、圧縮部１０２は、データを圧縮しない構成となっている。具体的には、圧縮部１０２はバッファ（図示せず）を有する。また、圧縮部１０２は、辞書生成部１０９が動作している期間中には、データ発生部１１０から入力するデータを上記のバッファに蓄積しておく。そして、前記の如く、辞書生成部１０９が、生成した新規の辞書データを第２記憶部１０４に格納した以降に、圧縮部１０２はバッファに蓄積されているデータを、新たな辞書データを参照しながら圧縮する。
以下に、第１実施形態のデータ圧縮装置１００の動作例（データ圧縮方法）を、図１３のフローチャートを参照して説明する。図１３のフローチャートは、データ圧縮装置１００における制御装置（ＣＰＵ）１０１が実行するコンピュータプログラムの処理手順を示す。そのコンピュータプログラムは、データ圧縮装置１００が有するメモリやハードディスク等の記憶装置の記憶媒体（例えば、不揮発性の記憶媒体）に格納されている。また、当該コンピュータプログラムは、例えば、コンパクトディスク（ＣＤ）やメモリカード等の可搬タイプの記憶媒体に格納された後に、当該可搬タイプの記憶媒体からデータ圧縮装置１００の記憶装置の記憶媒体に格納される場合がある。
まず、データ発生部１１０からデータが入力すると（ステップＳ０１０）、圧縮部１０２は、第２記憶部１０４に記憶されている辞書データに基づいて、入力したデータを圧縮する（ステップＳ０２０）。また、圧縮部１０２は、圧縮後のデータである圧縮データを第１記憶部１０３に格納する。続いて、監視部１０５は、圧縮部１０２から出力される使用状況データ（圧縮に使用した辞書データとその使用回数を示すデータ）、及び、総容量データ（データ発生部１１０から圧縮部１０２に入力したデータの総容量を示すデータ）に基づき、辞書使用頻度を計算する（ステップＳ０３０）。そして、監視部１０５は、その計算結果をタイミング制御部１０６に出力する。
次に、タイミング制御部１０６は、監視部１０５から入力した使用頻度を合計した値（合計値）が、設定された閾値以下に低下したか否かを判定する（ステップＳ０４０）。タイミング制御部１０６は、上記合計値が閾値以下でない場合には（ステップＳ０４０におけるＮＯ）、何もしない。そして、データ圧縮装置１００はステップＳ０２０以降の動作を繰り返す。一方、タイミング制御部１０６は、上記合計値が閾値以下である場合には（ステップＳ０４０におけるＹＥＳ）、展開部１０７に動作開始を指示する。
これにより、展開部１０７は、第１記憶部１０３に記憶されている圧縮データを、第２記憶部１０４に記憶されている辞書データを参照しながら展開し、展開後のデータである展開データを第３記憶部１０８に格納する（ステップＳ０５０）。さらに続いて、展開部１０７は、第１記憶部１０３に記憶されている圧縮データを削除すると共に、辞書生成部１０９に動作開始を指示する。
これにより、辞書生成部１０９は、第３記憶部１０８に記憶されている展開データを、所定の種類の辞書式圧縮法（第１実施形態では、ＢＰＥ法）に従って辞書データを生成する。そして、辞書生成部１０９は、その生成した新規の辞書データを第２記憶部１０４に格納（更新）する（ステップＳ０６０）。つまり、辞書生成部１０９は、新規の辞書データを生成する。そして、辞書生成部１０９は、第２記憶部１０４にそれまで記憶されていた古い辞書データを削除し、当該第２記憶部１０４に新規に生成した辞書データを格納する。また、辞書生成部１０９は、第２記憶部１０４に新規の辞書データを格納（更新）した後に、圧縮部１０２に向けて、展開データを圧縮する指示を出す。
これにより、圧縮部１０２は、第３記憶部１０８に記憶されている展開データを、第２記憶部１０４に記憶された新規の辞書データを参照しながら圧縮する（ステップＳ０７０）。そして、圧縮部１０２は、圧縮後のデータである圧縮データを第１記憶部１０３に格納する。その後、データ圧縮装置１００は、前記ステップＳ０２０以降の動作を繰り返す。
なお、データ圧縮装置１００が稼働初期（すなわち、第２記憶部１０４に辞書データが記憶されていない状態）である場合には、データ圧縮装置１００は、次のように動作する。
例えば、圧縮部１０２は、データ発生部１１０からデータが入力すると、そのデータをそのまま第１記憶部１０３に格納すると共に、そのデータの総容量を示すデータ（総容量データ）を監視部１０５に出力する。監視部１０５は、その総容量データをそのまま、タイミング制御部１０６に出力する。タイミング制御部１０６は、データ発生部１１０から入力した総容量、あるいは、総容量データが入力した回数、あるいは、最初に総容量データが入力してからの経過時間等に基づいたタイミングで、展開部１０７に動作開始を指示する。これにより、展開部１０７は、第１記憶部１０３に記憶されているデータをそのまま第３記憶部１０８に格納する。そして、辞書生成部１０９が、第３記憶部１０８に格納されているデータを利用して辞書データを生成し、生成した新規の辞書データを第２記憶部１０４に格納する。その後、辞書生成部１０９は、圧縮部１０２に動作開始を指示する。これにより、圧縮部１０２は、第３記憶部１０８に格納されているデータを圧縮し、圧縮後のデータ（圧縮データ）を第１記憶部１０３に格納する。このような動作により、辞書データが生成され第２記憶部１０４に格納された以降には、データ圧縮装置１００は、図１３のフローチャートに示すような動作を繰り返す。
以上のように、この第１実施形態によれば、データ圧縮装置１００は、監視部１０５と、タイミング制御部１０６を有している。その監視部１０５は、データ発生部１１０から入力したデータを圧縮部１０２が圧縮する場合に、第２記憶部１０４に格納されている辞書データを圧縮部１０２がどれくらい使用しているかを監視する。そして、タイミング制御部１０６は、監視部１０５の監視結果に基づいて、辞書データの使用頻度が低下してきた（換言すれば、第２記憶部１０４に格納されている辞書データの有効度が下がってきた）場合に、辞書生成部１０９に、新たな辞書データを生成する指示を出す。この指示を受けて辞書生成部１０９が生成する新規の辞書データは、第３記憶部１０８に蓄積されている全ての展開データに基づいたデータである。このため、新規の辞書データは、当該辞書データを生成した時点において最も使用頻度の高い（つまり、圧縮効果の高い）辞書データとなる。
上記のように、この第１実施形態のデータ圧縮装置１００は、辞書データの使用頻度（有効度）が低下した場合に、有効度を高めた辞書データを生成でき、圧縮に使用する辞書データをその新規の辞書データに更新できる。これにより、データ発生部１１０から入力するデータの傾向が変化した場合に、データ圧縮装置１００は、傾向が変化した以降のデータに基づいて新規の辞書データを生成し、当該新規の辞書データ基づいてデータを圧縮できる。このため、データ圧縮装置１００は、データ発生部１１０から入力するデータの傾向が変化する場合にも、そのデータを常に効率良く圧縮できる。換言すれば、データ圧縮装置１００は、効率の良い圧縮率を維持できる。
（第２実施形態）
以下に、本発明に係る第２実施形態を、図面を参照して説明する。
図１４は、第２実施形態のデータ圧縮装置の構成を示すブロック図である。なお、この第２実施形態の説明において、第１実施形態と同様な構成部分には同一符号を付し、その重複説明を省略する。
第２実施形態のデータ圧縮装置（データ圧縮システム）２００は、第１実施形態のデータ圧縮装置１００の構成に加えて、さらに、リソース監視部（リソース監視手段）２０１と調整部（パラメータ調整手段）２０２を有している。また、データ圧縮装置２００は、第１実施形態で示したタイミング制御部１０６に代えて、タイミング制御部（辞書更新タイミング制御手段）２０３を有している。
前述した第１実施形態のデータ圧縮装置１００は、辞書データの使用頻度が低下した場合に辞書データを更新することによって、効率良い圧縮率を維持できる。これにより、データ圧縮装置１００は、メモリやディスク等の記憶媒体（例えば、第３記憶部１０８等を構成している記憶媒体など）の使用量を抑えることができる。一方で、辞書データを更新（生成）する動作は、データ圧縮装置１００の負荷が大きくなるので、頻繁に辞書データを更新することは望ましくない。このことから、メモリやディスクの容量に余裕がある場合には、圧縮率を高くすることよりも、データ圧縮装置１００の負荷を下げることを優先した方が良い場合もある。
このことを考慮して、第２実施形態のデータ圧縮装置２００は、前述したように、リソース監視部２０１と調整部２０２を有している。
リソース監視部２０１は、リソース（例えば、第３記憶部１０８等を構成する記憶媒体）の空き容量を監視（検知）し、監視結果を調整部２０２に出力する機能を有している。リソース監視部２０１が監視結果を調整部２０２に出力するタイミングは、例えば、予め定めた時間間隔毎である。
調整部（調整手段）２０２は、リソース監視部２０１から受け取った監視結果に基づき、タイミング制御部２０３で用いる閾値を調整する機能を有している。つまり、前記監視結果が、リソースの空き容量に余裕があることを示していれば、辞書データの有効度（使用頻度）が低くても直ぐにリソースを圧迫することはない。このことから、調整部２０２は、辞書データの更新頻度が低くなるように閾値を小さな値に調整する（設定する）。逆に、前記監視結果が、リソースの空き容量に余裕がないことを示していれば、辞書データの有効度が低くなると直ぐにリソースを圧迫してしまう。このことから、調整部２０２は、辞書データの更新頻度が高くなるように閾値を大きな値に調整する（設定する）。
より具体的には、例えば、調整部２０２は、リソースの空き容量（残り容量）が８０％以上である場合には、閾値を「０．５」に調整する（設定する）。また、調整部２０２は、リソースの空き容量が８０％未満、５０％以上である場合には、閾値を「０．７」に調整する。さらに、調整部２０２は、リソースの空き容量が５０％未満である場合には、閾値を「１．０」とする。このように、調整部２０２は、リソースの空き容量に応じて、閾値を段階的に設定してもよい。あるいは、調整部２０２は、リソースの空き容量に応じて、閾値を連続的に設定してもよい。なお、調整部２０２が閾値を設定する手法は、リソースの空き容量に応じて閾値を設定する手法であれば、どのような手法を採用してもよい。
タイミング制御部２０３は、上記のように設定される閾値を利用して、第１実施形態におけるタイミング制御部１０６と同様に、辞書データを更新するタイミングを決定する機能を有している。このタイミング制御部２０３は、調整部２０２により設定される閾値を利用する以外は、タイミング制御部１０６と同様の機能を有する。
この第２実施形態のデータ圧縮装置２００は、上記のように、リソースの空き容量に応じて辞書データの更新タイミングを変更できる。これにより、リソースに余裕がある場合には、データ圧縮装置２００は、辞書データを更新する回数（辞書更新回数）を減らすことができるため、当該装置の負荷を下げることができる。これに対して、リソースに余裕がない場合には、データ圧縮装置２００は、辞書更新回数を増やすことができるため、リソースの使用量（消費）を抑えることができる。
次に、リソース監視部２０１と調整部２０２の動作例を、図１５及び図１６を参照して説明する。図１５は、リソース監視部２０１と調整部２０２の動作例を示すフローチャートである。図１６は、調整部２０２のさらに具体的な動作例を示すフローチャートである。なお、図１５及び図１６のフローチャートに示される動作は、図１３のフローチャートに示される動作とは非同期に実行される。
図１５と図１６のフローチャートは、図１３のフローチャートと同様に、データ圧縮装置２００における制御装置（ＣＰＵ）１０１が実行するコンピュータプログラムの処理手順を示す。そのコンピュータプログラムは、前記同様に、データ圧縮装置２００が有するメモリやハードディスク等の記憶装置の記憶媒体（例えば、不揮発性の記憶媒体）に格納されている。また、当該コンピュータプログラムは、例えば、コンパクトディスク（ＣＤ）やメモリカード等の可搬タイプの記憶媒体に格納された後に、当該可搬タイプの記憶媒体からデータ圧縮装置２００の記憶装置の記憶媒体に格納される場合がある。
リソース監視部２０１は、例えば、予め設定された時間間隔毎に、データ圧縮装置２００が有するコンピュータのメモリやディスク等のリソースの空き容量を監視し、その監視結果を調整部２０２に出力する（図１５のステップＳ１１０）。続いて、調整部２０２は、その監視結果に基づいて、リソースの空き容量が大きい場合には、タイミング制御部２０３が利用する閾値を小さな値に設定する。反対に、調整部２０２は、リソースの空き容量が小さい場合には、上記閾値を大きな値に設定する（ステップＳ１２０）。
ここで、上記閾値を設定する動作例を図１６を参照して説明する。なお、図１６に示すステップＳ１１０の動作は、図１５に示すステップＳ１１０の動作と同様とする。
例えば、調整部２０２は、リソース監視部２０１から受け取った監視結果に基づいて、リソースの空き容量が８０％以上であるか否かを判定する（ステップＳ１２１）。リソースの空き容量が８０％以上である場合は、調整部２０２は、リソースの空き容量に余裕があると判断し、タイミング制御部２０３で利用する閾値を「０．５」に設定する（ステップＳ１２３）。
調整部２０２は、ステップＳ１２１において、リソースの空き容量が８０％以上でないと判定した場合には、リソースの空き容量が５０％以上であるか否かを判定する（ステップＳ１２２）。リソースの空き容量が５０％以上である場合には、調整部２０２は、前記閾値を「０．７」に設定する（ステップＳ１２４）。
調整部２０２は、ステップＳ１２２において、リソースの空き容量が５０％以上でないと判定した場合には、リソースの空き容量に余裕がないと判断し、前記閾値を「１．０」に設定する（ステップＳ１２５）。このように、第２実施形態では、調整部２０２は、リソースの空き容量に応じて前記閾値を調整する（設定する）。
なお、図１６に示すフローチャート（動作例）は、前記閾値を調整する手法の一例を示しているだけであり、前記閾値を調整する手法は、上記した手法に限定されない。
例えば、閾値を調整する段階は、８０％以上、８０％未満５０％以上、５０％未満の３段階でなく、もっと多くの段階であってもよい。また、調整される閾値は、「０．５」、「０．７」、「１．０」に限定されない。
（第３実施形態）
以下に、本発明に係る第３実施形態を、図面を参照して説明する。
図１７は、第３実施形態におけるデータ圧縮装置の構成を示すブロック図である。なお、第３実施形態の説明において、第１実施形態と同様な構成部分には同一符号を付し、その重複説明を省略する。
第３実施形態のデータ圧縮装置（データ圧縮システム）３００は、第１実施形態のデータ圧縮装置１００の構成に加えて、さらに、コスト見積もり部（辞書構築コスト見積もり手段）３０１と圧縮率見積り部（圧縮率見積もり手段）３０２を有している。また、当該データ圧縮装置３００は、第１実施形態のデータ圧縮装置１００におけるタイミング制御部１０６に代えて、後述するようなタイミング制御部（辞書更新タイミング制御手段）３０３を有している。
第２実施形態でも述べたように、第１実施形態のデータ圧縮装置１００は、辞書データの使用頻度が低下した場合に辞書データを更新することによって、効率良い圧縮率を維持できる。これにより、データ圧縮装置１００は、メモリやディスク等の記憶媒体の使用量を抑えることができる。一方で、辞書データを更新する動作は、データ圧縮装置１００の負荷が大きくなるので、頻繁に辞書データを更新することは望ましくない。このことから、メモリやディスクの容量に余裕がある場合には、圧縮率を高くすることよりも、データ圧縮装置１００の負荷を下げることを優先した方が良い場合もある。
このことを考慮して、第３実施形態のデータ圧縮装置３００は、コスト見積もり部３０１と圧縮率見積もり部３０２を有している。そして、データ圧縮装置３００は、第２実施形態とは異なる手法でもって、辞書データを更新するタイミングを変更する。
コスト見積り部（コスト見積り手段）３０１は、辞書データを新規に生成する際に要するコストを見積もる（推定する）機能を有する。すなわち、コスト見積もり部３０１は、タイミング制御部（タイミング制御手段）３０３から、動作を開始する指示を受けると、動作を開始する。その指示を受けるタイミングは、タイミング制御部３０３が、第１実施形態におけるタイミング制御部１０６と同様の動作によって辞書データの使用頻度が低下したと判断したタイミングである。コスト見積り部３０１は、第１記憶部１０３に記憶されている圧縮データのデータ量と、第２記憶部１０４に記憶されている辞書データのデータ量とに基づいて、辞書データを新規に生成する際に要するコストを見積もる。
第１実施形態で述べたように、辞書を生成するためには、まず、展開部１０７が、第１記憶部１０３に記憶されている圧縮データの全てについて、第２記憶部１０４に記憶されている辞書データを参照しながら展開して第３記憶部１０８に記憶する。続いて、辞書生成部１０９が、第３記憶部１０８に記憶されている展開データの全てについて隣り合う２文字の出現回数をカウントする動作（処理）と、一番カウント数が多かった隣り合う２文字を別の１文字で置き換える動作（処理）とを、繰り返す。このような動作は、データにおける隣り合う２文字の出現回数が全て１回になるまで行われる。このため、辞書を生成する際に要するコストは、第１記憶部１０３に記憶されているデータ量と、第２記憶部１０４に記憶されている辞書データのデータ量に比例したコストとなる。
このため、コスト見積り部３０１は、例えば、次式（２）を用いて、辞書データを生成する際に要するコストを算出する。
Ｃ＝Ｗ１×Ｄ１＋Ｗ２×Ｄ２・・・・・（２）
なお、式（２）において、Ｃは辞書データを生成する際に要するコストを示す。Ｄ１は第１記憶部１０３に記憶されている圧縮データのデータ量を示す。Ｄ２は第２記憶部１０４に記憶されている辞書データのデータ量を示す。さらに、Ｗ１とＷ２は重み定数を示す。これらＷ１とＷ２は、適宜な値に設定される。
コスト見積り部３０１は、前記コストＣを算出し終わると、算出結果である前記コストＣをタイミング制御部３０３に出力する。
なお、式（２）は前記コストを算出する算出式の一例である。コスト見積り部３０１は、第１記憶部１０３に記憶されている圧縮データのデータ量と、第２記憶部１０４に記憶されている辞書データのデータ量とを利用して、前記コストを算出する手法であれば、どのような手法を採用してもよい。
圧縮率見積り部３０２は、新規の辞書データに更新した以降の圧縮率を見積もる（推定する）機能を有している。すなわち、圧縮率見積り部３０２は、タイミング制御部３０３から、動作を開始する指示を受けると、動作を開始する。その指示を受けるタイミングは、タイミング制御部３０３が、前記同様に辞書データの使用頻度が低下したと判断したタイミングである。圧縮率見積り部３０２は、第１記憶部１０３に記憶されている圧縮データの一部を、第２記憶部１０４に記憶されている辞書データを用いて展開する。
その圧縮率見積り部３０２が展開するデータ量は、予め定めた固定のデータ量であってもよいし、あるいは、第１記憶部１０３に記憶されている全データ量の１０％に相当するデータ量というような、第１記憶部１０３の全データ量に比例するデータ量であってもよい。
そして、圧縮率見積り部３０２は、展開したデータを用いて新たな辞書データを生成し、当該辞書データを生成する際に用いた展開データをその新規の辞書データを利用して圧縮する。そして、圧縮率見積り部３０２は、その圧縮したデータの圧縮率を算出する。ここでの圧縮率とは、圧縮前のデータ量に対して圧縮後のデータ量がどれぐらい小さくなっているかを示す指標である。例えば、圧縮前のデータ量が１００ＭＢであり、圧縮後のデータ量が５０ＭＢである場合には、圧縮率は５０％となる。圧縮率見積り部３０２は、圧縮率を算出した後に、その圧縮率を示すデータを、タイミング制御部３０３に出力する。
この第３実施形態では、上記のように、圧縮率見積り部３０２は、第１記憶部１０３に記憶されているデータの全てではなく、一部のデータのみを展開している。さらに、圧縮率見積り部３０２は、その展開したデータに基づいて、辞書データを生成し、さらに、圧縮率を算出している（辞書データの評価を行っている）。これにより、圧縮率見積もり部３０２は、データ圧縮装置３００に大きな負荷をかけず、圧縮効果（圧縮率）を見積もる（算出する）ことができる。
タイミング制御部３０３は、前記同様にして辞書データの使用頻度が低下したと判断した場合には、動作を開始する指示を、第１実施形態とは異なり、展開部１０７ではなく、コスト見積り部３０１及び圧縮率見積り部３０２に対して出力する。
また、タイミング制御部３０３は、コスト見積り部３０１により算出されたコストと、圧縮率見積り部３０２により算出された圧縮率とに基づいて、辞書データを更新（生成）するか否かを判断する。
その判断は、例えば、前記コストをスコア化したデータと、圧縮率をスコア化したデータとを用いて行うことができる。ここで、下式（３）は、上記コストをスコア化したデータを算出する数式である。また、式（４）は、上記圧縮率をスコア化したデータを算出する数式である。
Ｓ１＝Ｗ３×Ｃ・・・・・（３）
Ｓ２＝Ｗ４×Ｒ・・・・・（４）
式（３）において、Ｓ１はコストをスコア化したデータを示す。Ｗ３は重み定数を示す。Ｃは式（２）により算出されるコストを示す。また、式（４）において、Ｓ２は圧縮率をスコア化したデータを示す。Ｗ４は重み定数を示す。Ｒは圧縮率見積もり部３０２が算出した圧縮率を示す。なお、式（３）と式（４）を用いて前記スコア化したデータを算出する手法は、前記スコア化したデータを算出する手法の一例である。前記スコア化したデータを算出する手法は上記手法に限定されない。
タイミング制御部３０３は、データ（圧縮率をスコア化したデータ）Ｓ２の方が、データ（コストをスコア化したデータ）Ｓ１より大きな値である場合には、新規の辞書データを生成する（辞書データを更新する）タイミングであると判断する。
換言すれば、データ（コストをスコア化したデータ）Ｓ１がデータ（圧縮率をスコア化したデータ）Ｓ２よりも大きい値である場合は、新規に辞書データを生成する（辞書データを更新する）タイミングではない。これにより、データ圧縮装置３００は、圧縮率が小さいにも拘わらず、辞書データを更新するという負荷の大きい動作（処理）を行わなくなる。このため、データ圧縮装置３００は、当該装置の負荷を軽減し、無駄を省くことができる。
以下に、第３実施形態のデータ圧縮装置３００の動作例を、図１８のフローチャートを参照して説明する。なお、図１８において、図１３に示した動作と同様の動作を示す部分には図１３と同一符号を付し、その重複説明を省略する。
図１８のフローチャートは、図１３のフローチャートと同様に、データ圧縮装置３００における制御装置（ＣＰＵ）１０１が実行するコンピュータプログラムの処理手順を示す。そのコンピュータプログラムは、前記同様に、データ圧縮装置３００が有するメモリやハードディスク等の記憶装置の記憶媒体（例えば、不揮発性の記憶媒体）に格納されている。また、当該コンピュータプログラムは、例えば、コンパクトディスク（ＣＤ）やメモリカード等の可搬タイプの記憶媒体に格納された後に、当該可搬タイプの記憶媒体からデータ圧縮装置３００の記憶装置の記憶媒体に格納される場合がある。
図１８のステップＳ０４３において、タイミング制御部３０３は、監視部１０５から受け取った使用頻度の合計値が、設定された閾値以下に低下したか否かを判定する。そして、タイミング制御部３０３は、使用頻度の合計値が閾値以下でなければ（ステップＳ０４３のＮＯ）、何もしない。そして、データ圧縮装置３００はステップＳ０２０以降の動作を繰り返す。これに対して、タイミング制御部３０３は、使用頻度の合計値が閾値以下である場合には（ステップＳ０４３のＹＥＳ）、コスト見積り部３０１及び圧縮率見積り部３０２に、それぞれ、動作を開始する指示を出す。
タイミング制御部３０３から指示を受けると、コスト見積り部３０１は前記コストを算出し、また、圧縮率見積り部３０２は前記圧縮率を算出する（ステップＳ２１０）。すなわち、ステップＳ２１０において、コスト見積り部３０１は、第１記憶部１０３に記憶されている圧縮データのデータ量と、第２記憶部１０４に記憶されている辞書データのデータ量とに基づいて、例えば式（２）式を利用して、コストＣを見積る（算出する）。そして、コスト見積り部３０１は、そのコストＣをタイミング制御部３０３に出力する。一方、圧縮率見積り部３０２は、第１記憶部１０３に記憶されている圧縮データの一部を、第２記憶部１０４に記憶されている辞書データを用いて展開する。そして、圧縮率見積り部３０２は、その展開されたデータを用いて新規の辞書データを生成し、さらにその新規の辞書データを利用して当該新規の辞書データを生成する際に利用した展開データを圧縮する。さらに、圧縮率見積り部３０２は、その圧縮したデータ（圧縮後のデータ）と、圧縮前のデータとを利用して、圧縮率（圧縮率＝（圧縮後のデータ量）÷（圧縮前のデータ量））を算出する。そして、圧縮率見積り部３０２は、その算出した圧縮率をタイミング制御部３０３に出力する。
続いて、タイミング制御部３０３は、コスト見積り部３０１から受け取ったコストＣを、式（３）を利用して、スコア化する。つまり、タイミング制御部３０３は、スコア化したデータＳ１を生成する。また、タイミング制御部３０３は、式（４）を利用して、圧縮率見積り部３０２から受け取った圧縮率Ｒをスコア化する。つまり、タイミング制御部３０３は、スコア化したデータＳ２を生成する。そして、タイミング制御部３０３は、データＳ１とデータＳ２を比較して、圧縮率の方がコストより大きいか否かを判定する（ステップＳ２２０）。タイミング制御部３０３は、圧縮率がコスト以下であるときには（ステップＳ２２０のＮＯ）、何もしない。そして、データ圧縮装置３００は、ステップＳ０２０以降の動作を繰り返す。一方、タイミング制御部３０３は、圧縮率がコストよりも大きい場合には（ステップＳ２２０のＹＥＳ）、展開部１０７に動作を開始する指示を出す。そして、データ圧縮装置３００は、前記したステップＳ０５０，Ｓ０６０，Ｓ０７０の動作を行った後に、ステップＳ０２０以降の動作を繰り返す。
この第３実施形態のデータ圧縮装置３００は、上記のように辞書データを更新するタイミングを変更できるため、圧縮効果が小さいにも拘わらず、負荷の大きい動作（辞書データを更新する動作）を行うことがなくなる。これにより、データ圧縮装置３００は、第１実施形態の効果に加えて、装置の負荷を軽減でき、かつ、無駄を省くことができるという効果が得られる。
（第４実施形態）
以下に、本発明に係る第４実施形態を、図面を参照して説明する。
図１９は、第４実施形態のデータ圧縮装置の構成を示すブロック図である。なお、第４実施形態の説明において、第１実施形態と同様な構成部分には同一符号を付し、その重複説明を省略する。
第４実施形態のデータ圧縮装置（データ圧縮システム）４００は、第１実施形態の構成に加えて、さらに、削除部（辞書データ削除手段）４０３を有する。また、データ圧縮装置４００は、第１実施形態で示した展開部１０７に代えて、部分展開部（圧縮データ一部展開手段）４０２を有し、また、第１実施形態で示したタイミング制御部１０６に代えて、タイミング制御部（辞書更新タイミング制御手段）４０１を有する。
前述した第１実施形態のデータ圧縮装置１００は、辞書データの使用頻度が低下してきた場合に辞書データを更新することによって、常に圧縮率の高い辞書データを有することができる。しかしながら、データ圧縮装置１００は、辞書データを更新する際には、毎回、全ての辞書データを作り直し、第２記憶部１０４に格納されている辞書データを一新する。このために、次のような問題が発生する。つまり、辞書データ全体の使用頻度が下がってきていたとしても、一部の辞書データに関しては、使用頻度がそれほど下がっていない、もしくは使用頻度が上がっている場合もある。このような場合に、全ての辞書データを作り直すことは無駄である。また、辞書データを更新する動作は負荷が高い。これらのことを考慮すると、データ圧縮装置は、辞書データを更新するタイミングの度に、全ての辞書データを作り直すのではなく、一部の辞書データを作り直すだけで済む場合には、その一部の辞書データだけを作り直すことが好ましい。そのように、一部の辞書データだけを作り直す場合には、データ圧縮装置１００は、負荷を軽減でき、かつ、辞書データを更新する処理速度を上げることができる。
そこで、第４実施形態のデータ圧縮装置４００は、辞書データを更新するタイミングにおいて、使用頻度が予め定めた設定値以下に低下した辞書データのみを削除し、次に述べるように新規の辞書データを生成する。
例えば、タイミング制御部４０１は、前記同様に、辞書データを更新すると判断した場合に、部分展開部４０２に動作を開始する指示を出す。このとき、タイミング制御部４０１は、図８に示したような使用頻度データに基づき、使用頻度が設定値以下に低下した辞書データを抽出する。そして、タイミング制御部４０１は、その抽出した辞書データを部分展開部４０２と削除部４０３に出力する。なお、上記のように辞書データを抽出する際に用いる設定値は、固定値でもよいし、稼働中に任意に変更できるようにしておいてもよい。
部分展開部（部分展開手段）４０２は、第１記憶部１０３に記憶されている圧縮データの一部を展開する機能を有する。すなわち、部分展開部４０２は、タイミング制御部４０１から受け取った辞書データを用いて、第１記憶部１０３に記憶されている圧縮データを展開し、展開した展開データを第３記憶部１０８に格納する。なお、タイミング制御部４０１が部分展開部４０２に出力する辞書データは、基本的には第２記憶部１０４に記憶されている辞書データの全てではなく一部である。このため、部分展開部４０２は、第１記憶部１０３に記憶されている一部の圧縮データだけを展開することになる。
また、タイミング制御部４０１が部分展開部４０２に出力する辞書データは、前記の如く、使用頻度が低下している辞書データである。このため、部分展開部４０２は、第１記憶部１０３に記憶されている圧縮データのうち、圧縮率があまり高くない圧縮データだけを展開し、圧縮率が高い圧縮データは展開しない。
部分展開部４０２は、上記のような圧縮データを展開する動作（処理）を終えると、第１記憶部１０３に記憶されている圧縮データを全て削除する。また、部分展開部４０２は、展開したデータを第３記憶部１０８に格納し終えると、辞書生成部１０９に動作を開始する指示を出力する。
削除部（削除手段）４０３は、タイミング制御部４０１から受け取った辞書データに該当する辞書データを第２記憶部１０４から削除する機能を有する。
辞書生成部１０９は、部分展開部４０２から前記指示を受け取ると、第３記憶部１０８に記憶されている展開データに基づいて、前記同様に、辞書データを生成する。そして、辞書生成部１０９は、新規に生成した辞書データを、第２記憶部１０４に格納（追記）する。辞書生成部１０９における上記機能（動作）以外の機能は、第１実施形態と同様である。
上記のように、第４実施形態のデータ圧縮装置４００は、使用頻度が低下している辞書データだけを削除し、一部の辞書データだけを更新できる。このため、データ圧縮装置４００は、全ての辞書データを作り直す場合に比べて、装置の負荷を軽減でき、また、辞書データを更新する処理速度を上げることができる。
以下に、第４実施形態のデータ圧縮システム４００の動作例（データ圧縮方法）を図２０のフローチャートを参照して説明する。なお、図２０において、図１３に示した動作と同様の動作を示す部分には図１３と同一符号を付し、その重複説明を省略する。
図２０のフローチャートは、図１３のフローチャートと同様に、データ圧縮装置４００における制御装置（ＣＰＵ）１０１が実行するコンピュータプログラムの処理手順を示す。そのコンピュータプログラムは、前記同様に、データ圧縮装置４００が有するメモリやハードディスク等の記憶装置の記憶媒体（例えば、不揮発性の記憶媒体）に格納されている。また、当該コンピュータプログラムは、例えば、コンパクトディスク（ＣＤ）やメモリカード等の可搬タイプの記憶媒体に格納された後に、当該可搬タイプの記憶媒体からデータ圧縮装置４００の記憶装置の記憶媒体に格納される場合がある。
図２０のステップＳ０４５において、タイミング制御部４０１は、監視部１０５から受け取った使用頻度の合計値が、設定された閾値以下に低下したか否かを判定する。そして、タイミング制御部４０１は、使用頻度の合計値が閾値以下でなければ（ステップＳ０４５のＮＯ）、何もしない。そして、データ圧縮装置４００は、ステップＳ０２０以降の動作を繰り返す。一方、タイミング制御部４０１は、使用頻度の合計値が閾値以下のときには（ステップＳ０４５のＹＥＳ）、監視部１０５から受け取った使用頻度データに基づき、使用頻度が設定値以下に低下した辞書データを抽出する。そして、タイミング制御部４０１は、部分展開部４０２に、その抽出した辞書データを出力すると共に、動作を開始する指示を出す。また、タイミング制御部４０１は、削除部４０３にも、抽出した辞書データを出力すると共に、動作を開始する指示を出す。
続いて、部分展開部４０２は、タイミング制御部４０１から前記指示を受け取ると、圧縮データの一部を展開する（ステップＳ３１０）。つまり、このステップＳ３１０では、部分展開部４０２は、第１記憶部１０３に記憶されている圧縮データのうち、使用頻度が設定値以下に低下した辞書データで圧縮されている圧縮データだけを、タイミング制御部４０１から受け取った辞書データを用いて展開する。そして、部分展開部４０２は、展開したデータを第３記憶部１０８に格納する。
また、ステップＳ３１０において、部分展開部４０２は、第１記憶部１０３に記憶されている圧縮データを展開し終えると、第１記憶部１０３に記憶されている圧縮データを全て削除する。また、部分展開部４０２は、展開したデータを第３記憶部１０８に格納し終えると、辞書生成部１０９に、動作を開始する指示を出す。
一方、削除部４０３は、タイミング制御部４０１から受け取った辞書データに該当するデータを第２記憶部１０４から削除する（ステップＳ３２０）。続いて、辞書生成部１０９は、辞書データを生成して第２記憶部１０４の辞書データを更新する（ステップＳ０６５）。このステップＳ０６５では、辞書生成部１０９は、第３記憶部１０８に記憶されている展開データに基づいて、辞書データを生成する。そして、辞書生成部１０９は、辞書データを生成し終えると、生成した新規の辞書データを第２記憶部１０４に格納（追記）する。
続いて、圧縮部１０２が、第３記憶部１０８に記憶されている展開データを圧縮する（ステップＳ０７５）。このステップＳ０７５では、圧縮部１０２は、新規に生成された辞書データを含む辞書データを参照しながら、展開データを圧縮する。圧縮部１０２は、圧縮したデータ（圧縮データ）を第１記憶部１０３に格納する。
以上のように、第４実施形態のデータ圧縮装置４００は、全ての辞書データを作り直すのではなく、使用頻度が下がっている辞書データだけを削除し、辞書データを更新できる。このため、データ圧縮装置４００は、全ての辞書データを１から作り直す場合に比べて、辞書データを生成する負荷を軽減でき、かつ、辞書データを更新する処理に要する時間を短縮できる（処理速度を上げることができる）という効果が得られる。
（第５実施形態）
以下に、本発明に係る第５実施形態を、図面を参照して説明する。
図２１は、本発明に係る第５実施形態のデータ圧縮装置の構成を示すブロック図である。なお、第５実施形態の説明において、第１実施形態と同一構成部分には同一符号を付し、その重複説明を省略する。
第５実施形態のデータ圧縮装置（データ圧縮システム）５００を構成する制御装置１０１は、圧縮部１０２と、切り替え部（圧縮アルゴリズム切り替え手段）５０１と、圧縮コスト見積り部（圧縮コスト見積り手段）５０２と、圧縮率見積り部（圧縮率見積り手段）５０３と、更新部（切り替え情報更新手段）５０４とを備える。
前述した各実施形態では、圧縮部１０２は、何れの場合にも、辞書式圧縮アルゴリズム（辞書式圧縮法）によりデータを圧縮している。これに対して、第５実施形態では、圧縮部１０２は、辞書式圧縮アルゴリズム以外の圧縮アルゴリズムでもデータを圧縮できる機能を備えている。また、圧縮部１０２は、辞書式圧縮アルゴリズム以外の圧縮アルゴリズムによりデータを圧縮している場合には、圧縮前のデータに対する圧縮後のデータの圧縮率を求める機能を備えている。
圧縮部１０２は、辞書式圧縮アルゴリズム以外の圧縮アルゴリズムによりデータを圧縮している場合には、総容量データ（データ発生部１１０から受け取ったデータの総容量を示すデータ）と、データ発生部１１０からデータを受け取ってからの経過時間を示すデータと、圧縮率を示すデータとを切り替え部５０１に出力する。なお、切り替え部５０１に出力する圧縮率は、圧縮部１０２が前回、切り替え部５０１に圧縮率を出力した以降にデータ発生部１１０から受け取ったデータを圧縮部１０２が圧縮した場合の圧縮率である。
圧縮部１０２は、辞書式圧縮アルゴリズムによりデータを圧縮している場合には、前述した使用状況データ（辞書データがどの程度使用されているかを示すデータ）と、総容量データ（データ発生部１１０から受け取ったデータの総容量を示すデータ）と、データ発生部１１０からデータを受け取ってからの経過時間を示すデータ（経過時間データ）とを、切り替え部５０１に出力する。
切り替え部（切り替え手段）５０１は、圧縮部１０２から受け取った各種データに基づいて、単位時間当たりのデータの流量を、例えば次式（５）を用いて計算する。
データの流量＝｛（今回の入力データの総容量）−（前回の入力データの総容量）｝÷｛（今回の経過時間）−（前回の経過時間）｝・・・・・（５）
切り替え部５０１は、上記のような計算を行うために、圧縮部１０２から受け取った総容量データと経過時間データを記憶しておく。そして、切り替え部５０１は、圧縮部１０２から新たに総容量データと経過時間データを受け取ると、古い総容量データと経過時間データを削除する。また、切り替え部５０１は、圧縮部１０２から初めて総容量データと経過時間データを受け取った場合には、データの流量を次式（６）を利用して計算する。
データの流量＝（今回の入力データの総容量）÷（今回の経過時間）・・・・・（６）
そして、切り替え部５０１は、算出したデータの流量が、所定の閾値Ｔ１以下に低くなった場合、あるいは、閾値Ｔ１よりも大きい所定の閾値Ｔ２以上に高くなった場合には、圧縮コスト見積り部５０２と圧縮率見積り部５０３に、動作を開始する指示を出す。
圧縮コスト見積り部（圧縮コスト見積り手段）５０２は、データを圧縮する際に要するコストを見積もる（推定する）機能を有する。すなわち、圧縮コスト見積り部５０２は、切り替え部５０１から指示を受けると、動作を開始する。圧縮コスト見積り部５０２は、予め設定されている複数の圧縮アルゴリズムのそれぞれでもって、データを圧縮する。そして、圧縮コスト見積り部５０２は、そのデータを圧縮する処理に要する計算量（コスト）を算出する。具体的には、圧縮コスト見積り部５０２は、第１記憶部１０３に記憶されている圧縮データの一部を、第２記憶部１０４に記憶されている辞書データを利用して展開し、これにより、圧縮前のデータを生成する。なお、圧縮コスト見積り部５０２が展開するデータのデータ量は、第１記憶部１０３に記憶されている圧縮データ全体に対する予め設定した割合に対応するデータ量であってもよいし、予め定めた固定量であってもよい。
圧縮コスト見積り部５０２は、上記のように生成した圧縮前のデータに対して、予め設定されている複数の圧縮アルゴリズムのそれぞれについて、次式（７）を利用して、計算量ＡＣを算出する。
ＡＣ＝（圧縮対象のデータ量）÷（圧縮にかかった時間）・・・・・（７）
なお、式（７）において、「圧縮対象のデータ量」は、圧縮コスト見積り部５０２が上記のように展開して得られた圧縮前のデータのデータ量を示す。また、「圧縮にかかった時間」は、その圧縮対象のデータを圧縮するのに要する時間を示す。
そして、圧縮コスト見積り部５０２は、計算量ＡＣを、例えば式（８）あるいは式（９）を利用して、スコア化し、スコア化したデータを、切り替え部５０１に出力する。
ＳＡ１＝Ｗ１１×ＡＣ１・・・・・（８）
ＳＡ２＝Ｗ１２×ＡＣ２・・・・・（９）
式（８）において、ＳＡ１は圧縮アルゴリズム１を利用してデータを圧縮した場合に要するコストをスコア化したデータ（圧縮コストスコア）を示す。Ｗ１１は重み定数を示す。ＡＣ１は圧縮アルゴリズム１に対応する前記計算量ＡＣを示す。式（９）において、ＳＡ２は圧縮アルゴリズム２を利用してデータを圧縮した場合に要するコストをスコア化したデータ（圧縮コストスコア）を示す。Ｗ１２は重み定数を示す。ＡＣ２は圧縮アルゴリズム２に対応する前記計算量ＡＣを示す。
なお、スコア化したデータを式（８）又は式（９）を利用して得る手法は、データをスコア化する手法の一例にすぎない。圧縮コスト見積り部５０２がデータをスコア化する手法は、計算量ＡＣを用いて算出する手法であれば、特に限定されない。
圧縮率見積り部（圧縮率見積り手段）５０３は、新規の辞書データを生成した場合における圧縮率を見積もる（推定する）機能を有する。すなわち、圧縮率見積り部５０３は、切り替え部５０１から指示を受け取ると、動作を開始する。圧縮率見積り部５０３は、圧縮率を算出する。具体的には、例えば、圧縮率見積り部５０３は、圧縮コスト見積り部５０２と同様に、第１記憶部１０３に記憶されている圧縮データの一部を利用して、圧縮前のデータを生成する。そして、圧縮率見積り部５０３は、圧縮前のデータに対して、予め設定されている複数の圧縮アルゴリズムのそれぞれについて、圧縮前のデータ量に対して圧縮後のデータ量がどの程度小さくなっているかを示す指標である圧縮率を算出する。そして、圧縮率見積り部５０３は、その算出した圧縮率を例えば式（１０）や式（１１）を利用してスコア化し、スコア化したデータを切り替え部５０１に出力する。
ＳＢ１＝Ｗ１３×Ｒ１・・・・・（１０）
ＳＢ２＝Ｗ１４×Ｒ２・・・・・（１１）
式（１０）において、ＳＢ１は圧縮アルゴリズム１を利用して圧縮したデータの圧縮率をスコア化したデータ（圧縮率スコア）を示す。Ｗ１３は重み定数を示す。Ｒ１は圧縮アルゴリズム１に対応する前記圧縮率を示す。式（１１）において、ＳＢ２は圧縮アルゴリズム２を利用して圧縮したデータの圧縮率をスコア化したデータ（圧縮率スコア）を示す。Ｗ１４は重み定数を示す。Ｒ２は圧縮アルゴリズム１に対応する前記圧縮率を示す。
スコア化したデータを式（１０）又は式（１１）を利用して得る手法は、データをスコア化する手法の一例にすぎない。圧縮率見積り部５０３がデータをスコア化する手法は、圧縮率を用いて算出する手法であれば、特に限定されない。
切り替え部５０１は、圧縮コスト見積り部５０２と圧縮率見積り美５０３からそれぞれ受け取ったデータに基づいて、圧縮アルゴリズムを切り替えるか、あるいは、どの圧縮アルゴリズムに切り替えるかを判断する機能を有する。
具体的には、辞書データの使用頻度が低下してきていることに起因して圧縮率が低下している場合には、より圧縮率の高いアルゴリズムに切り替えることが必要である。この場合には、切り替え部５０１は、現在の圧縮率スコアよりも高い圧縮率スコア、および、現在の圧縮コストスコアよりも高い圧縮コストスコアを有する圧縮アルゴリズムを選択する。切り替え候補の圧縮アルゴリズムが複数ある場合には、切り替え部５０１は、例えば、次式（１２）を利用して総合スコアを算出する。
ＳＣ１＝Ｗ１５×ＳＡ１＋Ｗ１６×ＳＢ１・・・・・（１２）
式（１２）において、ＳＣ１は圧縮アルゴリズム１に対応する総合スコアを示す。Ｗ１５、Ｗ１６は重み定数を示す。ＳＡ１は圧縮アルゴリズム１に対応する圧縮コストスコアを示す。ＳＢ１は圧縮アルゴリズム１に対応する圧縮率スコアを示す。重み定数Ｗ１５、Ｗ１６の値は予め設定される。当該重み定数Ｗ１５、Ｗ１６の値は、圧縮コスト（実施コスト）を重視するか、圧縮率を重視するかに応じて、適宜に調整される。つまり、圧縮率よりも圧縮コストを重視する場合には、重み定数Ｗ１５は大きく設定される。これに対して、圧縮コストよりも圧縮率を重視する場合には、重み定数Ｗ１６が大きく設定される。
一方、データの流量が閾値Ｔ１以下に低くなった場合には、データの流量が少なくなっているので、次のような圧縮アルゴリズムを採用することが好ましい。その圧縮アルゴリズムとは、圧縮に時間を要する高負荷な圧縮アルゴリズムであるが、圧縮率が高い圧縮アルゴリズムである。
切り替え部５０１は、そのような場合には、圧縮コストスコアが低くても、圧縮率スコアが高い圧縮アルゴリズムを選択する。具体的には、切り替え部５０１は、式（１２）と同様の数式を利用して、各圧縮アルゴリズムについてそれぞれ総合スコアを算出する。なお、その数式における重み定数Ｗ１６を大きく設定することによって、圧縮率スコアが圧縮コストスコアよりも重視される。
切り替え部５０１は、それら総合スコアを比較した結果に基づいて、どの圧縮アルゴリズムに切り替えるかを決定する。
また、データの流量が閾値Ｔ２以上に多くなった場合には、データの流量が大きくなっているので、次のような圧縮アルゴリズムを採用することが好ましい。その圧縮アルゴリズムとは、圧縮率が低くとも低負荷な圧縮アルゴリズムである。
そのような場合には、切り替え部５０１は、圧縮率スコアが低くても、圧縮コストスコアが高い圧縮アルゴリズムを選択する。具体的には、切り替え部５０１は、式（１２）と同様の数式を利用して、各圧縮アルゴリズムについてそれぞれ総合スコアを算出する。なお、その数式における重み定数Ｗ１５を大きく設定することによって、圧縮コストスコアが圧縮率スコアよりも重視される。
切り替え部５０１は、上記のように選択した圧縮アルゴリズムを、圧縮部１０２で使用している新たな圧縮アルゴリズムとして設定する。つまり、切り替え部５０１は、データを圧縮する際に利用する圧縮アルゴリズムを切り替える。また、切り替え部５０１は、更新部５０４に向けて、動作を開始する指示を出すと共に、切り替え後の圧縮アルゴリズムを示す情報を出力する。
更新部（更新手段）５０４は、切り替え部５０１から指示を受けると動作を開始する。更新部５０４は、第１記憶部１０３に記憶されているデータのどこからどこまでがどの圧縮アルゴリズムで圧縮されたデータであるかを示す情報を更新する。また、更新部５０４は、第２記憶部１０４に記憶されている辞書データのどこからどこまでがどの圧縮アルゴリズムで使用している辞書データであるのかという情報を更新する。
具体的には、更新部５０４が切り替え手段５０１から受け取った圧縮アルゴリズムの情報は、少なくとも圧縮アルゴリズム名の情報を含む。そして、更新部５０４は、第１記憶部１０３及び第２記憶部１０４のそれぞれに、どのデータがどの圧縮アルゴリズムに関連するデータであるのかが分かるように、圧縮アルゴリズムの情報を記憶（付加）する。
例えば、第１記憶部１０３に記憶されているデータは、以下のように、圧縮アルゴリズム毎に分類される。
［第１記憶部１０３］
圧縮アルゴリズムＡ
ＡＢＣＡＢＡＢＣ
ＤＡＣＥＦＤＡＳ
ＡＸＭＣＡＬＤＡ
圧縮アルゴリズムＢ
ＡＮＳＫＡＮＣＸ
ＣＩＫＡＤＬＡＬ
圧縮アルゴリズムＣ
ＤＫＬＡＢＮＫＣ
ＡＬＫＤＫＡＪＤ
ＳＬＤＫＪＡＬＬ
第２記憶部１０４に関しても同様である。
なお、圧縮部１０２が最初に採用する圧縮アルゴリズムは予め設定されている。また、この第５実施形態においては、圧縮部１０２は、データを圧縮する機能に加えて、辞書データを生成し、かつ、生成後の辞書データを第２記憶部１０４に登録する機能をも有する。
以下に、第５実施形態のデータ圧縮装置５００の動作例（データ圧縮方法）を、図２２のフローチャートを参照して説明する。なお、図２２において、図１８に示した動作と同様の動作を示す部分には図１８と同一符号を付し、その重複説明を省略する。
図２２のフローチャートは、図１３のフローチャートと同様に、データ圧縮装置５００における制御装置（ＣＰＵ）１０１が実行するコンピュータプログラムの処理手順を示す。そのコンピュータプログラムは、前記同様に、データ圧縮装置５００が有するメモリやハードディスク等の記憶装置の記憶媒体（例えば、不揮発性の記憶媒体）に格納されている。また、当該コンピュータプログラムは、例えば、コンパクトディスク（ＣＤ）やメモリカード等の可搬タイプの記憶媒体に格納された後に、当該可搬タイプの記憶媒体からデータ圧縮装置５００の記憶装置の記憶媒体に格納される場合がある。
図２２に示すステップＳ０４０において切り替え部５０１が辞書データの使用頻度が低下していないと判断した場合には、当該切り替え部５０１は、データの流量を計算する（ステップＳ４１０）。つまり、切り替え部５０１は、圧縮部１０２から受け取った総容量データおよび経過時間データに基づいて、前記式（６）を利用してデータ流量を算出する。そして、切り替え部５０１は、算出したデータ流量が閾値Ｔ１以下に低くなっているか、あるいは、データ流量が閾値Ｔ２以上に多くなっているかを判定する。これにより、切り替え部５０１は、データの流量が閾値Ｔ１以下もしくは閾値Ｔ２以上であると判断した場合には、圧縮コスト見積り部５０２と圧縮率見積り部５０３に動作を開始する指示を出す。
これにより、圧縮コスト見積り部５０２と圧縮率見積り部５０３が動作を開始する（ステップＳ４２０）。つまり、圧縮コスト見積り部５０２は、予め設定されている複数の圧縮アルゴリズムのそれぞれについて、前述したように、圧縮コストスコアを算出する。また、圧縮率見積り部５０３は、予め設定されている複数の圧縮アルゴリズムのそれぞれについて、前述したように、圧縮率スコアを算出する。
切り替え部５０１は、算出された圧縮コストスコアと圧縮率スコアに基づいて、圧縮アルゴリズムを切り替えるか否か（現在使用している圧縮アルゴリズムよりも適切な圧縮アルゴリズムが有るか否か）を判定する（ステップＳ４３０）。
切り替え部５０１は、圧縮アルゴリズムを切り替えると判定した場合には、前述したように、切り替え後の圧縮アルゴリズムを選択する。そして、切り替え部５０１は、データを圧縮する際に利用する圧縮アルゴリズムを、選択した圧縮アルゴリズムに切り替える（ステップＳ４４０）。また、切り替え部５０１は、第１記憶部１０３および第２記憶部１０４のそれぞれに、どのデータがどの圧縮アルゴリズムに関連するデータであるのかが分かるように、圧縮アルゴルリズムの情報を記憶（付加）する。
このように、第５実施形態によれば、切り替え部５０１は、データの流量が閾値Ｔ１以下である場合には、圧縮率を重視して、圧縮アルゴリズムを選択する。また、切り替え部５０１は、データの流量が閾値Ｔ２以上である場合には、圧縮コストを重視して、圧縮アルゴリズムを選択する。このように、切り替え部５０１は、圧縮コストと圧縮率に基づいて圧縮アルゴリズムを選択する。これにより、切り替え部５０１は、辞書式圧縮アルゴリズム以外の圧縮アルゴリズムをも含む複数の圧縮アルゴリズムの中から、有効度の高い圧縮アルゴリズムを選択できる。つまり、切り替え部５０１は、適切なタイミングで、有効度の高い圧縮アルゴリズムに切り替えることができる。このため、第５実施形態のデータ圧縮装置５００は、効率の良いデータ圧縮を維持できる。
なお、本発明は上記した各実施形態に限定されるものではなく、様々な実施の形態を採り得る。例えば、データ圧縮装置は、図２３に示すような構成を有していてもよい。すなわち、このデータ圧縮装置６００は、圧縮部（圧縮手段）６０１と、監視部（監視手段）６０２と、タイミング制御部（タイミング制御手段）６０３と、辞書生成部（辞書生成手段）６０４とを有している。圧縮部６０１は、圧縮対象のデータを、予め与えられている辞書データに基づいて圧縮する機能を備えている。監視部６０２は、圧縮部６０１が前記データを圧縮する場合における前記辞書データの使用頻度を算出する機能を備えている。タイミング制御部６０３は、算出した前記使用頻度が予め設定された閾値よりも低下した場合に、前記辞書データを更新する指示を出す機能を備えている。辞書生成部６０４は、その指示を受けて、圧縮部１０２が圧縮したデータを利用して前記辞書データを新規に生成する機能を備えている。
このデータ圧縮装置６００においても、前記各実施形態と同様に、効率の良いデータ圧縮率を維持できる。
以上、実施形態を参照して本願発明を説明したが、本願発明は上記実施形態に限定されるものではない。本願発明の構成や詳細には、本願発明のスコープ内で当業者が理解し得る様々な変更をすることができる。
なお、この出願は、２０１０年２月２３日に出願された日本出願の特願２０１０−０３６８０６を基礎とする優先権を主張し、その開示の全てをここに取り込む。
さらに、上記の実施形態の一部又は全部は、以下の付記のようにも記載されうるが、以下には限られない。
（付記１）
逐次的に入力される圧縮対象のデータを、辞書データ記憶手段に記憶されている辞書データを参照しながら圧縮して、圧縮データを出力する圧縮ステップと、
前記圧縮ステップで前記圧縮対象のデータを圧縮する際に参照する前記辞書データの使用頻度を計算する辞書使用頻度監視ステップと、
前記辞書使用頻度監視ステップによる前記使用頻度の計算結果が設定された閾値より低下したときに、辞書の更新を指示する辞書更新タイミング制御ステップと、
前記圧縮ステップで得られた前記圧縮データを、圧縮前のデータに展開するデータ展開ステップと、
前記辞書更新タイミング制御ステップによる前記辞書の更新の指示に従って、前記データ展開ステップで得られた前記圧縮前のデータから新たな辞書データを構築して、前記辞書データ記憶手段に記憶されていた辞書データを、構築した前記新たな辞書データに置き換える辞書構築ステップと
を含むデータ圧縮方法。
（付記２）
前記辞書使用頻度監視ステップは、前記圧縮ステップで前記圧縮対象のデータを圧縮する際に用いた前記辞書データと、その辞書データ毎の使用回数と、前記圧縮対象のデータの総量とを入力として受け、各辞書データ毎に前記使用回数を前記圧縮対象のデータの総量で除算して、前記使用頻度の計算結果を得る付記１記載のデータ圧縮方法。
（付記３）
前記辞書更新タイミング制御ステップは、前記辞書使用頻度監視ステップにより得られた全ての前記辞書データの使用頻度の計算結果の合計値が前記閾値よりも小さいときに、前記辞書の更新を指示する付記１又は２記載のデータ圧縮方法。
（付記４）
逐次的に入力される圧縮対象のデータを、辞書データ記憶手段に記憶されている辞書データを参照しながら圧縮して、圧縮データを出力する圧縮ステップと、
前記圧縮ステップで前記圧縮対象のデータを圧縮する際に参照する前記辞書データの使用頻度を計算する辞書使用頻度監視ステップと、
前記辞書使用頻度監視ステップによる前記使用頻度の計算結果が閾値より低下したときに、辞書の更新を指示する辞書更新タイミング制御ステップと、
前記圧縮ステップで得られる前記圧縮データを、圧縮前のデータに展開するデータ展開ステップと、
使用するリソースの空き容量を出力するリソース監視ステップと、
前記リソース監視ステップで出力される前記リソースの空き容量に応じて、前記辞書更新タイミング制御ステップで使用する前記閾値の値を可変設定するパラメータ調整ステップと、
前記辞書更新タイミング制御ステップによる前記辞書の更新の指示に従って、前記データ展開ステップで得られる前記圧縮前のデータから新たな辞書データを構築して、前記辞書データ記憶手段に記憶されていた辞書データを、構築した前記新たな辞書データに置き換える辞書構築ステップと
を含むデータ圧縮方法。
（付記５）
前記パラメータ調整ステップは、前記リソース監視ステップからの前記リソースの空き容量が大きいほど前記辞書更新タイミング制御ステップで使用する前記閾値を小さな値に可変設定し、前記リソースの空き容量が小さいほど前記閾値を大きな値に可変設定する付記４記載のデータ圧縮方法。
（付記６）
逐次的に入力される圧縮対象のデータを、辞書データ記憶手段に記憶されている辞書データを参照しながら圧縮して、圧縮データを出力する圧縮ステップと、
前記圧縮ステップで前記圧縮対象のデータを圧縮する際に参照する前記辞書データの使用頻度を計算する辞書使用頻度監視ステップと、
前記辞書使用頻度監視ステップによる前記使用頻度の計算結果が閾値より低下したときに、前記圧縮データのデータ量と前記辞書データ記憶手段に記憶されている前記辞書データのデータ量とに応じた辞書構築コストを見積る辞書構築コスト見積りステップと、
前記辞書使用頻度監視ステップによる前記使用頻度の計算結果が閾値より低下したときに、前記辞書データを更新した際の圧縮率を見積る圧縮率見積りステップと、
前記辞書コストと前記圧縮率とを比較し、その比較結果に応じて新たな辞書の構築を行うか否かを決定する辞書更新タイミング制御ステップと、
前記圧縮ステップから出力される前記圧縮データを、圧縮前のデータに展開するデータ展開ステップと、
前記辞書更新タイミング制御ステップにより前記新たな辞書を構築する決定がなされたときに、前記データ展開ステップで得られる前記圧縮前のデータから新たな辞書データを構築して、前記辞書データ記憶手段に記憶されていた辞書データを、構築した前記新たな辞書データに置き換える辞書構築ステップと
を含むデータ圧縮方法。
（付記７）
前記圧縮率見積りステップは、前記圧縮データの一部を前記辞書データ記憶手段に記憶されている前記辞書データを用いて展開した圧縮前のデータと、その圧縮前のデータを用いて構築した新たな辞書を用いて圧縮した圧縮後のデータとの比率を前記圧縮率として見積る付記６記載のデータ圧縮方法。
（付記８）
前記辞書更新タイミング制御ステップは、前記辞書コストをスコア化した第１の値と、前記圧縮率をスコア化した第２の値とを比較し、前記第２の値の方が前記第１の値よりも大きいときにのみ、前記新たな辞書の構築を行うことを決定する付記６又は７記載のデータ圧縮方法。
（付記９）
前記データ展開ステップは、
前記圧縮手段から出力される前記圧縮データを記憶する圧縮データ記憶ステップと、
前記辞書更新タイミング制御ステップによる前記辞書の更新の指示に従って、前記圧縮データ記憶ステップで記憶された前記圧縮データを、前記辞書データ記憶手段に記憶されている前記辞書データを参照しながら展開して圧縮前のデータである展開データを得る圧縮データ展開ステップと、
前記圧縮データ展開ステップにより得られた前記展開データを記憶する展開データ記憶ステップと
を備え、前記辞書構築ステップは、前記圧縮データ展開ステップにより前記圧縮データ記憶ステップで記憶された全ての前記圧縮データの展開終了により動作を開始し、前記展開データ記憶で記憶された前記圧縮前のデータである前記展開データに基づき、前記新たな辞書データを構築する付記１乃至８のうち、いずれか一項に記載のデータ圧縮方法。
（付記１０）
逐次的に入力される圧縮対象のデータを、辞書データ記憶手段に記憶されている辞書データを参照しながら圧縮して、圧縮データを出力する圧縮ステップと、
前記圧縮ステップで前記圧縮対象のデータを圧縮する際に参照する前記辞書データの使用頻度を各辞書データ毎に計算する辞書使用頻度監視ステップと、
前記辞書使用頻度監視ステップにより計算された各辞書データ毎の前記使用頻度のうち、設定された閾値より低下している使用頻度に対応した辞書データを抽出して出力する辞書更新タイミング制御ステップと、
前記辞書更新タイミング制御ステップにより得られた前記辞書データを用いて、前記圧縮で得られる前記圧縮データの一部を、圧縮前のデータに展開するデータ一部展開ステップと、
前記辞書更新タイミング制御ステップにより得られた前記辞書データを、前記辞書データ記憶手段に記憶されている複数の辞書データの中から削除する辞書データ削除ステップと、
前記データ一部展開ステップから出力される一部の前記圧縮前のデータから新たな辞書データを構築して、この新たな辞書データを前記辞書データ記憶手段に追記する辞書構築ステップと
を含むデータ圧縮方法。
（付記１１）
前記データ一部展開ステップは、
前記圧縮ステップで得られた前記圧縮データを記憶する圧縮データ記憶ステップと、
前記辞書更新タイミング制御ステップで得られた前記辞書データを用いて、前記圧縮データ記憶ステップで記憶された前記圧縮データの一部を展開して一部の圧縮前のデータを得る圧縮データ一部展開ステップと、
前記圧縮データ一部展開ステップにより得られた前記一部の圧縮前のデータを記憶する展開データ記憶ステップと
を備え、
前記辞書構築ステップは、前記圧縮データ一部展開ステップにより前記圧縮データ記憶ステップで記憶された前記圧縮データの一部の展開終了により動作を開始し、前記展開データ記憶ステップで記憶された前記一部の圧縮前のデータに基づき、前記新たな辞書データを構築する付記１０記載のデータ圧縮方法。
（付記１２）
逐次的に入力される圧縮対象のデータを、指定された圧縮アルゴリズムに従い圧縮して、圧縮データを出力する圧縮ステップと、
前記圧縮ステップによる圧縮前のデータに対して、予め設定されている複数の圧縮アルゴリズムのそれぞれにより圧縮して、圧縮前のデータ量と圧縮にかかった時間の比である圧縮コストを、前記複数の圧縮アルゴリズムのそれぞれについて見積る圧縮コスト見積りステップと、
前記圧縮ステップによる圧縮前のデータに対して、予め設定されている前記複数の圧縮アルゴリズムのそれぞれにより圧縮したときの圧縮率を算出する圧縮率見積りステップと、
前記圧縮ステップで前記圧縮対象のデータを圧縮する際に参照する辞書データの使用頻度が閾値より低下したときに、前記圧縮コスト見積りステップで得られた前記圧縮コストと前記圧縮率見積りステップで得られた前記圧縮率とに基づいて、前記複数の圧縮アルゴリズムのうち現在使用している圧縮アルゴリズムよりも圧縮率及び圧縮コストが高い圧縮アルゴリズムを前記圧縮ステップで用いる圧縮アルゴリズムに選択する圧縮アルゴリズム切り替えステップと
を含むデータ圧縮方法。
（付記１３）
前記圧縮アルゴリズム切り替えステップは、前記圧縮ステップにより圧縮される前記圧縮対象のデータのデータ流量を算出し、そのデータ流量が第１の閾値以下、又は前記第１の閾値より大なる第２の閾値以上となったときに、前記圧縮コスト見積りステップで得られた前記圧縮コストと前記圧縮率見積りステップで得られた前記圧縮率とに基づいて、前記データ流量が前記第１の閾値以下のときには前記複数の圧縮アルゴリズムのうち現在使用している圧縮アルゴリズムよりも少なくとも前記圧縮率の高い圧縮アルゴリズムを前記圧縮ステップで用いる圧縮アルゴリズムに選択し、前記データ流量が前記第２の閾値以上のときには前記複数の圧縮アルゴリズムのうち現在使用している圧縮アルゴリズムよりも少なくとも前記圧縮コストの高い圧縮アルゴリズムを前記圧縮ステップで用いる圧縮アルゴリズムに選択する付記１２記載のデータ圧縮方法。
（付記１４）
圧縮のための辞書データを記憶する辞書データ記憶手段と、
逐次的に入力される圧縮対象のデータを、前記辞書データ記憶手段に記憶されている前記辞書データを参照しながら圧縮して、圧縮データを出力する圧縮手段と、
前記圧縮手段から出力される前記圧縮データを、圧縮前のデータに展開するデータ展開手段と、
前記圧縮手段が前記圧縮対象のデータを圧縮する際に参照する前記辞書データの使用頻度を計算する辞書使用頻度監視手段と、
前記辞書使用頻度監視手段による前記使用頻度の計算結果が設定された閾値より低下したときに、辞書の更新を指示する辞書更新タイミング制御手段と、
前記辞書更新タイミング制御手段から出力された前記辞書の更新の指示に従って、前記データ展開手段から出力される前記圧縮前のデータから新たな辞書データを構築して、前記辞書データ記憶手段に記憶されていた辞書データを、構築した前記新たな辞書データに置き換える辞書構築手段と
を有するデータ圧縮システム。
（付記１５）
前記辞書使用頻度監視手段は、前記圧縮手段が前記圧縮対象のデータを圧縮する際に用いた前記辞書データと、その辞書データ毎の使用回数と、前記圧縮対象のデータの総量とを前記圧縮手段から入力として受け、各辞書データ毎に前記使用回数を前記圧縮対象のデータの総量で除算して、前記使用頻度の計算結果を得る付記１４記載のデータ圧縮システム。
（付記１６）
前記辞書更新タイミング制御手段は、前記辞書使用頻度監視手段から入力された全ての前記辞書データの使用頻度の計算結果の合計値が前記閾値よりも小さいときに、前記辞書の更新を指示する付記１４又は１５記載のデータ圧縮システム。
（付記１７）
圧縮のための辞書データを記憶する辞書データ記憶手段と、
逐次的に入力される圧縮対象のデータを、前記辞書データ記憶手段に記憶されている前記辞書データを参照しながら圧縮して、圧縮データを出力する圧縮手段と、
前記圧縮手段から出力される前記圧縮データを、圧縮前のデータに展開するデータ展開手段と、
前記圧縮手段が前記圧縮対象のデータを圧縮する際に参照する前記辞書データの使用頻度を計算する辞書使用頻度監視手段と、
前記辞書使用頻度監視手段による前記使用頻度の計算結果が閾値より低下したときに、辞書の更新を指示する辞書更新タイミング制御手段と、
使用するリソースの空き容量を出力するリソース監視手段と、
前記リソース監視手段からの前記リソースの空き容量に応じて、前記辞書更新タイミング制御手段で使用する前記閾値の値を可変設定するパラメータ調整手段と、
前記辞書更新タイミング制御手段から出力された前記辞書の更新の指示に従って、前記データ展開手段から出力される前記圧縮前のデータから新たな辞書データを構築して、前記辞書データ記憶手段に記憶されていた辞書データを、構築した前記新たな辞書データに置き換える辞書構築手段と
を有するデータ圧縮システム。
（付記１８）
前記パラメータ調整手段は、前記リソース監視手段からの前記リソースの空き容量が大きいほど前記辞書更新タイミング制御手段で使用する前記閾値を小さな値に可変設定し、前記リソースの空き容量が小さいほど前記閾値を大きな値に可変設定する付記１７記載のデータ圧縮システム。
（付記１９）
圧縮のための辞書データを記憶する辞書データ記憶手段と、
逐次的に入力される圧縮対象のデータを、前記辞書データ記憶手段に記憶されている前記辞書データを参照しながら圧縮して、圧縮データを出力する圧縮手段と、
前記圧縮手段から出力される前記圧縮データを、圧縮前のデータに展開するデータ展開手段と、
前記圧縮手段が前記圧縮対象のデータを圧縮する際に参照する前記辞書データの使用頻度を計算する辞書使用頻度監視手段と、
前記圧縮データのデータ量と前記辞書データ記憶手段に記憶されている前記辞書データのデータ量とに応じた辞書構築コストを見積もる辞書構築コスト見積り手段と、
前記辞書データを更新した際の圧縮率を見積もる圧縮率見積り手段と、
前記辞書使用頻度監視手段による前記使用頻度の計算結果が閾値より低下したときに、前記辞書構築コスト見積り手段及び前記圧縮率見積り手段のそれぞれに動作を指示することにより得られた、前記辞書構築コスト構築手段からの前記辞書コストと前記圧縮率見積り手段からの前記圧縮率とを比較し、その比較結果に応じて新たな辞書の構築を行うか否かを決定する辞書更新タイミング制御手段と、
前記辞書更新タイミング制御手段により前記新たな辞書を構築する決定がなされたときに、前記データ展開手段から出力される前記圧縮前のデータから新たな辞書データを構築して、前記辞書データ記憶手段に記憶されていた辞書データを、構築した前記新たな辞書データに置き換える辞書構築手段と
を有するデータ圧縮システム。
（付記２０）
前記圧縮率見積り手段は、前記圧縮データの一部を前記辞書データ記憶手段に記憶されている前記辞書データを用いて展開した圧縮前のデータと、その圧縮前のデータを用いて構築した新たな辞書を用いて圧縮した圧縮後のデータとの比率を前記圧縮率として見積もる付記１９記載のデータ圧縮システム。
（付記２１）
前記辞書更新タイミング制御手段は、前記辞書構築コスト構築手段からの前記辞書コストをスコア化した第１の値と、前記圧縮率見積り手段からの前記圧縮率をスコア化した第２の値とを比較し、前記第２の値の方が前記第１の値よりも大きいときにのみ、前記新たな辞書の構築を行うことを決定する付記１９又は２０記載のデータ圧縮システム。
（付記２２）
前記データ展開手段は、
前記圧縮手段から出力される前記圧縮データを記憶する圧縮データ記憶手段と、
前記辞書更新タイミング制御手段から出力された前記辞書の更新の指示に従って、前記圧縮データ記憶手段に記憶されている前記圧縮データを、前記辞書データ記憶手段に記憶されている前記辞書データを参照しながら展開して圧縮前のデータである展開データを得る圧縮データ展開手段と、
前記圧縮データ展開手段により得られた前記展開データを記憶する展開データ記憶手段と
を備え、前記辞書構築手段は、前記圧縮データ展開手段により前記圧縮データ記憶手段に記憶されている全ての前記圧縮データの展開終了により動作を開始し、前記展開データ記憶手段に記憶されている前記圧縮前のデータである前記展開データに基づき、前記新たな辞書データを構築する付記１４乃至２１のうち、いずれか一項記載のデータ圧縮システム。
（付記２３）
圧縮のための複数の辞書データを記憶する辞書データ記憶手段と、
逐次的に入力される圧縮対象のデータを、前記辞書データ記憶手段に記憶されている前記辞書データを参照しながら圧縮して、圧縮データを出力する圧縮手段と、
前記圧縮手段が前記圧縮対象のデータを圧縮する際に参照する前記辞書データの使用頻度を各辞書データ毎に計算する辞書使用頻度監視手段と、
前記辞書使用頻度監視手段による各辞書データ毎の前記使用頻度のうち、設定された閾値より低下している使用頻度に対応した辞書データを抽出して出力する辞書更新タイミング制御手段と、
前記辞書更新タイミング制御手段から出力された前記辞書データを用いて、前記圧縮手段から出力される前記圧縮データの一部を、圧縮前のデータに展開するデータ一部展開手段と、
前記辞書更新タイミング制御手段から出力された前記辞書データを、前記辞書データ記憶手段に記憶されている複数の辞書データの中から削除する辞書データ削除手段と、
前記データ一部展開手段から出力される一部の前記圧縮前のデータから新たな辞書データを構築して、この新たな辞書データを前記辞書データ記憶手段に追記する辞書構築手段と
を有するデータ圧縮システム。
（付記２４）
前記データ一部展開手段は、
前記圧縮手段から出力される前記圧縮データを記憶する圧縮データ記憶手段と、
前記辞書更新タイミング制御手段から出力された前記辞書データを用いて、前記圧縮データ記憶手段に記憶されている前記圧縮データの一部を展開して一部の圧縮前のデータを得る圧縮データ一部展開手段と、
前記圧縮データ一部展開手段により得られた前記一部の圧縮前のデータを記憶する展開データ記憶手段と
を備え、
前記辞書構築手段は、前記圧縮データ一部展開手段により前記圧縮データ記憶手段に記憶されている前記圧縮データの一部の展開終了により動作を開始し、前記展開データ記憶手段に記憶されている前記一部の圧縮前のデータに基づき、前記新たな辞書データを構築する付記２３記載のデータ圧縮システム。
（付記２５）
圧縮アルゴリズムが辞書式のときに用いられる辞書データを記憶する辞書データ記憶手段と、
逐次的に入力される圧縮対象のデータを、指定された圧縮アルゴリズムに従い圧縮して、圧縮データを出力する圧縮手段と、
前記圧縮手段の圧縮前のデータに対して、予め設定されている複数の圧縮アルゴリズムのそれぞれにより圧縮して、圧縮前のデータ量と圧縮にかかった時間の比である圧縮コストを、前記複数の圧縮アルゴリズムのそれぞれについて見積もる圧縮コスト見積り手段と、
前記圧縮手段の圧縮前のデータに対して、予め設定されている前記複数の圧縮アルゴリズムのそれぞれにより圧縮したときの圧縮率を算出する圧縮率見積り手段と、
前記圧縮手段が前記圧縮対象のデータを圧縮する際に参照する前記辞書データの使用頻度が閾値より低下したときに、前記圧縮コスト見積り手段と前記圧縮率見積り手段に動作を指示することにより得られた、前記圧縮コスト見積り手段からの前記圧縮コストと前記圧縮率見積り手段からの前記圧縮率とに基づいて、前記複数の圧縮アルゴリズムのうち現在使用している圧縮アルゴリズムよりも圧縮率及び圧縮コストが高い圧縮アルゴリズムを前記圧縮手段で用いる圧縮アルゴリズムに選択する圧縮アルゴリズム切り替え手段と
を有するデータ圧縮システム。
（付記２６）
前記圧縮アルゴリズム切り替え手段は、前記圧縮手段に入力される前記圧縮対象のデータのデータ流量を算出し、そのデータ流量が第１の閾値以下、又は前記第１の閾値より大なる第２の閾値以上となったときに、前記圧縮コスト見積り手段と前記圧縮率見積り手段に動作を指示することにより得られた、前記圧縮コスト見積り手段からの前記圧縮コストと前記圧縮率見積り手段からの前記圧縮率とに基づいて、前記データ流量が前記第１の閾値以下のときには前記複数の圧縮アルゴリズムのうち現在使用している圧縮アルゴリズムよりも少なくとも前記圧縮率の高い圧縮アルゴリズムを前記圧縮手段で用いる圧縮アルゴリズムに選択し、前記データ流量が前記第２の閾値以上のときには前記複数の圧縮アルゴリズムのうち現在使用している圧縮アルゴリズムよりも少なくとも前記圧縮コストの高い圧縮アルゴリズムを前記圧縮手段で用いる圧縮アルゴリズムに選択する付記２５記載のデータ圧縮システム。
（付記２７）
データ圧縮を行うコンピュータに、
逐次的に入力される圧縮対象のデータを、辞書データ記憶手段に記憶されている辞書データを参照しながら圧縮して、圧縮データを出力する圧縮ステップと、
前記圧縮ステップで前記圧縮対象のデータを圧縮する際に参照する前記辞書データの使用頻度を計算する辞書使用頻度監視ステップと、
前記辞書使用頻度監視ステップによる前記使用頻度の計算結果が設定された閾値より低下したときに、辞書の更新を指示する辞書更新タイミング制御ステップと、
前記圧縮ステップで得られた前記圧縮データを、圧縮前のデータに展開するデータ展開ステップと、
前記辞書更新タイミング制御ステップによる前記辞書の更新の指示に従って、前記データ展開ステップで得られた前記圧縮前のデータから新たな辞書データを構築して、前記辞書データ記憶手段に記憶されていた辞書データを、構築した前記新たな辞書データに置き換える辞書構築ステップと
を実行させるデータ圧縮プログラムを記憶するコンピュータプログラム記憶媒体。
（付記２８）
前記辞書使用頻度監視ステップは、前記圧縮ステップで前記圧縮対象のデータを圧縮する際に用いた前記辞書データと、その辞書データ毎の使用回数と、前記圧縮対象のデータの総量とを入力として受け、各辞書データ毎に前記使用回数を前記圧縮対象のデータの総量で除算して、前記使用頻度の計算結果を得る付記２７記載のデータ圧縮プログラムを記憶するコンピュータプログラム記憶媒体。
（付記２９）
前記辞書更新タイミング制御ステップは、前記辞書使用頻度監視ステップにより得られた全ての前記辞書データの使用頻度の計算結果の合計値が前記閾値よりも小さいときに、前記辞書の更新を指示する付記２７又は２８記載のデータ圧縮プログラムを記憶するコンピュータプログラム記憶媒体。
（付記３０）
データ圧縮を行うコンピュータに、
逐次的に入力される圧縮対象のデータを、辞書データ記憶手段に記憶されている辞書データを参照しながら圧縮して、圧縮データを出力する圧縮ステップと、
前記圧縮ステップで前記圧縮対象のデータを圧縮する際に参照する前記辞書データの使用頻度を計算する辞書使用頻度監視ステップと、
前記辞書使用頻度監視ステップによる前記使用頻度の計算結果が閾値より低下したときに、辞書の更新を指示する辞書更新タイミング制御ステップと、
前記圧縮ステップで得られる前記圧縮データを、圧縮前のデータに展開するデータ展開ステップと、
使用するリソースの空き容量を出力するリソース監視ステップと、
前記リソース監視ステップで出力される前記リソースの空き容量に応じて、前記辞書更新タイミング制御ステップで使用する前記閾値の値を可変設定するパラメータ調整ステップと、
前記辞書更新タイミング制御ステップによる前記辞書の更新の指示に従って、前記データ展開ステップで得られる前記圧縮前のデータから新たな辞書データを構築して、前記辞書データ記憶手段に記憶されていた辞書データを、構築した前記新たな辞書データに置き換える辞書構築ステップと
を実行させるデータ圧縮プログラムを記憶するコンピュータプログラム記憶媒体。
（付記３１）
前記パラメータ調整ステップは、前記リソース監視ステップからの前記リソースの空き容量が大きいほど前記辞書更新タイミング制御ステップで使用する前記閾値を小さな値に可変設定し、前記リソースの空き容量が小さいほど前記閾値を大きな値に可変設定する付記３０記載のデータ圧縮プログラムを記憶するコンピュータプログラム記憶媒体。
（付記３２）
データ圧縮を行うコンピュータに、
逐次的に入力される圧縮対象のデータを、辞書データ記憶手段に記憶されている辞書データを参照しながら圧縮して、圧縮データを出力する圧縮ステップと、
前記圧縮ステップで前記圧縮対象のデータを圧縮する際に参照する前記辞書データの使用頻度を計算する辞書使用頻度監視ステップと、
前記辞書使用頻度監視ステップによる前記使用頻度の計算結果が閾値より低下したときに、前記圧縮データのデータ量と前記辞書データ記憶手段に記憶されている前記辞書データのデータ量とに応じた辞書構築コストを見積る辞書構築コスト見積りステップと、
前記辞書使用頻度監視ステップによる前記使用頻度の計算結果が閾値より低下したときに、前記辞書データを更新した際の圧縮率を見積る圧縮率見積りステップと、
前記辞書コストと前記圧縮率とを比較し、その比較結果に応じて新たな辞書の構築を行うか否かを決定する辞書更新タイミング制御ステップと、
前記圧縮ステップから出力される前記圧縮データを、圧縮前のデータに展開するデータ展開ステップと、
前記辞書更新タイミング制御ステップにより前記新たな辞書を構築する決定がなされたときに、前記データ展開ステップで得られる前記圧縮前のデータから新たな辞書データを構築して、前記辞書データ記憶手段に記憶されていた辞書データを、構築した前記新たな辞書データに置き換える辞書構築ステップと
を実行させるデータ圧縮プログラムを記憶するコンピュータプログラム記憶媒体。
（付記３３）
前記圧縮率見積りステップは、前記圧縮データの一部を前記辞書データ記憶手段に記憶されている前記辞書データを用いて展開した圧縮前のデータと、その圧縮前のデータを用いて構築した新たな辞書を用いて圧縮した圧縮後のデータとの比率を前記圧縮率として見積る付記３２記載のデータ圧縮プログラムを記憶するコンピュータプログラム記憶媒体。
（付記３４）
前記辞書更新タイミング制御ステップは、前記辞書コストをスコア化した第１の値と、前記圧縮率をスコア化した第２の値とを比較し、前記第２の値の方が前記第１の値よりも大きいときにのみ、前記新たな辞書の構築を行うことを決定する付記３２又は３３記載のデータ圧縮プログラムを記憶するコンピュータプログラム記憶媒体。
（付記３５）
前記データ展開ステップは、
前記圧縮手段から出力される前記圧縮データを記憶する圧縮データ記憶ステップと、
前記辞書更新タイミング制御ステップによる前記辞書の更新の指示に従って、前記圧縮データ記憶ステップで記憶された前記圧縮データを、前記辞書データ記憶手段に記憶されている前記辞書データを参照しながら展開して圧縮前のデータである展開データを得る圧縮データ展開ステップと、
前記圧縮データ展開ステップにより得られた前記展開データを記憶する展開データ記憶ステップと
を備え、前記辞書構築ステップは、前記圧縮データ展開ステップにより前記圧縮データ記憶ステップで記憶された全ての前記圧縮データの展開終了により動作を開始し、前記展開データ記憶で記憶された前記圧縮前のデータである前記展開データに基づき、前記新たな辞書データを構築する付記２７乃至３４のうち、いずれか一項記載のデータ圧縮プログラムを記憶するコンピュータプログラム記憶媒体。
（付記３６）
データ圧縮を行うコンピュータに、
逐次的に入力される圧縮対象のデータを、辞書データ記憶手段に記憶されている辞書データを参照しながら圧縮して、圧縮データを出力する圧縮ステップと、
前記圧縮ステップで前記圧縮対象のデータを圧縮する際に参照する前記辞書データの使用頻度を各辞書データ毎に計算する辞書使用頻度監視ステップと、
前記辞書使用頻度監視ステップにより計算された各辞書データ毎の前記使用頻度のうち、設定された閾値より低下している使用頻度に対応した辞書データを抽出して出力する辞書更新タイミング制御ステップと、
前記辞書更新タイミング制御ステップにより得られた前記辞書データを用いて、前記圧縮で得られる前記圧縮データの一部を、圧縮前のデータに展開するデータ一部展開ステップと、
前記辞書更新タイミング制御ステップにより得られた前記辞書データを、前記辞書データ記憶手段に記憶されている複数の辞書データの中から削除する辞書データ削除ステップと、
前記データ一部展開ステップから出力される一部の前記圧縮前のデータから新たな辞書データを構築して、この新たな辞書データを前記辞書データ記憶手段に追記する辞書構築ステップと
を実行させるデータ圧縮プログラムを記憶するコンピュータプログラム記憶媒体。
（付記３７）
前記データ一部展開ステップは、
前記圧縮ステップで得られた前記圧縮データを記憶する圧縮データ記憶ステップと、
前記辞書更新タイミング制御ステップで得られた前記辞書データを用いて、前記圧縮データ記憶ステップで記憶された前記圧縮データの一部を展開して一部の圧縮前のデータを得る圧縮データ一部展開ステップと、
前記圧縮データ一部展開ステップにより得られた前記一部の圧縮前のデータを記憶する展開データ記憶ステップと
を備え、
前記辞書構築ステップは、前記圧縮データ一部展開ステップにより前記圧縮データ記憶ステップで記憶された前記圧縮データの一部の展開終了により動作を開始し、前記展開データ記憶ステップで記憶された前記一部の圧縮前のデータに基づき、前記新たな辞書データを構築する付記３６記載のデータ圧縮プログラムを記憶するコンピュータプログラム記憶媒体。
（付記３８）
データ圧縮を行うコンピュータに、
逐次的に入力される圧縮対象のデータを、指定された圧縮アルゴリズムに従い圧縮して、圧縮データを出力する圧縮ステップと、
前記圧縮ステップによる圧縮前のデータに対して、予め設定されている複数の圧縮アルゴリズムのそれぞれにより圧縮して、圧縮前のデータ量と圧縮にかかった時間の比である圧縮コストを、前記複数の圧縮アルゴリズムのそれぞれについて見積る圧縮コスト見積りステップと、
前記圧縮ステップによる圧縮前のデータに対して、予め設定されている前記複数の圧縮アルゴリズムのそれぞれにより圧縮したときの圧縮率を算出する圧縮率見積りステップと、
前記圧縮ステップで前記圧縮対象のデータを圧縮する際に参照する辞書データの使用頻度が閾値より低下したときに、前記圧縮コスト見積りステップで得られた前記圧縮コストと前記圧縮率見積りステップで得られた前記圧縮率とに基づいて、前記複数の圧縮アルゴリズムのうち現在使用している圧縮アルゴリズムよりも圧縮率及び圧縮コストが高い圧縮アルゴリズムを前記圧縮ステップで用いる圧縮アルゴリズムに選択する圧縮アルゴリズム切り替えステップと
を実行させるデータ圧縮プログラムを記憶するコンピュータプログラム記憶媒体。
（付記３９）
前記圧縮アルゴリズム切り替えステップは、前記圧縮ステップにより圧縮される前記圧縮対象のデータのデータ流量を算出し、そのデータ流量が第１の閾値以下、又は前記第１の閾値より大なる第２の閾値以上となったときに、前記圧縮コスト見積りステップで得られた前記圧縮コストと前記圧縮率見積りステップで得られた前記圧縮率とに基づいて、前記データ流量が前記第１の閾値以下のときには前記複数の圧縮アルゴリズムのうち現在使用している圧縮アルゴリズムよりも少なくとも前記圧縮率の高い圧縮アルゴリズムを前記圧縮ステップで用いる圧縮アルゴリズムに選択し、前記データ流量が前記第２の閾値以上のときには前記複数の圧縮アルゴリズムのうち現在使用している圧縮アルゴリズムよりも少なくとも前記圧縮コストの高い圧縮アルゴリズムを前記圧縮ステップで用いる圧縮アルゴリズムに選択する付記３８記載のデータ圧縮プログラムを記憶するコンピュータプログラム記憶媒体。Embodiments according to the present invention will be described below with reference to the drawings.
(First embodiment)
FIG. 1 is a block diagram showing a configuration of a data compression apparatus (data compression system) according to the first embodiment of the present invention. The data compression apparatus 100 includes a control device 101, a first storage unit 103, a second storage unit 104, and a third storage unit 108. The first storage unit 103, the second storage unit 104, and the third storage unit 108 are configured by a storage device. The storage device has a storage medium (for example, a hard disk device, a random access memory (RAM), or the like) that stores (stores) data. In the first embodiment, the first storage unit 103 functions as a compressed data storage unit (compressed data storage unit) that stores compressed data to be described later. The second storage unit 104 functions as a dictionary data storage unit (dictionary data storage unit) that stores dictionary data described later. The third storage unit 108 functions as a development data storage unit (expansion data storage unit) that stores development data to be described later.
The control device 101 is a computer including a CPU (Central Processing Unit), for example. The control device 101 governs the overall operation of the data compression device 100 by appropriately executing various computer programs stored in a computer-readable storage medium.
In the first embodiment, the control device 101 has a function of compressing data generated by the data generation unit 110 using a lexicographic compression method. Furthermore, the control device 101 has a function of generating dictionary data to be referred to when compressing data and a function of updating dictionary data. That is, the control apparatus 101 implement | achieves the following functional blocks by operate | moving according to a computer program. The functional blocks include a compression unit 102, a monitoring unit (dictionary usage frequency monitoring unit) 105, a timing control unit (dictionary update timing unit) 106, a decompression unit (compressed data decompression unit) 107, and a dictionary generation unit ( Dictionary construction means) 109. The data generated by the data generation unit 110 may be any kind of data, but here it is assumed to be character string data for easy understanding of the explanation.
The compression unit (compression unit) 102 has a function (data compression function) for compressing data by a lexicographic compression method based on dictionary data as described later stored in the second storage unit 104. The data to be compressed is data input from the data generation unit 110 or decompressed data as will be described later stored in the third storage unit 108. Note that there are various types of lexical compression methods such as LZW (Lempel-Ziv and Welch) and LZ77 (Lempel-Ziv 77). The type of lexicographic compression method employed by the compression unit 102 is not particularly limited. However, in the first embodiment, in order to make the explanation easy to understand, the compression unit 102 compresses data by a lexicographic compression method called a BPE (Byte Pair Encoding) method.
The BPE method is an algorithm for compressing target data by repeatedly replacing 2-byte data with high appearance frequency with 1-byte data. FIG. 2 is a diagram for explaining an example of data compression by the BPE method. The compression unit 102 compresses data by the BPE method as follows.
For example, the second storage unit 104 holds a plurality of dictionary data as shown in FIG. The dictionary data is data representing the relationship between the character string before replacement and the character after replacement corresponding to the character string. One of the dictionary data shown in FIG. 6 is data representing the relationship between the character string “AB” before replacement and the character “G” after replacement corresponding to the character string. One of the other dictionary data shown in FIG. 6 is data representing the relationship between the character string “DE” before replacement and the character “H” after replacement corresponding to the character string. One of the other dictionary data shown in FIG. 6 is data representing the relationship between the character string “GC” before replacement and the character “I” after replacement corresponding to the character string. Such dictionary data is generated by the dictionary generation unit 109 as described later.
For example, when data D001 which is a character string as shown in FIG. 2 is input, the compression unit 102 refers to the dictionary data in the second storage unit 104 and compresses the data as data D002. That is, the compression unit 102 replaces two adjacent characters “AB” in the data D001 with one character “G” based on the dictionary data. Thus, the compression unit 102 creates data D002 obtained by compressing the data D001. Further, the compression unit 102 replaces two adjacent characters “DE” in the data D002 with one character “H” based on the dictionary data. Accordingly, the compression unit 102 creates data D003 obtained by compressing the data D002. Further, the compression unit 102 replaces two adjacent characters “GC” in the data D003 with one character “I” based on the dictionary data. Thereby, the compression unit 102 creates data D004 obtained by compressing the data D003. Further, the compression unit 102 tries to compress the data D004 in the same manner as described above. However, the data D004 does not have any of the character strings “AB”, “DE”, and “GC” corresponding to the dictionary data. For this reason, the compression unit 102 cannot compress the data D004. Thus, the compression unit 102 ends the process of compressing data (data compression process).
In addition to the data compression function as described above, the compression unit 102 further has a function of writing the finally obtained data D004 into the first storage unit 103. It should be noted that dictionary data is not stored in the second storage unit 104 in the initial operation after the data compression apparatus 100 starts operation. For this reason, the compression unit 102 does not perform the compression operation, and writes the data input from the data generation unit 110 to the first storage unit 103 as it is.
Further, as will be described later, when the dictionary data stored in the second storage unit 104 is replaced with new dictionary data, the compression unit 102 stores all the decompressed data stored in the third storage unit 108. Is compressed based on new dictionary data. Further, the compression unit 102 has a function of deleting all the decompressed data stored in the third storage unit 108 after the completion of the compression of all the decompressed data stored in the third storage unit 108. .
Further, the compression unit 102 creates data (usage status data) relating to the usage status of dictionary data in the data compression process and data (total capacity data) indicating the total capacity of the uncompressed data processed in the data compression process. It has a function. Further, the compression unit 102 has a function of temporarily storing the data and then outputting the data to the monitoring unit 105.
FIG. 7A shows an example of usage status data. FIG. 7B shows an example of total capacity data. In the example of FIG. 7A, the usage status data is data in which dictionary data and the number of uses thereof are paired. According to the usage status data of FIG. 7A, dictionary data that replaces “AB” with “G” is used ten times. The dictionary data for replacing “DE” with “H” has been used 20 times. Furthermore, dictionary data for replacing “GC” with “I” is used five times.
Further, according to FIG. 7B, the total capacity data is 153000 bytes. Note that the timing at which data is output from the compression unit 102 to the monitoring unit 105 may be every time the compression unit 102 compresses the data input from the data generation unit 110, or may be every preset time interval.
In the initial operation after the data compression apparatus 100 starts operation, the compression unit 102 does not perform the data compression process as described above, so only the total capacity of the data input from the data generation unit 110 is sent to the monitoring unit 105. Output.
The dictionary generation unit (dictionary generation unit) 109 has a function of generating dictionary data and a function of storing the generated dictionary data in the second storage unit 104. The procedure for generating dictionary data by the dictionary generation unit 109 is set according to the type of lexicographic compression method employed by the compression unit 102. That is, the type of lexicographic compression method employed in the first embodiment is the BPE method. Therefore, the lexicographic generation unit 109 generates dictionary data according to a procedure according to the BPE method.
For example, when receiving an instruction to start an operation (an operation instruction) from the expansion unit 107 described later, the lexicographic generation unit 109 generates dictionary data based on all data stored in the third storage unit 108. Generate as follows. In the initial operation after the operation of the data compression apparatus 100, the data input from the data generation unit 110 is directly passed through the compression unit 102, the first storage unit 103, and the expansion unit 107, as will be described later. And stored in the third storage unit 108.
That is, first, the lexicographic generation unit 109 counts the number of appearances of two adjacent characters (character strings) in the input data D001 which is a character string as shown in FIG. In the case of data D001, “AB” appears first as two adjacent characters. Thereby, the lexicographic generation unit 109 sets the count value of “AB” to “1”. The next two adjacent characters are “BD”. Thereby, the lexicographic generation unit 109 sets the count value of “BD” to “1”. The next two adjacent characters are “DA”. Thereby, the lexicographic generation unit 109 sets the count value of “DA” to “1”. The next two adjacent characters are “AB”. Since “AB” has already appeared, the lexicographic generation unit 109 increases the count value of “AB” by “1” to “2”. Thereafter, the lexicographic generation unit 109 repeats the same process (count process).
The table shown in FIG. 3 is a table showing the number of appearances (count value) of two adjacent characters in the data D001. The number of appearances (count value) is a value obtained by the lexicographic generation unit 109 performing the above processing. Based on the table shown in FIG. 3, in the data D001, the adjacent two characters having the highest appearance count are “AB”, and the appearance count (count value) of “AB” is four.
Next, the lexicographic generation unit 109, based on the result obtained by the counting process as described above, another character (for example, “G”) corresponding to the adjacent two characters “AB” having the highest number of appearances. Is set as the replacement character. The replacement character (replacement character) is a character not included in the input data D001. This is because if the characters included in the input data D001 are set as replacement characters, the compressed data cannot be restored to the original state. The replacement character may be a number or a symbol.
The lexicographic generation unit 109 creates (generates) data corresponding to the adjacent two characters “AB” and the replacement character “G” as dictionary data.
Further, the lexicographic generation unit 109 replaces “AB” with the replacement character “G” in the input data D001. As a result, the lexicographic generation unit 109 creates data D002.
Next, the lexicographic generation unit 109 counts the number of appearances of two adjacent characters for the data D002 as well as the input data D001. Thereby, the lexicographic generation unit 109 can obtain information as shown in FIG. The table shown in FIG. 4 is a table showing the number of appearances (count value) of two adjacent characters in the data D002. Based on the table shown in FIG. 4, in the data D002, the two adjacent characters having the highest number of appearances are “DE”, and the number of appearances (count value) of “DE” is three. Therefore, the lexicographic generation unit 109 sets a replacement character (“H”) that replaces the adjacent two characters “DE” having the highest number of appearances, as described above. Accordingly, the lexicographic generation unit 109 creates data corresponding to the adjacent two characters “DE” and the replacement character “H” as dictionary data. Further, the lexicographic generation unit 109 creates data D003 by replacing “DE” in the data D002 with a replacement character “H”.
Further, the lexicographic generation unit 109 counts the number of appearances of two adjacent characters (character strings) for the data D003 as well as the input data D001. The table shown in FIG. 5 is a table showing the number of appearances (count value) of two adjacent characters in the data D003. Based on the table shown in FIG. 5, in the data D003, the two adjacent characters having the highest number of appearances are “GC”, and the number of appearances (count value) of “GC” is two. Therefore, the lexicographic generation unit 109 sets a replacement character (“I”) that replaces the two adjacent characters “GC” having the highest number of appearances, as described above. Thereby, the lexicographic generation unit 109 creates data corresponding to the adjacent two characters “GC” and the replacement character “I” as dictionary data. Further, the lexicographic generation unit 109 creates data D004 by replacing “GC” in the data D003 with the replacement character “I”.
Further, the lexicographic generation unit 109 performs the same process as described above for the data D004. That is, the lexicographic generation unit 109 counts the number of appearances of two adjacent characters for the data D004. With this process, the lexicographic generation unit 109 detects that any two adjacent characters appear only once in the data D004. Thereby, the lexicographic generation unit 109 ends the process of counting the number of appearances of two adjacent characters as described above and the process of setting a replacement character, that is, the process of generating dictionary data. The lexicographic generation unit 109 stores the generated new dictionary data (see FIG. 6) in the second storage unit 104. When dictionary data is already stored in the second storage unit 104, the lexicographic generation unit 109 deletes the old dictionary data and then stores new dictionary data in the second storage unit 104. .
Furthermore, the lexicographic generation unit 109 has a function of notifying the compression unit 102 that new dictionary data has been stored in the second storage unit 104 after the new dictionary data has been stored in the second storage unit 104. Have.
The monitoring unit (monitoring unit) 105 has a function of calculating the usage frequency of the dictionary data based on the usage status data and the total capacity data received from the compression unit 102. For example, the monitoring unit 105 calculates the usage frequency of the dictionary data using the following formula (1).
Usage frequency = (Number of times dictionary data is used) ÷ (Total data capacity) (1)
The monitoring unit 105 further has a function of outputting data indicating the calculated use frequency (use frequency data) to the timing control unit 106. FIG. 8 shows an example of usage frequency data output from the monitoring unit 105 to the timing control unit 106. The usage frequency data shown in FIG. 8 is data indicating the relationship between dictionary data and usage frequency.
In the initial operation after the operation of the data compression apparatus 100, the data output from the compression unit 102 to the monitoring unit 105 is only the total capacity data as described above. For this reason, the monitoring unit 105 outputs the total capacity data to the timing control unit 106 as it is without calculating the usage frequency of the dictionary data.
When the timing control unit (timing control unit) 106 detects that the use frequency of the dictionary data is decreasing based on the use frequency data received from the monitoring unit 105, the timing control unit (timing control unit) 106 instructs the expansion unit 107 to start the operation. It has a function to output. Specifically, the timing control unit 106 decreases the usage frequency of the dictionary data when the value (total value) of the usage frequencies of the received usage frequency data is smaller than a preset threshold value. And instructing the development unit 107 to start the operation. Note that the threshold used here may be set appropriately by the user.
In the initial operation after the operation of the data compression apparatus 100, the timing control unit 106 receives not the usage frequency data but the total capacity data from the monitoring unit 105. In such a case, for example, when the timing control unit 106 detects that the capacity value of the total capacity data received from the monitoring unit 105 is equal to or greater than a preset value, the timing control unit 106 starts the operation of the expansion unit 107. Instruct. Alternatively, the timing control unit 106 may instruct the development unit 107 to start operation when the number of times of receiving the total capacity data from the monitoring unit 105 reaches a preset number. Alternatively, the timing control unit 106 may instruct the development unit 107 to start an operation after a preset time has elapsed after first receiving the total capacity data from the monitoring unit 105.
The expansion unit (expansion unit) 107 has a function of expanding the compressed data stored in the first storage unit 103 when receiving an instruction to start an operation from the timing control unit 106. Note that the method in which the decompressing unit 107 decompresses the compressed data is a method corresponding to the compression method employed by the compressing unit 102. In the first embodiment, since the BPE method is used, the expansion unit 107 expands data based on the dictionary data stored in the second storage unit 104 as follows.
For example, here, the expansion unit 107 expands (data expands) compressed data T012 as illustrated in FIG. 9A with reference to the dictionary data illustrated in FIG. 9B. Note that the compressed data T012 in FIG. 9A corresponds to the data D004 in FIG. The dictionary data in FIG. 9B corresponds to the dictionary data in FIG. 10 to 12 are diagrams for explaining the flow of data development.
The expansion unit 107 first prepares a buffer 111 (see FIG. 10) having an appropriate size. Then, the decompressing unit 107 inputs the top data “G” of the compressed data T012 to the top of the buffer 111 as B001 shown in FIG. The head data “G” in the buffer 111 corresponds to “AB” when the dictionary data is referred to. Therefore, the expansion unit 107 replaces “G” in the buffer 111 with “AB” as shown in B002 in FIG. Next, the two characters (character string) corresponding to the data “A” at the head of the buffer 111 are not registered in the dictionary data. In such a case, the expansion unit 107 extracts the data “A” from the buffer 111 as output data. As a result, the top data of the buffer 111 becomes “B” as shown in B003 in FIG. Further, the output data is “A” as shown by T101 in FIG.
Two characters (character string) corresponding to the data “B” at the head of the buffer 111 are not registered in the dictionary data. From this, the expansion unit 107 extracts the data “B” from the buffer 111 as output data in the same manner as described above. As a result, the output data becomes “AB” as indicated by T102 in FIG.
By taking out “B”, the buffer 111 becomes empty. As a result, the decompressing unit 107 inputs the second data “D” of the compressed data T012 to the buffer 111 as indicated by B004 in FIG. Two characters (character string) corresponding to the data “D” are not registered in the dictionary data. From this, similarly to the above, the expansion unit 107 extracts the data “D” from the buffer 111 as output data. As a result, the output data becomes “ABD” as shown by T103 in FIG.
By taking out “D”, the buffer 111 becomes empty again. As a result, the decompressing unit 107 inputs the third data “I” of the compressed data T012 to the buffer 111 as shown in B005 in FIG. The data “I” corresponds to “GC” with reference to the dictionary data. Therefore, the expansion unit 107 replaces “I” in the buffer 111 with “GC” as shown in B006 in FIG.
Further, the expansion unit 107 replaces the leading data “G” in the buffer 111 with data “AB” as shown in B007 shown in FIG. 12 based on the dictionary data, as described above. As a result, the top data of the buffer 111 is data “A”. Two characters (character string) corresponding to the data “A” are not registered in the dictionary data. For this reason, the expansion unit 107 extracts the data “A” from the buffer 111 as output data in the same manner as described above. As a result, the output data becomes “ABDA” as shown by T104 in FIG.
By extracting the data “A”, the top data in the buffer 111 becomes data “B” as shown in B008 in FIG. Two characters (character string) corresponding to the data “B” are not registered in the dictionary data. Therefore, the expansion unit 107 extracts the data “B” from the buffer 111 as output data, as described above. As a result, the output data becomes “ABDAB” as shown by T105 in FIG.
By extracting the data “B”, the top data of the buffer 111 becomes data “C” as shown in B009 shown in FIG. The expansion unit 107 expands the data T012 by repeating the above processing (operation). That is, the data (output data) extracted from the buffer 111 is the expanded data.
The expansion unit 107 has a function of storing the data (expansion data) expanded as described above in the third storage unit 108. Furthermore, the decompressing unit 107 has a function of deleting all the compressed data stored in the first storage unit 103 after decompressing all the compressed data stored in the first storage unit 103. Yes. Further, the expansion unit 107 instructs the dictionary generation unit 109 to start the operation after deleting the compressed data in the first storage unit 103 and storing the expanded data in the third storage unit 108. It has a function to output. Thereby, the dictionary generation unit 109 receives the instruction as described above, and generates new dictionary data using the expanded data stored in the third storage unit 108.
Note that, as described above, the second storage unit 104 does not hold dictionary data in the initial operation after the data compression apparatus 100 starts operation. If the dictionary data is not stored in the second storage unit 104, the compression unit 102 does not perform an operation (process) for compressing the data. In this case, the data stored in the first storage unit 103 is uncompressed data (uncompressed data). Furthermore, the expansion unit 107 cannot expand the data unless the dictionary data is stored in the second storage unit 104. Therefore, the decompressing unit 107 has a function of moving the uncompressed data stored in the first storage unit 103 to the third storage unit 108 as it is.
In the first embodiment, the data stored in the third storage unit 108 when the dictionary generation unit 109 starts the operation of generating dictionary data is the initial operation after the data compression apparatus 100 starts operation, Alternatively, it is data stored since the previous dictionary data was generated. Since the dictionary generation unit 109 generates dictionary data using such data, it is possible to generate dictionary data with the highest data compression rate when the dictionary data is generated.
Actually, it takes time to generate the dictionary data. For this reason, data may be input from the data generation unit 110 while the dictionary data is being generated. In this case, if the compression unit 102 compresses the input data based on the old dictionary data stored in the second storage unit 104, the following inconvenience occurs. That is, the compressed data compressed based on the old dictionary data while generating new dictionary data does not match the compressed data compressed based on the new dictionary data after generating new dictionary data. Therefore, in the first embodiment, the compression unit 102 is configured not to compress data while generating dictionary data. Specifically, the compression unit 102 has a buffer (not shown). In addition, the compression unit 102 accumulates data input from the data generation unit 110 in the buffer during the period in which the dictionary generation unit 109 is operating. Then, as described above, after the dictionary generation unit 109 stores the generated new dictionary data in the second storage unit 104, the compression unit 102 refers to the data stored in the buffer with the new dictionary data. Compress while.
Hereinafter, an operation example (data compression method) of the data compression apparatus 100 according to the first embodiment will be described with reference to the flowchart of FIG. The flowchart of FIG. 13 shows the processing procedure of the computer program executed by the control device (CPU) 101 in the data compression device 100. The computer program is stored in a storage medium (for example, a non-volatile storage medium) of a storage device such as a memory or a hard disk included in the data compression apparatus 100. Further, for example, the computer program is stored in a portable storage medium such as a compact disk (CD) or a memory card, and then transferred from the portable storage medium to the storage medium of the storage device of the data compression apparatus 100. May be stored.
First, when data is input from the data generation unit 110 (step S010), the compression unit 102 compresses the input data based on the dictionary data stored in the second storage unit 104 (step S020). In addition, the compression unit 102 stores compressed data, which is data after compression, in the first storage unit 103. Subsequently, the monitoring unit 105 uses the usage status data output from the compression unit 102 (data indicating the dictionary data used for compression and the number of uses thereof), and the total capacity data (input from the data generation unit 110 to the compression unit 102). The dictionary use frequency is calculated based on the data indicating the total capacity of the data (step S030). Then, the monitoring unit 105 outputs the calculation result to the timing control unit 106.
Next, the timing control unit 106 determines whether or not the value (total value) obtained by totaling the usage frequencies input from the monitoring unit 105 has decreased below a set threshold value (step S040). The timing control unit 106 does nothing if the total value is not less than or equal to the threshold value (NO in step S040). Then, the data compression apparatus 100 repeats the operations after step S020. On the other hand, when the total value is equal to or less than the threshold (YES in step S040), the timing control unit 106 instructs the development unit 107 to start the operation.
As a result, the decompression unit 107 decompresses the compressed data stored in the first storage unit 103 while referring to the dictionary data stored in the second storage unit 104, and decompresses the decompressed data that is the decompressed data. It stores in the 3rd memory | storage part 108 (step S050). Subsequently, the expansion unit 107 deletes the compressed data stored in the first storage unit 103 and instructs the dictionary generation unit 109 to start operation.
Thereby, the dictionary generation unit 109 generates dictionary data from the decompressed data stored in the third storage unit 108 according to a predetermined type of lexicographic compression method (BPE method in the first embodiment). Then, the dictionary generation unit 109 stores (updates) the generated new dictionary data in the second storage unit 104 (step S060). That is, the dictionary generation unit 109 generates new dictionary data. Then, the dictionary generation unit 109 deletes the old dictionary data stored so far in the second storage unit 104 and stores newly generated dictionary data in the second storage unit 104. Further, the dictionary generation unit 109 stores (updates) new dictionary data in the second storage unit 104 and then instructs the compression unit 102 to compress the decompressed data.
Thereby, the compression unit 102 compresses the decompressed data stored in the third storage unit 108 while referring to the new dictionary data stored in the second storage unit 104 (step S070). Then, the compression unit 102 stores compressed data, which is data after compression, in the first storage unit 103. Thereafter, the data compression apparatus 100 repeats the operations after step S020.
Note that when the data compression apparatus 100 is in the initial operation (that is, the dictionary data is not stored in the second storage unit 104), the data compression apparatus 100 operates as follows.
For example, when data is input from the data generation unit 110, the compression unit 102 stores the data in the first storage unit 103 as it is and outputs data indicating the total capacity of the data (total capacity data) to the monitoring unit 105. To do. The monitoring unit 105 outputs the total capacity data as it is to the timing control unit 106. The timing control unit 106 is a development unit at a timing based on the total capacity input from the data generation unit 110, the number of times the total capacity data is input, the elapsed time since the first input of the total capacity data, or the like. An operation start is instructed to 107. Thereby, the expansion unit 107 stores the data stored in the first storage unit 103 in the third storage unit 108 as it is. Then, the dictionary generation unit 109 generates dictionary data using the data stored in the third storage unit 108 and stores the generated new dictionary data in the second storage unit 104. Thereafter, the dictionary generation unit 109 instructs the compression unit 102 to start operation. Thus, the compression unit 102 compresses the data stored in the third storage unit 108 and stores the compressed data (compressed data) in the first storage unit 103. With this operation, after the dictionary data is generated and stored in the second storage unit 104, the data compression apparatus 100 repeats the operation as shown in the flowchart of FIG.
As described above, according to the first embodiment, the data compression apparatus 100 includes the monitoring unit 105 and the timing control unit 106. The monitoring unit 105 monitors how much the compression unit 102 is using dictionary data stored in the second storage unit 104 when the compression unit 102 compresses data input from the data generation unit 110. Then, the timing control unit 106 has decreased the frequency of use of dictionary data based on the monitoring result of the monitoring unit 105 (in other words, the effectiveness of the dictionary data stored in the second storage unit 104 has decreased). In this case, the dictionary generation unit 109 is instructed to generate new dictionary data. The new dictionary data generated by the dictionary generation unit 109 in response to this instruction is data based on all the expanded data stored in the third storage unit 108. Therefore, the new dictionary data is the dictionary data having the highest use frequency (that is, the compression effect is high) at the time when the dictionary data is generated.
As described above, the data compression apparatus 100 according to the first embodiment can generate dictionary data with increased effectiveness when the use frequency (validity) of dictionary data decreases, and can store dictionary data used for compression. The new dictionary data can be updated. Thereby, when the tendency of the data input from the data generation part 110 changes, the data compression apparatus 100 produces | generates new dictionary data based on the data after a tendency changes, and based on the said new dictionary data Data can be compressed. Therefore, the data compression apparatus 100 can always efficiently compress the data even when the tendency of the data input from the data generation unit 110 changes. In other words, the data compression apparatus 100 can maintain an efficient compression rate.
(Second Embodiment)
A second embodiment according to the present invention will be described below with reference to the drawings.
FIG. 14 is a block diagram illustrating the configuration of the data compression apparatus according to the second embodiment. In the description of the second embodiment, the same components as those in the first embodiment are denoted by the same reference numerals, and redundant description thereof is omitted.
In addition to the configuration of the data compression apparatus 100 of the first embodiment, the data compression apparatus (data compression system) 200 of the second embodiment further includes a resource monitoring unit (resource monitoring unit) 201 and an adjustment unit (parameter adjustment unit). 202. The data compression apparatus 200 includes a timing control unit (dictionary update timing control unit) 203 instead of the timing control unit 106 shown in the first embodiment.
The data compression apparatus 100 according to the first embodiment described above can maintain an efficient compression rate by updating the dictionary data when the usage frequency of the dictionary data decreases. Thereby, the data compression apparatus 100 can suppress the usage amount of a storage medium such as a memory or a disk (for example, a storage medium constituting the third storage unit 108 or the like). On the other hand, since the operation of updating (generating) dictionary data increases the load on the data compression apparatus 100, it is not desirable to update the dictionary data frequently. For this reason, when there is a margin in the capacity of the memory or the disk, it may be better to give priority to lowering the load on the data compression apparatus 100 than to increase the compression rate.
Considering this, the data compression apparatus 200 of the second embodiment includes the resource monitoring unit 201 and the adjustment unit 202 as described above.
The resource monitoring unit 201 has a function of monitoring (detecting) a free capacity of a resource (for example, a storage medium constituting the third storage unit 108) and outputting a monitoring result to the adjustment unit 202. The timing at which the resource monitoring unit 201 outputs the monitoring result to the adjustment unit 202 is, for example, every predetermined time interval.
The adjustment unit (adjustment unit) 202 has a function of adjusting the threshold used by the timing control unit 203 based on the monitoring result received from the resource monitoring unit 201. That is, if the monitoring result indicates that there is a sufficient free space in the resource, the resource is not immediately compressed even if the effectiveness (usage frequency) of the dictionary data is low. Thus, the adjustment unit 202 adjusts (sets) the threshold value to a small value so that the dictionary data update frequency is low. On the other hand, if the monitoring result indicates that there is no room for the free space of the resource, the resource is immediately compressed when the effectiveness of the dictionary data becomes low. Therefore, the adjustment unit 202 adjusts (sets) the threshold value to a large value so that the dictionary data update frequency increases.
More specifically, for example, the adjustment unit 202 adjusts (sets) the threshold value to “0.5” when the free capacity (remaining capacity) of the resource is 80% or more. The adjustment unit 202 adjusts the threshold value to “0.7” when the free space of the resource is less than 80% and 50% or more. Furthermore, the adjustment unit 202 sets the threshold value to “1.0” when the free capacity of the resource is less than 50%. As described above, the adjustment unit 202 may set the threshold value stepwise in accordance with the free capacity of the resource. Alternatively, the adjustment unit 202 may continuously set the threshold according to the free capacity of the resource. Note that any method may be employed as the method by which the adjustment unit 202 sets the threshold as long as the threshold is set according to the free capacity of the resource.
The timing control unit 203 has a function of determining the timing for updating the dictionary data, using the threshold value set as described above, similarly to the timing control unit 106 in the first embodiment. The timing control unit 203 has the same function as the timing control unit 106 except that the threshold set by the adjustment unit 202 is used.
As described above, the data compression apparatus 200 according to the second embodiment can change the update timing of the dictionary data according to the free capacity of the resource. Thereby, when there is a sufficient resource, the data compression apparatus 200 can reduce the number of times of updating the dictionary data (dictionary update number), and thus the load on the apparatus can be reduced. On the other hand, when there is no room for resources, the data compression apparatus 200 can increase the number of times the dictionary is updated, so that the amount of resources used (consumption) can be suppressed.
Next, operation examples of the resource monitoring unit 201 and the adjustment unit 202 will be described with reference to FIGS. 15 and 16. FIG. 15 is a flowchart illustrating exemplary operations of the resource monitoring unit 201 and the adjustment unit 202. FIG. 16 is a flowchart illustrating a more specific operation example of the adjustment unit 202. The operations shown in the flowcharts of FIGS. 15 and 16 are executed asynchronously with the operation shown in the flowchart of FIG.
15 and 16 show the processing procedure of the computer program executed by the control device (CPU) 101 in the data compression apparatus 200, similarly to the flowchart of FIG. The computer program is stored in a storage medium (for example, a non-volatile storage medium) of a storage device such as a memory or a hard disk included in the data compression apparatus 200 as described above. Further, for example, the computer program is stored in a portable storage medium such as a compact disk (CD) or a memory card, and then transferred from the portable storage medium to the storage medium of the data compression apparatus 200. May be stored.
For example, the resource monitoring unit 201 monitors the free capacity of resources such as a computer memory and a disk included in the data compression apparatus 200 at predetermined time intervals, and outputs the monitoring result to the adjustment unit 202 (see FIG. 15 step S110). Subsequently, based on the monitoring result, the adjustment unit 202 sets the threshold used by the timing control unit 203 to a small value when the free capacity of the resource is large. On the contrary, the adjustment unit 202 sets the threshold value to a large value when the resource free capacity is small (step S120).
Here, an operation example for setting the threshold value will be described with reference to FIG. The operation in step S110 shown in FIG. 16 is the same as the operation in step S110 shown in FIG.
For example, the adjustment unit 202 determines whether the free capacity of the resource is 80% or more based on the monitoring result received from the resource monitoring unit 201 (step S121). If the free space of the resource is 80% or more, the adjustment unit 202 determines that the free space of the resource has a margin, and sets the threshold used by the timing control unit 203 to “0.5” (step S123). ).
If the adjustment unit 202 determines in step S121 that the resource free capacity is not 80% or more, the adjustment unit 202 determines whether the resource free capacity is 50% or more (step S122). If the free capacity of the resource is 50% or more, the adjustment unit 202 sets the threshold value to “0.7” (step S124).
If the adjustment unit 202 determines in step S122 that the free space of the resource is not 50% or more, the adjustment unit 202 determines that there is no room in the free space of the resource, and sets the threshold to “1.0” (step 1). S125). As described above, in the second embodiment, the adjustment unit 202 adjusts (sets) the threshold according to the free capacity of the resource.
Note that the flowchart (example of operation) shown in FIG. 16 only shows an example of a method for adjusting the threshold value, and the method for adjusting the threshold value is not limited to the method described above.
For example, the step of adjusting the threshold value may be more steps than the three steps of 80% or more, less than 80%, 50% or more, and less than 50%. Further, the threshold value to be adjusted is not limited to “0.5”, “0.7”, and “1.0”.
(Third embodiment)
A third embodiment according to the present invention will be described below with reference to the drawings.
FIG. 17 is a block diagram showing the configuration of the data compression apparatus in the third embodiment. In the description of the third embodiment, the same components as those in the first embodiment are denoted by the same reference numerals, and redundant description thereof is omitted.
In addition to the configuration of the data compression apparatus 100 of the first embodiment, the data compression apparatus (data compression system) 300 of the third embodiment further includes a cost estimation unit (dictionary construction cost estimation unit) 301 and a compression rate estimation unit ( Compression rate estimating means) 302. In addition, the data compression apparatus 300 includes a timing control unit (dictionary update timing control means) 303 as described later, instead of the timing control unit 106 in the data compression apparatus 100 of the first embodiment.
As described in the second embodiment, the data compression apparatus 100 according to the first embodiment can maintain an efficient compression ratio by updating the dictionary data when the use frequency of the dictionary data is reduced. Thereby, the data compression apparatus 100 can suppress the usage amount of a storage medium such as a memory or a disk. On the other hand, the operation of updating the dictionary data increases the load on the data compression apparatus 100, so it is not desirable to update the dictionary data frequently. For this reason, when there is a margin in the capacity of the memory or the disk, it may be better to give priority to lowering the load on the data compression apparatus 100 than to increase the compression rate.
Considering this, the data compression apparatus 300 according to the third embodiment includes a cost estimation unit 301 and a compression rate estimation unit 302. And the data compression apparatus 300 changes the timing which updates dictionary data with the method different from 2nd Embodiment.
The cost estimation unit (cost estimation means) 301 has a function of estimating (estimating) a cost required for newly generating dictionary data. That is, when the cost estimation unit 301 receives an instruction to start an operation from the timing control unit (timing control unit) 303, the cost estimation unit 301 starts the operation. The timing at which the instruction is received is the timing at which the timing control unit 303 determines that the use frequency of the dictionary data has decreased due to the same operation as the timing control unit 106 in the first embodiment. The cost estimation unit 301 newly generates dictionary data based on the data amount of compressed data stored in the first storage unit 103 and the data amount of dictionary data stored in the second storage unit 104. Estimate the cost required.
As described in the first embodiment, in order to generate a dictionary, first, the decompression unit 107 stores all the compressed data stored in the first storage unit 103 in the second storage unit 104. The dictionary data is expanded with reference to existing dictionary data and stored in the third storage unit 108. Subsequently, the dictionary generation unit 109 counts the number of appearances of two adjacent characters for all the expanded data stored in the third storage unit 108, and the adjacent 2 with the largest count number. The operation (process) of replacing a character with another character is repeated. Such an operation is performed until the number of appearances of two adjacent characters in the data is all one. For this reason, the cost required for generating a dictionary is a cost proportional to the amount of data stored in the first storage unit 103 and the amount of dictionary data stored in the second storage unit 104.
For this reason, the cost estimation unit 301 calculates the cost required for generating dictionary data using the following equation (2), for example.
C = W1 × D1 + W2 × D2 (2)
In Equation (2), C represents the cost required for generating dictionary data. D1 indicates the amount of compressed data stored in the first storage unit 103. D2 indicates the amount of dictionary data stored in the second storage unit 104. Further, W1 and W2 indicate weight constants. These W1 and W2 are set to appropriate values.
When the cost estimation unit 301 finishes calculating the cost C, the cost estimation unit 301 outputs the cost C as a calculation result to the timing control unit 303.
Formula (2) is an example of a calculation formula for calculating the cost. The cost estimation unit 301 uses the data amount of compressed data stored in the first storage unit 103 and the data amount of dictionary data stored in the second storage unit 104 to calculate the cost. Any method may be adopted as long as it is.
The compression rate estimation unit 302 has a function of estimating (estimating) the compression rate after updating to new dictionary data. That is, the compression rate estimation unit 302 starts the operation when receiving an instruction to start the operation from the timing control unit 303. The timing at which the instruction is received is the timing at which the timing control unit 303 determines that the use frequency of the dictionary data has decreased as described above. The compression rate estimation unit 302 expands a part of the compressed data stored in the first storage unit 103 using the dictionary data stored in the second storage unit 104.
The data amount developed by the compression rate estimation unit 302 may be a predetermined fixed data amount, or a data amount corresponding to 10% of the total data amount stored in the first storage unit 103. The amount of data proportional to the total amount of data in the first storage unit 103 may be used.
Then, the compression rate estimation unit 302 generates new dictionary data using the expanded data, and compresses the expanded data used when generating the dictionary data using the new dictionary data. Then, the compression rate estimation unit 302 calculates the compression rate of the compressed data. Here, the compression ratio is an index indicating how much the data amount after compression is smaller than the data amount before compression. For example, when the data amount before compression is 100 MB and the data amount after compression is 50 MB, the compression rate is 50%. After calculating the compression rate, the compression rate estimation unit 302 outputs data indicating the compression rate to the timing control unit 303.
In the third embodiment, as described above, the compression rate estimation unit 302 expands only a part of the data, not all the data stored in the first storage unit 103. Furthermore, the compression rate estimation unit 302 generates dictionary data based on the expanded data, and further calculates the compression rate (dictionary data is evaluated). Thereby, the compression rate estimation unit 302 can estimate (calculate) the compression effect (compression rate) without imposing a heavy load on the data compression apparatus 300.
Unlike the first embodiment, when the timing control unit 303 determines that the use frequency of the dictionary data has been reduced in the same manner as described above, the timing control unit 303 is not the expansion unit 107 but the cost estimation unit 301. And output to the compression rate estimation unit 302.
In addition, the timing control unit 303 determines whether to update (generate) dictionary data based on the cost calculated by the cost estimation unit 301 and the compression rate calculated by the compression rate estimation unit 302.
The determination can be made using, for example, data obtained by scoring the cost and data obtained by scoring the compression rate. Here, the following expression (3) is an expression for calculating data obtained by scoring the cost. Expression (4) is an expression for calculating data obtained by scoring the compression ratio.
S1 = W3 × C (3)
S2 = W4 × R (4)
In Expression (3), S1 represents data obtained by scoring costs. W3 represents a weight constant. C represents the cost calculated by the equation (2). Moreover, in Formula (4), S2 shows the data which scored the compression rate. W4 represents a weight constant. R represents the compression rate calculated by the compression rate estimation unit 302. Note that the method of calculating the scored data using the equations (3) and (4) is an example of a method of calculating the scored data. The method for calculating the scored data is not limited to the above method.
The timing control unit 303 generates new dictionary data when the data (data obtained by scoring the compression rate) S2 has a larger value than the data (data obtained by scoring the cost) S1 (dictionary data). Is updated).
In other words, when the data (data that scores the cost) S1 is larger than the data (data that scores the compression rate) S2, the timing for newly generating dictionary data (updating dictionary data) is not. As a result, the data compression apparatus 300 does not perform a heavy load operation (processing) of updating dictionary data even though the compression rate is small. For this reason, the data compression apparatus 300 can reduce the load of the apparatus and eliminate waste.
Hereinafter, an operation example of the data compression apparatus 300 according to the third embodiment will be described with reference to the flowchart of FIG. In FIG. 18, the same reference numerals as those in FIG. 13 are given to portions showing the same operations as those shown in FIG. 13, and the duplicate description thereof is omitted.
The flowchart of FIG. 18 shows the processing procedure of the computer program executed by the control device (CPU) 101 in the data compression apparatus 300, similarly to the flowchart of FIG. As described above, the computer program is stored in a storage medium (for example, a non-volatile storage medium) of a storage device such as a memory or a hard disk included in the data compression apparatus 300. Further, the computer program is stored in a portable storage medium such as a compact disk (CD) or a memory card, for example, and then transferred from the portable storage medium to the storage medium of the data compression apparatus 300. May be stored.
In step S043 in FIG. 18, the timing control unit 303 determines whether the total value of the usage frequencies received from the monitoring unit 105 has decreased below a set threshold value. Then, the timing control unit 303 does nothing if the total value of the usage frequencies is not less than or equal to the threshold (NO in step S043). Then, the data compression apparatus 300 repeats the operations after step S020. On the other hand, when the total value of the usage frequencies is equal to or less than the threshold value (YES in step S043), the timing control unit 303 instructs the cost estimation unit 301 and the compression rate estimation unit 302 to start operations. Put out.
When receiving an instruction from the timing control unit 303, the cost estimation unit 301 calculates the cost, and the compression rate estimation unit 302 calculates the compression rate (step S210). That is, in step S210, the cost estimation unit 301, for example, based on the data amount of compressed data stored in the first storage unit 103 and the data amount of dictionary data stored in the second storage unit 104, for example The cost C is estimated (calculated) using the formula (2). Then, the cost estimation unit 301 outputs the cost C to the timing control unit 303. On the other hand, the compression rate estimation unit 302 expands a part of the compressed data stored in the first storage unit 103 using the dictionary data stored in the second storage unit 104. Then, the compression rate estimation unit 302 generates new dictionary data using the expanded data, and further uses the expanded data used when generating the new dictionary data using the new dictionary data. Compress. Further, the compression rate estimation unit 302 uses the compressed data (compressed data) and the data before compression to compress the compression rate (compression rate = (data amount after compression) / (data before compression). Amount)). Then, the compression rate estimation unit 302 outputs the calculated compression rate to the timing control unit 303.
Subsequently, the timing control unit 303 scores the cost C received from the cost estimation unit 301 using Expression (3). That is, the timing control unit 303 generates scored data S1. In addition, the timing control unit 303 scores the compression rate R received from the compression rate estimation unit 302 using Expression (4). That is, the timing control unit 303 generates scored data S2. Then, the timing control unit 303 compares the data S1 and the data S2, and determines whether or not the compression rate is greater than the cost (step S220). The timing control unit 303 does nothing when the compression rate is less than the cost (NO in step S220). Then, the data compression apparatus 300 repeats the operations after step S020. On the other hand, when the compression rate is greater than the cost (YES in step S220), the timing control unit 303 issues an instruction to start the operation to the expansion unit 107. The data compression apparatus 300 repeats the operations in and after step S020 after performing the operations in steps S050, S060, and S070 described above.
Since the data compression apparatus 300 according to the third embodiment can change the update timing of the dictionary data as described above, it performs an operation with a large load (an operation to update the dictionary data) even though the compression effect is small. Nothing will happen. Thereby, in addition to the effect of 1st Embodiment, the data compression apparatus 300 can reduce the load of an apparatus and the effect that a waste can be eliminated is acquired.
(Fourth embodiment)
A fourth embodiment according to the present invention will be described below with reference to the drawings.
FIG. 19 is a block diagram showing the configuration of the data compression apparatus according to the fourth embodiment. In the description of the fourth embodiment, the same components as those in the first embodiment are denoted by the same reference numerals, and redundant description thereof is omitted.
The data compression apparatus (data compression system) 400 according to the fourth embodiment further includes a deletion unit (dictionary data deletion unit) 403 in addition to the configuration of the first embodiment. Further, the data compression apparatus 400 includes a partial expansion unit (compressed data partial expansion unit) 402 instead of the expansion unit 107 illustrated in the first embodiment, and the timing control unit illustrated in the first embodiment. Instead of 106, a timing control unit (dictionary update timing control means) 401 is provided.
The data compression apparatus 100 according to the first embodiment described above can always have dictionary data with a high compression rate by updating the dictionary data when the frequency of use of the dictionary data has decreased. However, when updating the dictionary data, the data compression apparatus 100 recreates all the dictionary data every time and updates the dictionary data stored in the second storage unit 104. For this reason, the following problems occur. In other words, even if the usage frequency of the entire dictionary data is decreasing, the usage frequency may not be so low or the usage frequency may be increasing for some dictionary data. In such a case, it is useless to recreate all dictionary data. In addition, the operation of updating dictionary data is expensive. Considering these things, the data compression apparatus does not recreate all dictionary data every time the dictionary data is updated, but only recreates a part of the dictionary data. It is preferable to recreate only the dictionary data. As described above, when only a part of the dictionary data is recreated, the data compression apparatus 100 can reduce the load and increase the processing speed for updating the dictionary data.
Therefore, the data compression apparatus 400 of the fourth embodiment deletes only dictionary data whose usage frequency has dropped below a predetermined set value at the time of updating dictionary data, and new dictionary data is stored as described below. Generate.
For example, as described above, when it is determined that the dictionary data is to be updated, the timing control unit 401 instructs the partial expansion unit 402 to start the operation. At this time, the timing control unit 401 extracts dictionary data whose usage frequency has dropped below a set value based on the usage frequency data as shown in FIG. Then, the timing control unit 401 outputs the extracted dictionary data to the partial expansion unit 402 and the deletion unit 403. Note that the setting value used when extracting the dictionary data as described above may be a fixed value or may be arbitrarily changed during operation.
The partial expansion unit (partial expansion means) 402 has a function of expanding a part of the compressed data stored in the first storage unit 103. That is, the partial expansion unit 402 expands the compressed data stored in the first storage unit 103 using the dictionary data received from the timing control unit 401 and stores the expanded data in the third storage unit 108. . Note that the dictionary data output from the timing control unit 401 to the partial expansion unit 402 is basically a part of the dictionary data stored in the second storage unit 104. For this reason, the partial expansion unit 402 expands only a part of the compressed data stored in the first storage unit 103.
Further, the dictionary data output from the timing control unit 401 to the partial expansion unit 402 is dictionary data whose use frequency is reduced as described above. For this reason, the partial decompression unit 402 decompresses only the compressed data that is not so high in the compressed data stored in the first storage unit 103 and does not decompress the compressed data that has a high compression rate.
When the partial decompression unit 402 finishes the operation (process) for decompressing the compressed data as described above, the partial decompression unit 402 deletes all the compressed data stored in the first storage unit 103. In addition, when the partial expansion unit 402 finishes storing the expanded data in the third storage unit 108, the partial expansion unit 402 outputs an instruction to start the operation to the dictionary generation unit 109.
The deletion unit (deletion unit) 403 has a function of deleting dictionary data corresponding to the dictionary data received from the timing control unit 401 from the second storage unit 104.
When the dictionary generation unit 109 receives the instruction from the partial expansion unit 402, the dictionary generation unit 109 generates dictionary data based on the expansion data stored in the third storage unit 108 as described above. Then, the dictionary generation unit 109 stores (appends) the newly generated dictionary data in the second storage unit 104. Functions other than the above function (operation) in the dictionary generation unit 109 are the same as those in the first embodiment.
As described above, the data compression apparatus 400 according to the fourth embodiment can delete only dictionary data whose use frequency is reduced and can update only a part of the dictionary data. For this reason, the data compression apparatus 400 can reduce the load on the apparatus and increase the processing speed for updating the dictionary data as compared with the case of recreating all dictionary data.
Hereinafter, an operation example (data compression method) of the data compression system 400 of the fourth embodiment will be described with reference to the flowchart of FIG. In FIG. 20, the same reference numerals as those in FIG. 13 are given to the portions showing the same operations as those shown in FIG.
The flowchart of FIG. 20 shows the processing procedure of the computer program executed by the control device (CPU) 101 in the data compression apparatus 400, similarly to the flowchart of FIG. The computer program is stored in a storage medium (for example, a non-volatile storage medium) of a storage device such as a memory or a hard disk included in the data compression apparatus 400 as described above. Further, for example, the computer program is stored in a portable storage medium such as a compact disk (CD) or a memory card, and then transferred from the portable storage medium to the storage medium of the data compression apparatus 400. May be stored.
In step S045 of FIG. 20, the timing control unit 401 determines whether or not the total value of the usage frequencies received from the monitoring unit 105 has decreased below a set threshold value. Then, the timing control unit 401 does nothing if the total value of the usage frequencies is not less than or equal to the threshold (NO in step S045). Then, the data compression apparatus 400 repeats the operations after step S020. On the other hand, when the total value of the usage frequencies is equal to or less than the threshold value (YES in step S045), the timing control unit 401 extracts dictionary data whose usage frequency has decreased below the set value based on the usage frequency data received from the monitoring unit 105. To do. Then, the timing control unit 401 outputs the extracted dictionary data to the partial expansion unit 402 and issues an instruction to start the operation. The timing control unit 401 also outputs the extracted dictionary data to the deletion unit 403 and issues an instruction to start the operation.
Subsequently, upon receiving the instruction from the timing control unit 401, the partial expansion unit 402 expands a part of the compressed data (step S310). That is, in this step S310, the partial decompression unit 402 performs timing only on the compressed data stored in the first storage unit 103 and compressed with dictionary data whose use frequency has dropped below the set value. Expansion is performed using the dictionary data received from the control unit 401. Then, the partial expansion unit 402 stores the expanded data in the third storage unit 108.
In step S <b> 310, when the partial decompression unit 402 finishes decompressing the compressed data stored in the first storage unit 103, the partial decompression unit 402 deletes all the compressed data stored in the first storage unit 103. In addition, when the partial expansion unit 402 finishes storing the expanded data in the third storage unit 108, the partial expansion unit 402 instructs the dictionary generation unit 109 to start the operation.
On the other hand, the deletion unit 403 deletes data corresponding to the dictionary data received from the timing control unit 401 from the second storage unit 104 (step S320). Subsequently, the dictionary generation unit 109 generates dictionary data and updates the dictionary data in the second storage unit 104 (step S065). In step S065, the dictionary generation unit 109 generates dictionary data based on the expanded data stored in the third storage unit 108. When the dictionary generation unit 109 finishes generating the dictionary data, the dictionary generation unit 109 stores (adds) the generated new dictionary data in the second storage unit 104.
Subsequently, the compression unit 102 compresses the decompressed data stored in the third storage unit 108 (step S075). In step S075, the compression unit 102 compresses the decompressed data while referring to dictionary data including newly generated dictionary data. The compression unit 102 stores the compressed data (compressed data) in the first storage unit 103.
As described above, the data compression apparatus 400 according to the fourth embodiment can update only the dictionary data whose frequency of use is reduced and update the dictionary data instead of recreating all dictionary data. For this reason, the data compression apparatus 400 can reduce the load for generating dictionary data and reduce the time required to update the dictionary data as compared to the case of recreating all dictionary data from 1 (the processing speed can be reduced). Effect).
(Fifth embodiment)
The fifth embodiment according to the present invention will be described below with reference to the drawings.
FIG. 21 is a block diagram showing the configuration of the data compression apparatus according to the fifth embodiment of the present invention. Note that in the description of the fifth embodiment, the same components as those in the first embodiment are denoted by the same reference numerals, and redundant description thereof is omitted.
The control apparatus 101 which comprises the data compression apparatus (data compression system) 500 of 5th Embodiment is the compression part 102, the switching part (compression algorithm switching means) 501, and the compression cost estimation part (compression cost estimation means) 502. A compression rate estimation unit (compression rate estimation unit) 503 and an update unit (switching information update unit) 504 are provided.
In each of the above-described embodiments, the compression unit 102 compresses data using a lexicographic compression algorithm (lexicographic compression method) in any case. In contrast, in the fifth embodiment, the compression unit 102 has a function of compressing data using a compression algorithm other than the lexicographic compression algorithm. In addition, when the data is compressed by a compression algorithm other than the lexicographic compression algorithm, the compression unit 102 has a function of obtaining a compression rate of the data after compression with respect to the data before compression.
When the data is compressed by a compression algorithm other than the lexicographic compression algorithm, the compression unit 102 receives the total capacity data (data indicating the total capacity of the data received from the data generation unit 110) and the data generation unit 110. Data indicating the elapsed time since the data was received and data indicating the compression rate are output to the switching unit 501. The compression rate output to the switching unit 501 is a compression rate when the compression unit 102 compresses data received from the data generation unit 110 after the compression unit 102 previously output the compression rate to the switching unit 501. .
When the data is compressed by the lexicographic compression algorithm, the compression unit 102 uses the above-described usage status data (data indicating how much dictionary data is used) and total capacity data (from the data generation unit 110). Data indicating the total capacity of the received data) and data indicating the elapsed time since the data was received from the data generation unit 110 (elapsed time data) are output to the switching unit 501.
Based on various data received from the compression unit 102, the switching unit (switching unit) 501 calculates the data flow rate per unit time using, for example, the following equation (5).
Data flow rate = {(total capacity of current input data) − (total capacity of previous input data)} ÷ {(current elapsed time) − (previous elapsed time)} (5)
The switching unit 501 stores the total capacity data and the elapsed time data received from the compression unit 102 in order to perform the above calculation. When the switching unit 501 newly receives the total capacity data and the elapsed time data from the compression unit 102, the switching unit 501 deletes the old total capacity data and the elapsed time data. Further, when the switching unit 501 receives the total capacity data and the elapsed time data for the first time from the compression unit 102, the switching unit 501 calculates the data flow rate using the following equation (6).
Data flow rate = (Total capacity of current input data) / (Current elapsed time) (6)
The switching unit 501 then compresses the compression cost estimation unit 502 when the calculated data flow rate becomes lower than the predetermined threshold T1 or higher than the predetermined threshold T2 larger than the threshold T1. The compression rate estimation unit 503 is instructed to start the operation.
The compression cost estimation unit (compression cost estimation means) 502 has a function of estimating (estimating) a cost required for compressing data. That is, the compression cost estimation unit 502 starts operation when receiving an instruction from the switching unit 501. The compression cost estimation unit 502 compresses data using each of a plurality of preset compression algorithms. Then, the compression cost estimation unit 502 calculates a calculation amount (cost) required for the process of compressing the data. Specifically, the compression cost estimation unit 502 expands a part of the compressed data stored in the first storage unit 103 using the dictionary data stored in the second storage unit 104, thereby Generate data before compression. Note that the data amount of the data developed by the compression cost estimation unit 502 may be a data amount corresponding to a preset ratio with respect to the entire compressed data stored in the first storage unit 103, or may be a predetermined fixed amount. It may be an amount.
The compression cost estimation unit 502 calculates the calculation amount AC for each of a plurality of preset compression algorithms for the uncompressed data generated as described above, using the following equation (7). .
AC = (data amount to be compressed) ÷ (time taken for compression) (7)
In Expression (7), “data amount to be compressed” indicates the data amount of the data before compression obtained by the compression cost estimation unit 502 expanding as described above. The “time required for compression” indicates the time required to compress the data to be compressed.
Then, the compression cost estimating unit 502 scores the calculation amount AC using, for example, the equation (8) or the equation (9), and outputs the scored data to the switching unit 501.
SA1 = W11 × AC1 (8)
SA2 = W12 × AC2 (9)
In Expression (8), SA1 indicates data (compression cost score) obtained by scoring the cost required when data is compressed using the compression algorithm 1. W11 represents a weight constant. AC1 indicates the calculation amount AC corresponding to the compression algorithm 1. In Equation (9), SA2 indicates data (compression cost score) obtained by scoring the cost required when data is compressed using the compression algorithm 2. W12 represents a weight constant. AC2 indicates the calculation amount AC corresponding to the compression algorithm 2.
Note that the method of obtaining scored data using the formula (8) or the formula (9) is merely an example of a method for scoring data. The method for scoring data by the compression cost estimation unit 502 is not particularly limited as long as it is a method for calculating using the calculation amount AC.
The compression rate estimation unit (compression rate estimation means) 503 has a function of estimating (estimating) the compression rate when new dictionary data is generated. That is, when the compression rate estimation unit 503 receives an instruction from the switching unit 501, the compression rate estimation unit 503 starts an operation. The compression rate estimation unit 503 calculates a compression rate. Specifically, for example, similarly to the compression cost estimation unit 502, the compression rate estimation unit 503 generates pre-compression data by using a part of the compressed data stored in the first storage unit 103. . The compression rate estimation unit 503 then determines how much the data amount after compression is smaller than the data amount before compression for each of a plurality of preset compression algorithms with respect to the data before compression. The compression ratio, which is an index indicating the above, is calculated. Then, the compression rate estimation unit 503 scores the calculated compression rate using, for example, Expression (10) or Expression (11), and outputs the scored data to the switching unit 501.
SB1 = W13 × R1 (10)
SB2 = W14 × R2 (11)
In Expression (10), SB1 represents data (compression ratio score) obtained by scoring the compression ratio of data compressed using the compression algorithm 1. W13 indicates a weight constant. R1 indicates the compression rate corresponding to the compression algorithm 1. In Expression (11), SB2 represents data (compression ratio score) obtained by scoring the compression ratio of data compressed using the compression algorithm 2. W14 represents a weight constant. R2 indicates the compression rate corresponding to the compression algorithm 1.
The technique for obtaining scored data by using formula (10) or formula (11) is merely an example of a technique for scoring data. The method of scoring data by the compression rate estimation unit 503 is not particularly limited as long as it is a method of calculating using the compression rate.
The switching unit 501 has a function of determining whether or not to switch the compression algorithm based on the data received from the compression cost estimation unit 502 and the compression rate estimation beauty 503, respectively.
Specifically, when the compression rate is reduced due to a decrease in the use frequency of dictionary data, it is necessary to switch to an algorithm with a higher compression rate. In this case, the switching unit 501 selects a compression algorithm having a compression rate score higher than the current compression rate score and a compression cost score higher than the current compression cost score. When there are a plurality of switching candidate compression algorithms, the switching unit 501 calculates a total score using the following equation (12), for example.
SC1 = W15 × SA1 + W16 × SB1 (12)
In Expression (12), SC1 represents an overall score corresponding to the compression algorithm 1. W15 and W16 indicate weight constants. SA1 indicates a compression cost score corresponding to the compression algorithm 1. SB1 indicates a compression rate score corresponding to the compression algorithm 1. The values of the weight constants W15 and W16 are set in advance. The values of the weight constants W15 and W16 are appropriately adjusted depending on whether the compression cost (implementation cost) is important or the compression rate is important. That is, when the compression cost is more important than the compression rate, the weight constant W15 is set larger. On the other hand, when the compression rate is more important than the compression cost, the weight constant W16 is set larger.
On the other hand, when the data flow rate becomes lower than or equal to the threshold value T1, the data flow rate is reduced. Therefore, it is preferable to employ the following compression algorithm. The compression algorithm is a high-load compression algorithm that requires time for compression, but is a compression algorithm having a high compression rate.
In such a case, the switching unit 501 selects a compression algorithm having a high compression rate score even if the compression cost score is low. Specifically, the switching unit 501 calculates a total score for each compression algorithm using a mathematical expression similar to Expression (12). Note that the compression rate score is more important than the compression cost score by setting the weighting constant W16 in the mathematical formula to be large.
The switching unit 501 determines which compression algorithm to switch to based on the result of comparing these total scores.
In addition, when the data flow rate exceeds the threshold value T2, the data flow rate is large, so it is preferable to employ the following compression algorithm. The compression algorithm is a low-load compression algorithm even if the compression rate is low.
In such a case, the switching unit 501 selects a compression algorithm having a high compression cost score even if the compression rate score is low. Specifically, the switching unit 501 calculates a total score for each compression algorithm using a mathematical expression similar to Expression (12). Note that the compression cost score is more important than the compression rate score by setting the weighting constant W15 in the mathematical formula to be large.
The switching unit 501 sets the compression algorithm selected as described above as a new compression algorithm used by the compression unit 102. That is, the switching unit 501 switches the compression algorithm used when compressing data. In addition, the switching unit 501 issues an instruction to start an operation to the update unit 504 and outputs information indicating the compression algorithm after switching.
The update unit (update unit) 504 starts the operation when receiving an instruction from the switching unit 501. The update unit 504 updates information indicating from which part of the data stored in the first storage unit 103 to which data is compressed with which compression algorithm. Further, the update unit 504 updates information indicating from which part of the dictionary data stored in the second storage unit 104 to which dictionary data is used by which compression algorithm.
Specifically, the compression algorithm information received from the switching unit 501 by the update unit 504 includes at least information about the compression algorithm name. Then, the update unit 504 stores (adds) compression algorithm information in each of the first storage unit 103 and the second storage unit 104 so as to know which data is related to which compression algorithm. .
For example, the data stored in the first storage unit 103 is classified for each compression algorithm as follows.
[First storage unit 103]
Compression algorithm A
ABCABABC
DACEFDAS
AXMCALDA
Compression algorithm B
ANSKANX
CIKADLAL
Compression algorithm C
DKLABNKC
ALKDKAJD
SLDKJALL
The same applies to the second storage unit 104.
Note that a compression algorithm that is first employed by the compression unit 102 is set in advance. In the fifth embodiment, the compression unit 102 has a function of generating dictionary data and registering the generated dictionary data in the second storage unit 104 in addition to the function of compressing data. .
Hereinafter, an operation example (data compression method) of the data compression apparatus 500 of the fifth embodiment will be described with reference to the flowchart of FIG. In FIG. 22, the same reference numerals as those in FIG. 18 are given to portions showing the same operations as those shown in FIG. 18, and the duplicate description thereof is omitted.
The flowchart of FIG. 22 shows the processing procedure of the computer program executed by the control device (CPU) 101 in the data compression apparatus 500, similarly to the flowchart of FIG. The computer program is stored in a storage medium (for example, a non-volatile storage medium) of a storage device such as a memory or a hard disk included in the data compression apparatus 500 as described above. Further, for example, the computer program is stored in a portable storage medium such as a compact disk (CD) or a memory card, and then transferred from the portable storage medium to the storage medium of the data compression apparatus 500. May be stored.
When the switching unit 501 determines in step S040 shown in FIG. 22 that the usage frequency of the dictionary data has not decreased, the switching unit 501 calculates the data flow rate (step S410). That is, the switching unit 501 calculates the data flow rate using the equation (6) based on the total capacity data and the elapsed time data received from the compression unit 102. Then, the switching unit 501 determines whether the calculated data flow rate is lower than the threshold value T1 or whether the data flow rate is higher than the threshold value T2. As a result, when the switching unit 501 determines that the data flow rate is equal to or lower than the threshold value T1 or higher than the threshold value T2, the switching unit 501 issues an instruction to start the operation to the compression cost estimation unit 502 and the compression rate estimation unit 503.
Thereby, the compression cost estimation unit 502 and the compression rate estimation unit 503 start operation (step S420). That is, the compression cost estimation unit 502 calculates a compression cost score for each of a plurality of preset compression algorithms as described above. In addition, the compression rate estimation unit 503 calculates a compression rate score for each of a plurality of preset compression algorithms as described above.
The switching unit 501 determines whether to switch the compression algorithm based on the calculated compression cost score and compression ratio score (whether there is a compression algorithm more appropriate than the compression algorithm currently used) ( Step S430).
If it is determined that the compression algorithm is to be switched, the switching unit 501 selects the compressed compression algorithm as described above. Then, the switching unit 501 switches the compression algorithm used when compressing data to the selected compression algorithm (step S440). In addition, the switching unit 501 stores (adds) compression algorithm information in each of the first storage unit 103 and the second storage unit 104 so that it can be understood which data is related to which compression algorithm. To do.
As described above, according to the fifth embodiment, the switching unit 501 selects a compression algorithm with an emphasis on the compression rate when the data flow rate is equal to or less than the threshold T1. In addition, when the data flow rate is equal to or greater than the threshold value T2, the switching unit 501 selects a compression algorithm with an emphasis on compression cost. As described above, the switching unit 501 selects a compression algorithm based on the compression cost and the compression rate. Thereby, the switching unit 501 can select a highly effective compression algorithm from a plurality of compression algorithms including a compression algorithm other than the lexicographic compression algorithm. That is, the switching unit 501 can switch to a highly effective compression algorithm at an appropriate timing. For this reason, the data compression apparatus 500 of the fifth embodiment can maintain efficient data compression.
In addition, this invention is not limited to each above-mentioned embodiment, Various embodiment can be taken. For example, the data compression apparatus may have a configuration as shown in FIG. That is, the data compression apparatus 600 includes a compression unit (compression unit) 601, a monitoring unit (monitoring unit) 602, a timing control unit (timing control unit) 603, and a dictionary generation unit (dictionary generation unit) 604. doing. The compression unit 601 has a function of compressing data to be compressed based on dictionary data given in advance. The monitoring unit 602 has a function of calculating the use frequency of the dictionary data when the compression unit 601 compresses the data. The timing control unit 603 has a function of issuing an instruction to update the dictionary data when the calculated use frequency falls below a preset threshold value. In response to the instruction, the dictionary generation unit 604 has a function of newly generating the dictionary data using the data compressed by the compression unit 102.
In this data compression apparatus 600 as well, an efficient data compression rate can be maintained as in the above embodiments.
While the present invention has been described with reference to the embodiments, the present invention is not limited to the above embodiments. Various changes that can be understood by those skilled in the art can be made to the configuration and details of the present invention within the scope of the present invention.
This application claims priority based on Japanese Patent Application No. 2010-036806 filed on Feb. 23, 2010, the entire disclosure of which is incorporated herein.
Further, a part or all of the above-described embodiment can be described as in the following supplementary notes, but is not limited thereto.
(Appendix 1)
A compression step of compressing sequentially input data to be compressed while referring to dictionary data stored in the dictionary data storage means, and outputting compressed data;
A dictionary usage frequency monitoring step of calculating a usage frequency of the dictionary data referred to when compressing the data to be compressed in the compression step;
A dictionary update timing control step for instructing updating of the dictionary when the calculation result of the use frequency by the dictionary use frequency monitoring step falls below a set threshold;
A data expansion step of expanding the compressed data obtained in the compression step into data before compression;
Dictionary data stored in the dictionary data storage means by constructing new dictionary data from the uncompressed data obtained in the data expansion step in accordance with the dictionary update instruction in the dictionary update timing control step A dictionary construction step for replacing the new dictionary data with
Data compression method.
(Appendix 2)
The dictionary usage frequency monitoring step receives as input the dictionary data used when the compression target data is compressed in the compression step, the number of times each dictionary data is used, and the total amount of the compression target data. The data compression method according to appendix 1, wherein the number of uses is divided by the total amount of data to be compressed for each dictionary data to obtain a calculation result of the use frequency.
(Appendix 3)
The dictionary update timing control step instructs the updating of the dictionary when the total value of the calculation results of the usage frequencies of all the dictionary data obtained by the dictionary usage frequency monitoring step is smaller than the threshold value. Or the data compression method of 2.
(Appendix 4)
A compression step of compressing sequentially input data to be compressed while referring to dictionary data stored in the dictionary data storage means, and outputting compressed data;
A dictionary usage frequency monitoring step of calculating a usage frequency of the dictionary data referred to when compressing the data to be compressed in the compression step;
A dictionary update timing control step for instructing to update the dictionary when the calculation result of the use frequency by the dictionary use frequency monitoring step falls below a threshold;
A data expansion step of expanding the compressed data obtained in the compression step into data before compression;
A resource monitoring step that outputs the free capacity of the resource to be used;
A parameter adjustment step for variably setting the threshold value used in the dictionary update timing control step according to the free capacity of the resource output in the resource monitoring step;
According to the dictionary update instruction in the dictionary update timing control step, new dictionary data is constructed from the uncompressed data obtained in the data expansion step, and the dictionary data stored in the dictionary data storage means is stored. A dictionary construction step for replacing the constructed new dictionary data with
Data compression method.
(Appendix 5)
In the parameter adjustment step, the threshold used in the dictionary update timing control step is variably set to a smaller value as the free capacity of the resource from the resource monitoring step is larger, and the threshold is decreased as the free capacity of the resource is smaller. The data compression method according to appendix 4, wherein the data is variably set to a large value.
(Appendix 6)
A compression step of compressing sequentially input data to be compressed while referring to dictionary data stored in the dictionary data storage means, and outputting compressed data;
A dictionary usage frequency monitoring step of calculating a usage frequency of the dictionary data referred to when compressing the data to be compressed in the compression step;
When the calculation result of the use frequency by the dictionary use frequency monitoring step falls below a threshold value, a dictionary is constructed according to the data amount of the compressed data and the data amount of the dictionary data stored in the dictionary data storage means A dictionary construction cost estimation step for estimating the cost;
A compression rate estimation step for estimating a compression rate when the dictionary data is updated when a calculation result of the usage frequency by the dictionary usage frequency monitoring step falls below a threshold;
A dictionary update timing control step of comparing the dictionary cost with the compression ratio and determining whether or not to construct a new dictionary according to the comparison result;
A data expansion step of expanding the compressed data output from the compression step into data before compression;
When the decision to construct the new dictionary is made by the dictionary update timing control step, new dictionary data is constructed from the uncompressed data obtained in the data expansion step and stored in the dictionary data storage means A dictionary construction step of replacing the created dictionary data with the constructed new dictionary data;
Data compression method.
(Appendix 7)
The compression rate estimation step includes a new data constructed using uncompressed data obtained by decompressing a part of the compressed data using the dictionary data stored in the dictionary data storage means, and the pre-compressed data. The data compression method according to appendix 6, wherein a ratio with the compressed data compressed using a dictionary is estimated as the compression ratio.
(Appendix 8)
The dictionary update timing control step compares the first value obtained by scoring the dictionary cost with the second value obtained by scoring the compression rate, and the second value is the first value. 8. The data compression method according to appendix 6 or 7, wherein it is decided to construct the new dictionary only when it is larger than.
(Appendix 9)
The data expansion step includes:
A compressed data storage step for storing the compressed data output from the compression means;
In accordance with an instruction to update the dictionary in the dictionary update timing control step, the compressed data stored in the compressed data storage step is expanded and compressed with reference to the dictionary data stored in the dictionary data storage means. A compressed data decompression step for obtaining decompressed data which is the previous data;
A decompressed data storage step for storing the decompressed data obtained by the compressed data decompressing step;
The dictionary construction step starts when the decompression of all the compressed data stored in the compressed data storage step by the compressed data decompression step is completed, and the pre-compression stored in the decompressed data storage The data compression method according to any one of supplementary notes 1 to 8, wherein the new dictionary data is constructed based on the expanded data that is data.
(Appendix 10)
A compression step of compressing sequentially input data to be compressed while referring to dictionary data stored in the dictionary data storage means, and outputting compressed data;
A dictionary usage frequency monitoring step of calculating, for each dictionary data, the usage frequency of the dictionary data referred to when compressing the data to be compressed in the compression step;
A dictionary update timing control step for extracting and outputting dictionary data corresponding to a use frequency that is lower than a set threshold among the use frequencies for each dictionary data calculated by the dictionary use frequency monitoring step;
A partial data expansion step of expanding a portion of the compressed data obtained by the compression into data before compression using the dictionary data obtained by the dictionary update timing control step;
A dictionary data deletion step of deleting the dictionary data obtained by the dictionary update timing control step from a plurality of dictionary data stored in the dictionary data storage means;
A dictionary construction step of constructing new dictionary data from a part of the uncompressed data output from the data partial expansion step, and adding the new dictionary data to the dictionary data storage means;
Data compression method.
(Appendix 11)
The partial data expansion step includes:
A compressed data storage step for storing the compressed data obtained in the compression step;
Using the dictionary data obtained in the dictionary update timing control step, a part of the compressed data stored in the compressed data storage step is expanded to obtain a part of the compressed data. Steps,
A decompressed data storage step for storing the partial uncompressed data obtained by the compressed data partial decompression step;
With
The dictionary construction step starts operation when the expansion of a part of the compressed data stored in the compressed data storage step in the compressed data partial expansion step is completed, and the part stored in the expanded data storage step The data compression method according to appendix 10, wherein the new dictionary data is constructed based on the data before compression.
(Appendix 12)
A compression step of compressing sequentially inputted data to be compressed in accordance with a designated compression algorithm and outputting compressed data;
The data before compression in the compression step is compressed by each of a plurality of preset compression algorithms, and the compression cost, which is the ratio of the amount of data before compression and the time taken for compression, A compression cost estimation step for estimating each of the compression algorithms;
A compression rate estimation step of calculating a compression rate when the data before compression by the compression step is compressed by each of the plurality of compression algorithms set in advance;
When the use frequency of dictionary data to be referred to when compressing the data to be compressed in the compression step is lower than a threshold value, the compression cost obtained in the compression cost estimation step and the compression rate estimation step are obtained. A compression algorithm switching step for selecting, based on the compression rate, a compression algorithm having a compression rate and a compression cost higher than a compression algorithm currently used among the plurality of compression algorithms as a compression algorithm used in the compression step;
Data compression method.
(Appendix 13)
The compression algorithm switching step calculates a data flow rate of the compression target data compressed by the compression step, and the data flow rate is equal to or lower than a first threshold value or equal to or higher than a second threshold value greater than the first threshold value. When the data flow rate is equal to or less than the first threshold, based on the compression cost obtained in the compression cost estimation step and the compression rate obtained in the compression rate estimation step, A compression algorithm having at least a higher compression ratio than the currently used compression algorithm is selected as a compression algorithm to be used in the compression step, and the plurality of compressions when the data flow rate is equal to or greater than the second threshold. Among the algorithms, the compression cost is at least higher than the compression algorithm currently used. Data compression method according to Note 12, wherein the selecting the compression algorithm to use compression algorithms in the compression step.
(Appendix 14)
Dictionary data storage means for storing dictionary data for compression;
Compression means for compressing sequentially inputted data to be compressed with reference to the dictionary data stored in the dictionary data storage means, and outputting compressed data;
Data expansion means for expanding the compressed data output from the compression means into data before compression;
Dictionary use frequency monitoring means for calculating the use frequency of the dictionary data referred to when the compression means compresses the data to be compressed;
Dictionary update timing control means for instructing updating of the dictionary when the calculation result of the use frequency by the dictionary use frequency monitoring means falls below a set threshold;
In accordance with the dictionary update instruction output from the dictionary update timing control means, new dictionary data is constructed from the uncompressed data output from the data expansion means and stored in the dictionary data storage means. Dictionary construction means for replacing the newly created dictionary data with the constructed new dictionary data;
A data compression system.
(Appendix 15)
The dictionary usage frequency monitoring means includes the compression means that stores the dictionary data used when the compression means compresses the compression target data, the number of times each dictionary data is used, and the total amount of the compression target data. 15. The data compression system according to appendix 14, wherein the number of uses is divided by the total amount of data to be compressed for each dictionary data to obtain a calculation result of the use frequency.
(Appendix 16)
The dictionary update timing control means instructs update of the dictionary when the total value of the calculation results of the usage frequencies of all the dictionary data input from the dictionary usage frequency monitoring means is smaller than the threshold value. Or 15. The data compression system according to 15.
(Appendix 17)
Dictionary data storage means for storing dictionary data for compression;
Compression means for compressing sequentially inputted data to be compressed with reference to the dictionary data stored in the dictionary data storage means, and outputting compressed data;
Data expansion means for expanding the compressed data output from the compression means into data before compression;
Dictionary use frequency monitoring means for calculating the use frequency of the dictionary data referred to when the compression means compresses the data to be compressed;
Dictionary update timing control means for instructing dictionary update when the use frequency calculation result by the dictionary use frequency monitoring means falls below a threshold;
Resource monitoring means for outputting the free capacity of the resource to be used;
Parameter adjustment means for variably setting the threshold value used in the dictionary update timing control means according to the free space of the resource from the resource monitoring means;
In accordance with the dictionary update instruction output from the dictionary update timing control means, new dictionary data is constructed from the uncompressed data output from the data expansion means and stored in the dictionary data storage means. Dictionary construction means for replacing the newly created dictionary data with the constructed new dictionary data;
A data compression system.
(Appendix 18)
The parameter adjustment means variably sets the threshold used by the dictionary update timing control means to a smaller value as the free capacity of the resource from the resource monitoring means is larger, and the threshold is set as the free capacity of the resource is smaller. The data compression system according to appendix 17, wherein the data compression system is variably set to a large value.
(Appendix 19)
Dictionary data storage means for storing dictionary data for compression;
Compression means for compressing sequentially inputted data to be compressed with reference to the dictionary data stored in the dictionary data storage means, and outputting compressed data;
Data expansion means for expanding the compressed data output from the compression means into data before compression;
Dictionary use frequency monitoring means for calculating the use frequency of the dictionary data referred to when the compression means compresses the data to be compressed;
Dictionary construction cost estimating means for estimating a dictionary construction cost according to the data amount of the compressed data and the data amount of the dictionary data stored in the dictionary data storage means;
Compression rate estimating means for estimating a compression rate when the dictionary data is updated;
The dictionary construction cost obtained by instructing each of the dictionary construction cost estimation unit and the compression rate estimation unit when the calculation result of the usage frequency by the dictionary usage frequency monitoring unit falls below a threshold value. A dictionary update timing control unit that compares the dictionary cost from the construction unit with the compression rate from the compression rate estimation unit and determines whether or not to construct a new dictionary according to the comparison result;
When the decision to construct the new dictionary is made by the dictionary update timing control means, new dictionary data is constructed from the uncompressed data output from the data expansion means, and stored in the dictionary data storage means. Dictionary construction means for replacing stored dictionary data with the constructed new dictionary data;
A data compression system.
(Appendix 20)
The compression rate estimation means includes a new data constructed using uncompressed data obtained by decompressing a part of the compressed data using the dictionary data stored in the dictionary data storage means, and the data before the compression. Item 20. The data compression system according to appendix 19, wherein a ratio with the compressed data compressed using a dictionary is estimated as the compression rate.
(Appendix 21)
The dictionary update timing control means compares the first value obtained by scoring the dictionary cost from the dictionary construction cost construction means and the second value obtained by scoring the compression ratio from the compression ratio estimation means. The data compression system according to appendix 19 or 20, wherein the new dictionary is determined to be constructed only when the second value is larger than the first value.
(Appendix 22)
The data expansion means is
Compressed data storage means for storing the compressed data output from the compression means;
In accordance with the dictionary update instruction output from the dictionary update timing control means, the compressed data stored in the compressed data storage means is referred to the dictionary data stored in the dictionary data storage means. Compressed data decompression means for decompressing and obtaining decompressed data that is data before compression;
Decompressed data storage means for storing the decompressed data obtained by the compressed data decompressing means;
And the dictionary construction means starts operation when the compressed data expansion means ends expansion of all the compressed data stored in the compressed data storage means, and is stored in the expanded data storage means. The data compression system according to any one of supplementary notes 14 to 21, wherein the new dictionary data is constructed based on the decompressed data that is data before compression.
(Appendix 23)
Dictionary data storage means for storing a plurality of dictionary data for compression;
Compression means for compressing sequentially inputted data to be compressed with reference to the dictionary data stored in the dictionary data storage means, and outputting compressed data;
Dictionary use frequency monitoring means for calculating the use frequency of the dictionary data referred to when the compression means compresses the data to be compressed for each dictionary data;
A dictionary update timing control means for extracting and outputting dictionary data corresponding to a use frequency that is lower than a set threshold value among the use frequencies for each dictionary data by the dictionary use frequency monitoring means;
Data partial expansion means for expanding a part of the compressed data output from the compression means into data before compression using the dictionary data output from the dictionary update timing control means;
Dictionary data deletion means for deleting the dictionary data output from the dictionary update timing control means from a plurality of dictionary data stored in the dictionary data storage means;
Dictionary construction means for constructing new dictionary data from a part of the uncompressed data output from the data partial expansion means, and adding the new dictionary data to the dictionary data storage means;
A data compression system.
(Appendix 24)
The data partial expansion means includes
Compressed data storage means for storing the compressed data output from the compression means;
Using the dictionary data output from the dictionary update timing control means, a part of the compressed data that expands a part of the compressed data stored in the compressed data storage means to obtain a part of the compressed data Deployment means;
Decompressed data storage means for storing the partial uncompressed data obtained by the compressed data partial decompression means;
With
The dictionary construction means starts its operation upon completion of expansion of a part of the compressed data stored in the compressed data storage means by the compressed data partial expansion means, and is stored in the expanded data storage means 24. The data compression system according to appendix 23, wherein the new dictionary data is constructed based on a part of the data before compression.
(Appendix 25)
Dictionary data storage means for storing dictionary data used when the compression algorithm is lexicographic;
Compression means for sequentially compressing data to be compressed according to a designated compression algorithm and outputting compressed data;
The data before compression of the compression means is compressed by each of a plurality of preset compression algorithms, and the compression cost, which is the ratio of the amount of data before compression and the time taken for compression, A compression cost estimation means for estimating each of the compression algorithms;
Compression ratio estimating means for calculating a compression ratio when the data before compression by the compression means is compressed by each of the plurality of compression algorithms set in advance;
Obtained by instructing the compression cost estimating means and the compression rate estimating means to operate when the use frequency of the dictionary data referred to when the compressing means compresses the data to be compressed falls below a threshold value. Further, based on the compression cost from the compression cost estimation unit and the compression rate from the compression rate estimation unit, the compression rate and the compression cost are higher than the compression algorithm currently used among the plurality of compression algorithms. Compression algorithm switching means for selecting a high compression algorithm as the compression algorithm used in the compression means;
A data compression system.
(Appendix 26)
The compression algorithm switching unit calculates a data flow rate of the compression target data input to the compression unit, and the data flow rate is equal to or lower than a first threshold value or higher than a second threshold value greater than the first threshold value. And the compression cost from the compression cost estimation unit and the compression rate from the compression rate estimation unit obtained by instructing the compression cost estimation unit and the compression rate estimation unit to operate. Based on the above, when the data flow rate is less than or equal to the first threshold, a compression algorithm having at least a higher compression rate than the currently used compression algorithm is selected as the compression algorithm to be used by the compression means, When the data flow rate is equal to or higher than the second threshold, the data compression flow is currently used among the plurality of compression algorithms. Data compression system according to Note 25, wherein selecting a compression algorithm used in the compression means at least higher compression algorithm of said compression cost than contraction algorithm.
(Appendix 27)
To the computer that performs data compression,
A compression step of compressing sequentially input data to be compressed while referring to dictionary data stored in the dictionary data storage means, and outputting compressed data;
A dictionary usage frequency monitoring step of calculating a usage frequency of the dictionary data referred to when compressing the data to be compressed in the compression step;
A dictionary update timing control step for instructing updating of the dictionary when the calculation result of the use frequency by the dictionary use frequency monitoring step falls below a set threshold;
A data expansion step of expanding the compressed data obtained in the compression step into data before compression;
Dictionary data stored in the dictionary data storage means by constructing new dictionary data from the uncompressed data obtained in the data expansion step in accordance with the dictionary update instruction in the dictionary update timing control step A dictionary construction step for replacing the new dictionary data with
A computer program storage medium for storing a data compression program for executing the program.
(Appendix 28)
The dictionary usage frequency monitoring step receives as input the dictionary data used when the compression target data is compressed in the compression step, the number of times each dictionary data is used, and the total amount of the compression target data. 28. A computer program storage medium for storing the data compression program according to appendix 27, wherein the number of uses is divided by the total amount of data to be compressed for each dictionary data to obtain the calculation result of the use frequency.
(Appendix 29)
The dictionary update timing control step instructs the updating of the dictionary when the total value of the calculation results of the usage frequencies of all the dictionary data obtained by the dictionary usage frequency monitoring step is smaller than the threshold value. Or a computer program storage medium for storing the data compression program according to 28.
(Appendix 30)
To the computer that performs data compression,
A compression step of compressing sequentially input data to be compressed while referring to dictionary data stored in the dictionary data storage means, and outputting compressed data;
A dictionary usage frequency monitoring step of calculating a usage frequency of the dictionary data referred to when compressing the data to be compressed in the compression step;
A dictionary update timing control step for instructing to update the dictionary when the calculation result of the use frequency by the dictionary use frequency monitoring step falls below a threshold;
A data expansion step of expanding the compressed data obtained in the compression step into data before compression;
A resource monitoring step that outputs the free capacity of the resource to be used;
A parameter adjustment step for variably setting the threshold value used in the dictionary update timing control step according to the free capacity of the resource output in the resource monitoring step;
According to the dictionary update instruction in the dictionary update timing control step, new dictionary data is constructed from the uncompressed data obtained in the data expansion step, and the dictionary data stored in the dictionary data storage means is stored. A dictionary construction step for replacing the constructed new dictionary data with
A computer program storage medium for storing a data compression program for executing the program.
(Appendix 31)
In the parameter adjustment step, the threshold used in the dictionary update timing control step is variably set to a smaller value as the free capacity of the resource from the resource monitoring step is larger, and the threshold is decreased as the free capacity of the resource is smaller. A computer program storage medium for storing the data compression program according to attachment 30 that is variably set to a large value.
(Appendix 32)
To the computer that performs data compression,
A compression step of compressing sequentially input data to be compressed while referring to dictionary data stored in the dictionary data storage means, and outputting compressed data;
A dictionary usage frequency monitoring step of calculating a usage frequency of the dictionary data referred to when compressing the data to be compressed in the compression step;
When the calculation result of the use frequency by the dictionary use frequency monitoring step falls below a threshold value, a dictionary is constructed according to the data amount of the compressed data and the data amount of the dictionary data stored in the dictionary data storage means A dictionary construction cost estimation step for estimating the cost;
A compression rate estimation step for estimating a compression rate when the dictionary data is updated when a calculation result of the usage frequency by the dictionary usage frequency monitoring step falls below a threshold;
A dictionary update timing control step of comparing the dictionary cost with the compression ratio and determining whether or not to construct a new dictionary according to the comparison result;
A data expansion step of expanding the compressed data output from the compression step into data before compression;
When the decision to construct the new dictionary is made by the dictionary update timing control step, new dictionary data is constructed from the uncompressed data obtained in the data expansion step and stored in the dictionary data storage means A dictionary construction step of replacing the created dictionary data with the constructed new dictionary data;
A computer program storage medium for storing a data compression program for executing the program.
(Appendix 33)
The compression rate estimation step includes a new data constructed using uncompressed data obtained by decompressing a part of the compressed data using the dictionary data stored in the dictionary data storage means, and the pre-compressed data. 33. A computer program storage medium for storing a data compression program according to supplementary note 32 for estimating a ratio of the compressed data compressed using a dictionary as the compression ratio.
(Appendix 34)
The dictionary update timing control step compares the first value obtained by scoring the dictionary cost with the second value obtained by scoring the compression rate, and the second value is the first value. 34. A computer program storage medium for storing the data compression program according to appendix 32 or 33, which determines that the new dictionary is to be constructed only when it is larger than.
(Appendix 35)
The data expansion step includes:
A compressed data storage step for storing the compressed data output from the compression means;
In accordance with an instruction to update the dictionary in the dictionary update timing control step, the compressed data stored in the compressed data storage step is expanded and compressed with reference to the dictionary data stored in the dictionary data storage means. A compressed data decompression step for obtaining decompressed data which is the previous data;
A decompressed data storage step for storing the decompressed data obtained by the compressed data decompressing step;
The dictionary construction step starts when the decompression of all the compressed data stored in the compressed data storage step by the compressed data decompression step is completed, and the pre-compression stored in the decompressed data storage 35. A computer program storage medium for storing the data compression program according to any one of supplementary notes 27 to 34 for constructing the new dictionary data based on the expanded data that is data.
(Appendix 36)
To the computer that performs data compression,
A compression step of compressing sequentially input data to be compressed while referring to dictionary data stored in the dictionary data storage means, and outputting compressed data;
A dictionary usage frequency monitoring step of calculating, for each dictionary data, the usage frequency of the dictionary data referred to when compressing the data to be compressed in the compression step;
A dictionary update timing control step for extracting and outputting dictionary data corresponding to a use frequency that is lower than a set threshold among the use frequencies for each dictionary data calculated by the dictionary use frequency monitoring step;
A partial data expansion step of expanding a portion of the compressed data obtained by the compression into data before compression using the dictionary data obtained by the dictionary update timing control step;
A dictionary data deletion step of deleting the dictionary data obtained by the dictionary update timing control step from a plurality of dictionary data stored in the dictionary data storage means;
A dictionary construction step of constructing new dictionary data from a part of the uncompressed data output from the data partial expansion step, and adding the new dictionary data to the dictionary data storage means;
A computer program storage medium for storing a data compression program for executing the program.
(Appendix 37)
The partial data expansion step includes:
A compressed data storage step for storing the compressed data obtained in the compression step;
Using the dictionary data obtained in the dictionary update timing control step, a part of the compressed data stored in the compressed data storage step is expanded to obtain a part of the compressed data. Steps,
A decompressed data storage step for storing the partial uncompressed data obtained by the compressed data partial decompression step;
With
The dictionary construction step starts operation when the expansion of a part of the compressed data stored in the compressed data storage step in the compressed data partial expansion step is completed, and the part stored in the expanded data storage step 37. A computer program storage medium for storing the data compression program according to appendix 36 for constructing the new dictionary data based on the data before compression.
(Appendix 38)
To the computer that performs data compression,
A compression step of compressing sequentially inputted data to be compressed in accordance with a designated compression algorithm and outputting compressed data;
The data before compression in the compression step is compressed by each of a plurality of preset compression algorithms, and the compression cost, which is the ratio of the amount of data before compression and the time taken for compression, A compression cost estimation step for estimating each of the compression algorithms;
A compression rate estimation step of calculating a compression rate when the data before compression by the compression step is compressed by each of the plurality of compression algorithms set in advance;
When the use frequency of dictionary data to be referred to when compressing the data to be compressed in the compression step is lower than a threshold value, the compression cost obtained in the compression cost estimation step and the compression rate estimation step are obtained. A compression algorithm switching step for selecting, based on the compression rate, a compression algorithm having a compression rate and a compression cost higher than a compression algorithm currently used among the plurality of compression algorithms as a compression algorithm used in the compression step;
A computer program storage medium for storing a data compression program for executing the program.
(Appendix 39)
The compression algorithm switching step calculates a data flow rate of the compression target data compressed by the compression step, and the data flow rate is equal to or lower than a first threshold value or equal to or higher than a second threshold value greater than the first threshold value. When the data flow rate is equal to or less than the first threshold, based on the compression cost obtained in the compression cost estimation step and the compression rate obtained in the compression rate estimation step, A compression algorithm having at least a higher compression ratio than the currently used compression algorithm is selected as a compression algorithm to be used in the compression step, and the plurality of compressions when the data flow rate is equal to or greater than the second threshold. Among the algorithms, the compression cost is at least higher than the compression algorithm currently used. Computer program storage medium for storing data compression program according to Note 38, wherein the selecting the compression algorithm to use compression algorithms in the compression step.

本発明は、サーバが出力するログデータやセンサが出力するデータ等の、逐次的に発生するデータを辞書式圧縮法によって圧縮する装置（システム）には有効である。 The present invention is effective for an apparatus (system) that compresses sequentially generated data such as log data output from a server or data output from a sensor by a lexicographic compression method.

１００、２００、３００、４００、５００データ圧縮装置
１０２圧縮部
１０５監視部
１０６，２０３，３０３，４０１タイミング制御部
１０７展開部
１０９辞書生成部
２０１リソース監視部
２０２調整部
３０１コスト見積り部
３０２，５０３圧縮率見積り部
４０２部分展開部
４０３削除部
５０１切り替え部
５０２圧縮コスト見積り部
５０４更新部100, 200, 300, 400, 500 Data compression device 102 Compression unit 105 Monitoring unit 106, 203, 303, 401 Timing control unit 107 Expansion unit 109 Dictionary generation unit 201 Resource monitoring unit 202 Adjustment unit 301 Cost estimation unit 302, 503 Compression Rate estimation unit 402 Partial expansion unit 403 Deletion unit 501 Switching unit 502 Compression cost estimation unit 504 Update unit

Claims

Compression means for compressing data to be compressed based on dictionary data given in advance;
Monitoring means for calculating the frequency of use of the dictionary data when the compression means compresses the data;
Timing control means for issuing an instruction to update the dictionary data when the calculated use frequency falls below a preset threshold;
A data compression apparatus comprising: a dictionary generation unit that receives the instruction and newly generates the dictionary data based on the data compressed by the compression unit.

Expansion means for expanding the data compressed by the compression means using the dictionary data;
The data compression apparatus according to claim 1, wherein the dictionary generation unit newly generates the dictionary data using data expanded by the expansion unit.

The data compression apparatus according to claim 1 or 2, wherein the monitoring unit calculates the use frequency by dividing the number of times the dictionary data is used by the total capacity of the data before compression.

Resource monitoring means for detecting a free capacity of a resource for storing the data;
4. The data compression apparatus according to claim 1, further comprising an adjustment unit that variably sets the threshold value used by the timing control unit in accordance with an available capacity of the resource.

Estimating means for estimating a cost required for newly generating the dictionary data and a compression ratio of the data compressed by the compression means based on the new dictionary data when the dictionary data is newly generated In addition, and
2. The timing control unit determines whether to issue the instruction to update the dictionary data by using the cost and the compression rate estimated by the estimation unit in addition to the threshold value. The data compression device according to any one of claims 4 to 4.

For each of a plurality of preset compression algorithms, an estimation means for estimating a compression cost that is a ratio between the amount of data before compression and a time required for compression, and estimating a compression rate of the compressed data;
The switching unit according to any one of claims 1 to 4, further comprising switching means for selecting a compressed compression algorithm from among the plurality of compression algorithms based on the compression cost and the compression ratio. Data compression device.

A deletion unit that deletes the dictionary data whose usage frequency is lower than a predetermined value from a storage unit that stores a plurality of the dictionary data;
7. The data compression apparatus according to claim 1, wherein the dictionary generation unit newly generates the dictionary data using a part of the data compressed by the compression unit.

Compress data to be compressed based on dictionary data given in advance,
Calculating the frequency of use of the dictionary data when compressing the data;
When the calculated usage frequency falls below a preset threshold value, an instruction to update the dictionary data is issued,
A data compression method for receiving the instruction and newly generating the dictionary data based on the compressed data.

A process of compressing data to be compressed based on dictionary data given in advance;
A process of calculating the use frequency of the dictionary data when compressing the data;
A process of issuing an instruction to update the dictionary data when the calculated use frequency falls below a preset threshold;
A program storage medium that stores a computer program that causes the data compression apparatus to execute a process of generating the dictionary data based on the compressed data in response to the instruction.