JP5413153B2

JP5413153B2 - Data compression apparatus, data expansion apparatus, data compression program, and data expansion program

Info

Publication number: JP5413153B2
Application number: JP2009268693A
Authority: JP
Inventors: 達哉浅井; 真一郎多湖; 宏弥稲越; 青史岡本
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2009-11-26
Filing date: 2009-11-26
Publication date: 2014-02-12
Anticipated expiration: 2029-11-26
Also published as: JP2011114546A

Description

本発明は、データ圧縮装置、データ伸長装置、データ圧縮プログラム、及びデータ伸長プログラムに関する。 The present invention relates to a data compression device, a data expansion device, a data compression program, and a data expansion program.

テキストデータ（プレーンなテキスト形式のデータや、ＣＳＶ（Comma Separated Values）形式のデータ、ＸＭＬ（eXtensible Markup Language）データ等）を圧縮する方法として、辞書型符号化方式が知られている。辞書型符号化方式は、圧縮対象のデータ列中に出現する文字や文字列にそれぞれ所定の番号（符号）を割り当てて辞書を作成しておき、この辞書に基づいて実際の入力文字の符号化を行う。辞書式符号化方式には、辞書の保存が必要な静的辞書式符号化（ＢＰＥ(Byte Pair Encoding)やＳＴＶＦ(Suffix-Tree based VF coding) 等）と、辞書の保存が不要な動的辞書式符号化（ＬＺ７７やＬＺ７８等）がある。データベースから出力されたデータなど、重複した値文字列を多く含むようなデータでは、静的辞書式符号化を適用してから、さらに動的辞書式符号化を適用することで、動的辞書式符号化のみの適用より圧縮率が向上することが知られている。また、静的辞書式符号化は、圧縮データを伸長することなくパターン検索するのに適した方式であることも知られている。 As a method for compressing text data (plain text format data, CSV (Comma Separated Values) format data, XML (eXtensible Markup Language) data, etc.), a dictionary-type encoding method is known. The dictionary-type coding method creates a dictionary by assigning a predetermined number (code) to each character or character string that appears in the data string to be compressed, and encodes the actual input character based on this dictionary. I do. The lexicographic encoding method includes static lexicographic encoding (BPE (Byte Pair Encoding), STVF (Suffix-Tree based VF coding), etc.) that requires dictionary storage, and dynamic dictionary that does not require dictionary storage. There is formula coding (LZ77, LZ78, etc.). For data that contains a lot of duplicate value strings, such as data output from a database, apply dynamic lexicographic encoding after applying static lexicographic encoding. It is known that the compression rate is improved by applying only coding. It is also known that static lexicographic coding is a method suitable for pattern search without decompressing compressed data.

また、入力データを適当な大きさに分割したブロックを逐次処理（ストリーム処理）するブロックストリーム処理が知られている。多くの静的辞書式符号化方式は、辞書作成と符号化の２パスで符号化を行うため、逐次処理を行うことができないが、ブロックストリーム処理を適用することで、限られた計算領域しか用いずに、逐次処理を行うことができる。また、ブロックストリーム処理は、より細かい単位でのストリーム処理（文字ストリーム処理や単語ストリーム処理）に比べて、より軽量なインデックスを用いたフィルタリングを行える、等の利点がある。 Further, block stream processing is known in which blocks obtained by dividing input data into appropriate sizes are sequentially processed (stream processing). Many static lexicographic coding methods perform coding in two passes, dictionary creation and coding, so sequential processing cannot be performed. However, by applying block stream processing, only a limited calculation area can be obtained. Sequential processing can be performed without using it. In addition, the block stream processing has an advantage that filtering using a lighter index can be performed as compared with stream processing (character stream processing and word stream processing) in smaller units.

そこで、ブロックストリーム処理において圧縮率を向上させる圧縮・伸長アルゴリズムが提案されている（例えば、特許文献１）。また、単語ストリーム処理において、辞書を使用してデータを圧縮する技術も提案されている（例えば、特許文献２）。 Therefore, a compression / decompression algorithm that improves the compression rate in block stream processing has been proposed (for example, Patent Document 1). Also, a technique for compressing data using a dictionary in word stream processing has been proposed (for example, Patent Document 2).

特開平９−２１４３５３号公報JP-A-9-214353 特開平１１−１６８３９０号公報JP-A-11-168390

しかしながら、ブロックストリーム処理を用いて入力データの辞書式符号化を行う場合に、上記特許文献１および２の技術では、次の問題がある。特許文献１の技術では、各ブロックにおける辞書サイズの削減について考慮していないため、十分な圧縮率を得られない恐れがある。特許文献２の技術は、単語ストリーム処理を行っているので、ブロックストリーム処理に特有の利点を失ってしまう。また、特許文献２の技術をブロックストリーム処理に自然に拡張することは可能だが、その場合、各ブロックの辞書の増大により、やはり十分な圧縮率が得られない恐れがある。 However, when performing lexicographic encoding of input data using block stream processing, the techniques of Patent Documents 1 and 2 have the following problems. The technique of Patent Document 1 does not consider the reduction of the dictionary size in each block, so there is a possibility that a sufficient compression rate cannot be obtained. Since the technique of Patent Document 2 performs word stream processing, it loses an advantage specific to block stream processing. In addition, although it is possible to naturally extend the technique of Patent Document 2 to block stream processing, there is a possibility that a sufficient compression rate may not be obtained due to an increase in the dictionary of each block.

本件は、上記の事情に鑑みて成されたものであり、ブロックストリーム処理において、テキストデータの圧縮率を向上させるデータ圧縮装置、データ伸長装置、データ圧縮プログラム、及びデータ伸長プログラムを提供することを目的とする。 This case has been made in view of the above circumstances, and provides a data compression device, a data decompression device, a data compression program, and a data decompression program that improve the compression rate of text data in block stream processing. Objective.

上記課題を解決するために、明細書開示のデータ圧縮装置は、テキストデータの入力を受け付ける入力部と、前記テキストデータを所定の規則に基づき複数のブロックに分割する分割部と、文字列と符号とが対応付けられて格納された辞書データである基準辞書に基づき、処理対象ブロックに出現する文字列のうち、該基準辞書に登録されていない文字列と、該基準辞書において前記処理対象ブロックに出現しない文字列に対応付けられた符号とを対応付けた辞書データである差分辞書を生成する差分辞書生成部と、前記作成した差分辞書と前記基準辞書とに基づき、辞書データである処理対象辞書を生成する処理対象辞書生成部と、前記生成した処理対象辞書を参照し、前記処理対象ブロックに出現する文字列を対応する符号に置き換えることで、該処理対象ブロックを圧縮する圧縮部と、前記圧縮部が圧縮した前記処理対象ブロックのデータと、前記生成した差分辞書とを出力する出力部と、を備える。 In order to solve the above problems, a data compression device disclosed in the specification includes an input unit that receives input of text data, a division unit that divides the text data into a plurality of blocks based on a predetermined rule, a character string, and a code And a character string that is not registered in the reference dictionary among the character strings that appear in the processing target block, and the processing target block in the reference dictionary is based on the reference dictionary that is the dictionary data stored in association with Based on the created difference dictionary and the reference dictionary, a processing target dictionary that is dictionary data based on the difference dictionary generation unit that generates a difference dictionary that is dictionary data associated with a code associated with a character string that does not appear The processing target dictionary generation unit for generating the processing target and the generated processing target dictionary are referred to, and the character string appearing in the processing target block is replaced with a corresponding code. And in comprises a compression unit for compressing the target block, and the data of the processing target block in which the compression unit is compressed, and an output unit for outputting a differential dictionary containing the product.

また、上記課題を解決するために、明細書開示のデータ伸長装置は、伸長対象のデータを所定の規則に基づき複数のブロックに分割する分割部と、前記複数のブロックのうち処理の対象となる処理対象ブロックから、文字列と符号とが対応付けられて格納された辞書データである差分辞書を取得する取得部と、辞書データである基準辞書と前記取得部が取得した前記差分辞書とに基づき、辞書データである処理対象辞書を生成する処理対象辞書生成部と、生成した前記処理対象辞書に基づいて、前記処理対象ブロックに出現する符号を対応する文字列に置き換えることで前記処理対象ブロックを復号する復号部と、を備え、前記差分辞書は、圧縮前の前記処理対象ブロックに出現する文字列のうちで前記基準辞書に登録されていない文字列と前記基準辞書において圧縮前の前記処理対象ブロックに出現しない文字列に対応付けられた符号とを対応付けた辞書データであり、前記処理対象ブロックは、前記処理対象辞書を参照し、圧縮前の前記処理対象ブロックに出現する文字列を対応する符号に置き換えることで圧縮されている。
In order to solve the above problem, a data decompression device disclosed in the specification is a processing unit that divides decompression target data into a plurality of blocks based on a predetermined rule, and is a processing target among the plurality of blocks. Based on an acquisition unit that acquires a difference dictionary that is dictionary data in which a character string and a code are stored in association with each other from a processing target block, a reference dictionary that is dictionary data, and the difference dictionary acquired by the acquisition unit , the process target dictionary generating unit for generating a processed dictionary is dictionary data, based on the generated processed dictionary, the target block by replacing the code appearing in the target block into the corresponding character string and a decoder for decoding, the difference dictionary, a character string not registered in the reference dictionary in the character string appearing in the target block before compression The dictionary data is associated with a code associated with a character string that does not appear in the processing target block before compression in the reference dictionary, and the processing target block refers to the processing target dictionary, and the compression target dictionary It is compressed by replacing the character string appearing in the processing target block with the corresponding code .

また、上記課題を解決するために、明細書開示のデータ圧縮プログラムは、コンピュータに、テキストデータの入力を受け付ける入力ステップと、前記テキストデータを所定の規則に基づき複数のブロックに分割する分割ステップと、文字列と符号とが対応付けられて格納された辞書データである基準辞書に基づき、処理対象ブロックに出現する文字列のうち、該基準辞書に登録されていない文字列と、該基準辞書において前記処理対象ブロックに出現しない文字列に対応付けられた符号とを対応付けた辞書データである差分辞書を生成する差分辞書生成ステップと、前記作成した差分辞書と前記基準辞書とに基づき、辞書データである処理対象辞書を生成する処理対象辞書生成ステップと、前記生成した処理対象辞書を参照し、前記処理対象ブロックに出現する文字列を対応する符号に置き換えることで、該処理対象ブロックを圧縮する圧縮ステップと、前記圧縮部が圧縮した前記処理対象ブロックのデータと、前記生成した差分辞書とを出力する出力ステップと、を実行させる。 In order to solve the above problems, a data compression program disclosed in the specification includes an input step of accepting text data input to a computer, and a dividing step of dividing the text data into a plurality of blocks based on a predetermined rule. Based on the reference dictionary that is the dictionary data in which the character string and the code are stored in association with each other, the character string that is not registered in the reference dictionary among the character strings that appear in the processing target block, and the reference dictionary Dictionary data based on a difference dictionary generation step of generating a difference dictionary that is dictionary data associated with a code associated with a character string that does not appear in the processing target block, and the created difference dictionary and the reference dictionary A processing target dictionary generation step for generating a processing target dictionary, and the generated processing target dictionary By outputting a compression step for compressing the processing target block by replacing the character string appearing in the block with a corresponding code, the data of the processing target block compressed by the compression unit, and the generated difference dictionary Steps are executed.

上記課題を解決するために、明細書開示のデータ伸長プログラムは、コンピュータに、伸長対象の圧縮データを所定の規則に基づき複数のブロックに分割する分割ステップと、前記複数のブロックのうち処理の対象となる処理対象ブロックから、文字列と符号とが対応付けられて格納された辞書データである差分辞書を取得する取得ステップと、辞書データである基準辞書と前記取得ステップで取得した前記差分辞書とに基づき、辞書データである処理対象辞書を生成する処理対象辞書生成ステップと、生成された前記処理対象辞書に基づいて、前記処理対象ブロックに出現する符号を対応する文字列に置き換えることで前記処理対象ブロックを復号する復号ステップと、を実行させ、前記差分辞書は、圧縮前の前記処理対象ブロックに出現する文字列のうちで前記基準辞書に登録されていない文字列と前記基準辞書において圧縮前の前記処理対象ブロックに出現しない文字列に対応付けられた符号とを対応付けた辞書データであり、前記処理対象ブロックは、前記処理対象辞書を参照し、圧縮前の前記処理対象ブロックに出現する文字列を対応する符号に置き換えることで圧縮されている。 In order to solve the above-described problem, a data decompression program disclosed in the specification includes a computer dividing a decompression target compressed data into a plurality of blocks based on a predetermined rule, and a processing target among the plurality of blocks. An acquisition step of acquiring a difference dictionary that is dictionary data in which a character string and a code are stored in association with each other, a reference dictionary that is dictionary data, and the difference dictionary acquired in the acquisition step the basis, the process target dictionary generating step of generating a processed dictionary is dictionary data, based on the generated the processing object dictionary, the processing by replacing the code appearing in the target block into the corresponding character string a decoding step of decoding the target block is executed, the difference dictionary, appear in the target block before compression Is a dictionary data in which a character string that is not registered in the reference dictionary is associated with a code that is associated with a character string that does not appear in the processing target block before compression in the reference dictionary, The processing target block is compressed by referring to the processing target dictionary and replacing a character string appearing in the processing target block before compression with a corresponding code .

明細書開示のデータ圧縮装置、及びデータ圧縮プログラムによれば、ブロックストリーム処理において、テキストデータの圧縮率を向上できる。 According to the data compression device and the data compression program disclosed in the specification, the compression rate of text data can be improved in block stream processing.

明細書開示のデータ伸長装置、及びデータ伸長プログラムによれば、ブロックストリーム処理において、テキストデータの圧縮率を向上させた圧縮データを、伸長できる。 According to the data decompression apparatus and the data decompression program disclosed in the specification, compressed data in which the compression rate of text data is improved can be decompressed in block stream processing.

実施例に係るデータ圧縮装置を含むシステム構成の一例を示す図である。It is a figure which shows an example of the system configuration | structure containing the data compression apparatus which concerns on an Example. データ圧縮装置のハードウェア構成の一例を示す図である。It is a figure which shows an example of the hardware constitutions of a data compression apparatus. データ圧縮装置が有する機能を実現する手段の一例を示す機能ブロック図である。It is a functional block diagram which shows an example of a means which implement | achieves the function which a data compression apparatus has. 辞書型符号化方式の概要を説明する図である。It is a figure explaining the outline | summary of a dictionary type encoding system. データ圧縮装置が実行するデータ圧縮処理の一例を示すフローチャートである。It is a flowchart which shows an example of the data compression process which a data compression apparatus performs. 辞書更新処理の一例を示すフローチャートである。It is a flowchart which shows an example of a dictionary update process. 被圧縮データの一例を示す図である。It is a figure which shows an example of to-be-compressed data. 圧縮データを作成する過程の一例を説明するための図である。It is a figure for demonstrating an example of the process which produces compressed data. 差分辞書作成処理の一例を示すフローチャートである。It is a flowchart which shows an example of a difference dictionary creation process. 圧縮データを作成する過程の一例を説明するための図である。It is a figure for demonstrating an example of the process which produces compressed data. 圧縮データを作成する過程の一例を説明するための図である。It is a figure for demonstrating an example of the process which produces compressed data. 圧縮データを作成する過程の一例を説明するための図である。It is a figure for demonstrating an example of the process which produces compressed data. 圧縮データを作成する過程の一例を説明するための図である。It is a figure for demonstrating an example of the process which produces compressed data. 圧縮データを作成する過程の一例を説明するための図である。It is a figure for demonstrating an example of the process which produces compressed data. 圧縮データを作成する過程の一例を説明するための図である。It is a figure for demonstrating an example of the process which produces compressed data. 圧縮データを作成する過程の一例を説明するための図である。It is a figure for demonstrating an example of the process which produces compressed data. 圧縮データの一例を示す図である。It is a figure which shows an example of compressed data. 比較例１、比較例２及び実施例により作成される圧縮データの一例を説明するための図である。It is a figure for demonstrating an example of the compression data produced by the comparative example 1, the comparative example 2, and an Example. 比較例１、比較例２及び実施例により作成される圧縮データの一例を説明するための図である。It is a figure for demonstrating an example of the compression data produced by the comparative example 1, the comparative example 2, and an Example. 比較例１、比較例２、及び実施例による圧縮データのサイズ比較の一例を示す模式図である。It is a schematic diagram which shows an example of the size comparison of the compression data by the comparative example 1, the comparative example 2, and an Example. 実施例に係るデータ伸長装置を含むシステム構成の一例を示す図である。1 is a diagram illustrating an example of a system configuration including a data decompression apparatus according to an embodiment. データ伸長装置が有する機能を実現する手段の一例を示す機能ブロック図である。It is a functional block diagram which shows an example of a means which implement | achieves the function which a data expansion | extension apparatus has. データ伸長処理の一例を示すフローチャートである。It is a flowchart which shows an example of a data expansion | extension process. 圧縮データを伸長する過程の一例を説明するための図である。It is a figure for demonstrating an example of the process which expand | extracts compressed data. 辞書復元処理の一例を示すフローチャートである。It is a flowchart which shows an example of a dictionary restoration process. 圧縮データを伸長する過程の一例を説明するための図である。It is a figure for demonstrating an example of the process which expand | extracts compressed data. 圧縮データを伸長する過程の一例を説明するための図である。It is a figure for demonstrating an example of the process which expand | extracts compressed data. データ圧縮／伸長装置を含むシステム構成の一例を示す図である。It is a figure which shows an example of the system configuration | structure containing a data compression / decompression apparatus.

以下、本件の実施形態について、添付図面を参照しつつ説明する。
［データ圧縮装置］ Hereinafter, an embodiment of the present invention will be described with reference to the accompanying drawings.
[Data compression device]

本件のデータ圧縮装置について説明する。図１は本件のデータ圧縮装置を含むシステム構成の一例を示す図である。図１に示すように、データ圧縮装置１００は、ネットワーク４０を介して、記憶装置１０、センサ装置２０、及びデータ処理装置３０と接続されている。 The data compression apparatus of this case will be described. FIG. 1 is a diagram showing an example of a system configuration including a data compression apparatus of the present case. As shown in FIG. 1, the data compression device 100 is connected to a storage device 10, a sensor device 20, and a data processing device 30 via a network 40.

記憶装置１０は、ハードディスクドライブ等で構成され、圧縮対象のデータ（以後、被圧縮データと記載する）、及び圧縮後のデータを格納する。 The storage device 10 is composed of a hard disk drive or the like, and stores data to be compressed (hereinafter referred to as data to be compressed) and compressed data.

センサ装置２０は、例えば、企業の従業員出入り口に設置される入退出管理装置である。センサ装置２０は、センサにより取得したデータをデータ圧縮装置１００へ送信する。例えば、センサ装置２０は、従業員が所持するＩＤカードの情報を取得して、従業員番号、氏名、及び勤務地を被圧縮データとしてデータ圧縮装置１００へ送信する。センサ装置２０は、センサと、センサが取得したデータを処理する情報処理装置とからなるセンサネットワークであっても良い。 The sensor device 20 is, for example, an entrance / exit management device installed at a company employee entrance. The sensor device 20 transmits data acquired by the sensor to the data compression device 100. For example, the sensor device 20 acquires information on an ID card possessed by an employee, and transmits the employee number, name, and work location to the data compression device 100 as compressed data. The sensor device 20 may be a sensor network including a sensor and an information processing device that processes data acquired by the sensor.

データ処理装置３０は、パーソナルコンピュータ等で構成され、入力されたデータに対し、演算等の予め定められたデータ処理を行う。データ処理装置３０は、予め定められた処理を施したデータを被圧縮データとして、データ圧縮装置１００へ送信する。また、データ処理装置３０は、データ圧縮装置１００から、データ圧縮装置１００が圧縮したデータを受け付ける。 The data processing device 30 is composed of a personal computer or the like, and performs predetermined data processing such as computation on input data. The data processing device 30 transmits data subjected to predetermined processing to the data compression device 100 as compressed data. In addition, the data processing device 30 receives data compressed by the data compression device 100 from the data compression device 100.

ネットワーク４０は、例えば、ＬＡＮ（Local Area Network）、ＷＡＮ（Wide Area Network）等で構成される。ネットワーク４０は、センサ装置２０、データ処理装置３０、及びデータ圧縮装置１００が送信するデータを、送信先に伝送する。 The network 40 is configured by, for example, a local area network (LAN), a wide area network (WAN), or the like. The network 40 transmits data transmitted by the sensor device 20, the data processing device 30, and the data compression device 100 to a transmission destination.

データ圧縮装置１００は、記憶装置１０、センサ装置２０、及びデータ処理装置３０から被圧縮データを受信する。被圧縮データは、ＣＳＶ形式、ＸＭＬ形式等の構造を有するテキストデータである。データ圧縮装置１００は、被圧縮データを複数のブロックに分割し、ブロック毎に圧縮処理を実行して圧縮データを生成する。データ圧縮装置１００は、圧縮データを記憶装置１０へ格納する。または、データ圧縮装置１００は、圧縮データをデータ処理装置３０へ送信する。 The data compression device 100 receives data to be compressed from the storage device 10, the sensor device 20, and the data processing device 30. The compressed data is text data having a structure such as CSV format or XML format. The data compression apparatus 100 divides the data to be compressed into a plurality of blocks, executes compression processing for each block, and generates compressed data. The data compression device 100 stores the compressed data in the storage device 10. Alternatively, the data compression device 100 transmits the compressed data to the data processing device 30.

次に、図２を用いてデータ圧縮装置１００のハードウェア構成について説明する。図２に示すように、データ圧縮装置１００は、入出力部１０１、ＲＯＭ(Read Only Memory)１０２、中央処理装置（ＣＰＵ:Central Processing Unit）１０３、及びＲＡＭ(Random Access Memory)１０４を備える。 Next, the hardware configuration of the data compression apparatus 100 will be described with reference to FIG. As shown in FIG. 2, the data compression apparatus 100 includes an input / output unit 101, a ROM (Read Only Memory) 102, a central processing unit (CPU) 103, and a RAM (Random Access Memory) 104.

入出力部１０１は、記憶装置１０、センサ装置２０、及びデータ処理装置３０から被圧縮データを受け付ける。また、入出力部１０１は、圧縮データを記憶装置１０、及びデータ処理装置３０へ出力する。ＲＯＭ１０２は、被圧縮データを圧縮するためのプログラム等を格納する。ＣＰＵ１０３は、ＲＯＭ１０２に格納されたプログラムを読み込んで実行する。ＲＡＭ１０４は、プログラムを実行する際に使用される一時的なデータを保存する。また、ＲＯＭ１０２に格納されたプログラムのＣＰＵ１０３による演算によって、図３に示すデータ圧縮装置１００が有する機能が実現される。 The input / output unit 101 receives data to be compressed from the storage device 10, the sensor device 20, and the data processing device 30. The input / output unit 101 outputs the compressed data to the storage device 10 and the data processing device 30. The ROM 102 stores a program for compressing data to be compressed. The CPU 103 reads and executes a program stored in the ROM 102. The RAM 104 stores temporary data used when executing the program. Further, the functions of the data compression apparatus 100 shown in FIG. 3 are realized by the calculation performed by the CPU 103 of the program stored in the ROM 102.

次に、図３を参照して、データ圧縮装置１００が有する機能の一例について説明する。図３は、データ圧縮装置１００が有する機能を実現する手段の一例を示す機能ブロック図である。 Next, an example of the function of the data compression apparatus 100 will be described with reference to FIG. FIG. 3 is a functional block diagram showing an example of means for realizing the functions of the data compression apparatus 100.

データ圧縮装置１００は、データ取得部１１０、辞書作成部１１１、差分辞書作成部１１２、符号化部１１３、及び出力部１１４を備える。 The data compression apparatus 100 includes a data acquisition unit 110, a dictionary creation unit 111, a difference dictionary creation unit 112, an encoding unit 113, and an output unit 114.

データ取得部１１０は、記憶装置１０、センサ装置２０、及びデータ処理装置３０等が送信した被圧縮データ（テキストデータ）を受け付ける入力部としての機能と、被圧縮データを所定の規則に基づき複数のブロックに分割する分割部としての機能を有する。データ取得部１１０は、分割した被圧縮データを、辞書作成部１１１に出力する。 The data acquisition unit 110 has a function as an input unit that receives compressed data (text data) transmitted from the storage device 10, the sensor device 20, the data processing device 30, and the like, and a plurality of compressed data based on a predetermined rule. It has a function as a dividing unit for dividing into blocks. The data acquisition unit 110 outputs the divided compressed data to the dictionary creation unit 111.

辞書作成部１１１（処理対象辞書生成部）は、被圧縮データを所定の規則に基づき分割した複数ブロックのうち、圧縮処の対象となるブロック（以後、処理対象ブロックという）のデータを符号化する際に必要な辞書を、データ取得部１１０から入力されたデータを元に作成する。辞書は、符号項目と、値項目とを有しており、辞書作成部１１１は、処理対象ブロックのデータに出現する文字列を値項目に登録し、文字列に対応付けられる所定の符号を符号項目に登録する。例えば、入力データが図４（１）に示すデータであり、符号化する項目が「地区」の場合、辞書作成部１１１は、地区の列に出現する文字列と、所定の符号とを対応付けた、図４（２）に示す辞書を作成する。 The dictionary creation unit 111 (processing target dictionary generation unit) encodes data of a block to be compressed (hereinafter referred to as a processing target block) among a plurality of blocks obtained by dividing the compressed data based on a predetermined rule. A dictionary required at this time is created based on the data input from the data acquisition unit 110. The dictionary has a code item and a value item, and the dictionary creation unit 111 registers a character string that appears in the data of the processing target block in the value item, and codes a predetermined code associated with the character string. Register in the item. For example, if the input data is the data shown in FIG. 4A and the item to be encoded is “district”, the dictionary creation unit 111 associates a character string appearing in the district column with a predetermined code. In addition, the dictionary shown in FIG.

符号化部１１３（圧縮部）は、ブロック毎に、辞書作成部１１１が作成した辞書を用いて、データを符号化し、出力部１１４に出力する。例えば、符号化部１１３は、図４（２）に示した辞書を用いて、図４（１）の「地区」の文字列を符号に置換することによって被圧縮データを符号化する。符号化されたデータは、図４（３）となる。 The encoding unit 113 (compression unit) encodes data for each block using the dictionary created by the dictionary creation unit 111 and outputs the data to the output unit 114. For example, the encoding unit 113 encodes the data to be compressed by replacing the character string of “district” in FIG. 4A with a code using the dictionary shown in FIG. The encoded data is as shown in FIG.

差分辞書作成部１１２（差分辞書生成部）は、差分辞書を作成し、出力部１１４へ出力する。差分辞書作成部１１２は、処理対象ブロックとは異なる基準ブロックの符号化に用いた辞書と、処理対象ブロックに含まれる符号化対象の文字列とに基づいて差分辞書を作成する。差分辞書は、符号化に使用される辞書と同様に、符号項目と値項目とを有する。差分辞書作成部１１２は、処理対象ブロックに出現する文字列のうち、基準ブロックの符号化に用いた辞書に登録されていない文字列を差分辞書の値項目に登録する。また、差分辞書作成部１１２は、差分辞書に登録した文字列に対して、基準ブロックの辞書において、基準ブロックの辞書には登録されているが処理対象ブロックには出現しない文字列に対応付けられている符号を割り当てる。例えば、基準ブロックの辞書が図４（４）であり、処理対象ブロックに含まれる符号化対象の文字列が図４（５）であるとする。この場合、差分辞書作成部１１２は、処理対象ブロックには出現するが基準ブロックの辞書には登録されていない文字列「ＮｅｗＹｏｒｋ」を差分辞書に登録する。また、差分辞書作成部１１２は、処理対象ブロックには出現しないが基準ブロックには登録されている文字列「ＬａｓＶｅｇａｓ」に割り当てられている符号「３」を、差分辞書に登録した文字列「ＮｅｗＹｏｒｋ」に対して割り当てる。その結果、差分辞書は図４（６）となる。 The difference dictionary creating unit 112 (difference dictionary creating unit) creates a difference dictionary and outputs it to the output unit 114. The difference dictionary creating unit 112 creates a difference dictionary based on the dictionary used for encoding the reference block different from the processing target block and the encoding target character string included in the processing target block. Similar to the dictionary used for encoding, the difference dictionary has a code item and a value item. The difference dictionary creation unit 112 registers character strings that are not registered in the dictionary used for encoding the reference block among the character strings that appear in the processing target block, in the value item of the difference dictionary. In addition, the difference dictionary creation unit 112 associates a character string registered in the difference dictionary with a character string that is registered in the reference block dictionary but does not appear in the processing target block. Assign a code. For example, it is assumed that the reference block dictionary is FIG. 4 (4) and the encoding target character string included in the processing target block is FIG. 4 (5). In this case, the difference dictionary creation unit 112 registers the character string “New York” that appears in the processing target block but is not registered in the reference block dictionary in the difference dictionary. Further, the difference dictionary creating unit 112 converts the character string “3” assigned to the character string “Las Vegas” that does not appear in the processing target block but is registered in the reference block into the character string “ Assign to “New York”. As a result, the difference dictionary is as shown in FIG.

出力部１１４は、差分辞書と符号化されたデータとを、ブロック毎に、記憶装置１０、又は、データ処理装置３０に出力する。 The output unit 114 outputs the difference dictionary and the encoded data to the storage device 10 or the data processing device 30 for each block.

次に、データ圧縮装置１００が実行する処理の一例について、図５及び図６を用いて説明する。図５は、データ圧縮装置１００が実行する圧縮処理の一例を示すフローチャートである。本実施例では、連続する２つのブロックにおける前方のブロックを基準ブロックと記載し、後方のブロックを処理対象ブロックと記載する。 Next, an example of processing executed by the data compression apparatus 100 will be described with reference to FIGS. 5 and 6. FIG. 5 is a flowchart illustrating an example of the compression process executed by the data compression apparatus 100. In this embodiment, a front block in two consecutive blocks is described as a reference block, and a rear block is described as a processing target block.

辞書作成部１１１は、基準ブロックの辞書Ｄｉｃ０を空テーブルで初期化する（ステップＳ１０）。これは、被圧縮データの最初のブロックが処理対象ブロックである場合、基準ブロックが存在しないためである。 The dictionary creation unit 111 initializes the dictionary Dic0 of the reference block with an empty table (step S10). This is because the reference block does not exist when the first block of the compressed data is the processing target block.

次に、辞書作成部１１１は、処理対象ブロックの符号化に使用する辞書（処理対象辞書）Ｄｉｃ１、及び読み込んだレコード数をカウントするための変数Ｍをそれぞれ初期化する（ステップＳ１２）。初期化によって、辞書Ｄｉｃ１は空テーブルとなり、Ｍの値は０となる。 Next, the dictionary creation unit 111 initializes a dictionary (processing target dictionary) Dic1 used for encoding the processing target block and a variable M for counting the number of records read (step S12). By the initialization, the dictionary Dic1 becomes an empty table, and the value of M becomes 0.

データ取得部１１０は、被圧縮データに処理するレコードが存在するか否か判定する（ステップＳ１４）。データ取得部１１０は、処理するレコードが存在する場合（ステップＳ１４／ＹＥＳ）、レコードを取得し、Ｍに１を加算する（ステップＳ１６）。 The data acquisition unit 110 determines whether there is a record to be processed in the compressed data (step S14). If there is a record to be processed (step S14 / YES), the data acquisition unit 110 acquires the record and adds 1 to M (step S16).

次に、辞書作成部１１１は、処理対象ブロックの辞書を作成するため、辞書更新処理を実行する（ステップＳ１８）。ここで、ステップＳ１８の辞書更新処理について、図６を用いて説明する。 Next, the dictionary creation unit 111 executes a dictionary update process in order to create a dictionary for the processing target block (step S18). Here, the dictionary update processing in step S18 will be described with reference to FIG.

図６は、処理対象ブロックの辞書を作成する辞書更新処理の一例を示すフローチャートである。 FIG. 6 is a flowchart illustrating an example of a dictionary update process for creating a dictionary of processing target blocks.

辞書作成部１１１は、取得したレコードに含まれる文字列が辞書Ｄｉｃ１のエントリに存在するか否か判定する（ステップＳ５０）。辞書作成部１１１は、取得したレコードに含まれる文字列が辞書Ｄｉｃ１のエントリに存在しない場合（ステップＳ５０／ＮＯ）、文字列と符号とを対応付けたエントリを辞書Ｄｉｃ１に新規登録し（ステップＳ５２）、更新処理を終了する。一方、辞書作成部１１１は、取得したレコードに含まれる文字列が辞書Ｄｉｃ１のエントリに存在する場合、更新処理を終了する。 The dictionary creation unit 111 determines whether the character string included in the acquired record exists in the entry of the dictionary Dic1 (step S50). When the character string included in the acquired record does not exist in the entry of the dictionary Dic1 (step S50 / NO), the dictionary creation unit 111 newly registers an entry in which the character string and the code are associated with each other in the dictionary Dic1 (step S52). ), The update process is terminated. On the other hand, when the character string included in the acquired record exists in the entry of the dictionary Dic1, the dictionary creation unit 111 ends the update process.

図５に戻り、圧縮処理の一例について説明を続ける。データ取得部１１０は、Ｍの値が、予め定められた値ＭＢよりも小さいか否か判定する（ステップＳ２０）。ここで、ＭＢは、１ブロックに含まれるレコード数を定める。ＭＢの値は、被圧縮データが保持していても良いし、ユーザが予め決定しておいても良い。また、ＭＢの値は、全ブロックを通して同一でも良いし、ブロック毎に異なっても良い。 Returning to FIG. 5, the description of an example of the compression process will be continued. The data acquisition unit 110 determines whether the value of M is smaller than a predetermined value MB (step S20). Here, MB defines the number of records included in one block. The value of MB may be held in the data to be compressed, or may be determined in advance by the user. Further, the value of MB may be the same throughout all blocks, or may be different for each block.

Ｍの値がＭＢの値よりも小さい場合、処理対象ブロックのレコードが、未だ全て読み込まれていないことを意味する。従って、データ取得部１１０は、Ｍの値がＭＢの値よりも小さい場合（ステップＳ２０／ＹＥＳ）、ステップＳ１４に戻って処理を継続する。 When the value of M is smaller than the value of MB, it means that all the records of the processing target block have not been read yet. Therefore, when the value of M is smaller than the value of MB (step S20 / YES), the data acquisition unit 110 returns to step S14 and continues the process.

Ｍの値がＭＢの値と等しい場合、処理対象ブロックのレコードが全て読みこまれたことを意味する。そこで、データ取得部１１０が、Ｍの値がＭＢの値と等しいと判定すると（ステップＳ２０／ＮＯ）、差分辞書作成部１１２は、差分辞書作成処理（ステップＳ２２）を実行する。差分辞書作成処理の詳細については後述する。 When the value of M is equal to the value of MB, it means that all records of the processing target block have been read. Therefore, when the data acquisition unit 110 determines that the value of M is equal to the value of MB (step S20 / NO), the difference dictionary creation unit 112 executes a difference dictionary creation process (step S22). Details of the difference dictionary creation processing will be described later.

出力部１１４は、差分辞書作成部１１２が差分辞書作成処理で作成した差分辞書Δを出力する（ステップＳ２４）。次に、辞書作成部１１１は、基準ブロックの辞書Ｄｉｃ０と差分辞書Δとをマージして符号化用の辞書を構築し、それを新たな辞書Ｄｉｃ１とする（ステップＳ２６）。具体的には、辞書Ｄｉｃ０と差分辞書Δとの間で重複する符号がある場合には、重複する符号に対応付けられた辞書Ｄｉｃ０の文字列を、差分辞書Δの文字列で置換する。また、辞書Ｄｉｃ０に登録されていない符号が差分辞書Δに登録されている場合には、辞書Ｄｉｃ０に差分辞書Δのエントリを追加する。つまり、符号化部１１３が使用する辞書Ｄｉｃ１は、処理対象ブロックに出現する文字列のうち、基準ブロックで使用された辞書に定義されている文字列には、基準ブロックの辞書と同一の符号が割り当てられた辞書となる。また、辞書Ｄｉｃ１は、基準ブロックの辞書に登録されていない文字列には、基準ブロックの辞書において処理対象ブロックに出現しない文字列に対応付けられた符号が割り当てられた辞書となる。 The output unit 114 outputs the difference dictionary Δ created by the difference dictionary creation unit 112 in the difference dictionary creation process (step S24). Next, the dictionary creation unit 111 merges the reference block dictionary Dic0 and the difference dictionary Δ to construct a dictionary for encoding, and sets it as a new dictionary Dic1 (step S26). Specifically, when there is an overlapping code between the dictionary Dic0 and the difference dictionary Δ, the character string of the dictionary Dic0 associated with the overlapping code is replaced with the character string of the difference dictionary Δ. If a code not registered in the dictionary Dic0 is registered in the difference dictionary Δ, an entry of the difference dictionary Δ is added to the dictionary Dic0. That is, the dictionary Dic1 used by the encoding unit 113 has the same code as the dictionary of the reference block in the character string defined in the dictionary used in the reference block among the character strings appearing in the processing target block. It becomes the assigned dictionary. The dictionary Dic1 is a dictionary in which codes associated with character strings that do not appear in the processing target block in the reference block dictionary are assigned to character strings that are not registered in the reference block dictionary.

符号化部１１３は、ステップＳ２６で更新された辞書Ｄｉｃ１を使用して、処理対象ブロックに出現する文字列を符号化する（ステップＳ２８）。出力部１１４は、符号化部１１３が符号化したデータを出力する（ステップＳ３０）。ステップＳ２４及びステップＳ３０の処理によって、処理対象ブロックの圧縮データが作成される。 The encoding unit 113 encodes the character string appearing in the processing target block using the dictionary Dic1 updated in step S26 (step S28). The output unit 114 outputs the data encoded by the encoding unit 113 (step S30). The compressed data of the block to be processed is created by the processing of step S24 and step S30.

辞書作成部１１１は、次ブロックの圧縮処理のために、辞書Ｄｉｃ０を初期化し（ステップＳ３２）、ステップＳ１２の処理へ戻る。本フローチャートでは、連続する２つのブロックのうち、前方のブロックを基準ブロック、後方のブロックを処理対象ブロックとしている。従って、今回処理したブロックは次に処理するブロックの基準ブロックとなるので、辞書Ｄｉｃ０は、辞書Ｄｉｃ１で初期化される。 The dictionary creation unit 111 initializes the dictionary Dic0 for the compression process of the next block (step S32), and returns to the process of step S12. In this flowchart, of two consecutive blocks, the front block is the reference block and the rear block is the processing target block. Therefore, since the block processed this time becomes the reference block for the next block to be processed, the dictionary Dic0 is initialized with the dictionary Dic1.

データ取得部１１０は、処理するレコードが存在しない場合（ステップＳ１４／ＮＯ）、Ｍの値が０か否か判定する（ステップＳ３４）。 If there is no record to be processed (step S14 / NO), the data acquisition unit 110 determines whether the value of M is 0 (step S34).

Ｍの値が０の場合とは、次のブロックが存在しない場合である。従って、データ取得部１１０はＭの値が０の場合（ステップＳ３４／ＹＥＳ）、データ圧縮処理を終了する。 The case where the value of M is 0 is a case where the next block does not exist. Accordingly, when the value of M is 0 (step S34 / YES), the data acquisition unit 110 ends the data compression process.

Ｍの値が０ではない場合とは、最後の処理対象ブロックに存在するデータの読込が全て終了した場合である。そこで、データ取得部１１０が、Ｍの値が０ではないと判定すると（ステップＳ３４／ＮＯ）、差分辞書作成部１１２はステップＳ２２、出力部１１４はステップＳ２４及びステップＳ３０、符号化部１１３はステップＳ２８、辞書作成部１１１はステップＳ２６の処理をそれぞれ実行する。ステップＳ２２〜ステップＳ３０の処理は、前述した各ステップの処理と同じであるため、説明を省略する。以上の処理により、被圧縮データは、ブロック毎に差分辞書と符号化データを含んで圧縮される。 The case where the value of M is not 0 is a case where all reading of data existing in the last processing target block is completed. Therefore, when the data acquisition unit 110 determines that the value of M is not 0 (step S34 / NO), the difference dictionary creation unit 112 is step S22, the output unit 114 is step S24 and step S30, and the encoding unit 113 is step. S28, the dictionary creation unit 111 executes the process of step S26. Since the process of step S22-step S30 is the same as the process of each step mentioned above, description is abbreviate | omitted. With the above processing, the compressed data is compressed including the difference dictionary and the encoded data for each block.

次に、図７〜図１７を参照しつつ、具体的なデータを用いて、上述した圧縮処理によるデータ圧縮について説明すると共に、差分辞書作成処理の詳細について説明する。図７は、本説明で用いる被圧縮データの一例である。本実施例では、地区項目に入力されている文字列を符号化した圧縮データを作成するとする。また、３レコード（つまり、ＭＢ＝３）を１ブロックとして、ブロックストリーム処理によりデータを圧縮するものとする。 Next, data compression by the above-described compression processing using specific data will be described with reference to FIGS. 7 to 17 and details of the difference dictionary creation processing will be described. FIG. 7 is an example of compressed data used in this description. In this embodiment, it is assumed that compressed data is generated by encoding a character string input to a district item. In addition, it is assumed that data is compressed by block stream processing with 3 records (that is, MB = 3) as one block.

データ圧縮装置１００は、図５のステップＳ１０及びＳ１２の初期化処理を行う。次に、処理対象となるレコードが存在するため（ステップＳ１４／ＹＥＳ）、データ取得部１１０が、図８（１）に示す１行目のレコードを取得する（ステップＳ１６）。 The data compression apparatus 100 performs the initialization process of steps S10 and S12 in FIG. Next, since there is a record to be processed (step S14 / YES), the data acquisition unit 110 acquires the record in the first row shown in FIG. 8A (step S16).

次に、辞書作成部１１１が、辞書Ｄｉｃ１の更新処理（ステップＳ１８）を実行する。この時点で辞書Ｄｉｃ１は空テーブルであるため、辞書Ｄｉｃ１において、ＳａｎＦｒａｎｃｉｓｃｏを値項目に有するエントリは存在しない（ステップＳ５０／ＮＯ）。従って、辞書作成部１１１は、ＳａｎＦｒａｎｃｉｓｃｏに符号「１」を割り当てたエントリを辞書Ｄｉｃ１に新規登録し（ステップＳ５２）、処理を終了する。更新後の辞書Ｄｉｃ１は図８（２）となる。 Next, the dictionary creation unit 111 executes a process for updating the dictionary Dic1 (step S18). Since the dictionary Dic1 is an empty table at this time, there is no entry having “San Francisco” as a value item in the dictionary Dic1 (step S50 / NO). Accordingly, the dictionary creation unit 111 newly registers an entry in which the code “1” is assigned to San Francisco in the dictionary Dic1 (step S52), and ends the process. The updated dictionary Dic1 is as shown in FIG.

Ｍ（＝１）＜ＭＢ（＝３）であり（ステップＳ２０／ＹＥＳ）、次のレコードが存在するため（ステップＳ１４／ＹＥＳ）、データ取得部１１０は、図８（３）に示す２行目のレコードを取得する（ステップＳ１６）。 Since M (= 1) <MB (= 3) (step S20 / YES) and the next record exists (step S14 / YES), the data acquisition unit 110 displays the second line shown in FIG. 8 (3). Is acquired (step S16).

図６の辞書更新処理において、ＷａｓｈｉｎｇｔｏｎＤ．Ｃ．は、辞書Ｄｉｃ１に存在しないため（ステップＳ５０／ＮＯ）、辞書作成部１１１は、ＷａｓｈｉｎｇｔｏｎＤ．Ｃ．に符号「２」を割り当てたエントリを辞書Ｄｉｃ１に新規登録し（ステップＳ５２）、処理を終了する。更新後の辞書Ｄｉｃ１は、図８（４）となる。 In the dictionary update process of FIG. C. Does not exist in the dictionary Dic1 (step S50 / NO), the dictionary creation unit 111 uses the Washington D.1. C. Is newly registered in the dictionary Dic1 (step S52), and the process ends. The updated dictionary Dic1 is as shown in FIG.

Ｍ（＝２）＜ＭＢ（＝３）であり（ステップＳ２０／ＹＥＳ）、次のレコードが存在するため（ステップＳ１４／ＹＥＳ）、データ取得部１１０は、図８（５）に示す３行目のレコードを入力する。 Since M (= 2) <MB (= 3) (step S20 / YES) and the next record exists (step S14 / YES), the data acquisition unit 110 displays the third line shown in FIG. 8 (5). Enter the record.

図６の辞書更新処理において、ＷａｓｈｉｎｇｔｏｎＤ．Ｃ．は、既に辞書Ｄｉｃ１に存在しているため（ステップＳ５０／ＹＥＳ）、辞書作成部１１１は、辞書Ｄｉｃ１にエントリは新規登録せず、処理を終了する。 In the dictionary update process of FIG. C. Already exists in the dictionary Dic1 (step S50 / YES), the dictionary creation unit 111 does not newly register an entry in the dictionary Dic1, and ends the process.

Ｍ（＝３）がＭＢ（＝３）と一致するため（ステップＳ２０／ＮＯ）、差分辞書作成部１１２が、差分辞書作成処理（ステップＳ２２）を実行する。 Since M (= 3) matches MB (= 3) (step S20 / NO), the difference dictionary creation unit 112 executes difference dictionary creation processing (step S22).

ここで、具体例を用いながら、差分辞書作成処理について説明する。図９は、差分辞書作成処理の一例を示すフローチャートである。 Here, the difference dictionary creation process will be described using a specific example. FIG. 9 is a flowchart illustrating an example of the difference dictionary creation process.

差分辞書作成部１１２は、差分辞書Δを基準ブロックの辞書Ｄｉｃ０で初期化する（ステップＳ６０）。なお、最初のブロックでは、辞書Ｄｉｃ０は空集合となっているため、差分辞書Δも空集合となる。 The difference dictionary creating unit 112 initializes the difference dictionary Δ with the reference block dictionary Dic0 (step S60). In the first block, the dictionary Dic0 is an empty set, so the difference dictionary Δ is also an empty set.

次に、差分辞書作成部１１２は、基準ブロックの辞書Ｄｉｃ０の値項目の集合と、更新処理を実行した辞書Ｄｉｃ１の値項目の集合との差集合Ｄｉｆｆ０を求める（ステップＳ６２）。図９では、辞書Ｄｉｃ０の値項目の集合をＤｉｃ０．ｖａｌ、辞書Ｄｉｃ１の値項目の集合を辞書Ｄｉｃ１．ｖａｌと記載する。 Next, the difference dictionary creation unit 112 obtains a difference set Diff0 between the set of value items of the dictionary Dic0 of the reference block and the set of value items of the dictionary Dic1 that has been subjected to the update process (step S62). In FIG. 9, a set of value items in the dictionary Dic0 is represented by Dic0. val, a set of value items of the dictionary Dic1 is stored in the dictionary Dic1. It is described as val.

次に、差分辞書作成部１１２は、辞書Ｄｉｃ１の値項目の集合と、辞書Ｄｉｃ１の値項目の集合との差集合Ｄｉｆｆ１を求める（ステップＳ６４）。更に、差分辞書作成部１１２は辞書Ｄｉｃ１の値項目の集合と、辞書Ｄｉｃ０の値項目の集合との積集合ＮｏＤｉｆｆを求める（ステップＳ６６）。 Next, the difference dictionary creation unit 112 obtains a difference set Diff1 between the set of value items in the dictionary Dic1 and the set of value items in the dictionary Dic1 (step S64). Further, the difference dictionary creation unit 112 obtains a product set NoDiff of the set of value items in the dictionary Dic1 and the set of value items in the dictionary Dic0 (step S66).

図８に示した具体例において、辞書Ｄｉｃ０の値項目の集合Ｄｉｃ０．ｖａｌは空集合であり、辞書Ｄｉｃ１の値項目の集合Ｄｉｃ１．ｖａｌ＝｛ＳａｎＦｒａｎｃｉｓｃｏ，ＷａｓｈｉｎｇｔｏｎＤ．Ｃ．｝である。この場合、辞書Ｄｉｃ０には存在するが辞書Ｄｉｃ１には存在しない値項目は、存在しない。従って、ステップＳ６２の処理を実行すると、差集合Ｄｉｆｆ０は空集合となる。また、辞書Ｄｉｃ０には存在しないが辞書Ｄｉｃ１には存在する値項目は、ＳａｎＦｒａｎｃｉｓｃｏ及びＷａｓｈｉｎｇｔｏｎＤ．Ｃ．である。従って、ステップＳ６４の処理を実行すると、差集合Ｄｉｆｆ１＝｛ＳａｎＦｒａｎｃｉｓｃｏ，ＷａｓｈｉｎｇｔｏｎＤ．Ｃ．｝となる。更に、辞書Ｄｉｃ０と、辞書Ｄｉｃ１との間に共通する値項目は存在しないため、ステップＳ６６の処理を実行すると、積集合ＮｏＤｉｆｆは空集合となる。 In the specific example shown in FIG. 8, a set Dic0. val is an empty set, and a set of value items Dic1. val = {San Francisco, Washington D. C. }. In this case, there is no value item that exists in the dictionary Dic0 but does not exist in the dictionary Dic1. Therefore, when the process of step S62 is executed, the difference set Diff0 becomes an empty set. The value items that are not present in the dictionary Dic0 but are present in the dictionary Dic1 are San Francisco and Washington D.C. C. It is. Therefore, when the process of step S64 is executed, the difference set Diff1 = {San Francisco, Washington D.D. C. }. Furthermore, since there is no common value item between the dictionary Dic0 and the dictionary Dic1, when the process of step S66 is executed, the product set NoDiff becomes an empty set.

次に、差分辞書作成部１１２は、Ｄｉｆｆ１が空集合か否か判定する（ステップＳ６８）。差分辞書作成部１１２は、Ｄｉｆｆ１が空集合ではない場合（ステップＳ６８／ＮＯ）、差分辞書作成部１１２は、Ｄｉｆｆ０が空集合か否か判定する（ステップＳ７０）。 Next, the difference dictionary creation unit 112 determines whether Diff1 is an empty set (step S68). When Diff1 is not an empty set (Step S68 / NO), the difference dictionary creating unit 112 determines whether Diff0 is an empty set (Step S70).

差分辞書作成部１１２は、Ｄｉｆｆ０が空集合の場合（ステップＳ７０／ＹＥＳ）、Ｄｉｆｆ１の全ての要素を差分辞書Δに新規登録する（ステップＳ７６）。そして、差分辞書作成部１１２は、ＮｏＤｉｆｆの要素と一致する値を持つエントリを差分辞書Δから全て除去し（ステップＳ７８）、差分辞書Δを辞書作成部１１１及び出力部１１４に出力する（ステップＳ８０）。 When Diff0 is an empty set (step S70 / YES), the difference dictionary creation unit 112 newly registers all elements of Diff1 in the difference dictionary Δ (step S76). Then, the difference dictionary creation unit 112 removes all entries having values that match the elements of NoDiff from the difference dictionary Δ (step S78), and outputs the difference dictionary Δ to the dictionary creation unit 111 and the output unit 114 (step S80). ).

具体例では、Ｄｉｆｆ１は空集合ではなく（ステップＳ６８／ＮＯ）、Ｄｉｆｆ０は空集合である（ステップＳ７０／ＹＥＳ）。したがって、差分辞書作成部１１２は、ステップＳ７６の処理を実行する。ステップＳ７６の処理実行後の差分辞書Δは図１０（１）となる。また、具体例において、ＮｏＤｉｆｆは空集合であるため、ステップＳ７８において、差分辞書Δから除去するエントリは存在しない。従って、差分辞書作成部１１２は、ステップＳ８０の処理を実行し、図１０（１）に示す差分辞書Δを辞書作成部１１１及び出力部１１４に出力する。図９の差分辞書作成処理における他のステップについては、後述する。 In the specific example, Diff1 is not an empty set (step S68 / NO), and Diff0 is an empty set (step S70 / YES). Therefore, the differential dictionary creation unit 112 executes the process of step S76. The difference dictionary Δ after the process of step S76 is as shown in FIG. In the specific example, since NoDiff is an empty set, there is no entry to be removed from the difference dictionary Δ in step S78. Therefore, the difference dictionary creation unit 112 executes the process of step S80 and outputs the difference dictionary Δ shown in FIG. 10 (1) to the dictionary creation unit 111 and the output unit 114. Other steps in the difference dictionary creation process of FIG. 9 will be described later.

差分辞書作成処理が終了すると、出力部１１４が差分辞書Δを圧縮データの一部として出力する（ステップＳ２４）。次に、辞書作成部１１１は、辞書Ｄｉｃ０と差分辞書Δとに基づいて、辞書Ｄｉｃ１を更新する（ステップＳ２６）。今回は、辞書Ｄｉｃ０が空集合であるため、差分辞書Δのエントリが辞書Ｄｉｃ０に追加され、符号化に使用される辞書Ｄｉｃ１となる（図１０（２））。 When the difference dictionary creation process ends, the output unit 114 outputs the difference dictionary Δ as part of the compressed data (step S24). Next, the dictionary creation unit 111 updates the dictionary Dic1 based on the dictionary Dic0 and the difference dictionary Δ (step S26). Since the dictionary Dic0 is an empty set this time, the entry of the difference dictionary Δ is added to the dictionary Dic0 and becomes the dictionary Dic1 used for encoding (FIG. 10 (2)).

符号化部１１３は、図１０（２）に示す、辞書作成部１１１が作成した辞書を用いて、図１０（３）に示すように、１行目〜３行目までのレコードを符号化する。 Using the dictionary created by the dictionary creation unit 111 shown in FIG. 10 (2), the encoding unit 113 encodes records from the first line to the third line as shown in FIG. 10 (3). .

上述のように符号化されたブロックＢ１は、次に圧縮処理が実行されるブロックの基準ブロックとなる。従って、辞書作成部１１１は、ブロックＢ１の辞書Ｄｉｃ１で、辞書Ｄｉｃ０を初期化する（ステップＳ３２）。初期化された辞書Ｄｉｃ０は、図１１（１）となる。 The block B1 encoded as described above serves as a reference block for the next block to be subjected to compression processing. Accordingly, the dictionary creation unit 111 initializes the dictionary Dic0 with the dictionary Dic1 of the block B1 (step S32). The initialized dictionary Dic0 is as shown in FIG.

次に、データ取得部１１０は、図１１（２）に示す、ブロックＢ２の１行目のレコードを取得する（ステップＳ１６）。新しいブロックの処理を開始するにあたり、ステップＳ１２において辞書Ｄｉｃ１は初期化されて空テーブルとなっており、辞書Ｄｉｃ１には、ＳａｎＦｒａｎｃｉｓｃｏの値を持つエントリが存在しない（ステップＳ５０／ＮＯ）。従って、図６の辞書更新処理により、辞書Ｄｉｃ１には、図１１（３）に示す、ＳａｎＦｒａｎｃｉｓｃｏに符号「１」を割り当てたエントリが新規登録される（ステップＳ５２）。 Next, the data acquisition unit 110 acquires the record in the first row of the block B2 shown in FIG. 11 (2) (step S16). In starting the processing of a new block, the dictionary Dic1 is initialized to be an empty table in step S12, and no entry having the value of San Francisco exists in the dictionary Dic1 (step S50 / NO). Accordingly, the dictionary update process of FIG. 6 newly registers an entry assigned with the code “1” to San Francisco shown in FIG. 11 (3) in the dictionary Dic1 (step S52).

データ取得部１１０は、同様にして、図１１（４）に示す、ブロックＢ２の２行目のレコードを取得する（ステップＳ１６）。辞書Ｄｉｃ１には、ＷａｓｈｉｎｇｔｏｎＤ．Ｃ．の値を持つエントリが存在しないため（ステップＳ５０／ＮＯ）、図１１（５）に示すように、ＷａｓｈｉｎｇｔｏｎＤ．Ｃ．に符号「２」を割り当てたエントリが、辞書Ｄｉｃ１に新規登録される（ステップＳ５２）。 Similarly, the data acquisition unit 110 acquires the record in the second row of the block B2 shown in FIG. 11 (4) (step S16). In the dictionary Dic1, Washington D.C. C. Since there is no entry having the value of (Step S50 / NO), as shown in FIG. C. Is newly registered in the dictionary Dic1 (step S52).

データ取得部１１０は、次に、図１１（６）に示す、ブロックＢ２の３行目のレコードを取得する（ステップＳ１６）。辞書Ｄｉｃ１には、ＮｅｗＹｏｒｋの値を持つエントリが存在しないため（ステップＳ５０／ＮＯ）、図１１（７）に示すように、ＮｅｗＹｏｒｋに符号「３」を割り当てたエントリが、辞書Ｄｉｃ１に新規登録される（ステップＳ５２）。 Next, the data acquisition unit 110 acquires the record in the third row of the block B2 shown in FIG. 11 (6) (step S16). Since there is no entry having the value New York in the dictionary Dic1 (step S50 / NO), as shown in FIG. 11 (7), an entry assigned the code “3” to the New York is new to the dictionary Dic1. Registration is performed (step S52).

ここで、Ｍの値が３となり、所定のレコード数を読み込んだため（ステップＳ２０／ＮＯ）、差分辞書作成部１１２が差分辞書作成処理（ステップＳ２２）を行う。 Here, since the value of M is 3 and a predetermined number of records has been read (step S20 / NO), the difference dictionary creation unit 112 performs a difference dictionary creation process (step S22).

ステップＳ２２では、前述した図９のステップＳ６０〜ステップＳ６６の処理を、差分辞書作成部１１２が実行する。その結果、初期化された差分辞書Δは、図１２（１）となり、Ｄｉｆｆ０は空集合、Ｄｉｆｆ１＝｛ＮｅｗＹｏｒｋ｝、ＮｏＤｉｆｆ＝｛ＳａｎＦｒａｎｃｉｓｃｏ，ＷａｓｈｉｎｇｔｏｎＤ．Ｃ．｝となる。 In step S22, the difference dictionary creation unit 112 executes the processing of steps S60 to S66 of FIG. 9 described above. As a result, the initialized difference dictionary Δ is as shown in FIG. 12 (1), where Diff0 is an empty set, Diff1 = {New York}, NoDiff = {San Francisco, Washington D.D. C. }.

Ｄｉｆｆ１が空集合ではなく（ステップＳ６８／ＮＯ）、Ｄｉｆｆ０が空集合であるため（ステップＳ７０／ＹＥＳ）、Ｄｉｆｆ１の要素が差分辞書Δに新規登録される（ステップＳ７６）。その結果、具体例において、ステップＳ７６の処理実行後の差分辞書Δは図１２（２）となる。 Since Diff1 is not an empty set (step S68 / NO) and Diff0 is an empty set (step S70 / YES), the element of Diff1 is newly registered in the difference dictionary Δ (step S76). As a result, in the specific example, the difference dictionary Δ after the process of step S76 is as shown in FIG.

次に、差分辞書作成部１１２は、ステップＳ７８の処理を実行する。具体例では、ＮｏＤｉｆｆの要素（ＳａｎＦｒａｎｃｉｓｃｏとＷａｓｈｉｎｇｔｏｎＤ．Ｃ．）を値に持つエントリが、現在の差分辞書Δに存在する。従って、差分辞書作成部１１２は、差分辞書Δから、（ＳａｎＦｒａｎｃｉｓｃｏとＷａｓｈｉｎｇｔｏｎＤ．Ｃ．）を値に持つエントリを削除し（ステップＳ７８）、差分辞書Δを出力部１１４及び辞書作成部１１１に出力する（ステップＳ８０）。出力される差分辞書Δは、図１２（３）となる。 Next, the differential dictionary creation unit 112 executes the process of step S78. In the specific example, an entry having a value of NoDiff elements (San Francisco and Washington DC) exists in the current difference dictionary Δ. Therefore, the difference dictionary creating unit 112 deletes the entry having (San Francisco and Washington DC) as values from the difference dictionary Δ (step S78), and the difference dictionary Δ is stored in the output unit 114 and the dictionary creating unit 111. Output (step S80). The output difference dictionary Δ is as shown in FIG.

次に、辞書作成部１１１は、図１３（１）に示す辞書Ｄｉｃ０と、図１３（２）に示す差分辞書Δとに基づいて、辞書Ｄｉｃ１を更新する（ステップＳ２６）。この場合、差分辞書Δに含まれる符号「３」は、辞書Ｄｉｃ０には登録されていない符号であるので、辞書Ｄｉｃ０に差分辞書Δのエントリを追加したものを、辞書Ｄｉｃ１とする。その結果、辞書Ｄｉｃ１は図１３（３）となる。 Next, the dictionary creation unit 111 updates the dictionary Dic1 based on the dictionary Dic0 shown in FIG. 13 (1) and the difference dictionary Δ shown in FIG. 13 (2) (step S26). In this case, since the code “3” included in the difference dictionary Δ is a code that is not registered in the dictionary Dic0, a dictionary Dic1 is obtained by adding the entry of the difference dictionary Δ to the dictionary Dic0. As a result, the dictionary Dic1 is as shown in FIG.

符号化部１１３は、図１３（３）の辞書を用いて、ブロックＢ２のデータを符号化する（ステップＳ２８）。符号化されたデータは図１３（４）となる。 The encoding unit 113 encodes the data of the block B2 using the dictionary of FIG. 13 (3) (step S28). The encoded data is as shown in FIG.

ブロックＢ２は、次に圧縮処理が実行されるブロックの基準ブロックとなる。従って、辞書作成部１１１は、ブロックＢ２の辞書Ｄｉｃ１で、辞書Ｄｉｃ０を初期化する（ステップＳ３２）。初期化された辞書Ｄｉｃ０は、図１４（１）となる。 The block B2 serves as a reference block for the next block to be subjected to compression processing. Accordingly, the dictionary creation unit 111 initializes the dictionary Dic0 with the dictionary Dic1 of the block B2 (step S32). The initialized dictionary Dic0 is as shown in FIG.

次に、データ取得部１１０は、図１４（２）に示す、ブロックＢ３の１行目のレコードを取得する（ステップＳ１６）。新しいブロックの処理を開始するにあたり、辞書Ｄｉｃ１は初期化されて空テーブルとなっているため、辞書Ｄｉｃ１には、Ｃｈｉｃａｇｏの値を持つエントリは存在しない（ステップＳ５０／ＮＯ）。従って、辞書更新処理により、辞書Ｄｉｃ１には、図１４（３）に示す、Ｃｈｉｃａｇｏに符号「１」を割り当てたエントリが新規登録される（ステップＳ５２）。 Next, the data acquisition unit 110 acquires the record in the first row of the block B3 shown in FIG. 14 (2) (step S16). When starting the processing of a new block, since the dictionary Dic1 is initialized to be an empty table, there is no entry having a Chicago value in the dictionary Dic1 (step S50 / NO). Therefore, by the dictionary update process, an entry assigned with code “1” to Chicago shown in FIG. 14 (3) is newly registered in the dictionary Dic1 (step S52).

データ取得部１１０は、図１３（４）に示す、ブロックＢ３の２行目のレコードを取得する（ステップＳ１６）。辞書Ｄｉｃ１には、ＳａｎＦｒａｎｃｉｓｃｏの値を持つエントリが存在しないため（ステップＳ５０／ＮＯ）、図１４（５）に示すように、ＳａｎＦｒａｎｃｉｓｃｏに符号「２」を割り当てたエントリが辞書Ｄｉｃ１に新規登録される（ステップＳ５２）。 The data acquisition unit 110 acquires the record in the second row of the block B3 shown in FIG. 13 (4) (step S16). Since there is no entry having the value of San Francisco in the dictionary Dic1 (step S50 / NO), as shown in FIG. 14 (5), an entry assigned the code “2” to the San Francisco is newly registered in the dictionary Dic1. (Step S52).

データ取得部１１０は、次に、図１４（６）に示す、ブロックＢ３の３行目のレコードを取得する（ステップＳ１６）。辞書Ｄｉｃ１には、ＳａｎＦｒａｎｃｉｓｃｏの値を持つエントリが既に存在するので（ステップＳ５０／ＹＥＳ）、辞書Ｄｉｃ１は変更されず、図１４（５）のままである。 Next, the data acquisition unit 110 acquires the record in the third row of the block B3 shown in FIG. 14 (6) (step S16). Since the dictionary Dic1 already has an entry having the value of San Francisco (step S50 / YES), the dictionary Dic1 is not changed and remains as shown in FIG. 14 (5).

ステップＳ２２では、前述した図９のステップＳ６０〜ステップＳ６６の処理を、差分辞書作成部１１２が実行する。その結果、具体例において、初期化された差分辞書Δは図１５（１）となる。また、Ｄｉｆｆ０＝｛ＷａｓｈｉｎｇｔｏｎＤ．Ｃ．，ＮｅｗＹｏｒｋ｝、Ｄｉｆｆ１＝｛Ｃｈｉｃａｇｏ｝、ＮｏＤｉｆｆ＝｛ＳａｎＦｒａｎｃｉｓｃｏ｝となる。具体例において、Ｄｉｆｆ１は空集合ではなく（ステップＳ７０／ＮＯ）、Ｄｉｆｆ０も空集合ではない（ステップＳ７０／ＮＯ）。 In step S22, the difference dictionary creation unit 112 executes the processing of steps S60 to S66 of FIG. 9 described above. As a result, in the specific example, the initialized difference dictionary Δ is as shown in FIG. Also, Diff0 = {Washington D. C. , New York}, Diff1 = {Chicago}, NoDiff = {San Francisco}. In the specific example, Diff1 is not an empty set (step S70 / NO), and Diff0 is not an empty set (step S70 / NO).

図９のフローチャートにおいて、差分辞書作成部１１２は、Ｄｉｆｆ０が空集合ではない場合（ステップＳ７０／ＮＯ）、集合Ｄｉｆｆ０の要素ｄ０、集合Ｄｉｆｆ１の要素ｄ１を取得し、差分辞書Δにおけるｄ０をｄ１に置換する。そして、差分辞書作成部１１２は、Ｄｉｆｆ０から要素ｄ０を、Ｄｉｆｆ１から要素ｄ１をそれぞれ削除する（ステップＳ７２）。その後、差分辞書作成部１１２は、ステップＳ６８に戻り処理を継続する。 In the flowchart of FIG. 9, when Diff0 is not an empty set (step S70 / NO), the difference dictionary creation unit 112 acquires the element d0 of the set Diff0 and the element d1 of the set Diff1, and sets d0 in the difference dictionary Δ to d1. Replace. Then, the difference dictionary creation unit 112 deletes the element d0 from Diff0 and the element d1 from Diff1 (step S72). Thereafter, the differential dictionary creation unit 112 returns to step S68 and continues the processing.

具体例で、差分辞書Δにおいて、Ｄｉｆｆ０の要素であるＷａｓｈｉｎｇｔｏｎＤ．Ｃ．が、Ｄｉｆｆ１の要素であるＣｈｉｃａｇｏで置換される。また、Ｄｉｆｆ０からＷａｓｈｉｎｇｔｏｎＤ．Ｃ．が削除され、Ｄｉｆｆ１からＣｈｉｃａｇｏが削除される。その結果、ステップＳ７２実行後の差分辞書Δは、図１５（２）となり、また、Ｄｉｆｆ０＝｛ＮｅｗＹｏｒｋ｝、Ｄｉｆｆ１は空集合となる。Ｄｉｆｆ１が空集合であるとは、基準ブロックの辞書Ｄｉｃ０に存在するが処理対象ブロックには出現しない文字列の数が、辞書Ｄｉｃ０には存在しないが処理対象ブロックには出現する文字列の数を超えているということである。 In a specific example, in the difference dictionary Δ, Washington D. which is an element of Diff0. C. Is replaced with Chicago, which is an element of Diff1. Also, from Diff0 to Washington D.C. C. Are deleted, and Chicago is deleted from Diff1. As a result, the difference dictionary Δ after execution of step S72 is as shown in FIG. 15 (2), and Diff0 = {New York} and Diff1 is an empty set. Diff1 is an empty set means that the number of character strings that exist in the reference block dictionary Dic0 but do not appear in the processing target block is the number of character strings that do not exist in the dictionary Dic0 but appear in the processing target block. It is exceeding.

そこで、図９のフローチャートにおいて、差分辞書作成部１１２は、Ｄｉｆｆ１が空集合の場合（ステップＳ６８／ＹＥＳ）、差分辞書Δにおいて、Ｄｉｆｆ０の要素と一致する値を全てＮＵＬＬに置換する（ステップＳ７４）。次に、差分辞書作成部１１２は、Ｎｏｄｉｆｆの要素と一致する値を持つエントリを差分辞書Δから全て除去し（ステップＳ７８）、差分辞書Δを出力部１１４及び辞書作成部１１１出力する（ステップＳ８０）。 Therefore, in the flowchart of FIG. 9, when Diff1 is an empty set (step S68 / YES), the difference dictionary creation unit 112 replaces all values that match the elements of Diff0 with NULL in the difference dictionary Δ (step S74). . Next, the difference dictionary creation unit 112 removes all entries having values that match the elements of Nodiff from the difference dictionary Δ (step S78), and outputs the difference dictionary Δ to the output unit 114 and the dictionary creation unit 111 (step S80). ).

具体例においては、差分辞書Δにおいて、Ｄｉｆｆ０の要素であるＮｅｗＹｏｒｋがＮＵＬＬに置換される（ステップＳ７４）。この処理の結果、ステップＳ７４実行後の差分辞書Δは図１５（３）となる。そして、差分辞書作成部１１２は、ステップＳ７８において、現在の差分辞書ΔからＮｏＤｉｆｆの要素（ＳａｎＦｒａｎｃｉｓｃｏ）の値を持つエントリを除去する。その結果、ステップＳ８０で出力される差分辞書Δは、図１５（４）となる。 In the specific example, New York, which is an element of Diff0, is replaced with NULL in the difference dictionary Δ (step S74). As a result of this processing, the difference dictionary Δ after execution of step S74 is as shown in FIG. In step S78, the difference dictionary creating unit 112 removes an entry having a NoDiff element (San Francisco) value from the current difference dictionary Δ. As a result, the difference dictionary Δ output in step S80 is as shown in FIG.

次に、辞書作成部１１１は、図１６（１）に示す辞書Ｄｉｃ０と、図１６（２）に示す差分辞書Δとに基づいて、ブロックＢ３の符号化に用いる辞書Ｄｉｃ１を更新する（ステップＳ２６）。辞書Ｄｉｃ０と差分辞書Δとは、符号「２」及び「３」が重複するので、辞書Ｄｉｃ０において、符号「２」及び「３」が割り当てられている文字列を、差分辞書Δの文字列で上書きし、辞書Ｄｉｃ１を更新する。その結果、辞書Ｄｉｃ１は、図１６（３）となる。 Next, the dictionary creation unit 111 updates the dictionary Dic1 used for encoding the block B3 based on the dictionary Dic0 shown in FIG. 16 (1) and the difference dictionary Δ shown in FIG. 16 (2) (step S26). ). In the dictionary Dic0 and the difference dictionary Δ, the codes “2” and “3” overlap. Therefore, in the dictionary Dic0, the character strings to which the codes “2” and “3” are assigned are the character strings of the difference dictionary Δ. Overwrite and update the dictionary Dic1. As a result, the dictionary Dic1 is as shown in FIG.

符号化部１１３は、図１６（３）に示される辞書Ｄｉｃ１を用いて、ブロックＢ３のデータを符号化する（ステップＳ２８）。符号化されたデータは、図１６（４）となる。 The encoding unit 113 encodes the data of the block B3 using the dictionary Dic1 shown in FIG. 16 (3) (step S28). The encoded data is as shown in FIG.

このようにして、図７に示す被圧縮データは、図１７に示すように、ブロック毎に、差分辞書Δと符号化データとを有する圧縮データとなり出力される。 In this way, the compressed data shown in FIG. 7 is output as compressed data having the difference dictionary Δ and the encoded data for each block, as shown in FIG.

以上の説明から明らかなように、本実施例に係るデータ圧縮装置は、ブロックにおいて、処理対象ブロックに出現する文字列のうち、基準辞書に登録されていない文字列と、基準辞書において処理対象ブロックに出現しない文字列に対応付けられた符号とを対応付けた辞書データである差分辞書を生成する。これにより、ブロックストリーム処理においてテキストデータを圧縮する場合、ブロック毎に差分辞書と符号化データとを含む圧縮データを作成し、かつ差分辞書で文字列に割り当てる符号を再利用して、圧縮率を向上させることができる。 As is apparent from the above description, the data compression apparatus according to the present embodiment includes a character string that is not registered in the reference dictionary among character strings that appear in the processing target block, and a processing target block in the reference dictionary. A difference dictionary, which is dictionary data associated with a code associated with a character string that does not appear in is generated. As a result, when compressing text data in block stream processing, compressed data including a difference dictionary and encoded data is created for each block, and a code assigned to a character string in the difference dictionary is reused to reduce the compression rate. Can be improved.

ここで、図１８及び図１９を用いて、比較例１及び２と、本実施例とによる被圧縮データの圧縮率を比較する。図１８（１）は、本説明で使用する被圧縮データを示す。本説明では、４レコードを１ブロックとして処理することとする。 Here, the compression ratios of the data to be compressed according to the first and second comparative examples and the present embodiment will be compared using FIGS. FIG. 18 (1) shows the compressed data used in this description. In this description, four records are processed as one block.

まず、比較例１による圧縮データの作成について説明する。図１８（２）は、比較例１により被圧縮データを圧縮した場合のデータ例である。比較例１は、ブロック毎に、ブロックに出現する文字列と、符号とを対応付けた辞書を登録して、圧縮データを作成する。また、ブロックが変わる都度、符号を１から採番し直す。 First, creation of compressed data according to Comparative Example 1 will be described. FIG. 18B is a data example when the data to be compressed is compressed according to the first comparative example. In Comparative Example 1, for each block, a dictionary in which character strings appearing in the block are associated with codes is registered to create compressed data. Also, each time the block changes, the code is renumbered from 1.

具体的に、図１８（１）のデータを用いて説明する。図１８（１）に示す被圧縮データのブロックＢ１において、ブロックに出現する文字列は、ＳａｎＦｒａｎｃｉｓｃｏ、ＷａｓｈｉｎｇｔｏｎＤ．Ｃ．、ＬｏｓＡｎｇｅｌｓ、ＮｅｗＯｒｌｅａｎｓの４つである。従って、比較例１では、上記４つの文字列と符号とを対応付けた辞書をブロックＢ１の辞書として登録し、図１８（２）に示すように、辞書と符号化データとを含むブロックＢ１の圧縮データを作成する。 Specifically, description will be made with reference to the data in FIG. In the block B1 of the data to be compressed shown in FIG. C. , Los Angeles, and New Orleans. Therefore, in Comparative Example 1, a dictionary in which the above four character strings and codes are associated with each other is registered as a dictionary of the block B1, and as shown in FIG. 18 (2), the block B1 including the dictionary and the encoded data is registered. Create compressed data.

次に、比較例１では、ブロックＢ２において、ブロックに出現するＷａｓｈｉｎｇｔｏｎＤ．Ｃ．、ＮｅｗＹｏｒｋ、Ｃｈｉｃａｇｏ、及びＬｏｓＡｎｇｅｌｓの４つの文字列と、符号とを対応付けた辞書をブロックＢ２の辞書として登録する。また、ブロックＢ３では、ブロックに出現するＷａｓｈｉｎｇｔｏｎＤ．Ｃ．、ＮｅｗＹｏｒｋ、Ｃｈｉｃａｇｏ、及びＬｏｓＡｎｇｅｌｓの４つの文字列と、符号とを対応付けた辞書をブロックＢ３の辞書として登録する。そして、ブロック毎に登録された辞書を用いて、被圧縮データを符号化する。その結果、比較例１の圧縮データは、図１８（２）となる。 Next, in Comparative Example 1, in the block B2, the Washington D.D. C. , New York, Chicago, and Los Angels, and a dictionary in which codes are associated with each other, are registered as a dictionary of the block B2. In block B3, Washington D. that appears in the block. C. , New York, Chicago, and Los Angels, and a dictionary in which codes are associated with each other, are registered as a dictionary of the block B3. Then, the compressed data is encoded using a dictionary registered for each block. As a result, the compressed data of Comparative Example 1 is as shown in FIG.

次に、比較例２による圧縮データの作成について説明する。図１９（１）は、比較例２により被圧縮データを圧縮した場合のデータ例である。比較例２は、ブロック毎に、本実施例と同様の差分辞書を作成するが、差分辞書に登録される文字列に、新たに採番した符号を割り当てる。すなわち、比較例２は、基準ブロックには出現するが処理対象ブロックには出現しない文字列に対応付けられていた符号の再利用は行わない。 Next, creation of compressed data according to Comparative Example 2 will be described. FIG. 19A is a data example when the data to be compressed is compressed according to the second comparative example. In Comparative Example 2, a difference dictionary similar to that of the present embodiment is created for each block, but a newly assigned code is assigned to a character string registered in the difference dictionary. That is, in Comparative Example 2, the code associated with the character string that appears in the reference block but does not appear in the processing target block is not reused.

比較例２について、具体的に説明する。比較例２において、ブロックＢ１は最初の処理対象ブロックであるため、基準ブロックの辞書が存在しない。従って、ブロックＢ１については、比較例２においても、比較例１と同様のエントリを持つ差分辞書が登録される。次に、ブロックＢ２において、比較例２はブロックＢ１の辞書には登録されていないがブロックＢ２には出現する文字列に対して、差分辞書を登録する。つまり、ＮｅｗＹｏｒｋ、及びＣｈｉｃａｇｏの文字列を差分辞書に登録する。ここで、比較例２では、差分辞書に登録された文字列に、新たに採番した符号を割り当てる。ブロックＢ１において、符号は「４」まで使用されているので、比較例２は、新規の符号「５」及び「６」をＮｅｗＹｏｒｋ及びＣｈｉｃａｇｏにそれぞれ割り当てる。ブロックＢ３では、ブロックＢ２の符号化に使用された辞書には存在しないがブロックＢ３には出現する文字列に対して、差分辞書を登録する。つまり、ＬａｓＶｅｇａｓ及びＭｅｘｉｃｏＣｉｔｙの文字列を差分辞書に登録する。そして、ブロックＢ２の差分辞書を作成する際に、符号を「６」まで採番済みであるので、ＬａｓＶｅｇａｓ及びＭｅｘｉｃｏＣｉｔｙには、新規の符号「７」及び「８」をそれぞれ割り当てる。以上のようにして、比較例２は、図１９（１）に示すように、差分辞書と符号化データとを含む圧縮データをブロック毎に作成する。 Comparative Example 2 will be specifically described. In Comparative Example 2, since the block B1 is the first processing target block, there is no reference block dictionary. Therefore, a difference dictionary having the same entry as that of the comparative example 1 is registered in the comparative example 2 for the block B1. Next, in block B2, comparative example 2 registers a difference dictionary for character strings that are not registered in the dictionary of block B1 but appear in block B2. That is, the character strings of New York and Chicago are registered in the difference dictionary. Here, in Comparative Example 2, a newly assigned code is assigned to the character string registered in the difference dictionary. Since the codes up to “4” are used in the block B1, the comparative example 2 assigns new codes “5” and “6” to New York and Chicago, respectively. In block B3, a difference dictionary is registered for a character string that does not exist in the dictionary used for encoding block B2 but appears in block B3. That is, the character strings of Las Vegas and Mexico City are registered in the difference dictionary. When the difference dictionary for the block B2 is created, since the codes are already numbered up to “6”, new codes “7” and “8” are assigned to Las Vegas and Mexico City, respectively. As described above, the second comparative example creates compressed data including a difference dictionary and encoded data for each block, as shown in FIG.

図１９（２）は、本実施例により図１８（１）の被圧縮データを圧縮した場合のデータ例である。本実施例において、ブロックＢ１は最初の処理対象ブロックであるため、基準ブロックの辞書が存在しない。従って、ブロックＢ１については、実施例においても、比較例１及び２と同様のエントリを持つ差分辞書が登録される。次に、ブロックＢ２において、実施例はブロックＢ１の辞書には登録されていないがブロックＢ２には出現する文字列に対して、差分辞書を登録する。つまり、ＮｅｗＹｏｒｋ、及びＣｈｉｃａｇｏの文字列を差分辞書に登録する。そして、実施例では、ブロックＢ１の辞書において、ブロックＢ１の辞書には登録されているがブロックＢ２には出現しない文字列に割り当てられている符号を再利用する。図１９（２）の例では、ブロックＢ２には出現しないＳａｎＦｒａｎｃｉｓｃｏ及びＮｅｗＯｒｌｅａｎｓに対して、ブロックＢ１の辞書で割り当てられている符号「１」及び「４」を再利用し、ＮｅｗＹｏｒｋ、及びＣｈｉｃａｇｏに対して割り当てる。また、ブロックＢ３では、ブロックＢ３には出現しないＬｏｓＡｎｇｅｌｓ及びＣｈｉｃａｇｏに対して、ブロックＢ２を符号化した辞書で割り当てられている符号「３」及び「４」を再利用し、ＬａｓＶｅｇａｓ及びＭｅｘｉｃｏＣｉｔｙに割り当てる。以上のようにして、本実施例は、図１９（２）に示すように、差分辞書と符号化データとを含む圧縮データをブロック毎に作成する。 FIG. 19B is a data example when the compressed data of FIG. 18A is compressed according to the present embodiment. In this embodiment, since the block B1 is the first processing target block, there is no reference block dictionary. Therefore, for the block B1, a difference dictionary having the same entries as those in the comparative examples 1 and 2 is registered in the embodiment. Next, in block B2, in the embodiment, a difference dictionary is registered for a character string that is not registered in the dictionary of block B1 but appears in block B2. That is, the character strings of New York and Chicago are registered in the difference dictionary. In the embodiment, the code assigned to the character string registered in the dictionary of the block B1 but not appearing in the block B2 is reused in the dictionary of the block B1. In the example of FIG. 19 (2), codes “1” and “4” assigned in the dictionary of block B1 are reused for San Francisco and New Orleans that do not appear in block B2, and New York, and Assign to Chicago. Also, in block B3, codes “3” and “4” assigned in the dictionary encoding block B2 are reused for Los Angels and Chicago that do not appear in block B3, and Las Vegas and Mexico City are used. Assign to. As described above, in this embodiment, as shown in FIG. 19B, compressed data including the difference dictionary and the encoded data is created for each block.

次に、比較例１及び２と、本実施例との圧縮率を比較する。 Next, the compression ratios of Comparative Examples 1 and 2 and this example are compared.

比較例１では、上述したように、ブロック毎に、ブロックに出現する文字列と符号とを対応付けた辞書を作成する。従って、図１８（２）に示すように、ブロック毎に登録された辞書間で重複した値を含むこととなる。例えば、ブロックＢ１の辞書と、ブロックＢ２の辞書とでは、ＷａｓｈｉｎｇｔｏｎＤ．Ｃ．とＬｏｓＡｎｇｅｌｓとが重複している。また、ブロックＢ２の辞書とブロックＢ３の辞書とでは、ＷａｓｈｉｎｇｔｏｎＤ．Ｃ．とＮｅｗＹｏｒｋとが重複している。このため、比較例１では、辞書全体のサイズが大きくなり、圧縮率の悪化を引き起こす。 In Comparative Example 1, as described above, for each block, a dictionary in which a character string appearing in the block is associated with a code is created. Therefore, as shown in FIG. 18 (2), duplicate values are included between the dictionaries registered for each block. For example, in the dictionary of block B1 and the dictionary of block B2, Washington D.C. C. And Los Angels overlap. In addition, in the dictionary of block B2 and the dictionary of block B3, Washington D.C. C. And New York overlap. For this reason, in the comparative example 1, the size of the whole dictionary becomes large, and the compression rate is deteriorated.

比較例２では、ブロックＢ２の辞書には、ブロックＢ１で出現する文字列は登録されず、ブロックＢ２で初めて出現した文字列のみが差分辞書に登録される。従って、比較例１と比較して、辞書サイズを小さくすることができる。 In Comparative Example 2, the character string that appears in block B1 is not registered in the dictionary of block B2, and only the character string that first appears in block B2 is registered in the difference dictionary. Therefore, the dictionary size can be reduced as compared with Comparative Example 1.

しかし、比較例２では、基準ブロックには出現しないが処理対象ブロックでは出現する文字列に、新たに採番した符号を割り当てる。従って、比較例１のようにブロック毎に辞書を出力する場合と比べて、符号数が増加してしまい、符号の符号長が長くなってしまう恐れがある。そこで、比較例２のように、符号を新たに採番する場合と、本実施例のように再利用する場合との間の圧縮率の違いについて説明する。 However, in Comparative Example 2, a newly assigned code is assigned to a character string that does not appear in the reference block but appears in the processing target block. Therefore, compared to the case of outputting a dictionary for each block as in Comparative Example 1, the number of codes may increase, and the code length of the codes may become long. Therefore, the difference in compression rate between the case where a code is newly assigned as in Comparative Example 2 and the case where the code is reused as in this embodiment will be described.

被圧縮データＤを、Ｎ個のブロックに分割したとする（Ｄ＝Ｂ１・Ｂ２・Ｂ３・・・ＢＮ）。すると、比較例２の場合、つまり、符号を再利用しない場合に必要な符号の総数は、最大で被圧縮データＤに含まれるエントリの数となる。一方、本実施例のように、符号を再利用する場合、必要な符号の総数は、最大で、ブロックＢｉ（ｉ＝１〜Ｎ）に含まれるエントリの数となる。 It is assumed that the compressed data D is divided into N blocks (D = B1, B2, B3... BN). Then, in the case of the comparative example 2, that is, when the code is not reused, the total number of codes required is the maximum number of entries included in the compressed data D. On the other hand, when codes are reused as in this embodiment, the total number of necessary codes is the maximum number of entries included in the block Bi (i = 1 to N).

符号化に整数を用いると仮定すると、比較例２の場合、符号が２５６種類までならば、符号長１バイトで符号化できるが、それ以上の場合、６５５３６種類までは符号長２バイトが必要となり、それ以上となると符号長はさらに長くなる。一方、本実施例の場合、必要な符号の総数は、最大でブロックＢｉに含まれるエントリ数であるため、１ブロックに含まれるレコード数を２５６以下とすれば、符号長が１バイトを超えることはない。 Assuming that integers are used for encoding, in the case of Comparative Example 2, if the number of codes is up to 256, encoding can be performed with a code length of 1 byte, but if it is more than that, code length of 2 bytes is required for 65536 types. If it is longer than that, the code length is further increased. On the other hand, in the case of the present embodiment, the total number of necessary codes is the maximum number of entries included in the block Bi. Therefore, if the number of records included in one block is 256 or less, the code length exceeds 1 byte. There is no.

図２０は、比較例１、比較例２、及び本実施例によってデータを圧縮した場合の、圧縮データのファイルサイズを模式的に示した図である。図２０中、ハッチングを施した部分は辞書データを表し、ハッチングを施していない部分は、符号化データを表す。図２０からもわかるように、比較例１と比較例２とでは、比較例２の方が、辞書サイズが小さくなるため、圧縮データサイズも小さくなる。 FIG. 20 is a diagram schematically illustrating the file size of the compressed data when the data is compressed according to Comparative Example 1, Comparative Example 2, and the present embodiment. In FIG. 20, the hatched portion represents dictionary data, and the unhatched portion represents encoded data. As can be seen from FIG. 20, in Comparative Example 1 and Comparative Example 2, since the dictionary size is smaller in Comparative Example 2, the compressed data size is also smaller.

また、比較例２と本実施例とでは、本実施例の方が、符号を再利用することのより、符号長が長くなる可能性を低減できるため、比較例２と比較して更に圧縮データサイズを小さくすることができる。つまり、本実施例では、差分辞書を用いることで、辞書に登録されるエントリ数を削減できるため、辞書データのファイルサイズを削減できる。更に、符号を再利用することによって、符号長を短くして、符号に必要なデータサイズを削減できるため、更に圧縮率を向上できる。
［データ伸長装置］ Further, in the comparative example 2 and the present example, since the possibility of the code length becoming longer can be reduced in the present example than by reusing the code, the compressed data is further compared with the comparative example 2. The size can be reduced. That is, in the present embodiment, by using the differential dictionary, the number of entries registered in the dictionary can be reduced, so that the file size of dictionary data can be reduced. Furthermore, by reusing the code, the code length can be shortened and the data size required for the code can be reduced, so that the compression rate can be further improved.
[Data decompression device]

次に、本件に係るデータ伸長装置について説明する。図２１は、本件に係るデータ伸長装置を含むデータ伸長システムの一構成例を示している。 Next, a data decompression apparatus according to the present case will be described. FIG. 21 shows a configuration example of a data decompression system including a data decompression apparatus according to the present case.

データ伸長装置２００は、ネットワーク４０を介して、記憶装置１０、及びデータ処理装置３０と接続している。 The data decompression device 200 is connected to the storage device 10 and the data processing device 30 via the network 40.

記憶装置１０は、伸長対象のデータ（被伸長データと記載する）を格納している。また、データ伸長装置２００は、伸長されたデータを格納する。 The storage device 10 stores data to be decompressed (described as decompressed data). The data decompression device 200 stores decompressed data.

データ処理装置３０は、データ伸長装置２００に被伸長データを送信する。また、データ伸長装置２００が伸長したデータを受信する。 The data processing device 30 transmits the decompressed data to the data decompression device 200. Further, the data decompression apparatus 200 receives the decompressed data.

データ伸長装置２００のハードウェア構成は、データ圧縮装置１００のハードウェア構成と同様であるため、説明を省略する。ただし、データ伸長装置２００の場合、ＲＯＭ１０２は、被伸長データを伸長するためのプログラムを格納する。また、ＲＯＭ１０２に格納されたプログラムのＣＰＵ１０３による演算によって、図２２に示すデータ伸長装置２００が有する機能が実現される。 Since the hardware configuration of the data decompression device 200 is the same as the hardware configuration of the data compression device 100, description thereof will be omitted. However, in the case of the data decompression apparatus 200, the ROM 102 stores a program for decompressing data to be decompressed. Further, the functions of the data decompression apparatus 200 shown in FIG. 22 are realized by the calculation by the CPU 103 of the program stored in the ROM 102.

次に、図２２を用いて、データ伸長装置２００が有する機能を実現する手段について説明する。図２２は、データ伸長装置２００の機能ブロック図の一例である。 Next, means for realizing the functions of the data decompression apparatus 200 will be described with reference to FIG. FIG. 22 is an example of a functional block diagram of the data decompression apparatus 200.

図２２に示すように、データ伸長装置２００は、ブロック取得部２０１と、差分辞書及び圧縮データの入力を受け付ける入力部としての機能を実現するデータ取得部２０２及び差分辞書取得部２０３と、辞書復元部２０４と、復号部２０５と、出力部２０６とを備える。 As illustrated in FIG. 22, the data decompression device 200 includes a block acquisition unit 201, a data acquisition unit 202 and a difference dictionary acquisition unit 203 that realize a function as an input unit that receives input of a differential dictionary and compressed data, and a dictionary restoration Unit 204, decoding unit 205, and output unit 206.

ブロック取得部２０１は、記憶装置１０から取得した、あるいは、データ処理装置３０から受信した、複数のブロックで構成された被伸長データを所定の規則に基づき複数のブロックに分割する分割部としての機能を有する。ブロック取得部２０１は、被伸長データから伸長処理の対象となるブロック（処理対象ブロック）を取得する。被伸長データを構成する各ブロックは、差分辞書と符号化データとを有する。 The block acquisition unit 201 functions as a division unit that divides data to be decompressed, which is acquired from the storage device 10 or received from the data processing device 30 and configured by a plurality of blocks, into a plurality of blocks based on a predetermined rule Have The block acquisition unit 201 acquires a block (processing target block) that is a target of the decompression process from the decompressed data. Each block constituting the decompressed data has a difference dictionary and encoded data.

データ取得部２０２は、ブロック取得部２０１が取得した処理対象ブロックから、符号化データを取得する。 The data acquisition unit 202 acquires encoded data from the processing target block acquired by the block acquisition unit 201.

差分辞書取得部２０３は、処理対象ブロックから、差分辞書を取得する。 The difference dictionary acquisition unit 203 acquires a difference dictionary from the processing target block.

辞書復元部２０４（処理対象辞書生成部）は、差分辞書取得部２０３が取得した差分辞書と、処理対象ブロックとは異なる基準ブロックの復号に使用した辞書とに基づいて、処理対象ブロックの符号化データを作成した際に使用された辞書（処理対象辞書）を復元する。本実施例では、連続する２つのブロックにおいて、前方のブロックを基準ブロックとし、後方のブロックを処理対象ブロックとする。 The dictionary restoration unit 204 (processing target dictionary generation unit) encodes the processing target block based on the difference dictionary acquired by the difference dictionary acquisition unit 203 and the dictionary used for decoding the reference block different from the processing target block. Restore the dictionary (processing dictionary) used when creating the data. In this embodiment, in two consecutive blocks, the front block is set as a reference block, and the back block is set as a processing target block.

復号部２０５は、辞書復元部２０４が復元した辞書を用いて、データ取得部２０２が取得した符号化データを復号する。 The decoding unit 205 decodes the encoded data acquired by the data acquisition unit 202 using the dictionary restored by the dictionary restoration unit 204.

出力部２０６は、復号部２０５が復号したブロックのデータを、記憶装置１０に格納する、または、データ処理装置３０に送信する。 The output unit 206 stores the block data decoded by the decoding unit 205 in the storage device 10 or transmits the data to the data processing device 30.

次に、データ伸長装置２００が実行する伸長処理について説明する。図２３は、データ伸長装置２００が実行する伸長処理の一例を示すフローチャートである。 Next, decompression processing executed by the data decompression apparatus 200 will be described. FIG. 23 is a flowchart illustrating an example of decompression processing executed by the data decompression apparatus 200.

まず、辞書復元部２０４は、基準ブロックの辞書Ｄｉｃを空テーブルで初期化する（ステップＳ１００）。被伸長データの最初のブロックが処理対象ブロックである場合、基準ブロックが存在しないためである。 First, the dictionary restoration unit 204 initializes the dictionary Dic of the reference block with an empty table (step S100). This is because there is no reference block when the first block of the decompressed data is a processing target block.

ブロック取得部２０１は、被伸長データに処理対象となるブロックが存在するか否かを判定する（ステップＳ１０２）。ブロック取得部２０１は、ブロックが存在する場合には（ステップＳ１０２／ＹＥＳ）、ブロックを取得する（ステップＳ１０４）。ブロック取得部２０１は、ブロックが存在しない場合には（ステップＳ１０２／ＮＯ）、本処理を終了する。 The block acquisition unit 201 determines whether there is a block to be processed in the decompressed data (step S102). If there is a block (step S102 / YES), the block acquisition unit 201 acquires the block (step S104). If the block does not exist (step S102 / NO), the block acquisition unit 201 ends this process.

ブロック取得部２０１がブロックを取得すると（ステップＳ１０４）、差分辞書取得部２０３が取得されたブロックに含まれる差分辞書Δを取得する（ステップＳ１０６）。 When the block acquisition unit 201 acquires a block (step S104), the difference dictionary acquisition unit 203 acquires a difference dictionary Δ included in the acquired block (step S106).

次に、辞書復元部２０４が、処理対象ブロックの復号に使用される辞書Ｄｉｃ１の復元処理を実行する（ステップＳ１０８）。辞書復元処理の詳細は後述する。 Next, the dictionary restoration unit 204 executes a restoration process of the dictionary Dic1 used for decoding the processing target block (step S108). Details of the dictionary restoration processing will be described later.

復元部２０５は、ステップＳ１０８の処理で復元された辞書Ｄｉｃ１を用いて、符号化データを復号する（ステップＳ１１０）。 The restoration unit 205 decodes the encoded data using the dictionary Dic1 restored in the process of step S108 (step S110).

出力部２０６は、復号されたデータを出力する（ステップＳ１１２）。次に、辞書復元部２０４が、処理対象ブロックの復号に使用した辞書Ｄｉｃ１を使用して、辞書Ｄｉｃを初期化する（ステップＳ１１４）。本実施例では、基準ブロックと処理対象ブロックとは連続する２つのブロックであるため、ステップＳ１１０で復号に使用した辞書Ｄｉｃ１が、次の処理対象ブロックに対する基準ブロックの辞書Ｄｉｃとなるからである。 The output unit 206 outputs the decoded data (step S112). Next, the dictionary restoration unit 204 initializes the dictionary Dic using the dictionary Dic1 used for decoding the processing target block (step S114). This is because in the present embodiment, the reference block and the processing target block are two consecutive blocks, and thus the dictionary Dic1 used for decoding in step S110 is the dictionary Dic of the reference block for the next processing target block.

次に、具体的なデータを用いて、上述した伸長処理によるデータ伸長について説明するとともに、辞書復元処理の詳細について説明する。ここでは、図１７に示した圧縮データに伸長処理を施し、復号データを得るものとする。 Next, data expansion by the above-described expansion processing will be described using specific data, and details of the dictionary restoration processing will be described. Here, it is assumed that the compressed data shown in FIG. 17 is decompressed to obtain decoded data.

まず、ブロック取得部２０１が、ブロックＢ１を取得する（ステップＳ１０４）。差分辞書取得部２０３は、図２４（１）に示す、ブロックＢ１に含まれる差分辞書Δを取得する（ステップＳ１０６）。次に、辞書復元部２０４が辞書復元処理を実行する（ステップＳ１０８）。 First, the block acquisition unit 201 acquires a block B1 (step S104). The difference dictionary acquisition unit 203 acquires the difference dictionary Δ included in the block B1 illustrated in FIG. 24A (step S106). Next, the dictionary restoration unit 204 executes a dictionary restoration process (step S108).

ここで、辞書復元処理について、具体例を参照しつつ説明する。図２５は辞書復元処理の一例を示すフローチャートである。辞書復元部２０４は、まず、処理対象ブロックのデータ復号に使用される辞書Ｄｉｃ１及び、集合ＤｉｆｆＩＤを初期化する（ステップＳ１２０）。辞書復元部２０４は、辞書Ｄｉｃ１を辞書Ｄｉｃで初期化し、ＤｉｆｆＩＤを差分辞書Δの符号項目の値集合（図２５では、Δ．ＩＤと記載）で初期化する。 Here, the dictionary restoration process will be described with reference to a specific example. FIG. 25 is a flowchart showing an example of the dictionary restoration process. The dictionary restoration unit 204 first initializes the dictionary Dic1 and the set DiffID used for data decoding of the processing target block (step S120). The dictionary restoration unit 204 initializes the dictionary Dic1 with the dictionary Dic, and initializes DiffID with the value set of the code items of the difference dictionary Δ (indicated as Δ.ID in FIG. 25).

具体例では、ブロックＢ１は、最初のブロックであるので基準ブロックの復号に用いた辞書Ｄｉｃは空テーブルとなっている。従って、ステップＳ１２０の初期化の結果、辞書Ｄｉｃ１は空テーブルとなり、また、ＤｉｆｆＩＤ＝｛１，２｝となる。 In the specific example, since the block B1 is the first block, the dictionary Dic used for decoding the reference block is an empty table. Therefore, as a result of the initialization in step S120, the dictionary Dic1 becomes an empty table, and DiffID = {1, 2}.

次に、辞書復元部２０４は、ＤｉｆｆＩＤに要素が存在するか否か判定する（ステップＳ１２２）。ＤｉｆｆＩＤに要素が存在する場合（ステップＳ１２２／ＹＥＳ）、辞書復元部２０４は、ＤｉｆｆＩＤにおける最小の要素ｋを取得し、ＤｉｆｆＩＤから要素ｋを消去する（ステップＳ１２４）。次に、辞書復元部２０４は、ステップＳ１２４で取得したｋが辞書Ｄｉｃ１のエントリ数以下か否か判定する（ステップＳ１２６）。図２５では、辞書Ｄｉｃ１のエントリ数を｜Ｄｉｃ１｜で表す。 Next, the dictionary restoration unit 204 determines whether or not an element exists in DiffID (step S122). When there is an element in DiffID (step S122 / YES), the dictionary restoration unit 204 acquires the minimum element k in DiffID and deletes the element k from DiffID (step S124). Next, the dictionary restoration unit 204 determines whether or not k acquired in step S124 is equal to or smaller than the number of entries in the dictionary Dic1 (step S126). In FIG. 25, the number of entries in the dictionary Dic1 is represented by | Dic1 |.

ｋの値が辞書Ｄｉｃ１のエントリ数より大きい場合（ステップＳ１２６／ＮＯ）、辞書復元部２０４は、辞書Ｄｉｃ１の末尾に、Ｄｉｃ１［ｋ］＝Δ［ｋ］となるエントリを追加する。その後、辞書復元部２０４はステップＳ１２２に戻り処理を継続する。ここで、Ｄｉｃ１［ｋ］は、辞書Ｄｉｃ１において符号「ｋ」と対応付けられている文字列を表し、Δ［ｋ］は、差分辞書Δにおいて符号「ｋ」と対応付けられている文字列を表す。 When the value of k is larger than the number of entries in the dictionary Dic1 (step S126 / NO), the dictionary restoring unit 204 adds an entry satisfying Dic1 [k] = Δ [k] to the end of the dictionary Dic1. Thereafter, the dictionary restoration unit 204 returns to step S122 and continues the processing. Here, Dic1 [k] represents a character string associated with the code “k” in the dictionary Dic1, and Δ [k] represents a character string associated with the code “k” in the difference dictionary Δ. Represent.

具体例では、ＤｉｆｆＩＤに要素が存在するので、辞書復元部２０４は、ＤｉｆｆＩＤにおける最小の要素ｋ＝１を取得し、ＤｉｆｆＩＤからｋ＝１を消去する（ステップＳ１２４）。その結果、ＤｉｆｆＩＤ＝｛２｝となる。 In the specific example, since there is an element in DiffID, the dictionary restoration unit 204 acquires the minimum element k = 1 in DiffID, and deletes k = 1 from DiffID (step S124). As a result, DiffID = {2}.

ｋ（＝１）が、辞書Ｄｉｃ１のエントリ数（＝０）よりも大きいため、辞書復元部２０４は、辞書Ｄｉｃ１において符号「１」と対応付けられた文字列が、差分辞書Δにおいて符号「１」と対応付けられた文字列となるエントリを、辞書Ｄｉｃ１の末尾に追加する（ステップＳ１３０）。その結果、ステップＳ１３０の処理後の辞書Ｄｉｃ１は、図２４（２）となる。 Since k (= 1) is larger than the number of entries (= 0) in the dictionary Dic1, the dictionary restoring unit 204 causes the character string associated with the code “1” in the dictionary Dic1 to have the code “1” in the difference dictionary Δ. "Is added to the end of the dictionary Dic1 (step S130). As a result, the dictionary Dic1 after the process of step S130 is as shown in FIG.

辞書復元部２０４は、ＤｉｆｆＩＤに、未だ要素が存在するので（ステップＳ１２２／ＹＥＳ）、ＤｉｆｆＩＤからｋ＝２を取得し、ＤｉｆｆＩＤからｋ＝２を除去する（ステップＳ１２４）。この結果、ＤｉｆｆＩＤは空集合となる。 Since there is still an element in DiffID (step S122 / YES), dictionary restoring unit 204 acquires k = 2 from DiffID and removes k = 2 from DiffID (step S124). As a result, DiffID becomes an empty set.

ｋ＝２は、辞書Ｄｉｃ１のエントリ数（＝１）より大きいため、辞書復元部２０４は、辞書Ｄｉｃ１において符号「２」と対応付けられた文字列が、差分辞書Δにおいて符号「２」と対応付けられた文字列となるエントリを、辞書Ｄｉｃ１の末尾に追加する。ステップＳ１３０の処理後の辞書Ｄｉｃ１は、図２４（３）となる。先ほどのステップＳ１２４の処理で、ＤｉｆｆＩＤは空集合となっている。 Since k = 2 is larger than the number of entries (= 1) in the dictionary Dic1, the dictionary restoration unit 204 corresponds to the character string associated with the code “2” in the dictionary Dic1 and the code “2” in the difference dictionary Δ. An entry that becomes the attached character string is added to the end of the dictionary Dic1. The dictionary Dic1 after the process of step S130 is as shown in FIG. In the process of step S124, DiffID is an empty set.

図２５のフローチャートにおいて、ＤｉｆｆＩＤに要素が存在しない場合（ステップＳ１２２／ＮＯ）、辞書復元部２０４は、辞書Ｄｉｃ１を出力し（ステップＳ１３２）、本処理を終了する。辞書復元処理の他のステップについては、後述する。 In the flowchart of FIG. 25, when there is no element in DiffID (step S122 / NO), the dictionary restoring unit 204 outputs the dictionary Dic1 (step S132), and the process is terminated. Other steps of the dictionary restoration process will be described later.

具体例において、辞書復元部２０４は、ＤｉｆｆＩＤが空集合であるので（ステップＳ１２２／ＮＯ）、辞書Ｄｉｃ１を出力する（ステップＳ１３２）。図２４（４）が復元された辞書Ｄｉｃ１である。 In the specific example, since the DiffID is an empty set (Step S122 / NO), the dictionary restoring unit 204 outputs the dictionary Dic1 (Step S132). FIG. 24 (4) shows the restored dictionary Dic1.

復号部２０５は、図２４（４）に示す辞書Ｄｉｃ１を用いて、データ取得部２０２が取得したブロックＢ１に含まれる符号化データを復号する（ステップＳ１１０）。復号されたデータは、図２４（５）となる。 The decoding unit 205 decodes the encoded data included in the block B1 acquired by the data acquisition unit 202 using the dictionary Dic1 shown in FIG. 24 (4) (step S110). The decrypted data is as shown in FIG.

ブロック取得部２０１は、次のブロック（ブロックＢ２）が存在するので（ステップＳ１０２／ＹＥＳ）、ブロックＢ２を取得する。差分辞書取得部２０３は、図２６（１）に示す、ブロックＢ２に含まれる差分辞書Δを取得する（ステップＳ１０６）。次に、辞書復元部２０４が辞書復元処理（ステップＳ１０８）を実行する。 Since the next block (block B2) exists (step S102 / YES), the block acquisition unit 201 acquires the block B2. The difference dictionary acquisition unit 203 acquires the difference dictionary Δ included in the block B2 shown in FIG. 26 (1) (step S106). Next, the dictionary restoration unit 204 executes a dictionary restoration process (step S108).

まず、辞書復元部２０４は、辞書Ｄｉｃ１を、基準ブロックとなるブロックＢ１の復号に用いた辞書Ｄｉｃで初期化する（ステップＳ１２０）。また、ＤｉｆｆＩＤをΔ．ＩＤで初期化する（ステップＳ１２０）。その結果、初期化された辞書Ｄｉｃ１は図２６（２）となり、ＤｉｆｆＩＤ＝｛３｝となる。 First, the dictionary restoration unit 204 initializes the dictionary Dic1 with the dictionary Dic used for decoding the block B1 serving as the reference block (step S120). Also, DiffID is set to Δ. Initialization is performed with the ID (step S120). As a result, the initialized dictionary Dic1 becomes FIG. 26 (2), and DiffID = {3}.

次に、辞書復元部２０４は、ＤｉｆｆＩＤに要素が存在するので（ステップＳ１２２／ＹＥＳ）、ＤｉｆｆＩＤからｋ＝３を取得し、ＤｉｆｆＩＤからｋ＝３を除去する（ステップＳ１２４）。この結果、ＤｉｆｆＩＤは空集合となる。ｋ＝３は辞書Ｄｉｃ１のエントリの数（＝２）よりも大きいので（ステップＳ１２６／ＹＥＳ）、辞書復元部２０４は、辞書Ｄｉｃ１の符号「３」と対応付けられた文字列が、差分辞書Δにおいて符号「３」と対応付けられた文字列となるエントリを、辞書Ｄｉｃ１に追加する（ステップＳ１３０）。ステップＳ１３０の処理後の辞書Ｄｉｃ１は、図２６（３）となる。 Next, since there is an element in DiffID (step S122 / YES), the dictionary restoration unit 204 acquires k = 3 from DiffID and removes k = 3 from DiffID (step S124). As a result, DiffID becomes an empty set. Since k = 3 is larger than the number of entries in the dictionary Dic1 (= 2) (step S126 / YES), the dictionary restoring unit 204 determines that the character string associated with the code “3” of the dictionary Dic1 is the difference dictionary Δ The entry that becomes the character string associated with the code “3” is added to the dictionary Dic1 (step S130). The dictionary Dic1 after the process of step S130 is as shown in FIG.

辞書復元部２０４は、ＤｉｆｆＩＤが空集合となっているので（ステップＳ１２２／ＮＯ）、図２６（４）に示す辞書Ｄｉｃ１を出力する（ステップＳ１３２）。 Since the DiffID is an empty set (Step S122 / NO), the dictionary restoring unit 204 outputs the dictionary Dic1 shown in FIG. 26 (4) (Step S132).

復号部２０５は、図２６（４）に示す辞書を使用して、ブロックＢ２に含まれる符号化データを復号する（ステップＳ１１０）。その結果、復号されたデータは図２６（５）となる。 The decoding unit 205 decodes the encoded data included in the block B2 using the dictionary shown in FIG. 26 (4) (step S110). As a result, the decoded data is as shown in FIG.

ブロック取得部２０１は、次のブロックが存在するので（ステップＳ１０２／ＹＥＳ）、ブロックＢ３を読み込む（ステップＳ１０４）。差分辞書取得部２０３は、図２７（１）に示す差分辞書ΔをブロックＢ３から取得する（ステップＳ１０６）。辞書復元部２０４が、辞書復元処理を実行する（ステップＳ１０８）。 Since the next block exists (step S102 / YES), the block acquisition unit 201 reads the block B3 (step S104). The difference dictionary acquisition unit 203 acquires the difference dictionary Δ illustrated in FIG. 27A from the block B3 (step S106). The dictionary restoration unit 204 executes dictionary restoration processing (step S108).

辞書復元部２０４は、辞書Ｄｉｃ１を、基準ブロックとなるブロックＢ２の復号に用いた辞書Ｄｉｃで初期化する（ステップＳ１２０）。また、ＤｉｆｆＩＤをΔ．ＩＤで初期化する（ステップＳ１２０）。その結果、初期化された辞書Ｄｉｃ１は、図２７（２）となり、ＤｉｆｆＩＤ＝｛２，３｝となる。 The dictionary restoration unit 204 initializes the dictionary Dic1 with the dictionary Dic used for decoding the block B2 serving as the reference block (step S120). Also, DiffID is set to Δ. Initialization is performed with the ID (step S120). As a result, the initialized dictionary Dic1 becomes FIG. 27 (2), and DiffID = {2, 3}.

辞書復元部２０４は、ＤｉｆｆＩＤに要素が存在するので（ステップＳ１２２／ＹＥＳ）、ＤｉｆｆＩＤから最小の要素ｋ＝２を取得し、ＤｉｆｆＩＤからｋ＝２を除去する（ステップＳ１２４）。その結果、ＤｉｆｆＩＤ＝｛３｝となる。ここで、具体例において、ｋ＝２が辞書Ｄｉｃ１のエントリ数（＝３）以下となっている。 Since there is an element in DiffID (step S122 / YES), the dictionary restoring unit 204 acquires the minimum element k = 2 from DiffID and removes k = 2 from DiffID (step S124). As a result, DiffID = {3}. Here, in a specific example, k = 2 is equal to or less than the number of entries (= 3) in the dictionary Dic1.

図２５のフローチャートにおいて、ｋの値が、辞書Ｄｉｃ１のエントリ数以下である場合（ステップＳ１２６／ＹＥＳ）、辞書復元部２０４は、辞書Ｄｉｃ１において符号「ｋ」と対応付けられている文字列を、差分辞書Δにおいて符号「ｋ」と対応付けられている文字列で上書きする（ステップＳ１２８）。そして、ステップＳ１２８の処理を終えると、辞書復元部２０４は、ステップＳ１２２に戻り処理を継続する。 In the flowchart of FIG. 25, when the value of k is less than or equal to the number of entries in the dictionary Dic1 (step S126 / YES), the dictionary restoration unit 204 determines a character string associated with the code “k” in the dictionary Dic1. The character string associated with the code “k” in the difference dictionary Δ is overwritten (step S128). When the process of step S128 is completed, the dictionary restoration unit 204 returns to step S122 and continues the process.

具体例において、辞書復元部２０４は、ｋ＝２が辞書Ｄｉｃ１のエントリ数（＝３）以下であるので、ステップＳ１２８の処理を実行する。すなわち、辞書Ｄｉｃ１の符号「２」と対応付けられた文字列を、差分辞書Δにおいて符号「２」に対応付けられた文字列で上書きする。ステップＳ１２８の処理後の辞書Ｄｉｃ１は、図２７（３）となる。 In the specific example, the dictionary restoration unit 204 executes the process of step S128 because k = 2 is equal to or less than the number of entries in the dictionary Dic1 (= 3). That is, the character string associated with the code “2” in the dictionary Dic1 is overwritten with the character string associated with the code “2” in the difference dictionary Δ. The dictionary Dic1 after the process of step S128 is as shown in FIG.

次に、辞書復元部２０４は、ＤｉｆｆＩＤに未だ要素が存在するので（ステップＳ１２２／ＹＥＳ）、要素ｋ＝３を取得し、ＤｉｆｆＩＤからｋ＝３を除去する（ステップＳ１２４）。その結果、ＤｉｆｆＩＤは空集合となる。 Next, since there is still an element in DiffID (step S122 / YES), the dictionary restoration unit 204 acquires element k = 3 and removes k = 3 from DiffID (step S124). As a result, DiffID becomes an empty set.

取得したｋ（＝３）は、辞書Ｄｉｃ１のエントリ数（＝３）以下であるので（ステップＳ１２６／ＹＥＳ）、辞書復元部２０４はステップＳ１２８の処理を実行する。すなわち、辞書Ｄｉｃ１において符号「３」と対応付けられた文字列を、差分辞書Δにおいて符号「３」と対応付けられた文字列で上書きする。その結果、ステップＳ１２８の処理後の辞書Ｄｉｃ１は、図２７（４）となる。 Since the acquired k (= 3) is equal to or less than the number of entries (= 3) in the dictionary Dic1 (step S126 / YES), the dictionary restoring unit 204 executes the process of step S128. That is, the character string associated with the code “3” in the dictionary Dic1 is overwritten with the character string associated with the code “3” in the difference dictionary Δ. As a result, the dictionary Dic1 after the processing in step S128 is as shown in FIG. 27 (4).

ＤｉｆｆＩＤが空集合となったので（ステップＳ１２２／ＮＯ）、辞書復元部２０４は、図２７（５）に示す辞書Ｄｉｃ１を出力する。復号部２０５は、ブロックＢ３に含まれるデータを、図２７（５）に示す辞書で復号する。その結果、復号されたデータは、図２７（６）になる。 Since DiffID has become an empty set (step S122 / NO), the dictionary restoration unit 204 outputs the dictionary Dic1 shown in FIG. 27 (5). The decoding unit 205 decodes the data included in the block B3 with the dictionary shown in FIG. 27 (5). As a result, the decoded data is as shown in FIG.

ブロックＢ３の次に、処理対象となるブロックは存在しないため（ステップＳ１０２／ＮＯ）、データ伸長装置２００は、データ伸長処理を終了する。 Since there is no block to be processed next to block B3 (step S102 / NO), the data expansion device 200 ends the data expansion processing.

以上の説明から明らかなように、本実施例に係るデータ伸長装置２００は、基準ブロックの符号化に使用された辞書と、処理対象ブロックの差分辞書との間に重複する符号が存在する場合、重複する符号に対応する基準ブロックの文字列を、差分辞書の文字列で置換する。その結果、上述したデータ圧縮方法により作成された圧縮データを伸長して、元のデータを復元することができる。 As is clear from the above description, the data decompression apparatus 200 according to the present embodiment, when there is an overlapping code between the dictionary used for encoding the reference block and the difference dictionary of the processing target block, The character string of the reference block corresponding to the overlapping code is replaced with the character string of the difference dictionary. As a result, it is possible to decompress the compressed data created by the above-described data compression method and restore the original data.

以上、本件実施例について詳述したが、本件は係る特定の実施形態に限定されるものではなく、特許請求の範囲に記載された本発明の要旨の範囲内において、種々の変形・変更が可能である。 The present embodiment has been described in detail above. However, the present embodiment is not limited to the specific embodiment, and various modifications and changes are possible within the scope of the gist of the present invention described in the claims. It is.

例えば、本実施例では、連続する２つのブロックにおいて、前方のブロックを基準ブロックとし、後方のブロックを処理対象ブロックとして、順次圧縮処理、伸長処理を行った。しかしながら、基準ブロックを被圧縮データの最初のブロックとし、処理対象ブロックを、最初のブロック以外の、任意のブロックＢｉとしても良い。つまり、データ圧縮装置１００は、処理対象ブロックの圧縮処理において、常に、最初のブロックを基準ブロックとして差分辞書を作成するようにしても良い。連続する２つのブロックを本実施例を用いて圧縮した場合、例えば、ブロックＢｉのデータを伸長するためには、ブロックＢ１〜ブロックＢｉ−１の符号化に用いられた辞書をそれぞれ復元してから、ブロックＢｉの辞書を復元し、データを復号する必要がある。しかしながら、最初のブロックとの差分辞書を作成するようにした場合、最初のブロックの差分辞書と、ブロックＢｉの差分辞書とを使用すれば、ブロックＢｉの符号化に用いられた辞書を復元できるため、ブロックＢ１〜ブロックＢｉ−１において順次辞書を復元する必要がない。従って、指定されたブロックのデータを復号するまでの時間を短縮することができる。また、本実施例における被圧縮データは構造を有するテキストデータであるが、被圧縮データが構造を有しないプレーンなテキスト形式のデータである場合も、同様に実施することができる。さらに、本実施例では、静的辞書式符号化手法として、単純な「値の符号化方式」を採用したが、その他の静的辞書式符号化手法を用いても、同様に実施することが可能である。 For example, in the present embodiment, in two consecutive blocks, the front block is used as a reference block and the rear block is used as a processing target block, and compression processing and decompression processing are sequentially performed. However, the reference block may be the first block of the compressed data, and the processing target block may be an arbitrary block Bi other than the first block. That is, the data compression apparatus 100 may always create a difference dictionary using the first block as a reference block in the compression processing of the processing target block. When two consecutive blocks are compressed using this embodiment, for example, in order to decompress the data of the block Bi, after restoring the dictionaries used for encoding the blocks B1 to Bi-1, respectively. It is necessary to restore the block Bi dictionary and decrypt the data. However, when the difference dictionary for the first block is created, the dictionary used for encoding the block Bi can be restored by using the difference dictionary for the first block and the difference dictionary for the block Bi. There is no need to sequentially restore the dictionary in block B1 to block Bi-1. Therefore, it is possible to shorten the time until the data of the designated block is decoded. Further, the compressed data in this embodiment is text data having a structure, but the same can be implemented when the compressed data is plain text format data having no structure. Furthermore, in this embodiment, a simple “value encoding method” is adopted as the static lexicographic encoding method, but the same can be implemented even if other static lexicographic encoding methods are used. Is possible.

なお、上記のデータ圧縮装置、及びデータ伸長装置が有する機能は、コンピュータによって実現することができる。その場合、データ圧縮装置、及びデータ伸長装置が有すべき機能の処理内容を記述したプログラムが提供される。そのプログラムをコンピュータで実行することにより、上記処理機能がコンピュータ上で実現される。処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。 Note that the functions of the data compression device and the data decompression device can be realized by a computer. In that case, a program describing the processing contents of the functions that the data compression apparatus and the data expansion apparatus should have is provided. By executing the program on a computer, the above processing functions are realized on the computer. The program describing the processing contents can be recorded on a computer-readable recording medium.

プログラムを流通させる場合には、例えば、そのプログラムが記録されたＤＶＤ（Digital Versatile Disc）、ＣＤ−ＲＯＭ（Compact Disc Read Only Memory）などの可搬型記録媒体の形態で販売される。また、プログラムをサーバコンピュータの記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することもできる。 When the program is distributed, for example, it is sold in the form of a portable recording medium such as a DVD (Digital Versatile Disc) or a CD-ROM (Compact Disc Read Only Memory) on which the program is recorded. It is also possible to store the program in a storage device of a server computer and transfer the program from the server computer to another computer via a network.

プログラムを実行するコンピュータは、例えば、可搬型記録媒体に記録されたプログラムもしくはサーバコンピュータから転送されたプログラムを、自己の記憶装置に格納する。そして、コンピュータは、自己の記憶装置からプログラムを読み取り、プログラムに従った処理を実行する。なお、コンピュータは、可搬型記録媒体から直接プログラムを読み取り、そのプログラムに従った処理を実行することもできる。また、コンピュータは、サーバコンピュータからプログラムが転送されるごとに、逐次、受け取ったプログラムに従った処理を実行することもできる。 The computer that executes the program stores, for example, the program recorded on the portable recording medium or the program transferred from the server computer in its own storage device. Then, the computer reads the program from its own storage device and executes processing according to the program. The computer can also read the program directly from the portable recording medium and execute processing according to the program. Further, each time the program is transferred from the server computer, the computer can sequentially execute processing according to the received program.

また、本実施例では、データ圧縮装置及びデータ伸長装置を別々の装置として記載したが、図２８に示すように１つの情報処理装置がデータ圧縮装置及びデータ伸長装置としての機能を果すように構成しても良い。また、例えば、インターネット等の通信網に接続されたサーバコンピュータを本件のデータ圧縮装置及びデータ伸長装置の少なくとも一方とし、これに接続されたパーソナルコンピュータ等の通信装置に、データ圧縮及びデータ伸長の少なくとも一つを行うサービスをサーバコンピュータから提供するようにしても良い（ＡＳＰ(Application Service Provider)）。 In this embodiment, the data compression device and the data decompression device are described as separate devices. However, as shown in FIG. 28, one information processing device is configured to function as a data compression device and a data decompression device. You may do it. Further, for example, a server computer connected to a communication network such as the Internet is at least one of the data compression device and the data decompression device of the present case, and the communication device such as a personal computer connected thereto has at least data compression and data decompression. One service may be provided from a server computer (ASP (Application Service Provider)).

また、本実施例ではネットワーク４０を介して、データ圧縮装置１００又はデータ伸長装置２００は、記憶装置１０、センサ装置２０、及びデータ処理装置３０とデータの送受信を行うこととした。しかしながら、データ圧縮装置１００又はデータ伸長装置２００を、記憶装置１０、センサ装置２０、及びデータ処理装置３０のそれぞれと直接接続（ローカル接続）して、データの送受信を行うように構成しても良い。また、本実施例では、地区項目のみを符号化したが、他の項目についても符号化が可能なことはいうまでもない。 In this embodiment, the data compression device 100 or the data decompression device 200 transmits / receives data to / from the storage device 10, the sensor device 20, and the data processing device 30 via the network 40. However, the data compression device 100 or the data decompression device 200 may be configured to directly connect (locally connect) to the storage device 10, the sensor device 20, and the data processing device 30 to transmit and receive data. . In this embodiment, only the district items are encoded, but it goes without saying that other items can also be encoded.

１００…データ圧縮装置
１１０…データ取得部
１１１…辞書作成部
１１２…差分辞書作成部
１１３…符号化部
１１４…出力部
２００…データ伸長装置
２０１…ブロック取得部
２０２…データ取得部
２０３…差分辞書取得部
２０４…辞書復元部
２０５…復号部 DESCRIPTION OF SYMBOLS 100 ... Data compression apparatus 110 ... Data acquisition part 111 ... Dictionary creation part 112 ... Difference dictionary creation part 113 ... Encoding part 114 ... Output part 200 ... Data decompression apparatus 201 ... Block acquisition part 202 ... Data acquisition part 203 ... Difference dictionary acquisition Unit 204 ... Dictionary restoration unit 205 ... Decoding unit

Claims

An input unit that accepts input of text data;
A dividing unit that divides the text data into a plurality of blocks based on a predetermined rule;
Based on a reference dictionary that is dictionary data in which character strings and codes are stored in association with each other, among character strings that appear in a processing target block, character strings that are not registered in the reference dictionary and the reference dictionary A difference dictionary generation unit that generates a difference dictionary that is dictionary data associated with a code associated with a character string that does not appear in the processing target block;
Based on the created difference dictionary and the reference dictionary, a processing target dictionary generation unit that generates a processing target dictionary that is dictionary data;
A compression unit that compresses the processing target block by referring to the generated processing target dictionary and replacing a character string that appears in the processing target block with a corresponding code;
An output unit that outputs the processing target block data compressed by the compression unit and the generated difference dictionary;
A data compression apparatus comprising:

The data compression apparatus according to claim 1, wherein the reference dictionary is a processing target dictionary generated by the processing target dictionary generation unit for the previous processing target block.

The data compression apparatus according to claim 1, wherein the reference dictionary is a processing target dictionary generated by the processing target dictionary generation unit for the first processing target block.

A dividing unit that divides data to be decompressed into a plurality of blocks based on a predetermined rule;
An acquisition unit that acquires a difference dictionary, which is dictionary data in which a character string and a code are stored in association with each other, from a processing target block to be processed among the plurality of blocks;
A processing target dictionary generation unit that generates a processing target dictionary that is dictionary data based on a reference dictionary that is dictionary data and the difference dictionary acquired by the acquisition unit ;
Based on the generated processing target dictionary, a decoding unit that decodes the processing target block by replacing a code appearing in the processing target block with a corresponding character string ;
Equipped with a,
The difference dictionary is associated with character strings that are not registered in the reference dictionary among character strings that appear in the processing target block before compression and character strings that do not appear in the processing target block before compression in the reference dictionary. Dictionary data in association with a given code,
The data decompression device , wherein the processing target block is compressed by referring to the processing target dictionary and replacing a character string appearing in the processing target block before compression with a corresponding code .

5. The data decompression apparatus according to claim 4, wherein the reference dictionary is a processing target dictionary generated by the processing target dictionary generation unit for the previous processing target block.

5. The data expansion device according to claim 4, wherein the reference dictionary is a processing target dictionary generated by the processing target dictionary generation unit for the first processing target block.

On the computer,
An input step for accepting input of text data;
A division step of dividing the text data into a plurality of blocks based on a predetermined rule;
Based on a reference dictionary that is dictionary data in which character strings and codes are stored in association with each other, among character strings that appear in a processing target block, character strings that are not registered in the reference dictionary and the reference dictionary A difference dictionary generation step of generating a difference dictionary that is dictionary data associated with a code associated with a character string that does not appear in the processing target block;
Based on the created difference dictionary and the reference dictionary, a processing target dictionary generation step for generating a processing target dictionary that is dictionary data;
A compression step of compressing the processing target block by referring to the generated processing target dictionary and replacing a character string appearing in the processing target block with a corresponding code;
An output step for outputting the processing target block data compressed by the compression step and the generated difference dictionary;
A data compression program characterized in that is executed.

On the computer,
A dividing step of dividing the compressed data to be decompressed into a plurality of blocks based on a predetermined rule;
An acquisition step of acquiring a difference dictionary that is dictionary data in which a character string and a code are stored in association with each other from a processing target block to be processed among the plurality of blocks;
A processing target dictionary generating step for generating a processing target dictionary that is dictionary data based on a reference dictionary that is dictionary data and the difference dictionary acquired in the acquiring step ;
A decoding step of decoding the processing target block by replacing a code appearing in the processing target block with a corresponding character string based on the generated processing target dictionary;
Was executed,
The difference dictionary is associated with character strings that are not registered in the reference dictionary among character strings that appear in the processing target block before compression and character strings that do not appear in the processing target block before compression in the reference dictionary. Dictionary data in association with a given code,
The data decompression program , wherein the processing target block is compressed by referring to the processing target dictionary and replacing a character string appearing in the processing target block before compression with a corresponding code .