JP2016052046A

JP2016052046A - Compression device, decompression device and storage device

Info

Publication number: JP2016052046A
Application number: JP2014177215A
Authority: JP
Inventors: 央小暮; Hiroshi Kogure; 淳松村; Atsushi Matsumura; 知也児玉; Tomoya Kodama
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2014-09-01
Filing date: 2014-09-01
Publication date: 2016-04-11

Abstract

PROBLEM TO BE SOLVED: To perform lossless compression on data at a high compression rate and a high throughput.SOLUTION: A compression device includes a first compression part, a second compression part and a selection part. In the case where at least one of partial data streams obtained by dividing an input data stream in a predetermined size is matched with any dictionary registration data, the first compression part generates first compression data including a dictionary address associated with at least one of dictionary registration data. The second compression part generates second compression data in which no dictionary address is included, by compressing the input data stream on the basis of either comparison of the first compression data with past first compression data or past input data streams. The selection part selects either the first compression data or the second compression data having a smaller data size, thereby obtaining compression data of the input data stream.SELECTED DRAWING: Figure 1

Description

実施形態は、データの圧縮および伸長に関する。 Embodiments relate to data compression and decompression.

近年、タブレット端末、ｕｌｔｒａｂｏｏｋなどのモバイル端末の普及が進んでいる。このようなモバイル端末のストレージとして、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）に比べて消費電力、耐衝撃性などの点で優れたＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）を採用することがある。さらに、ＳＳＤの大容量化、ＳＡＴＡ（ＳｅｒｉａｌＡＴＡ）からＰＣＩ（ＰｅｒｉｐｈｅｒａｌＣｏｍｐｏｎｅｎｔＩｎｔｅｒｃｏｎｎｅｃｔ）Ｅｘｐｒｅｓｓへのホストインターフェースのシフトなども進みつつある。この結果、ＳＳＤの取り扱うデータ量およびアクセス速度は増加しつつある。 In recent years, mobile terminals such as tablet terminals and ultrabooks are spreading. As a storage of such a mobile terminal, an SSD (Solid State Drive) that is superior in terms of power consumption, impact resistance and the like as compared with an HDD (Hard Disk Drive) may be adopted. Furthermore, the capacity of SSDs is increasing, and the host interface is shifting from SATA (Serial ATA) to PCI (Peripheral Component Interconnect) Express. As a result, the amount of data handled by the SSD and the access speed are increasing.

ＳＳＤの応答性および信頼性を向上させるために、データの読み書きに付随して可逆（ロスレス）圧縮を利用することができる。具体的には、ＳＳＤコントローラは、オリジナルのデータを圧縮し、ＳＳＤ内のＮＡＮＤフラッシュメモリに圧縮データを書き込む。そして、ＳＳＤコントローラは、ＮＡＮＤフラッシュメモリから圧縮データを読み出し、圧縮データを伸長することによってオリジナルのデータを復元する。係るロスレス圧縮を利用することにより、特にホストがサスペンド／レジュームする際のＳＳＤの応答性は向上する。さらに、ＮＡＮＤフラッシュメモリのオーバープロビジョニングが等価的に大きくなるのでＳＳＤの寿命および信頼性も向上する。故に、ＳＳＤコントローラは、好ましくは、ストレージ（ＮＡＮＤフラッシュメモリ）に保存されるデータ全般を高圧縮率かつ高スループットでロスレス圧縮する。ストレージに保存されるデータ全般とは、例えば、ＯＳ（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ）若しくはプログラムなどのシステムファイル、ハイバネーションファイル若しくはページファイルなどのメモリイメージ、または、ユーザデータなどである。 In order to improve SSD responsiveness and reliability, lossless compression can be used in association with reading and writing data. Specifically, the SSD controller compresses the original data and writes the compressed data to the NAND flash memory in the SSD. Then, the SSD controller reads the compressed data from the NAND flash memory and decompresses the compressed data to restore the original data. By utilizing such lossless compression, the responsiveness of the SSD is improved especially when the host is suspended / resume. Furthermore, since the over-provisioning of the NAND flash memory becomes equivalently large, the life and reliability of the SSD are also improved. Therefore, the SSD controller preferably compresses the entire data stored in the storage (NAND flash memory) losslessly with a high compression rate and high throughput. The overall data stored in the storage is, for example, a system file such as an OS (Operating System) or a program, a memory image such as a hibernation file or a page file, or user data.

Ｂ．Ｓｕｋｈｗａｎｉ，Ｂ．Ａｂａｌｉ，Ｂ．Ｂｒｅｚｚｏ，ａｎｄＳ．Ａｓａａｄ， “Ｈｉｇｈ−Ｔｈｒｏｕｇｈｐｕｔ，ＬｏｓｓｌｅｓｓＤａｔａＣｏｍｐｒｅｓｓｉｏｎｏｎＦＰＧＡｓ，” ｉｎＦｉｅｌｄＰｒｏｇｒａｍｍａｂｌｅＣｕｓｔｏｍＣｏｍｐｕｔｉｎｇＭａｃｈｉｎｅｓ（ＦＣＣＭ），２０１１ＩＥＥＥ１９ｔｈＡｎｎｕａｌＩｎｔ．Ｓｙｍｐ．ｏｎ．ＩＥＥＥ，２０１１，ｐｐ．１１３−１１６．B. Sukwani, B.H. Abali, B.A. Brezzo, and S.M. Asaad, “High-Throughput, Lossless Data Compression on FPGAs,” in Field Programmable Custom Computing Machines (FCCM), 2011 IEEE 19th Annual. Symp. on. IEEE, 2011, pp. 113-116.

実施形態は、データを高圧縮率かつ高スループットでロスレス圧縮することを目的とする。 The embodiment aims at lossless compression of data at a high compression rate and high throughput.

実施形態によれば、圧縮装置は、辞書メモリと、第１の圧縮部と、第２の圧縮部と、選択部とを含む。辞書メモリは、辞書アドレスを所定サイズの辞書登録データと関連付ける辞書テーブルを保存する。第１の圧縮部は、入力データ列を所定サイズで分割した部分データ列の各々と辞書登録データとの比較結果に基づいて当該入力データ列を圧縮することによって、部分データ列のうち少なくとも１つが辞書登録データのいずれかに一致するならば当該辞書登録データの少なくとも１つに関連付けられる辞書アドレスを含む第１の圧縮データを生成する。第２の圧縮部は、第１の圧縮データと過去の第１の圧縮データとの比較、ならびに、過去の入力データ列のうち少なくとも一方に基づいて、入力データ列を圧縮することによって辞書アドレスを含まない第２の圧縮データを生成する。選択部は、第１の圧縮データおよび第２の圧縮データのうちデータサイズの小さい一方を選択することによって、入力データ列の圧縮データを得る。 According to the embodiment, the compression device includes a dictionary memory, a first compression unit, a second compression unit, and a selection unit. The dictionary memory stores a dictionary table that associates dictionary addresses with dictionary registration data of a predetermined size. The first compression unit compresses the input data string based on a comparison result between each of the partial data strings obtained by dividing the input data string by a predetermined size and the dictionary registration data, so that at least one of the partial data strings is If it matches any of the dictionary registration data, first compressed data including a dictionary address associated with at least one of the dictionary registration data is generated. The second compression unit compares the first compressed data with the past first compressed data, and compresses the input data string based on at least one of the past input data strings, thereby obtaining a dictionary address. Second compressed data not included is generated. The selection unit obtains compressed data of the input data string by selecting one of the first compressed data and the second compressed data having a smaller data size.

実施形態によれば、伸長装置は、辞書メモリと、第１の伸長部と、第２の伸長部と選択部とを含む。辞書メモリは、辞書アドレスを所定サイズの辞書登録データと関連付ける辞書テーブルを保存する。第１の伸長部は、圧縮データに対応する圧縮前データを所定の分割パターンに従って分割した各部分データ列の符号語を当該圧縮データから抽出し、当該部分データ列が辞書アドレスのいずれかに相当するならば当該辞書アドレスに関連付けられる辞書登録データを当該部分データ列として取り扱うことによって、第１の伸長データを生成する。第２の伸長部は、過去の圧縮データおよび過去の伸長データのうち少なくとも一方に基づいて、圧縮データを伸長することによって第２の伸長データを生成する。選択部は、圧縮データのコーデックが第１の方式であるならば第１の伸長データを選択し、圧縮データのコーデックが第２の方式であるならば第２の伸長データを選択することによって、圧縮データの伸長データを得る。 According to the embodiment, the decompression device includes a dictionary memory, a first decompression unit, a second decompression unit, and a selection unit. The dictionary memory stores a dictionary table that associates dictionary addresses with dictionary registration data of a predetermined size. The first decompressing unit extracts, from the compressed data, a code word of each partial data sequence obtained by dividing the pre-compression data corresponding to the compressed data according to a predetermined division pattern, and the partial data sequence corresponds to one of the dictionary addresses If so, the first decompressed data is generated by treating the dictionary registration data associated with the dictionary address as the partial data string. The second decompressing unit creates second decompressed data by decompressing the compressed data based on at least one of the past compressed data and the past decompressed data. The selection unit selects the first decompressed data if the codec of the compressed data is the first scheme, and selects the second decompressed data if the codec of the compressed data is the second scheme, Obtain decompressed data of compressed data.

実施形態によれば、ストレージ装置は、ストレージと、圧縮装置と、伸長装置とを含む。圧縮装置は、辞書メモリと、第１の圧縮部と、第２の圧縮部と、選択部とを含む。辞書メモリは、辞書アドレスを所定サイズの辞書登録データと関連付ける辞書テーブルを保存する。第１の圧縮部は、入力データ列を所定サイズで分割した部分データ列の各々と辞書登録データとの比較結果に基づいて当該入力データ列を圧縮することによって、部分データ列のうち少なくとも１つが辞書登録データのいずれかに一致するならば当該辞書登録データの少なくとも１つに関連付けられる辞書アドレスを含む第１の圧縮データを生成する。第２の圧縮部は、第１の圧縮データと過去の第１の圧縮データとの比較、ならびに、過去の入力データ列のうち少なくとも一方に基づいて、入力データ列を圧縮することによって辞書アドレスを含まない第２の圧縮データを生成する。選択部は、第１の圧縮データおよび第２の圧縮データのうちデータサイズの小さい一方を選択することによって、入力データ列の圧縮データを得る。入力データ列の圧縮データは、ストレージにおいて読み書きされる。伸長装置は、辞書メモリと、第１の伸長部と、第２の伸長部と、選択部とを含む。辞書メモリは、辞書アドレスを所定サイズの辞書登録データと関連付ける辞書テーブルを保存する。第１の伸長部は、ストレージから読み出された圧縮データに対応する圧縮前データを所定の分割パターンに従って分割した各部分データ列の符号語を当該圧縮データから抽出し、当該部分データ列が辞書アドレスのいずれかに相当するならば当該辞書アドレスに関連付けられる辞書登録データを当該部分データ列として取り扱うことによって、第１の伸長データを生成する。第２の伸長部は、過去にストレージから読み出された圧縮データおよび過去の伸長データのうち少なくとも一方に基づいて、ストレージから読み出された圧縮データを伸長することによって第２の伸長データを生成する。選択部は、ストレージから読み出された圧縮データのコーデックが第１の方式であるならば第１の伸長データを選択し、ストレージから読み出された圧縮データのコーデックが第２の方式であるならば第２の伸長データを選択することによって、ストレージから読み出された圧縮データの伸長データを得る。 According to the embodiment, the storage device includes a storage, a compression device, and a decompression device. The compression device includes a dictionary memory, a first compression unit, a second compression unit, and a selection unit. The dictionary memory stores a dictionary table that associates dictionary addresses with dictionary registration data of a predetermined size. The first compression unit compresses the input data string based on a comparison result between each of the partial data strings obtained by dividing the input data string by a predetermined size and the dictionary registration data, so that at least one of the partial data strings is If it matches any of the dictionary registration data, first compressed data including a dictionary address associated with at least one of the dictionary registration data is generated. The second compression unit compares the first compressed data with the past first compressed data, and compresses the input data string based on at least one of the past input data strings, thereby obtaining a dictionary address. Second compressed data not included is generated. The selection unit obtains compressed data of the input data string by selecting one of the first compressed data and the second compressed data having a smaller data size. The compressed data of the input data string is read / written in the storage. The decompression device includes a dictionary memory, a first decompression unit, a second decompression unit, and a selection unit. The dictionary memory stores a dictionary table that associates dictionary addresses with dictionary registration data of a predetermined size. The first decompressing unit extracts, from the compressed data, a codeword of each partial data string obtained by dividing the pre-compression data corresponding to the compressed data read from the storage according to a predetermined division pattern, and the partial data string is a dictionary If it corresponds to any of the addresses, the first decompressed data is generated by treating the dictionary registration data associated with the dictionary address as the partial data string. The second decompression unit generates the second decompressed data by decompressing the compressed data read from the storage based on at least one of the compressed data read from the storage in the past and the past decompressed data. To do. The selection unit selects the first decompressed data if the codec of the compressed data read from the storage is the first method, and selects the codec of the compressed data read from the storage is the second method. For example, by selecting the second decompressed data, decompressed data of the compressed data read from the storage is obtained.

第１の実施形態に係る圧縮装置を例示するブロック図。1 is a block diagram illustrating a compression device according to a first embodiment. 第１の圧縮部によって行われるデータ圧縮処理の説明図。Explanatory drawing of the data compression process performed by the 1st compression part. 入力データ列の分割パターンを例示する図。The figure which illustrates the division | segmentation pattern of an input data sequence. 第１の圧縮部によって行われるデータ圧縮処理の説明図。Explanatory drawing of the data compression process performed by the 1st compression part. 図１の圧縮装置によって行われるデータ圧縮処理を例示するフローチャート。2 is a flowchart illustrating a data compression process performed by the compression apparatus in FIG. 1. 第１の実施形態に係る伸長装置を例示するブロック図。1 is a block diagram illustrating a decompressing device according to a first embodiment. 図６の伸長装置によって行われるデータ伸長処理を例示するフローチャート。The flowchart which illustrates the data expansion | extension process performed by the expansion | extension apparatus of FIG. 第１の実施形態に係るストレージ装置を例示するブロック図。1 is a block diagram illustrating a storage device according to a first embodiment. 図１の変形例を例示するブロック図。The block diagram which illustrates the modification of FIG. 第１の圧縮データの構造の説明図。Explanatory drawing of the structure of 1st compression data. 第３の実施形態に係る圧縮装置に含まれる第２の圧縮部を例示するブロック図。The block diagram which illustrates the 2nd compression part contained in the compression device concerning a 3rd embodiment. 図１１の第２の圧縮部によって生成される第２の圧縮データを例示する図。The figure which illustrates the 2nd compression data generated by the 2nd compression part of Drawing 11. 図１１の第２の圧縮部によって生成される第２の圧縮データを例示する図。The figure which illustrates the 2nd compression data generated by the 2nd compression part of Drawing 11. 第４の実施形態に係る圧縮装置に含まれる第２の圧縮部を例示するブロック図。The block diagram which illustrates the 2nd compression part contained in the compression device concerning a 4th embodiment. 第４の実施形態に係る圧縮装置が行うデータ圧縮処理の説明図。Explanatory drawing of the data compression process which the compression device concerning 4th Embodiment performs. 第４の実施形態に係る圧縮装置が行うデータ圧縮処理の説明図。Explanatory drawing of the data compression process which the compression device concerning 4th Embodiment performs. 第４の実施形態に係る圧縮装置によって生成される第１の圧縮データおよび第２の圧縮データの構造の説明図。Explanatory drawing of the structure of the 1st compression data produced | generated by the compression apparatus which concerns on 4th Embodiment, and 2nd compression data. 図８のストレージ装置のハードウェア構成を例示するブロック図。FIG. 9 is a block diagram illustrating a hardware configuration of the storage apparatus of FIG.

以下、図面を参照しながら実施形態の説明が述べられる。尚、以降、説明済みの要素と同一または類似の要素には同一または類似の符号が付され、重複する説明は基本的に省略される。 Hereinafter, embodiments will be described with reference to the drawings. Hereinafter, the same or similar elements as those already described are denoted by the same or similar reference numerals, and redundant description is basically omitted.

（第１の実施形態）
図１に例示されるように、第１の実施形態に係る圧縮装置は、辞書メモリ１１０と、第１の圧縮部１２０と、第２の圧縮部１３０と、選択部１４０とを含む。この圧縮装置は、固定サイズ（例えば、８Ｂｙｔｅｓ）の入力データ列１０をロスレス圧縮することによって圧縮データ１１を生成する。なお、以降の説明において、固定サイズの入力データ列１０をブロックと呼ぶこともある。 (First embodiment)
As illustrated in FIG. 1, the compression device according to the first embodiment includes a dictionary memory 110, a first compression unit 120, a second compression unit 130, and a selection unit 140. This compression apparatus generates compressed data 11 by lossless compression of an input data string 10 having a fixed size (for example, 8 bytes). In the following description, the fixed-size input data string 10 may be referred to as a block.

辞書メモリ１１０には、ハッシュ関数、ハッシュテーブルおよび辞書テーブルが保存される。ハッシュテーブルは、入力データ列１０を所定のサイズで分割した部分データ列に対してハッシュ関数を適用することによって得られるハッシュ値を、辞書アドレスと呼ばれるパラメータと関連付ける。 The dictionary memory 110 stores a hash function, a hash table, and a dictionary table. The hash table associates a hash value obtained by applying a hash function to a partial data string obtained by dividing the input data string 10 by a predetermined size with a parameter called a dictionary address.

辞書テーブルは、上記辞書アドレスを辞書登録データと関連付ける。辞書登録データとは、圧縮済みのデータに相当する過去の入力データ列を上記所定のサイズで分割した部分データ列である。 The dictionary table associates the dictionary address with dictionary registration data. The dictionary registration data is a partial data string obtained by dividing a past input data string corresponding to compressed data by the predetermined size.

なお、複数のサイズに対して個別のハッシュ関数、ハッシュテーブルおよび辞書テーブルが用意されてもよい。図３および図４に例示されるように、入力データ列１０のサイズが８Ｂｙｔｅｓであるならば、当該入力データ列１０を２Ｂｙｔｅｓ単位、４Ｂｙｔｅｓ単位および８Ｂｙｔｅｓ単位でそれぞれ分割した部分データ列を登録するために、２Ｂｙｔｅｓ向け、４Ｂｙｔｅｓ向けおよび８Ｂｙｔｅｓ向けのハッシュ関数、ハッシュテーブルおよび辞書テーブルがそれぞれ用意されてもよい。 Individual hash functions, hash tables, and dictionary tables may be prepared for a plurality of sizes. As illustrated in FIGS. 3 and 4, if the size of the input data string 10 is 8 bytes, a partial data string obtained by dividing the input data string 10 in units of 2 bytes, 4 bytes, and 8 bytes is registered. In addition, a hash function, a hash table, and a dictionary table for 2 bytes, 4 bytes, and 8 bytes may be prepared, respectively.

辞書メモリ１１０に保存されるハッシュテーブルおよび辞書テーブルは、後述されるように、将来の入力データ列を効率的に圧縮するために、入力データ列１０を用いて更新される。 As will be described later, the hash table and the dictionary table stored in the dictionary memory 110 are updated using the input data string 10 in order to efficiently compress the future input data string.

第１の圧縮部１２０は、入力データ列１０を図示されない外部装置（例えばホスト）から受け取る。第１の圧縮部１２０は、辞書メモリ１１０に保存されるハッシュ関数、ハッシュテーブルおよび辞書テーブルを用いて、入力データ列１０を圧縮することによって第１の圧縮データを得る。第１の圧縮データは、入力データ列１０に適用される分割パターン、ならびに、当該分割パターンに対応する各部分データ列と辞書登録データとの一致／不一致を示す辞書一致情報と、当該部分データ列の各々の符号語とを含む。第１の圧縮部１２０は、第１の圧縮データを選択部１４０へと出力する。 The first compression unit 120 receives the input data string 10 from an external device (for example, a host) not shown. The first compression unit 120 obtains first compressed data by compressing the input data string 10 using a hash function, a hash table, and a dictionary table stored in the dictionary memory 110. The first compressed data includes a division pattern applied to the input data string 10, dictionary matching information indicating whether each partial data string corresponding to the division pattern matches dictionary registration data, and the partial data string. Each codeword. The first compression unit 120 outputs the first compressed data to the selection unit 140.

具体的には、第１の圧縮部１２０は図２に例示されるようにデータ圧縮処理を行う。第１の圧縮部１２０は、入力データ列１０を１種類以上のサイズ単位でそれぞれ分割し、各部分データ列に一致する辞書登録データを辞書メモリ１１０に保存された辞書テーブルから検索する。 Specifically, the first compression unit 120 performs data compression processing as illustrated in FIG. The first compression unit 120 divides the input data string 10 into one or more types of size units, and searches the dictionary table stored in the dictionary memory 110 for dictionary registration data that matches each partial data string.

例えば図３に示されるように、第１の圧縮部１２０は入力データ列１０を２Ｂｙｔｅｓ単位、４Ｂｙｔｅｓ単位および８Ｂｙｔｅｓ単位でそれぞれ分割する。それから、例えば図４に示されるように、第１の圧縮部１２０は、８Ｂｙｔｅｓの部分データ列に対して８Ｂｙｔｅｓ向けのハッシュ関数を適用することによってハッシュ値を得る。第１の圧縮部１２０は、このハッシュ値に関連付けられる辞書アドレスを８Ｂｙｔｅｓ向けのハッシュテーブルから読み出す。第１の圧縮部１２０は、さらに、この辞書アドレスに関連付けられる辞書登録データを８Ｂｙｔｅｓ向けの辞書テーブルから読み出し、上記８Ｂｙｔｅｓの部分データ列と比較する。第１の圧縮部１２０は、４Ｂｙｔｅｓの部分データ列および２Ｂｙｔｅｓの部分データ列についても同様に辞書検索処理を行う。なお、異なるサイズ向けの辞書検索処理の間には依存関係がないので、第１の圧縮部１２０は複数のサイズ向けの辞書検索処理を並列的なパイプライン処理として実現することで高スループットを達成できる。 For example, as illustrated in FIG. 3, the first compression unit 120 divides the input data string 10 in units of 2 bytes, 4 bytes, and 8 bytes. Then, for example, as illustrated in FIG. 4, the first compression unit 120 obtains a hash value by applying a hash function for 8 bytes to the 8-byte partial data string. The first compression unit 120 reads the dictionary address associated with this hash value from the hash table for 8 bytes. Further, the first compression unit 120 reads out the dictionary registration data associated with the dictionary address from the dictionary table for 8 bytes, and compares it with the 8 bytes partial data string. The first compression unit 120 similarly performs a dictionary search process for a 4-byte partial data string and a 2-byte partial data string. Since there is no dependency between dictionary search processes for different sizes, the first compression unit 120 achieves high throughput by realizing dictionary search processes for multiple sizes as parallel pipeline processing. it can.

第１の圧縮部１２０は、各部分データ列が辞書登録データのいずれかに一致するならば当該辞書登録データに関連付けられる辞書アドレスを当該部分データ列の符号語として出力する。なお、第１のサイズ（例えば、２Ｂｙｔｅｓ）の第１の部分データ列と当該第１の部分データ列を包含する第２のサイズ（例えば、４Ｂｙｔｅｓ）の第２の部分データ列との両方が辞書登録データのいずれかに一致するならば、第１の圧縮部１２０は第１の部分データ列ではなく第２の部分データ列を辞書アドレスへと符号化することで第１の圧縮データのサイズを小さくすることができる。 If each partial data string matches any of the dictionary registration data, the first compression unit 120 outputs a dictionary address associated with the dictionary registration data as a code word of the partial data string. Note that both the first partial data string having the first size (for example, 2 bytes) and the second partial data string having the second size (for example, 4 bytes) including the first partial data string are dictionaries. If it matches any of the registered data, the first compression unit 120 encodes the second partial data string instead of the first partial data string into a dictionary address, thereby reducing the size of the first compressed data. Can be small.

他方、第１の圧縮部１２０は、部分データ列が辞書登録データのいずれにも一致しないならば当該部分データ列（辞書不一致データと呼ばれる）をそのまま当該部分データ列の符号語として出力する（すなわち、当該部分データ列をパススルーする）。 On the other hand, if the partial data string does not match any of the dictionary registration data, the first compression unit 120 outputs the partial data string (called dictionary mismatch data) as it is as a code word of the partial data string (that is, , Pass through the partial data string).

すなわち、第１の圧縮データは、入力データ列１０に適用される分割パターン、ならびに、当該分割パターンに対応する各部分データ列と辞書登録データとの一致／不一致を示す辞書一致情報と、当該部分データ列の各々の符号語（すなわち、辞書アドレスまたは辞書不一致データ）とを含む。 That is, the first compressed data includes a division pattern applied to the input data string 10, dictionary matching information indicating a match / mismatch between each partial data string corresponding to the division pattern and dictionary registration data, and the part Each code word of the data string (that is, dictionary address or dictionary mismatch data) is included.

さらに、第１の圧縮部１２０は、将来の入力データ列を効率的に圧縮するために、入力データ列１０を用いて辞書メモリ１１０に保存されたハッシュテーブルおよび辞書テーブルを更新する。すなわち、第１の圧縮部１２０は、入力データ列１０を分割した部分データ列の各々を辞書アドレスと関連付けて辞書テーブルに登録する。そして、第１の圧縮部１２０は、この部分データ列に対応するハッシュ値とこの辞書アドレスとを関連付けてハッシュテーブルに登録する。 Furthermore, the first compression unit 120 updates the hash table and the dictionary table stored in the dictionary memory 110 using the input data sequence 10 in order to efficiently compress the future input data sequence. That is, the first compression unit 120 registers each partial data string obtained by dividing the input data string 10 in the dictionary table in association with the dictionary address. Then, the first compression unit 120 associates the hash value corresponding to this partial data string and this dictionary address and registers them in the hash table.

第２の圧縮部１３０は、入力データ列１０を図示されない外部装置（例えばホスト）から受け取る。第２の圧縮部１３０は、現行の第１の圧縮データおよび過去の第１の圧縮データの比較、ならびに、圧縮済みのデータに相当する過去の入力データ列のうち少なくとも一方に基づいて、入力データ列１０を圧縮することによって第２の圧縮データを得る。第２の圧縮データは、少なくとも前述の辞書アドレスを含まない点で第１の圧縮データとは異なる。第２の圧縮部１３０は、第２の圧縮データを選択部１４０へと出力する。 The second compression unit 130 receives the input data string 10 from an external device (for example, a host) not shown. The second compressing unit 130 compares the current first compressed data with the past first compressed data, and the input data based on at least one of the past input data string corresponding to the compressed data. By compressing the column 10, the second compressed data is obtained. The second compressed data is different from the first compressed data in that it does not include at least the above dictionary address. The second compression unit 130 outputs the second compressed data to the selection unit 140.

選択部１４０は、第１の圧縮データを第１の圧縮部１２０から受け取り、第２の圧縮データを第２の圧縮部１３０から受け取る。選択部１４０は、第１の圧縮データおよび第２の圧縮データのうちデータサイズの小さい一方を選択することによって、圧縮データ１１を得る。選択部１４０は、圧縮データ１１を図示されない外部装置（例えば、ＮＡＮＤフラッシュメモリ）へと出力する。 The selection unit 140 receives first compressed data from the first compression unit 120 and receives second compressed data from the second compression unit 130. The selection unit 140 obtains the compressed data 11 by selecting one of the first compressed data and the second compressed data having a smaller data size. The selection unit 140 outputs the compressed data 11 to an external device (for example, a NAND flash memory) not shown.

図１の圧縮装置は、図５に例示されるデータ圧縮処理を行う。
第１の圧縮部１２０は、辞書メモリ１１０に保存されるハッシュ関数、ハッシュテーブルおよび辞書テーブルを用いて、入力データ列１０を圧縮することによって第１の圧縮データを生成する（ステップＳ２０１）。第１の圧縮部１２０は、入力データ列１０を用いて辞書メモリ１１０に保存されるハッシュテーブルおよび辞書テーブルを更新する（ステップＳ２０２）。 The compression apparatus in FIG. 1 performs data compression processing illustrated in FIG.
The first compression unit 120 generates first compressed data by compressing the input data string 10 using a hash function, a hash table, and a dictionary table stored in the dictionary memory 110 (step S201). The first compression unit 120 updates the hash table and dictionary table stored in the dictionary memory 110 using the input data string 10 (step S202).

第２の圧縮部１３０は、現行の第１の圧縮データおよび過去の第１の圧縮データの比較、ならびに、圧縮済みのデータに相当する過去の入力データ列のうち少なくとも一方に基づいて、入力データ列１０を圧縮することによって第２の圧縮データを生成する（ステップＳ２０３）。なお、ステップＳ２０１、ステップＳ２０２およびステップＳ２０３は、図５は異なる順序で実行されてもよい。 The second compressing unit 130 compares the current first compressed data with the past first compressed data, and the input data based on at least one of the past input data string corresponding to the compressed data. Second compressed data is generated by compressing the column 10 (step S203). Note that step S201, step S202, and step S203 may be performed in a different order from FIG.

選択部１４０は、ステップＳ２０１において生成された第１の圧縮データおよびステップＳ２０２において生成された第２の圧縮データのうちデータサイズの小さい一方を選択することによって、圧縮データ１１を得る（ステップＳ２０４）。ステップＳ２０４の後に図５のデータ圧縮処理は終了する。 The selection unit 140 obtains the compressed data 11 by selecting one of the first compressed data generated in step S201 and the second compressed data generated in step S202, which has a smaller data size (step S204). . After step S204, the data compression process of FIG. 5 ends.

なお、図１の圧縮装置は、図９に例示される圧縮装置に変形することもできる。図９の圧縮装置は、分割部６５０を備えている点で図１の圧縮装置とは異なる。 Note that the compression device in FIG. 1 can be modified to the compression device illustrated in FIG. 9. The compression apparatus of FIG. 9 is different from the compression apparatus of FIG. 1 in that a dividing unit 650 is provided.

分割部６５０は、オリジナルデータ列１２を図示されない外部装置（例えばホスト）から受け取る。分割部６５０は、オリジナルデータ列１２を予め定められた固定サイズに分割することによって、入力データ列１０を得る。 The dividing unit 650 receives the original data string 12 from an external device (for example, a host) not shown. The dividing unit 650 obtains the input data string 10 by dividing the original data string 12 into a predetermined fixed size.

図９の第１の圧縮部１２０および第２の圧縮部１３０は、分割部６５０から固定サイズの入力データ列１０をそれぞれ受け取って圧縮する。 The first compression unit 120 and the second compression unit 130 in FIG. 9 receive the input data string 10 having a fixed size from the division unit 650 and compress it.

図６に例示されるように、第１の実施形態に係る伸長装置は、判定部３１０と、辞書メモリ３２０と、第１の伸長部３３０と、第２の伸長部３４０と、選択部３５０とを含む。この伸長装置は、本実施形態に係る圧縮装置によって生成された圧縮データ２１を伸長することによって固定サイズ（例えば、８Ｂｙｔｅｓ）の伸長データ２２を生成する。 As illustrated in FIG. 6, the decompression apparatus according to the first embodiment includes a determination unit 310, a dictionary memory 320, a first decompression unit 330, a second decompression unit 340, and a selection unit 350. including. This decompressing device creates decompressed data 22 of a fixed size (for example, 8 bytes) by decompressing the compressed data 21 generated by the compressing device according to the present embodiment.

判定部３１０は、圧縮データ２１を図示されない外部装置（例えば、ＮＡＮＤフラッシュメモリ）から受け取る。判定部３１０は、圧縮データ２１が第１の方式および第２の方式のいずれによって圧縮されているかを判定する。判定部３１０は、判定した方式を選択部３５０に通知し、圧縮データ２１を第１の伸長部３３０および第２の伸長部３４０へと出力する。 The determination unit 310 receives the compressed data 21 from an external device (for example, a NAND flash memory) not shown. The determination unit 310 determines whether the compressed data 21 is compressed by the first method or the second method. The determination unit 310 notifies the selection unit 350 of the determined method, and outputs the compressed data 21 to the first decompression unit 330 and the second decompression unit 340.

辞書メモリ３２０には、辞書テーブルが保存され。辞書テーブルは、図１の圧縮装置の辞書メモリ１１０に保存される辞書テーブルと対応する。すなわち、辞書メモリ３２０に保存される辞書テーブルは、辞書アドレスを、過去の伸長データを所定のサイズで分割した部分データ列（辞書登録データとも呼ばれる）と関連付ける。すなわち、辞書テーブルでは、辞書登録データが辞書アドレスと関連付けられている。なお、複数のサイズに対して個別の辞書テーブルが用意されてもよい。辞書メモリ３２０に保存される辞書テーブルは、後述されるように伸長データ２２を用いて更新される。 The dictionary memory 320 stores a dictionary table. The dictionary table corresponds to the dictionary table stored in the dictionary memory 110 of the compression apparatus of FIG. That is, the dictionary table stored in the dictionary memory 320 associates a dictionary address with a partial data string (also called dictionary registration data) obtained by dividing past decompressed data by a predetermined size. That is, in the dictionary table, dictionary registration data is associated with the dictionary address. Individual dictionary tables may be prepared for a plurality of sizes. The dictionary table stored in the dictionary memory 320 is updated using the decompressed data 22 as will be described later.

第１の伸長部３３０は、圧縮データ２１を判定部３１０から受け取る。第１の伸長部３３０は、辞書メモリ３２０に保存される辞書テーブルを用いて、圧縮データ２１を伸長することによって第１の伸長データを得る。なお、圧縮データ２１が第２の方式によって圧縮されている場合には、第１の伸長部３３０は圧縮データ２１を正しく伸長できない。第１の伸長部３３０は、第１の伸長データを選択部３５０へと出力する。 The first decompressing unit 330 receives the compressed data 21 from the determining unit 310. The first decompressing unit 330 obtains first decompressed data by decompressing the compressed data 21 using a dictionary table stored in the dictionary memory 320. Note that when the compressed data 21 is compressed by the second method, the first decompressing unit 330 cannot decompress the compressed data 21 correctly. The first decompression unit 330 outputs the first decompression data to the selection unit 350.

圧縮データ２１が第１の方式によって圧縮されている場合に、当該圧縮データ２１は当該圧縮データ２１に対応する圧縮前データに適用された分割パターン、ならびに、当該分割パターンに対応する各部分データ列と辞書登録データとの一致／不一致を示す辞書一致情報と、当該部分データ列の各々の符号語（すなわち、辞書アドレスまたは辞書不一致データ）とを含む。 When the compressed data 21 is compressed by the first method, the compressed data 21 includes the division pattern applied to the pre-compression data corresponding to the compression data 21 and each partial data string corresponding to the division pattern. Dictionary match information indicating the match / mismatch between the dictionary registration data and each code word of the partial data string (that is, dictionary address or dictionary mismatch data).

第１の伸長部３３０は、圧縮データ２１から辞書一致情報を抽出し、当該辞書一致情報に基づいて各部分データ列の符号語を当該圧縮データ２１からさらに抽出する。そして、第１の伸長部３３０は、辞書一致情報に基づいて各部分データ列の符号語が辞書アドレスに相当するか辞書不一致データに相当するかを判定する。部分データ列の符号語が辞書アドレスに相当するならば、第１の伸長部３３０は辞書テーブルにおいて当該辞書アドレスに関連付けられる辞書登録データを当該部分データ列として取り扱う。他方、部分データ列の符号語が辞書不一致データに相当するならば、第１の伸長部３３０は当該辞書不一致データを当該部分データ列として取り扱う。 The first decompressing unit 330 extracts dictionary match information from the compressed data 21, and further extracts codewords of each partial data string from the compressed data 21 based on the dictionary match information. Then, the first decompressing unit 330 determines whether the code word of each partial data string corresponds to a dictionary address or dictionary mismatch data based on the dictionary match information. If the code word of the partial data string corresponds to the dictionary address, the first decompressing unit 330 treats the dictionary registration data associated with the dictionary address in the dictionary table as the partial data string. On the other hand, if the code word of the partial data string corresponds to the dictionary mismatch data, the first decompressing unit 330 treats the dictionary mismatch data as the partial data string.

第２の伸長部３４０は、圧縮データ２１を判定部３１０から受け取る。第２の伸長部３４０は、過去の圧縮データおよび過去の伸長データのうち少なくとも一方に基づいて、圧縮データ２１を伸長することによって第２の伸長データを得る。なお、圧縮データ２１が第１の方式によって圧縮されている場合には、第２の伸長部３４０は圧縮データ２１を正しく伸長できない。圧縮データ２１が第２の方式によって圧縮されている場合に、当該圧縮データ２１は少なくとも前述の辞書アドレスを含まない。第２の伸長部３４０は、第２の伸長データを選択部３５０へと出力する。 The second decompression unit 340 receives the compressed data 21 from the determination unit 310. The second decompressing unit 340 obtains second decompressed data by decompressing the compressed data 21 based on at least one of past compressed data and past decompressed data. Note that when the compressed data 21 is compressed by the first method, the second decompression unit 340 cannot correctly decompress the compressed data 21. When the compressed data 21 is compressed by the second method, the compressed data 21 does not include at least the aforementioned dictionary address. The second decompression unit 340 outputs the second decompressed data to the selection unit 350.

選択部３５０は、判定部３１０によって判定された方式を通知され、第１の伸長データを第１の伸長部３３０から受け取り、第２の伸長データを第２の伸長部３４０から受け取る。選択部３５０は、第１の伸長データおよび第２の伸長データのうち判定部３１０によって判定された方式に対応する一方を選択することによって、伸長データ２２を得る。選択部３５０は、伸長データ２２を図示されない外部装置（例えばホスト）へと出力する。 The selection unit 350 is notified of the method determined by the determination unit 310, receives the first decompressed data from the first decompressing unit 330, and receives the second decompressed data from the second decompressing unit 340. The selection unit 350 obtains the decompressed data 22 by selecting one of the first decompressed data and the second decompressed data that corresponds to the method determined by the determination unit 310. The selection unit 350 outputs the decompressed data 22 to an external device (for example, a host) not shown.

さらに、選択部３５０は、伸長データ２２を用いて辞書メモリ３２０に保存された辞書テーブルを更新する。すなわち、選択部３５０は、伸長データ２２を分割した部分データ列の各々を辞書アドレスと関連付けて辞書テーブルに登録する。 Further, the selection unit 350 updates the dictionary table stored in the dictionary memory 320 using the decompressed data 22. That is, the selection unit 350 registers each partial data string obtained by dividing the decompressed data 22 in the dictionary table in association with the dictionary address.

なお、選択部３５０は、判定部３１０と同一または類似の機能を備えていてもよい。すなわち、選択部３５０が、圧縮データ２１が第１の方式および第２の方式のいずれによって圧縮されているかを判定し、第１の伸長データおよび第２の伸長データのうち判定した方式に対応する一方を選択してもよい。この場合には、判定部３１０は省略可能である。 Note that the selection unit 350 may have the same or similar function as the determination unit 310. That is, the selection unit 350 determines whether the compressed data 21 is compressed by the first method or the second method, and corresponds to the determined method of the first decompressed data and the second decompressed data. One may be selected. In this case, the determination unit 310 can be omitted.

図６の圧縮装置は、図７に例示されるデータ伸長処理を行う。
判定部３１０は、圧縮データ２１が第１の方式および第２の方式のいずれによって圧縮されているかを判定する（ステップＳ４０１）。第１の伸長部３３０は、辞書メモリ３２０に保存される辞書テーブルを用いて、圧縮データ２１を伸長することによって第１の伸長データを得る（ステップＳ４０２）。 The compression apparatus in FIG. 6 performs data decompression processing illustrated in FIG.
The determination unit 310 determines whether the compressed data 21 is compressed by the first method or the second method (step S401). The first decompressing unit 330 obtains first decompressed data by decompressing the compressed data 21 using the dictionary table stored in the dictionary memory 320 (step S402).

第２の伸長部３４０は、過去の圧縮データおよび過去の伸長データのうち少なくとも一方に基づいて、圧縮データ２１を伸長することによって第２の伸長データを得る（ステップＳ４０３）。なお、ステップＳ４０２およびステップＳ４０３は、図７は異なる順序で実行されてもよい。 The second decompression unit 340 obtains the second decompressed data by decompressing the compressed data 21 based on at least one of the past compressed data and the past decompressed data (step S403). Note that step S402 and step S403 may be executed in a different order from FIG.

選択部３５０は、ステップＳ４０２において生成された第１の伸長データおよびステップＳ４０３において生成された第２の伸長データのうちステップＳ４０１において判定された方式に対応する一方を選択することによって、伸長データ２２を得る（ステップＳ４０４）。 The selection unit 350 selects one of the first decompressed data generated in step S402 and the second decompressed data generated in step S403, which corresponds to the method determined in step S401. Is obtained (step S404).

選択部３５０は、ステップＳ４０４において得られた伸長データ２２を用いて辞書メモリ３２０に保存された辞書テーブルを更新する（ステップＳ４０５）。ステップＳ４０５の後に図７のデータ伸長処理は終了する。 The selection unit 350 updates the dictionary table stored in the dictionary memory 320 using the decompressed data 22 obtained in step S404 (step S405). After step S405, the data decompression process in FIG. 7 ends.

図８に例示されるように、図１の圧縮装置および図６の伸長装置をストレージ５００に接続することによって、入力データ列をロスレス圧縮して保存するストレージ装置が形成される。図８のストレージ装置によれば、図１の圧縮装置によって生成された圧縮データ１１がストレージ５００に書き込まれ、当該ストレージ５００から読み出された圧縮データ２１が図６の伸長装置によって伸長される。図８のストレージ装置は例えばＳＳＤに相当し、ストレージ５００は例えばＮＡＮＤフラッシュメモリに相当する。なお、本実施形態はＳＳＤに限らずＨＤＤにも適用可能である。例えば、ストレージ５００は磁気ディスクであってもよく、図１の圧縮装置および図６の伸長装置がハードディスクコントローラに組み込まれてもよい。 As illustrated in FIG. 8, by connecting the compression device of FIG. 1 and the decompression device of FIG. 6 to the storage 500, a storage device for storing an input data string with lossless compression is formed. According to the storage apparatus of FIG. 8, the compressed data 11 generated by the compression apparatus of FIG. 1 is written to the storage 500, and the compressed data 21 read from the storage 500 is expanded by the expansion apparatus of FIG. The storage device in FIG. 8 corresponds to, for example, an SSD, and the storage 500 corresponds to, for example, a NAND flash memory. Note that this embodiment is applicable not only to SSDs but also to HDDs. For example, the storage 500 may be a magnetic disk, and the compression device of FIG. 1 and the decompression device of FIG. 6 may be incorporated in the hard disk controller.

図８のストレージ装置のハードウェア構成が図１８に例示される。図１８のストレージ装置は、ＳＳＤコントローラ９０１と、入出力Ｉ／Ｆ（インターフェース）９０４と、ＮＡＮＤフラッシュメモリ９０５とを含む。ＳＳＤコントローラ９０１、入出力Ｉ／Ｆ９０４およびＮＡＮＤフラッシュメモリ９０５は、互いにバス接続されている。 The hardware configuration of the storage apparatus of FIG. 8 is illustrated in FIG. 18 includes an SSD controller 901, an input / output I / F (interface) 904, and a NAND flash memory 905. The SSD controller 901, the input / output I / F 904, and the NAND flash memory 905 are connected to each other by a bus.

ＳＳＤコントローラ９０１は、ＮＡＮＤフラッシュメモリ９０５の読み書きなどの種々の制御を行う。ＳＳＤコントローラ９０１は、圧縮装置９０２および伸長装置９０３を含む。圧縮装置９０２は例えば図１の圧縮装置に相当し、伸長装置９０３は例えば図６の伸長装置に相当する。 The SSD controller 901 performs various controls such as reading and writing of the NAND flash memory 905. The SSD controller 901 includes a compression device 902 and a decompression device 903. The compression device 902 corresponds to, for example, the compression device in FIG. 1, and the expansion device 903 corresponds to, for example, the expansion device in FIG.

入出力Ｉ／Ｆ９０４は、図１８のストレージ装置と図示されない外部装置（例えばホスト）との間でデータをやり取りする。入出力Ｉ／Ｆ９０４は、例えばＵＳＢを用いて実装可能である。ＮＡＮＤフラッシュメモリ９０５は、図８のストレージ５００に相当する。 The input / output I / F 904 exchanges data between the storage apparatus of FIG. 18 and an external apparatus (for example, a host) not shown. The input / output I / F 904 can be mounted using, for example, a USB. The NAND flash memory 905 corresponds to the storage 500 in FIG.

以上説明したように、第１の実施形態に係る圧縮装置は、圧縮済みのデータに相当する過去の入力データ列を所定のサイズで分割した部分データ列が登録された辞書を用いて入力データ列を圧縮することで第１の圧縮データを生成する。さらに、この圧縮装置は、現行の第１の圧縮データおよび過去の第１の圧縮データの比較、ならびに、過去の入力データ列のうち少なくとも一方に基づいて入力データ列を圧縮することで第２の圧縮データを生成する。そして、この圧縮装置は、第１の圧縮データおよび第２の圧縮データのうちデータサイズの小さい一方を選択してする。すなわち、この圧縮装置は、第１の圧縮データの圧縮率が低い場合（例えば、入力データ列を分割した部分データ列の大部分が辞書登録データと一致しない場合）には第２の圧縮データを選択できる。 As described above, the compression apparatus according to the first embodiment uses an input data string using a dictionary in which partial data strings obtained by dividing a past input data string corresponding to compressed data by a predetermined size are registered. Is compressed to generate first compressed data. Further, the compression apparatus compares the current first compressed data and the past first compressed data, and compresses the input data string based on at least one of the past input data strings, thereby generating the second data Generate compressed data. The compression apparatus selects one of the first compressed data and the second compressed data having a smaller data size. That is, when the compression rate of the first compressed data is low (for example, when most of the partial data sequence obtained by dividing the input data sequence does not match the dictionary registration data), the compression device converts the second compressed data. You can choose.

故に、この圧縮装置によれば、安定的に高い圧縮率で入力データ列をロスレス圧縮することができる。また、この圧縮装置は、第１の圧縮データを生成するものの第２の圧縮データを生成しない従来の圧縮装置と比べて、スループットの低下およびハードウェアの増加をほとんど招くことなく実装可能である。 Therefore, according to this compression apparatus, it is possible to losslessly compress the input data string at a high compression rate. In addition, this compression apparatus can be mounted with almost no decrease in throughput and increase in hardware compared to a conventional compression apparatus that generates first compressed data but does not generate second compressed data.

本実施形態に係る伸長装置は、本実施形態に係る圧縮装置によって生成された圧縮データを伸長できる。本実施形態に係るストレージ装置は、この圧縮データを保存することで高い応答性および信頼性を達成できる。 The decompression device according to the present embodiment can decompress the compressed data generated by the compression device according to the present embodiment. The storage apparatus according to the present embodiment can achieve high responsiveness and reliability by storing this compressed data.

（第２の実施形態）
第２の実施形態に係る圧縮装置において、第１の圧縮部１２０は図１０に例示される構造を持つ第１の圧縮データを生成する。図１０の例では、８Ｂｙｔｅｓデータ“ＡＢＣＤＥＦ１２”が入力データ列１０として取り扱われる。 (Second Embodiment)
In the compression apparatus according to the second embodiment, the first compression unit 120 generates first compressed data having the structure illustrated in FIG. In the example of FIG. 10, 8 bytes data “ABCDEF12” is handled as the input data string 10.

第１の圧縮部１２０は、入力データ列１０の部分データ列“ＡＢＣＤＥＦ１２”、“ＡＢＣＤ”、“ＥＦ１２”、“ＡＢ”、“ＣＤ”、“ＥＦ”および“１２”についてそれぞれ辞書検索処理を行う。そして、部分データ列“ＡＢＣＤ”および“１２”が辞書登録データのいずれかに一致し、残りの部分データ列“ＥＦ”が辞書登録データのいずれにも一致しなかったとする。 The first compression unit 120 performs dictionary search processing on the partial data strings “ABCDEF12”, “ABCD”, “EF12”, “AB”, “CD”, “EF”, and “12” of the input data string 10, respectively. . Then, it is assumed that the partial data strings “ABCD” and “12” match any of the dictionary registration data, and the remaining partial data string “EF” does not match any of the dictionary registration data.

第１の圧縮部１２０は、入力データ列１０に適用される分割パターン（例えば、４Ｂｙｔｅｓ｜２Ｂｙｔｅｓ｜２Ｂｙｔｅｓ）、ならびに、当該分割パターンに対応する各部分データ列と辞書登録データとの一致／不一致（例えば、一致｜不一致｜一致）を示す辞書一致情報を出力する。具体的には、８Ｂｙｔｅｓの入力データ列１０を２Ｂｙｔｅｓ単位、４Ｂｙｔｅｓ単位および８Ｂｙｔｅｓ単位でそれぞれ分割する場合に、部分データ列の出現パターンおよび当該分割パターンに対応する各部分データ列と辞書登録データとの一致パターンの組み合わせの総数は２６個である。故に、辞書一致情報は５ｂｉｔｓで表現することができる。 The first compression unit 120 divides the pattern applied to the input data string 10 (for example, 4 bytes | 2 bytes | 2 bytes), and matches / mismatches each partial data string corresponding to the divided pattern and the dictionary registration data ( For example, dictionary match information indicating “match | mismatch | match” is output. Specifically, when the 8-byte input data string 10 is divided in units of 2 bytes, 4 bytes, and 8 bytes, the appearance pattern of the partial data string and each partial data string corresponding to the divided pattern and the dictionary registration data The total number of matching pattern combinations is 26. Therefore, the dictionary match information can be expressed in 5 bits.

また、第１の圧縮部１２０は、辞書一致データに相当する部分データ列“ＡＢＣＤ”および“１２”に一致した辞書登録データに関連付けられる辞書アドレスを当該部分データ列“ＡＢＣＤ”および“１２”の符号語として出力する。辞書アドレスのデータサイズは、辞書テーブルのエントリ数に依存する。具体的には、２５６個の辞書登録データが辞書テーブルによって管理されているならば、辞書アドレスは８ｂｉｔｓで表現することができる。 In addition, the first compression unit 120 sets the dictionary address associated with the dictionary registration data corresponding to the partial data strings “ABCD” and “12” corresponding to the dictionary matching data to the partial data strings “ABCD” and “12”. Output as codeword. The data size of the dictionary address depends on the number of entries in the dictionary table. Specifically, if 256 dictionary registration data are managed by the dictionary table, the dictionary address can be expressed in 8 bits.

さらに、第１の圧縮部１２０は、辞書不一致データに相当する部分データ列“ＥＦ”をそのまま当該部分データ列の符号語として出力する。具体的には、部分データ列“ＥＦ”のデータサイズは２Ｂｙｔｅｓであるから、符号語のデータサイズも同じ２Ｂｙｔｅｓ（１６ｂｉｔｓ）となる。 Further, the first compression unit 120 outputs the partial data string “EF” corresponding to the dictionary mismatch data as it is as the code word of the partial data string. Specifically, since the data size of the partial data string “EF” is 2 bytes, the data size of the code word is also 2 bytes (16 bits).

従って、図１０の例によれば、入力データ列１０のデータサイズが８Ｂｙｔｅｓ（６４ｂｉｔｓ）であるのに対して、第１の圧縮データのデータサイズは約３．６Ｂｙｔｅｓ（３７ｂｉｔｓ）である。故に、第１の圧縮データの領域節約率（１−圧縮後のデータサイズ／圧縮前のデータサイズ）は約４２％（＝１−３７／６４）と見積もることができる。 Therefore, according to the example of FIG. 10, the data size of the input data string 10 is 8 bytes (64 bits), whereas the data size of the first compressed data is about 3.6 bytes (37 bits). Therefore, the area saving ratio (1-data size after compression / data size before compression) of the first compressed data can be estimated to be about 42% (= 1-37 / 64).

以上説明したように、第２の実施形態に係る圧縮装置は、圧縮済みのデータに相当する過去の入力データ列を所定のサイズで分割した部分データ列が登録された辞書を用いて入力データ列を圧縮することで第１の圧縮データを生成する。この第１の圧縮データは、辞書一致情報を含める必要はあるものの、入力データ列を分割した部分データ列の大部分が辞書登録データと一致する場合には高い圧縮率を達成できる。 As described above, the compression apparatus according to the second embodiment uses the dictionary in which a partial data sequence obtained by dividing a past input data sequence corresponding to compressed data by a predetermined size is registered. Is compressed to generate first compressed data. Although the first compressed data needs to include dictionary matching information, a high compression ratio can be achieved when most of the partial data string obtained by dividing the input data string matches the dictionary registration data.

なお、辞書一致情報および辞書不一致データの発生頻度には偏りがあると考えられる。従って、第１の圧縮部は、係るデータに対してＨｕｆｆｍａｎ符号化などの可変長符号化をさらに適用することによって、第１の圧縮データの期待圧縮率を向上させることができる。 Note that the occurrence frequency of dictionary match information and dictionary mismatch data is considered to be biased. Therefore, the first compression unit can improve the expected compression rate of the first compressed data by further applying variable length coding such as Huffman coding to the data.

（第３の実施形態）
第３の実施形態に係る圧縮装置は、図１１に例示される第２の圧縮部７３０を備える点で図１または図６の圧縮装置と異なる。第２の圧縮部７３０は、第１の圧縮データを第１の圧縮部１２０から受け取る。第２の圧縮部７３０は、現行の第１の圧縮データおよび前回の第１の圧縮データの比較に基づいて、入力データ列１０を圧縮することによって第２の圧縮データを得る。第２の圧縮部７３０は、第２の圧縮データを選択部１４０へと出力する。第２の圧縮部７３０は、記憶部７３１と、比較部７３２と、符号化部７３３とを含む。 (Third embodiment)
The compression apparatus according to the third embodiment is different from the compression apparatus of FIG. 1 or FIG. 6 in that the second compression unit 730 illustrated in FIG. 11 is provided. The second compression unit 730 receives the first compressed data from the first compression unit 120. The second compression unit 730 obtains second compressed data by compressing the input data sequence 10 based on the comparison between the current first compressed data and the previous first compressed data. The second compression unit 730 outputs the second compressed data to the selection unit 140. The second compression unit 730 includes a storage unit 731, a comparison unit 732, and an encoding unit 733.

記憶部７３１には、前回の第１の圧縮データが保存されている。記憶部７３１に保存された前回の第１の圧縮データは、比較部７３２によって読み出される。なお、記憶部７３１に保存される前回の第１の圧縮データは、全部でなく一部であってもよい。例えば、前回の第１の圧縮データのうち辞書一致情報および辞書アドレスは記憶部７３１に保存され、当該前回の第１の圧縮データのうち辞書不一致データは記憶部７３１に保存されなくてもよい。 The storage unit 731 stores the previous first compressed data. The previous first compressed data stored in the storage unit 731 is read by the comparison unit 732. Note that the previous first compressed data stored in the storage unit 731 may be a part of the compressed data. For example, the dictionary match information and the dictionary address in the previous first compressed data may be stored in the storage unit 731, and the dictionary mismatch data in the previous first compressed data may not be stored in the storage unit 731.

比較部７３２は、第１の圧縮部１２０から現行の第１の圧縮データを受け取り、記憶部７３１から前回の第１の圧縮データを受け取る。比較部７３２は、現行の第１の圧縮データを前回の第１の圧縮データと比較する。具体的には、比較部７３２は、現行の第１の圧縮データに含まれる各情報要素（例えば、辞書一致情報、辞書アドレスおよび辞書不一致データ）が、過去の第１の圧縮データに含まれる対応する情報要素と一致するか否かを判定する。比較部７３２は、比較結果を符号化部７３３に通知する。 The comparison unit 732 receives the current first compressed data from the first compression unit 120 and receives the previous first compressed data from the storage unit 731. The comparison unit 732 compares the current first compressed data with the previous first compressed data. Specifically, the comparison unit 732 corresponds to each information element (for example, dictionary match information, dictionary address, and dictionary mismatch data) included in the current first compressed data included in the past first compressed data. It is determined whether or not the information element matches. The comparison unit 732 notifies the encoding unit 733 of the comparison result.

符号化部７３３は、比較部７３２によって通知された比較結果に従い、現行の第１の圧縮データに含まれる情報要素のうち過去の第１の圧縮データに含まれる対応する情報要素と一致する情報要素をフラグ（以降の説明においてスキップフラグと呼ぶ）によって置き換えることで、第２の圧縮データを生成する。スキップフラグによって置き換えられた情報要素は、伸長装置への伝送がスキップされる。 In accordance with the comparison result notified by the comparison unit 732, the encoding unit 733 matches the corresponding information element included in the past first compressed data among the information elements included in the current first compressed data. Is replaced with a flag (referred to as a skip flag in the following description) to generate second compressed data. The information element replaced by the skip flag is skipped from being transmitted to the decompression device.

具体的には、図１２に例示されるように、現行の第１の圧縮データに含まれる情報要素のうち辞書一致情報（Ｆｌａｇ＃１）、辞書アドレス（Ａｄｒｓ＃１）および辞書不一致データ（Ｄａｔａ＃１）が過去の第１の圧縮データに含まれる対応する情報要素（すなわち、辞書一致情報（Ｆｌａｇ＃０）、辞書アドレス（Ａｄｒｓ＃０）および辞書不一致データ（Ｄａｔａ＃０））と一致するとする。この場合に、符号化部７３３は現行の第１の圧縮データ全体をスキップフラグによって置き換えることで第２の圧縮データを生成する。例えば、特定の値（０など）を用いた初期化が行われる場合には、入力データ列１０には同一の値が連続して出現するかもしれない。係る場合には、第１の圧縮データにおいても同一の辞書一致情報、辞書アドレスおよび辞書不一致データが連続して出現することになる。従って、選択部１４０は、第２の圧縮データを選択して出力することで高い圧縮率を達成できる。 Specifically, as illustrated in FIG. 12, among the information elements included in the current first compressed data, dictionary match information (Flag # 1), dictionary address (Adrs # 1), and dictionary mismatch data (Data). When # 1) matches the corresponding information element included in the past first compressed data (that is, dictionary match information (Flag # 0), dictionary address (Adrs # 0), and dictionary mismatch data (Data # 0)) To do. In this case, the encoding unit 733 generates the second compressed data by replacing the entire current first compressed data with the skip flag. For example, when initialization using a specific value (such as 0) is performed, the same value may appear continuously in the input data string 10. In such a case, the same dictionary match information, dictionary address, and dictionary mismatch data appear continuously in the first compressed data. Therefore, the selection unit 140 can achieve a high compression rate by selecting and outputting the second compressed data.

或いは、図１３に例示されるように、現行の第１の圧縮データに含まれる情報要素のうち辞書一致情報（Ｆｌａｇ＃１）および辞書アドレス（Ａｄｒｓ＃１）が過去の第１の圧縮データに含まれる対応する情報要素（すなわち、辞書一致情報（Ｆｌａｇ＃０）および辞書アドレス（Ａｄｒｓ＃０））と一致するとする。この場合に、符号化部７３３は現行の第１の圧縮データのうち辞書一致情報および辞書アドレスをスキップフラグによって置き換えることで第２の圧縮データを生成する。例えば、本実施形態に係る圧縮装置に接続される外部装置（ホストなど）に含まれるプロセッサは、２Ｂｙｔｅｓ、４Ｂｙｔｅｓ、８Ｂｙｔｅｓなどの２のべき乗Ｂｙｔｅｓのサイズ単位でデータを取り扱うことがあり、入力データ列１０が当該サイズに応じた周期性を持つかもしれない。係る場合には、第１の圧縮データにおいて同一の辞書一致情報および辞書アドレスが連続して出現する可能性がある。従って、選択部１４０は、第２の圧縮データを選択して出力することで高い圧縮率を達成できる。 Alternatively, as illustrated in FIG. 13, among the information elements included in the current first compressed data, the dictionary match information (Flag # 1) and the dictionary address (Adrs # 1) are included in the past first compressed data. It is assumed that the corresponding information elements included (that is, the dictionary match information (Flag # 0) and the dictionary address (Adrs # 0)) match. In this case, the encoding unit 733 generates second compressed data by replacing the dictionary match information and the dictionary address in the current first compressed data with a skip flag. For example, a processor included in an external device (such as a host) connected to the compression device according to the present embodiment may handle data in units of the size of powers of 2 bytes such as 2 bytes, 4 bytes, and 8 bytes. 10 may have periodicity depending on the size. In such a case, the same dictionary matching information and dictionary address may appear successively in the first compressed data. Therefore, the selection unit 140 can achieve a high compression rate by selecting and outputting the second compressed data.

以上説明したように、第３の実施形態に係る圧縮装置は、現行の第１の圧縮データのうち前回の第１の圧縮データと一致する情報要素をスキップフラグに置き換えることで第２の圧縮データを生成する。従って、この圧縮装置によれば、第１の圧縮データにおいて同一の情報要素が連続して出現する場合に、第２の圧縮データを選択して出力することで高い圧縮率を達成できる。 As described above, the compression apparatus according to the third embodiment replaces the information element that matches the previous first compressed data in the current first compressed data with the skip flag to replace the second compressed data. Is generated. Therefore, according to this compression apparatus, when the same information element appears continuously in the first compressed data, a high compression ratio can be achieved by selecting and outputting the second compressed data.

なお、本実施形態に係る圧縮装置に含まれる第２の圧縮部を実装するためには、符号化（すなわち圧縮）を行う符号化部に加えて、記憶部（すなわち、前回の第１の圧縮データの一部または全部を保存するためのハードウェア）および比較部（すなわち、現行の第１の圧縮データに含まれる各情報要素を前回の第１の圧縮データに含まれる対応する情報要素と比較するためのハードウェア）が必要となる。しかしながら、これら記憶部および比較部を増設することによる影響（コスト、消費電力、回路面積など）は限定的である。 In order to implement the second compression unit included in the compression apparatus according to the present embodiment, in addition to the encoding unit that performs encoding (that is, compression), the storage unit (that is, the previous first compression) Hardware for storing part or all of data) and a comparison unit (that is, each information element included in the current first compressed data is compared with a corresponding information element included in the previous first compressed data) Hardware) is required. However, the influence (cost, power consumption, circuit area, etc.) due to the addition of the storage unit and the comparison unit is limited.

（第４の実施形態）
第４の実施形態に係る圧縮装置は、図１４に例示される第２の圧縮部８３０を備える点で図１または図６の圧縮装置と異なる。第２の圧縮部８３０は、入力データ列１０を図示されない外部装置（例えばホスト）から受け取る。第２の圧縮部８３０は、圧縮済みのデータに相当する前回の入力データ列における特定サイズの最頻値に基づいて入力データ列１０を圧縮することによって第２の圧縮データを得る。第２の圧縮部８３０は、第２の圧縮データを選択部１４０へと出力する。第２の圧縮部８３０は、記憶部８３１と、比較部８３２と、最頻値算出部８３３と、符号化部８３４とを含む。 (Fourth embodiment)
The compression device according to the fourth embodiment is different from the compression device of FIG. 1 or 6 in that the second compression unit 830 illustrated in FIG. 14 is provided. The second compression unit 830 receives the input data string 10 from an external device (for example, a host) not shown. The second compression unit 830 obtains the second compressed data by compressing the input data sequence 10 based on the mode value of the specific size in the previous input data sequence corresponding to the compressed data. The second compression unit 830 outputs the second compressed data to the selection unit 140. Second compression unit 830 includes a storage unit 831, a comparison unit 832, a mode value calculation unit 833, and an encoding unit 834.

記憶部８３１には、前回の入力データ列が保存されている。記憶部８３１に保存された前回の入力データ列は、最頻値算出部８３３によって読み出される。最頻値算出部８３３は、記憶部８３１から前回の入力データ列を読み出す。最頻値算出部８３３は、前回の入力データ列を上記特定サイズで分割した部分データ列に基づいて、特定サイズの各データパターンの出現頻度をカウントすることによって最頻値を算出する。最頻値算出部８３３は、最頻値を比較部８３２に通知する。 The storage unit 831 stores the previous input data string. The previous input data string stored in the storage unit 831 is read by the mode value calculation unit 833. The mode value calculation unit 833 reads the previous input data string from the storage unit 831. The mode value calculation unit 833 calculates the mode value by counting the appearance frequency of each data pattern of the specific size based on the partial data sequence obtained by dividing the previous input data sequence by the specific size. The mode value calculation unit 833 notifies the mode value to the comparison unit 832.

比較部８３２は、入力データ列１０を受け取り、最頻値算出部８３３によって最頻値を通知される。比較部８３２は、入力データ列１０を上記特定サイズで分割した部分データ列の各々を最頻値と比較する。具体的には、比較部８３２は、入力データ列１０を特定サイズで分割した部分データ列の各々が、最頻値と一致するか否かを判定する。比較部８３２は、比較結果を符号化部８３４に通知する。 The comparison unit 832 receives the input data string 10 and is notified of the mode value by the mode value calculation unit 833. The comparison unit 832 compares each partial data string obtained by dividing the input data string 10 with the specific size with the mode value. Specifically, the comparison unit 832 determines whether each of the partial data strings obtained by dividing the input data string 10 with a specific size matches the mode value. The comparison unit 832 notifies the comparison unit 834 of the comparison result.

符号化部８３４は、比較部８３２によって通知された比較結果に従い、入力データ列１０を圧縮することで第２の圧縮データを生成する。第２の圧縮データは、入力データ列１０を上記特定サイズで分割した各部分データ列と最頻値との一致／不一致を示す最頻値一致情報と、上記部分データ列のうち最頻値と一致しなかった部分データ列（以降、最頻値不一致データと呼ぶ）の符号語とを含む。すなわち、第２の圧縮データは、上記部分データ列のうち最頻値と一致した部分データ列の符号語を含まない。 The encoding unit 834 generates the second compressed data by compressing the input data string 10 according to the comparison result notified by the comparison unit 832. The second compressed data includes mode value match information indicating a match / mismatch between each partial data string obtained by dividing the input data string 10 by the specific size and the mode value, and the mode value among the partial data strings. And a code word of a partial data string that does not match (hereinafter referred to as mode value mismatch data). That is, the second compressed data does not include the code word of the partial data string that matches the mode value among the partial data strings.

本実施形態に係る圧縮装置は、図１５に例示されるデータ圧縮処理を行う。図１５の例では、入力データ列１０のデータサイズは１６Ｂｙｔｅｓであり、上記特定サイズは１Ｂｙｔｅである。この圧縮装置は、辞書マッチングに基づく第１の方式および最頻値マッチングに基づく第２の方式の両方を試行しデータサイズの小さい一方の圧縮データを選択することで、安定的に高い圧縮率を達成できる。ある種のデータでは所与のブロックにおける最頻値が次のブロックの最頻値と一致する可能性が高いので、係るデータを取り扱う場合には最頻値マッチングに基づく第２の方式によって高い圧縮率を達成できる。 The compression apparatus according to this embodiment performs data compression processing illustrated in FIG. In the example of FIG. 15, the data size of the input data string 10 is 16 bytes, and the specific size is 1 byte. This compression apparatus tries both the first method based on dictionary matching and the second method based on mode matching, and selects one compressed data with a small data size, thereby stably increasing the compression rate. Can be achieved. For some types of data, the mode value in a given block is likely to match the mode value in the next block, so when handling such data, high compression is achieved by the second method based on mode value matching. Rate can be achieved.

図１５の例では、第１の圧縮データおよび第２の圧縮データのいずれにも、コーデック（圧縮方式）を示す情報要素（Ｃｏｄｅｃ）が含められる。コーデックが第１の方式および第２の方式のいずれかである場合には、係る情報要素は１ｂｉｔで表現できる。伸長装置は、係る情報要素に基づいて圧縮データが第１の方式および第２の方式のいずれによって圧縮されているかを判定できる。加えて、第１の圧縮データは、前述の辞書一致情報（Ｆｌａｇ）、辞書アドレス（Ａｄｒｓ）および辞書不一致データ（Ｄａｔａ）を含む。他方、第２の圧縮データは、最頻値一致情報（Ｍａｔｃｈ）および最頻値不一致データ（Ｄａｔａ）を含む。 In the example of FIG. 15, an information element (Codec) indicating a codec (compression method) is included in both the first compressed data and the second compressed data. When the codec is either the first method or the second method, the information element can be expressed by 1 bit. The decompression device can determine whether the compressed data is compressed by the first method or the second method based on the information element. In addition, the first compressed data includes the aforementioned dictionary match information (Flag), dictionary address (Adrs), and dictionary mismatch data (Data). On the other hand, the second compressed data includes mode value match information (Match) and mode value mismatch data (Data).

最頻値一致情報は、１６Ｂｙｔｅｓの入力データ列１０を１Ｂｙｔｅで分割した１６個の部分データ列の各々が１Ｂｙｔｅの最頻値と一致するか否かを１ｂｉｔ（１／０）で表現する。故に、最頻値一致情報のデータサイズは２Ｂｙｔｅｓ（１６ｂｉｔｓ）となる。最頻値不一致データは、上記１６個の部分データ列のうち最頻値と一致しなかった部分データ列を伝送するために用意され、データサイズはそれぞれ１Ｂｙｔｅとなる。図１５の例では、２つの部分データ列が最頻値と一致しなかったので、最頻値不一致データのデータサイズは合計２Ｂｙｔｅｓである。故に、図１５の例によれば、第２の圧縮データのデータサイズは合計約４Ｂｙｔｅｓである。 The mode value match information expresses, in 1 bit (1/0), whether or not each of 16 partial data strings obtained by dividing the 16-byte input data string 10 by 1 byte matches the 1-byte mode value. Therefore, the data size of the mode value match information is 2 bytes (16 bits). The mode value mismatch data is prepared to transmit a partial data sequence that does not match the mode value among the 16 partial data sequences, and each has a data size of 1 byte. In the example of FIG. 15, since the two partial data strings did not match the mode value, the data size of the mode value mismatch data is 2 bytes in total. Therefore, according to the example of FIG. 15, the data size of the second compressed data is about 4 bytes in total.

なお、上記特定サイズは１Ｂｙｔｅに限定されない。また、第２の圧縮部８３０は、複数種類の特定サイズについて最頻値を算出し、第２の圧縮データのデータサイズを最小化させる最頻値を選択してもよい。 The specific size is not limited to 1 byte. The second compression unit 830 may calculate a mode value for a plurality of types of specific sizes, and may select a mode value that minimizes the data size of the second compressed data.

例えば、本実施形態に係る圧縮装置は図１６に示されるようにデータ圧縮処理を行ってもよい。図１６の例では、入力データ列１０のデータサイズは８Ｂｙｔｅｓであり、特定サイズは２Ｂｙｔｅｓである。入力データ列１０は、“００１２００００”、“００１４００００”、“００１６００００”の順で圧縮装置に与えられている。圧縮装置は、これら“００１２００００”、“００１４００００”および“００１６００００”に対し、辞書マッチングに基づく第１の方式および最頻値マッチングに基づく第２の方式の両方を試行しデータサイズの小さい一方の圧縮データを選択することで、安定的に高い圧縮率を達成できる。 For example, the compression apparatus according to the present embodiment may perform data compression processing as shown in FIG. In the example of FIG. 16, the data size of the input data string 10 is 8 bytes, and the specific size is 2 bytes. The input data string 10 is given to the compression device in the order of “00120000”, “00140000”, and “00160000”. The compression apparatus tries both “00120000”, “00140000”, and “00160000” by using both the first method based on dictionary matching and the second method based on mode matching, and compressing one of the smaller data sizes. By selecting data, a high compression ratio can be achieved stably.

なお、図１６の例によれば、“００１４００００”および“００１６００００”の圧縮データに関して、辞書不一致データおよび最頻値不一致データは同一である。故に、圧縮装置は、辞書一致情報および辞書アドレスのデータサイズと最頻値一致情報のデータサイズとを比較することで、データサイズの小さい圧縮データを選択することができる。 In the example of FIG. 16, the dictionary mismatch data and the mode value mismatch data are the same for the compressed data of “00140000” and “00160000”. Therefore, the compression apparatus can select compressed data having a small data size by comparing the data size of the dictionary match information and dictionary address with the data size of the mode value match information.

なお、図１５の例では、第１の圧縮データおよび第２の圧縮データのいずれにも、コーデック（圧縮方式）を示す情報要素（Ｃｏｄｅｃ）が用いられているが、例えば図１７に示されるように、辞書一致情報（Ｆｌａｇ）が係る情報要素を表現できるように拡張されてもよい。 In the example of FIG. 15, an information element (Codec) indicating a codec (compression method) is used for both the first compressed data and the second compressed data. For example, as shown in FIG. Further, the dictionary matching information (Flag) may be extended so that the information element can be expressed.

図１７の例では、第１の圧縮データおよび第２の圧縮データのいずれにも、コーデックを示す情報要素（Ｆｌａｇ）が含められる。この情報要素（Ｆｌａｇ）は、辞書一致情報の拡張版に相当するから、コーデックが第１の方式である場合には、入力データ列１０に適用される分割パターン、ならびに、当該分割パターンに対応する各部分データ列と辞書登録データとの一致／不一致をさらに示す。換言すれば、この情報要素（Ｆｌａｇ）は、辞書一致情報に相当するＮ（例えば２６）種類の値およびコーデックが第２の方式であることを示す値との合計Ｎ＋１種類の値を取り得る。この情報要素（Ｆｌａｇ）は、ｃｅｉｌ（ｌｏｇ_２（Ｎ＋１））ｂｉｔｓで表現可能である。なお、ｃｅｉｌ（ｘ）は、ｘ以上の最小の整数を返す天井関数である。 In the example of FIG. 17, an information element (Flag) indicating a codec is included in both the first compressed data and the second compressed data. Since this information element (Flag) corresponds to an extended version of dictionary matching information, when the codec is the first method, it corresponds to the division pattern applied to the input data string 10 and the division pattern. The match / mismatch between each partial data string and the dictionary registration data is further shown. In other words, this information element (Flag) can take a total of N + 1 types of values including N (for example, 26) types of values corresponding to the dictionary matching information and a value indicating that the codec is the second system. This information element (Flag) can be expressed by ceil (log ₂ (N + 1)) bits. Note that ceil (x) is a ceiling function that returns the smallest integer equal to or greater than x.

伸長装置は、上記情報要素（Ｆｌａｇ）に基づいて圧縮データが第１の方式および第２の方式のいずれによって圧縮されているかを判定できる。加えて、第１の圧縮データは、前述の辞書アドレス（Ａｄｒｓ）および辞書不一致データ（Ｄａｔａ）を含む。他方、第２の圧縮データは、最頻値一致情報（Ｍａｔｃｈ）および最頻値不一致データ（Ｄａｔａ）を含む。 The decompression device can determine whether the compressed data is compressed by the first method or the second method based on the information element (Flag). In addition, the first compressed data includes the aforementioned dictionary address (Adrs) and dictionary mismatch data (Data). On the other hand, the second compressed data includes mode value match information (Match) and mode value mismatch data (Data).

以上説明したように、第４の実施形態に係る圧縮装置は、前回の入力データ列における特定サイズの最頻値と現行の入力データ列との比較結果に基づいて第２の圧縮データを生成する。第２の圧縮データは、入力データ列を特定サイズで分割した部分データ列のうち最頻値と一致した部分データ列の符号語を含まない。従って、この圧縮装置によれば、入力データ列における最頻値が連続して一致する場合に、第２の圧縮データを選択して出力することで高い圧縮率を達成できる。 As described above, the compression apparatus according to the fourth embodiment generates the second compressed data based on the comparison result between the mode value of the specific size in the previous input data string and the current input data string. . The second compressed data does not include the code word of the partial data string that matches the mode value among the partial data strings obtained by dividing the input data string by the specific size. Therefore, according to this compression apparatus, when the mode value in the input data string continuously matches, a high compression ratio can be achieved by selecting and outputting the second compressed data.

なお、本実施形態に係る圧縮装置に含まれる第２の圧縮部を実装するためには、符号化（すなわち圧縮）を行う符号化部に加えて、記憶部（すなわち、前回の入力データ列を保存するためのハードウェア）、最頻値算出部（すなわち、最頻値を算出するためのハードウェア）および比較部（すなわち、最頻値を現行の第１の入力データ列と比較するためのハードウェア）が必要となる。しかしながら、これら記憶部、最頻値算出部および比較部を増設することによる影響（コスト、消費電力、回路面積など）は限定的である。 In order to implement the second compression unit included in the compression device according to the present embodiment, in addition to the encoding unit that performs encoding (that is, compression), the storage unit (that is, the previous input data string is changed). Hardware for saving), mode value calculation unit (ie, hardware for calculating mode value) and comparison unit (ie, mode value for comparing the mode value with the current first input data string) Hardware). However, the influence (cost, power consumption, circuit area, etc.) due to the addition of the storage unit, the mode value calculation unit, and the comparison unit is limited.

前述の各実施形態において説明されたコーデックは、必要に応じて組み合わせて利用してもよい。例えば、第１の実施形態から第４の実施形態を組み合わせ、辞書マッチングに基づく方式、辞書マッチングに基づく圧縮データの一部をスキップフラグに置き換える方式、ならびに、最頻値マッチングに基づく方式のうちデータサイズを最小化させる方式を選択可能としてもよい。 The codecs described in the above embodiments may be used in combination as necessary. For example, a combination of the first to fourth embodiments, a method based on dictionary matching, a method for replacing a part of compressed data based on dictionary matching with a skip flag, and a method based on mode matching A method for minimizing the size may be selectable.

上記各実施形態の処理の少なくとも一部は、汎用のコンピュータを基本ハードウェアとして用いることでも実現可能である。上記処理を実現するプログラムは、コンピュータで読み取り可能な記録媒体に格納して提供されてもよい。プログラムは、インストール可能な形式のファイルまたは実行可能な形式のファイルとして記録媒体に記憶される。記録媒体としては、磁気ディスク、光ディスク（ＣＤ−ＲＯＭ、ＣＤ−Ｒ、ＤＶＤ等）、光磁気ディスク（ＭＯ等）、半導体メモリなどである。記録媒体は、プログラムを記憶でき、かつ、コンピュータが読み取り可能であれば、何れであってもよい。また、上記処理を実現するプログラムを、インターネットなどのネットワークに接続されたコンピュータ（サーバ）上に格納し、ネットワーク経由でコンピュータ（クライアント）にダウンロードさせてもよい。 At least a part of the processing of each of the above embodiments can also be realized by using a general-purpose computer as basic hardware. A program for realizing the above processing may be provided by being stored in a computer-readable recording medium. The program is stored in the recording medium as an installable file or an executable file. Examples of the recording medium include a magnetic disk, an optical disk (CD-ROM, CD-R, DVD, etc.), a magneto-optical disk (MO, etc.), and a semiconductor memory. The recording medium may be any recording medium as long as it can store the program and can be read by the computer. The program for realizing the above processing may be stored on a computer (server) connected to a network such as the Internet and downloaded to the computer (client) via the network.

本発明のいくつかの実施形態を説明したが、これらの実施形態は、例として提示したものであり、発明の範囲を限定することは意図していない。これら新規な実施形態は、その他の様々な形態で実施されることが可能であり、発明の要旨を逸脱しない範囲で、種々の省略、置き換え、変更を行うことができる。これら実施形態やその変形は、発明の範囲や要旨に含まれるとともに、特許請求の範囲に記載された発明とその均等の範囲に含まれる。 Although several embodiments of the present invention have been described, these embodiments are presented by way of example and are not intended to limit the scope of the invention. These novel embodiments can be implemented in various other forms, and various omissions, replacements, and changes can be made without departing from the scope of the invention. These embodiments and modifications thereof are included in the scope and gist of the invention, and are included in the invention described in the claims and the equivalents thereof.

１０・・・入力データ列
１１，２１・・・圧縮データ
１２・・・オリジナルデータ列
２２・・・伸長データ
１１０，３２０・・・辞書メモリ
１２０・・・第１の圧縮部
１３０，７３０，８３０・・・第２の圧縮部
１４０，３５０・・・選択部
３１０・・・判定部
３３０・・・第１の伸長部
３４０・・・第２の伸長部
５００・・・ストレージ
６５０・・・分割部
７３１，８３１・・・記憶部
７３２，８３２・・・比較部
７３３，８３４・・・符号化部
８３３・・・最頻値算出部
９０１・・・ＳＳＤコントローラ
９０２・・・圧縮装置
９０３・・・伸長装置
９０４・・・入出力Ｉ／Ｆ
９０５・・・ＮＡＮＤフラッシュメモリ DESCRIPTION OF SYMBOLS 10 ... Input data sequence 11, 21 ... Compressed data 12 ... Original data sequence 22 ... Decompression data 110, 320 ... Dictionary memory 120 ... First compression unit 130, 730, 830 ... second compression unit 140,350 ... selection unit 310 ... determination unit 330 ... first decompression unit 340 ... second decompression unit 500 ... storage 650 ... division Units 731, 831 ... Storage unit 732, 832 ... Comparison unit 733, 834 ... Coding unit 833 ... Mode value calculation unit 901 ... SSD controller 902 ... Compression device 903 ...・ Extension device 904: Input / output I / F
905 ... NAND flash memory

Claims

A dictionary memory that stores a dictionary table that associates dictionary addresses with dictionary registration data of a predetermined size;
By compressing the input data string based on a comparison result between each of the partial data strings obtained by dividing the input data string by the predetermined size and the dictionary registration data, at least one of the partial data strings is the dictionary registration data. A first compression unit that generates first compressed data that includes a dictionary address associated with at least one of the dictionary registration data,
A comparison between the first compressed data and the past first compressed data, and a second that does not include the dictionary address by compressing the input data string based on at least one of the past input data strings A second compression unit for generating the compressed data of
And a selection unit that obtains compressed data of the input data string by selecting one of the first compressed data and the second compressed data having a smaller data size.

The compression apparatus according to claim 1, further comprising a dividing unit that obtains the input data string by dividing the original data string at a fixed size.

The second compression unit includes
A storage unit for storing the previous input data string;
A calculation unit that calculates a mode value by counting the appearance frequency of each data pattern of the specific size based on a partial data sequence obtained by dividing the previous input data sequence by a specific size;
A comparison unit that compares each of the partial data strings obtained by dividing the input data string with the specific size with the mode value;
By encoding the input data string based on the comparison result between the input data string and the mode value, the partial data string obtained by dividing the input data string by the specific size matches the mode value. A coding unit that generates the second compressed data including the mode value matching information indicating the mismatch and the code word of the mode value mismatch data that does not match the mode value in the partial data string. Prepare
The compression apparatus according to claim 1.

The second compression unit includes
A storage unit for storing a part or all of the information elements included in the previous first compressed data;
A comparison unit that compares the first compressed data with an information element stored in the storage unit;
Based on the comparison result between the first compressed data and the information element stored in the storage unit, information that matches the information element stored in the storage unit among the information elements included in the first compressed data An encoding unit that generates the second compressed data by replacing an element with a skip flag.
The compression apparatus according to claim 1.

The first compressed data includes a division pattern applied to the input data string, dictionary matching information indicating matching / mismatching between each partial data string corresponding to the division pattern and dictionary registration data, and the partial data Each codeword of the sequence, and
The code word of the partial data string is a dictionary address associated with the dictionary registration data if the partial data string corresponds to dictionary matching data that matches any of the dictionary registration data. If it corresponds to dictionary mismatch data that does not match any of the dictionary registration data, it is the dictionary mismatch data.
The compression apparatus according to claim 4.

If the first compressed data and the previous first compressed data match in the dictionary match information and the dictionary address, the encoding unit includes the dictionary match information included in the first compressed data and The compression apparatus according to claim 5, wherein the second compressed data is generated by replacing the dictionary address with the skip flag.

The encoding unit is included in the first compressed data if the first compressed data and the previous first compressed data match in the dictionary match information, the dictionary address, and the dictionary mismatch data. The compression apparatus according to claim 5, wherein the second compressed data is generated by replacing the dictionary match information, the dictionary address, and the dictionary mismatch data with the skip flag.

2. The compression device according to claim 1, wherein each of the first compressed data and the second compressed data includes an information element indicating whether a codec is a first scheme or a second scheme.

The first compressed data and the second compressed data indicate whether the codec is the first method or the second method, and when the codec is the first method, the input data The compression apparatus according to claim 1, further comprising: a division pattern applied to the column, and an information element further indicating a match / mismatch between each partial data sequence corresponding to the split pattern and the dictionary registration data.

A dictionary memory that stores a dictionary table that associates dictionary addresses with dictionary registration data of a predetermined size;
A code word of each partial data string obtained by dividing the pre-compression data corresponding to the compressed data according to a predetermined division pattern is extracted from the compressed data, and if the partial data string corresponds to one of the dictionary addresses, the dictionary address A first decompression unit that generates first decompressed data by treating the dictionary registration data associated with the data as the partial data string;
A second decompression unit that generates second decompressed data by decompressing the compressed data based on at least one of past compressed data and past decompressed data;
Selecting the first decompressed data if the codec of the compressed data is a first scheme, and selecting the second decompressed data if the codec of the compressed data is a second scheme, A decompression device comprising: a selection unit that obtains decompressed data of the compressed data.

A storage, a compression device, and a decompression device;
The compression device includes:
A dictionary memory that stores a dictionary table that associates dictionary addresses with dictionary registration data of a predetermined size;
By compressing the input data string based on a comparison result between each of the partial data strings obtained by dividing the input data string by the predetermined size and the dictionary registration data, at least one of the partial data strings is the dictionary registration data. A first compression unit that generates first compressed data that includes a dictionary address associated with at least one of the dictionary registration data,
A comparison between the first compressed data and the past first compressed data, and a second that does not include the dictionary address by compressing the input data string based on at least one of the past input data strings A second compression unit for generating the compressed data of
A selection unit that obtains compressed data of the input data string by selecting one of the first compressed data and the second compressed data having a smaller data size; and
The compressed data of the input data string is read and written in the storage,
The extension device
A dictionary memory that stores a dictionary table that associates dictionary addresses with dictionary registration data of a predetermined size;
A code word of each partial data string obtained by dividing the pre-compression data corresponding to the compressed data read from the storage according to a predetermined division pattern is extracted from the compressed data, and the partial data string is set to one of the dictionary addresses. A first decompression unit that generates first decompressed data by handling dictionary registration data associated with the dictionary address as the partial data string, if corresponding,
Second decompression that generates second decompressed data by decompressing compressed data read from the storage based on at least one of compressed data read from the storage in the past and past decompressed data And
If the codec of the compressed data read from the storage is the first method, the first decompressed data is selected, and if the codec of the compressed data read from the storage is the second method A selection unit that obtains decompressed data of the compressed data read from the storage by selecting the second decompressed data;
Storage device.

The storage is a NAND flash memory,
The compression device and the decompression device are incorporated in an SSD (Solid State Drive) controller.
The storage apparatus according to claim 11.

The storage is a magnetic disk;
The compression device and the decompression device are incorporated in a hard disk controller.
The storage apparatus according to claim 11.