JP2022175713A

JP2022175713A - Storage device, information processing system, information processing method and program

Info

Publication number: JP2022175713A
Application number: JP2021082355A
Authority: JP
Inventors: 朋花上浦; Tomoka Kamiura
Original assignee: NEC Platforms Ltd
Current assignee: NEC Platforms Ltd
Priority date: 2021-05-14
Filing date: 2021-05-14
Publication date: 2022-11-25
Anticipated expiration: 2041-05-14
Also published as: JP7215804B2

Abstract

To provide a technology capable of improving the write performance.SOLUTION: When feature data corresponding to first data that is a management unit contained in first write data is to be stored, feature data corresponding to second data that is a management unit subsequent to the first data is stored in association with the feature data corresponding to the first data. When overlap determination is to be performed on second write data, the overlap determination is performed using the feature data corresponding to the first data, the feature data corresponding to the second data is read on the basis of the association with the feature data corresponding to the first data, and the overlap determination is performed using the feature data corresponding to the second data.SELECTED DRAWING: Figure 1

Description

本発明は、ストレージ装置、情報処理システム、情報処理方法、およびプログラムに関する。 The present invention relates to a storage device, an information processing system, an information processing method, and a program.

特許文献１には、重複排除処理を行う計算機システムが開示されている。この計算機システムのホスト計算機は、更新コンテンツを構成する全てのチャンクに対して重複判定処理を行い、コンテンツ管理テーブルを更新する。そして、ホスト計算機は、次回の重複排除処理の際に最新のコンテンツのデータに基づいた重複排除リストを作成する。これにより、ホスト装置からストレージ装置へ送信するチャンクの数を減らすことができる。 Patent Literature 1 discloses a computer system that performs deduplication processing. The host computer of this computer system performs duplication determination processing on all chunks that constitute update content, and updates the content management table. Then, the host computer creates a deduplication list based on the latest content data during the next deduplication process. As a result, the number of chunks transmitted from the host device to the storage device can be reduced.

国際公開第２０１５／０４０７１１号WO2015/040711

ところで、重複排除を行うストレージ装置では、ライトデータとストレージ装置に既に記憶されているデータとが重複しているか否かを調べる際に、ライトデータを所定の管理単位のデータ（例えば一定長のチャンク）に分割し、上記管理単位のデータから算出されるハッシュ値などの特徴データと、ストレージ装置に既に記憶されている特徴データとを比較することが行われる。このような特徴データは、全てのライトデータにおける所定の管理単位のデータごとに記憶される必要があり膨大となる。その結果、ストレージ装置に記憶されている膨大な特徴データのなかから重複判定に用いる特徴データを特定することの負荷が大きく、ライト性能の向上が難しくなる場合がある。 By the way, in a storage device that performs deduplication, when checking whether write data overlaps with data already stored in the storage device, write data is divided into data in a predetermined management unit (for example, chunks of a certain length). ), and feature data such as a hash value calculated from the data in the management unit is compared with feature data already stored in the storage device. Such feature data must be stored for each data in a predetermined management unit in all write data, resulting in an enormous amount of data. As a result, the load of specifying feature data to be used for duplication determination from among the huge amount of feature data stored in the storage device is heavy, and it may be difficult to improve the write performance.

そこでこの発明は、上記課題を解決するストレージ装置、情報処理システム、情報処理方法、およびプログラムを提供することを目的としている。 SUMMARY OF THE INVENTION Accordingly, an object of the present invention is to provide a storage device, an information processing system, an information processing method, and a program that solve the above problems.

上記目的を達成するために、本発明の第１態様によれば、ストレージ装置は、ひと纏まりまたは一連のライトデータに含まれる所定の管理単位のデータごとに、当該データに応じた特徴データを対応付けるライト命令処理手段と、第１ライトデータに含まれる前記管理単位の第１データに対応する前記特徴データを記憶手段に記憶させる場合、前記第１ライトデータに含まれて前記第１データの後続となる前記管理単位の第２データに対応する前記特徴データを、前記第１データに対応する前記特徴データと関連付けて前記記憶手段に記憶させる特徴データ登録手段と、第２ライトデータに対して重複判定を行う場合に、前記第１データに対応する前記特徴データを用いて重複判定を行い、前記第１データに対応する前記特徴データとの関連付けに基づき前記第２データに対応する前記特徴データを読み出し、前記第２データに対応する前記特徴データを用いて重複判定を行う判定処理手段と、を備える。 In order to achieve the above object, according to a first aspect of the present invention, a storage device associates feature data corresponding to each piece of data in a predetermined management unit included in a bundle or a series of write data. and when the feature data corresponding to the first data of the management unit included in the first write data are stored in the storage device, the feature data included in the first write data and subsequent to the first data. a feature data registering means for storing in the storage means the feature data corresponding to the second data of the management unit in which the feature data corresponds to the first data in association with the feature data corresponding to the first data; and duplication determination for the second write data is performed, duplicate determination is performed using the feature data corresponding to the first data, and the feature data corresponding to the second data is read based on association with the feature data corresponding to the first data. and determination processing means for performing overlap determination using the feature data corresponding to the second data.

上記目的を達成するために、本発明の第２態様によれば、情報処理システムは、ホストと、前記ホストと通信可能に接続されたストレージ装置と、を備える。前記ストレージ装置は、ひと纏まりまたは一連のライトデータに含まれる所定の管理単位のデータごとに、当該データに応じた特徴データを対応付けるライト命令処理手段と、第１ライトデータに含まれる前記管理単位の第１データに対応する前記特徴データを記憶手段に記憶させる場合、前記第１ライトデータに含まれて前記第１データの後続となる前記管理単位の第２データに対応する前記特徴データを、前記第１データに対応する前記特徴データと関連付けて前記記憶手段に記憶させる特徴データ登録手段と、第２ライトデータに対して重複判定を行う場合に、前記第１データに対応する前記特徴データを用いて重複判定を行い、前記第１データに対応する前記特徴データとの関連付けに基づき前記第２データに対応する前記特徴データを読み出し、前記第２データに対応する前記特徴データを用いて重複判定を行う判定処理手段と、を含む。 To achieve the above object, according to a second aspect of the present invention, an information processing system includes a host and a storage device communicably connected to the host. The storage device comprises: write command processing means for associating characteristic data corresponding to each piece of data in a predetermined management unit included in a set or a series of write data; When the feature data corresponding to the first data is stored in the storage means, the feature data corresponding to the second data of the management unit, which is included in the first write data and succeeds the first data, is stored in the storage means. a feature data registering means for storing in the storage means in association with the feature data corresponding to the first data; and using the feature data corresponding to the first data when determining duplication for the second write data. read out the feature data corresponding to the second data based on the association with the feature data corresponding to the first data; and perform overlap determination using the feature data corresponding to the second data. and determination processing means for performing.

上記目的を達成するために、本発明の第３態様によれば、情報処理方法は、ひと纏まりまたは一連のライトデータに含まれる所定の管理単位のデータごとに、当該データに応じた特徴データを対応付け、第１ライトデータに含まれる前記管理単位の第１データに対応する前記特徴データを記憶させる場合、前記第１ライトデータに含まれて前記第１データの後続となる前記管理単位の第２データに対応する前記特徴データを、前記第１データに対応する前記特徴データと関連付けて記憶させ、第２ライトデータに対して重複判定を行う場合に、前記第１データに対応する前記特徴データを用いて重複判定を行い、前記第１データに対応する前記特徴データとの関連付けに基づき前記第２データに対応する前記特徴データを読み出し、前記第２データに対応する前記特徴データを用いて重複判定を行う、ことを含む。 In order to achieve the above object, according to a third aspect of the present invention, an information processing method provides, for each data of a predetermined management unit included in a set or a series of write data, feature data corresponding to the data. When storing the feature data corresponding to the first data of the management unit included in the first write data, the first data of the management unit included in the first write data and subsequent to the first data is stored. When the feature data corresponding to the second data is stored in association with the feature data corresponding to the first data, and the duplication determination is performed for the second write data, the feature data corresponding to the first data is stored. to determine duplication, read out the feature data corresponding to the second data based on the association with the feature data corresponding to the first data, and use the feature data corresponding to the second data to determine duplication including making decisions.

上記目的を達成するために、本発明の第４態様によれば、プログラムは、コンピュータに、ひと纏まりまたは一連のライトデータに含まれる所定の管理単位のデータごとに、当該データに応じた特徴データを対応付けることと、第１ライトデータに含まれる前記管理単位の第１データに対応する前記特徴データを記憶させる場合、前記第１ライトデータに含まれて前記第１データの後続となる前記管理単位の第２データに対応する前記特徴データを、前記第１データに対応する前記特徴データと関連付けて記憶させることと、第２ライトデータに対して重複判定を行う場合に、前記第１データに対応する前記特徴データを用いて重複判定を行い、前記第１データに対応する前記特徴データとの関連付けに基づき前記第２データに対応する前記特徴データを読み出し、前記第２データに対応する前記特徴データを用いて重複判定を行うことと、を実行させる。 In order to achieve the above object, according to the fourth aspect of the present invention, a program is provided in a computer for each data in a predetermined management unit included in a set or a series of write data, characteristic data corresponding to the data and when storing the feature data corresponding to the first data of the management unit included in the first write data, the management unit that is included in the first write data and succeeds the first data and storing the feature data corresponding to the second data in association with the feature data corresponding to the first data; duplicate determination is performed using the feature data corresponding to the first data, the feature data corresponding to the second data is read out based on the association with the feature data corresponding to the first data, and the feature data corresponding to the second data is read. and performing a duplicate determination using .

本発明によれば、重複判定に用いる特徴データを特定することの負荷の低減を図ることができ、ライト性能の向上を図ることができる。 According to the present invention, it is possible to reduce the load of specifying feature data used for duplication determination, and to improve write performance.

本発明の一実施形態による情報処理システムを示す概略構成図である。1 is a schematic configuration diagram showing an information processing system according to an embodiment of the present invention; FIG. 本発明の一実施形態によるチャンクアドレス変換テーブルの内容の一例を示す図である。It is a figure which shows an example of the content of the chunk address translation table by one Embodiment of this invention. 本発明の一実施形態によるハッシュテーブルの内容の一例を示す図である。FIG. 4 is a diagram showing an example of the contents of a hash table according to one embodiment of the present invention; 本発明の一実施形態によるハッシュエントリの内容の一例を示す図である。FIG. 4 illustrates an example of the contents of a hash entry according to one embodiment of the present invention; 本発明の一実施形態によるハッシュ値判定結果情報の内容の一例を示す図である。It is a figure which shows an example of the content of the hash value determination result information by one Embodiment of this invention. 本発明の一実施形態による後続ハッシュ情報の内容の一例を示す図である。FIG. 4 is a diagram showing an example of content of subsequent hash information according to an embodiment of the present invention; 本発明の一実施形態によるライト処理の流れを示すフローチャートである。4 is a flow chart showing the flow of write processing according to an embodiment of the present invention; 本発明の一実施形態によるハッシュ値判定処理の流れの詳細を示すフローチャートである。4 is a flowchart showing details of the flow of hash value determination processing according to an embodiment of the present invention; 本発明の一実施形態によるリード処理の流れを示すフローチャートである。4 is a flow chart showing the flow of read processing according to an embodiment of the present invention; 本発明の一実施形態による最小構成のストレージ装置を示す図である。1 is a diagram showing a storage device with a minimum configuration according to one embodiment of the present invention; FIG. 本発明の一実施形態による最小構成のストレージ装置における処理の流れを示すフローチャートである。4 is a flow chart showing the flow of processing in the storage device with the minimum configuration according to one embodiment of the present invention; 本発明の一実施形態によるコンピュータの構成を示す概略構成図である。1 is a schematic configuration diagram showing the configuration of a computer according to one embodiment of the present invention; FIG.

以下、本発明の一実施形態によるストレージ装置を含む情報処理システムについて図面を参照して説明する。 An information processing system including a storage device according to an embodiment of the present invention will be described below with reference to the drawings.

＜１．情報処理システムの構成＞
図１は、本実施形態による情報処理システムを示す概略構成図である。情報処理システムＳは、例えば、ディスクアレイ装置１と、ホスト２とを含む。ディスクアレイ装置１は、「ストレージ装置」の一例である。 <1. Configuration of information processing system>
FIG. 1 is a schematic configuration diagram showing an information processing system according to this embodiment. The information processing system S includes, for example, a disk array device 1 and a host 2 . The disk array device 1 is an example of a "storage device".

ディスクアレイ装置１は、データを不揮発に記憶可能なストレージ装置である。ディスクアレイ装置１は、ホスト２からの制御命令に基づき、データの書き込み、読み出し、または消去などの処理を実行する。例えば、ディスクアレイ装置１は、ホスト２からライト命令を受信する場合、ライト命令に対応するライトデータを、ディスクアレイ装置１の内部の記憶部Ｍに書き込む。ディスクアレイ装置１は、ホスト２からリード命令を受信する場合、ディスクアレイ装置１の内部の記憶部Ｍに記憶されているデータを読み出してホスト２へ送信する。ディスクアレイ装置１は、例えば、バックアップ環境や仮想マシン環境のように同じデータが複数コピーされるシステムで使用され得る。ディスクアレイ装置１については、詳しく後述する。 The disk array device 1 is a storage device capable of storing data in a non-volatile manner. The disk array device 1 performs processing such as writing, reading, or erasing data based on control commands from the host 2 . For example, when receiving a write command from the host 2 , the disk array device 1 writes write data corresponding to the write command to the storage unit M inside the disk array device 1 . When the disk array device 1 receives a read command from the host 2 , it reads the data stored in the storage unit M inside the disk array device 1 and transmits it to the host 2 . The disk array device 1 can be used, for example, in a system in which the same data is copied multiple times, such as in a backup environment or virtual machine environment. The disk array device 1 will be described later in detail.

ホスト２は、例えば通信ケーブルを介して、ディスクアレイ装置１と通信可能に接続される。ホスト２は、例えば、１つ以上のホスト装置２１によって構成される。各ホスト装置２１は、サーバ装置でもよく、パーソナルコンピュータやその他のコンピュータでもよい。以下の記載において特段の説明がない限り、ライト命令およびリード命令は、ホスト２からディスクアレイ装置１へ送信されるライト命令およびリード命令を意味する。 The host 2 is communicably connected to the disk array device 1 via, for example, a communication cable. The host 2 is composed of one or more host devices 21, for example. Each host device 21 may be a server device, a personal computer, or another computer. Write commands and read commands in the following description mean write commands and read commands sent from the host 2 to the disk array device 1 unless otherwise specified.

＜２．ディスクアレイ装置＞
次に、ディスクアレイ装置１について説明する。
図１に示すように、ディスクアレイ装置１は、例えば、Ｉ／Ｏ処理部（入出力処理部）１１と、メモリ１２と、重複排除ボリューム１３と、プールボリューム１４と、管理ボリューム１５とを備える。本実施形態では、重複排除ボリューム１３と、プールボリューム１４と、管理ボリューム１５とにより、記憶部Ｍが構成されている。記憶部Ｍは、「記憶手段」の一例である。 <2. Disk array device>
Next, the disk array device 1 will be explained.
As shown in FIG. 1, the disk array device 1 includes, for example, an I/O processing unit (input/output processing unit) 11, a memory 12, a deduplication volume 13, a pool volume 14, and a management volume 15. . In this embodiment, the deduplication volume 13, the pool volume 14, and the management volume 15 constitute the storage unit M. FIG. The storage unit M is an example of a "storage means".

Ｉ／Ｏ処理部１１は、重複排除ボリューム１３およびプールボリューム１４に対するデータの書き込み、読み出し、および消去などを管理する。例えば、Ｉ／Ｏ処理部１１は、ホスト２からライト命令およびリード命令を受信する。Ｉ／Ｏ処理部１１は、ライト命令に応じたライト処理またはリード命令に応じたリード処理が完了すると、ホスト２にレスポンスを返却する。本実施形態では、Ｉ／Ｏ処理部１１は、ライト命令処理部１１１と、リード命令処理部１１２と、ハッシュ値判定処理部１１３と、重複排除処理部１１４とを備える。 The I/O processing unit 11 manages writing, reading, and erasing of data with respect to the deduplication volume 13 and the pool volume 14 . For example, the I/O processing unit 11 receives write commands and read commands from the host 2 . The I/O processing unit 11 returns a response to the host 2 when the write processing corresponding to the write command or the read processing corresponding to the read command is completed. In this embodiment, the I/O processing unit 11 includes a write command processing unit 111 , a read command processing unit 112 , a hash value determination processing unit 113 and a duplicate elimination processing unit 114 .

ライト命令処理部１１１は、Ｉ／Ｏ処理部１１が重複排除ボリューム１３へのライト命令を受信すると、当該ライト命令によりライト処理が要求されたライトデータを一定長のチャンクに分割する。チャンクは、「所定の管理単位のデータ」の一例である。チャンクのサイズよりも大きいサイズのライトデータは、「ひと纏まりのライトデータ」の一例である。 When the I/O processing unit 11 receives a write command to the deduplication volume 13, the write command processing unit 111 divides the write data for which write processing is requested by the write command into chunks of a certain length. A chunk is an example of "predetermined management unit data". Write data having a size larger than the size of the chunk is an example of "a group of write data".

本実施形態では、ライト命令処理部１１１は、ハッシュ値算出部１１１１を備える。ハッシュ値算出部１１１１は、ライト命令処理部１１１により分割された各チャンクからハッシュ値を算出し、算出したハッシュ値をメモリ１２に保存する。ハッシュ値は、「特徴データ」の一例である。各チャンクからハッシュ値を算出することは、各チャンクに対して特徴データを対応付けることの一例である。ただし、特徴データは、ハッシュ値に限らず、各チャンクに応じて登録される特有の情報などでもよい。ライト命令処理部１１１は、「ライト命令処理手段」の一例である。 In this embodiment, the write command processing unit 111 has a hash value calculation unit 1111 . The hash value calculator 1111 calculates a hash value from each chunk divided by the write command processor 111 and stores the calculated hash value in the memory 12 . A hash value is an example of "feature data." Calculating a hash value from each chunk is an example of associating feature data with each chunk. However, the feature data is not limited to hash values, and may be specific information registered according to each chunk. The write command processing unit 111 is an example of "write command processing means".

リード命令処理部１１２は、Ｉ／Ｏ処理部１１がリード命令を受信すると、チャンクアドレス変換テーブル１５１（図２参照）に基づき、リード命令の対象となる各チャンクのデータを読み出し、ホスト２にリードデータを返却する。チャンクアドレス変換テーブル１５１は、各チャンクのデータの格納位置を示す情報が登録されたテーブルである。以下では、各チャンクのデータを「チャンクデータ」と称する場合があり、チャンクデータの格納位置を「チャンク格納位置」と称する場合があり、チャンク格納位置を示す情報を「チャンク格納位置情報」と称する場合がある。なお説明の便宜上、文脈の中で「チャンク格納位置」を「チャンク格納位置情報」の意味で用いている場合がある。 When the I/O processing unit 11 receives a read command, the read command processing unit 112 reads the data of each chunk to be read by the read command based on the chunk address conversion table 151 (see FIG. 2), and sends the data to the host 2. Return data. The chunk address conversion table 151 is a table in which information indicating the storage position of data of each chunk is registered. Hereinafter, the data of each chunk may be referred to as "chunk data", the storage location of chunk data may be referred to as "chunk storage location", and information indicating the chunk storage location may be referred to as "chunk storage location information". Sometimes. For convenience of explanation, "chunk storage location" may be used in context to mean "chunk storage location information".

図２は、チャンクアドレス変換テーブル１５１の内容の一例を示す図である。チャンクアドレス変換テーブル１５１は、例えば、後述する重複排除ボリューム１３に保存されるチャンクアドレス（重複排除ＬＤチャンクアドレス）と、当該チャンクアドレスに対応するチャンクデータが保存されるプールボリューム１４の格納位置（チャンク格納位置）との対応関係の一覧を記憶するテーブルである。「ＬＤ」は、論理ディスク（Logical Disk）を意味する。チャンク格納位置は、例えば、チャンクが格納された論理ディスク番号（ＬＤＮ：Logical Disk Number）を示すチャンク格納ＬＤＮと、チャンクが格納されたアドレスを示すチャンク格納アドレスとを含む。例えば、チャンク格納位置は、プールボリューム１４のボリューム番号と、プールボリューム１４上のアドレスとを含む。チャンクアドレス変換テーブル１５１は、後述する重複排除処理部１１４により生成される。 FIG. 2 is a diagram showing an example of the contents of the chunk address conversion table 151. As shown in FIG. The chunk address conversion table 151 includes, for example, a chunk address (deduplicated LD chunk address) stored in the deduplicated volume 13, which will be described later, and a storage position (chunk storage position) is a table that stores a list of correspondence relationships. "LD" means a logical disk. The chunk storage location includes, for example, a chunk storage LDN indicating a logical disk number (LDN) where the chunk is stored, and a chunk storage address indicating the address where the chunk is stored. For example, the chunk storage location includes the volume number of pool volume 14 and the address on pool volume 14 . The chunk address conversion table 151 is generated by the deduplication processing unit 114, which will be described later.

ハッシュ値判定処理部１１３は、ライト命令が受信され、ライト命令処理部１１１がライトデータをチャンクに分割してハッシュ値を算出すると、ライト範囲（ライトデータ）の先頭チャンクから順に全てのチャンクについて、ハッシュ値算出部１１１１が算出したハッシュ値のハッシュテーブル１５２（図３参照）における登録の有無の判定を行い、ハッシュ値の登録の有無と、ハッシュ値が登録されている場合にはハッシュ値が登録されているハッシュエントリ１５２１（図３参照）の位置情報をメモリ１２に記憶する。本実施形態では、ハッシュ値判定処理部１１３は、ハッシュ値の登録があるチャンクについては、後続チャンクのハッシュ値と、後続チャンクのハッシュ値の格納位置を示す情報とをメモリ１２に記憶する。ハッシュ値判定処理部１１３は、「判定処理手段」の一例である。以下、この内容について詳しく説明する。 When the write command is received and the write command processing unit 111 divides the write data into chunks and calculates the hash value, the hash value determination processing unit 113 determines all chunks in order from the top chunk of the write range (write data). Whether or not the hash value calculated by the hash value calculation unit 1111 is registered in the hash table 152 (see FIG. 3) is determined, and if the hash value is registered, the hash value is registered. The location information of the hash entry 1521 (see FIG. 3) is stored in the memory 12 . In this embodiment, the hash value determination processing unit 113 stores in the memory 12 the hash value of the succeeding chunk and the information indicating the storage position of the hash value of the succeeding chunk for the chunk for which the hash value is registered. The hash value determination processing unit 113 is an example of "determination processing means". This content will be described in detail below.

ハッシュ値判定処理部１１３は、例えば、ハッシュ値判定方法決定部１１３１と、ハッシュ値検索部１１３２と、後続ハッシュ値比較部１１３３とを備える。 The hash value determination processing unit 113 includes, for example, a hash value determination method determination unit 1131, a hash value search unit 1132, and a subsequent hash value comparison unit 1133.

ハッシュ値判定方法決定部１１３１は、メモリ１２を参照することで、後述する後続ハッシュ値の事前取得の有無を判定する。「後続ハッシュ値」とは、あるチャンクに対して、その後続となるチャンク（例えば、次のアドレスのチャンク）に対応するハッシュ値である。本実施形態では、後続ハッシュ値は、ハッシュ値判定処理部１１３による処理対象のチャンク（以下「処理中チャンク」と称する場合がある）よりも前方のチャンク（例えば１つ前方のチャンク）に関して実施された処理のなかで事前に取得される。 The hash value determination method determination unit 1131 refers to the memory 12 to determine whether or not to pre-acquire subsequent hash values, which will be described later. A “subsequent hash value” is a hash value corresponding to a chunk subsequent to a certain chunk (for example, a chunk of the next address). In the present embodiment, the subsequent hash value is performed with respect to a chunk (for example, a chunk one ahead) ahead of the chunk to be processed by the hash value determination processing unit 113 (hereinafter sometimes referred to as “processing chunk”). obtained in advance during the process.

本実施形態では、ハッシュ値判定方法決定部１１３１は、後続ハッシュ値がメモリ１２に格納されていない場合、後続ハッシュ値の事前取得が無いと判定する。この場合、ハッシュ値判定方法決定部１１３１は、ハッシュ値の判定方法として、ハッシュ値検索部１１３２によるハッシュ値判定を選択する。一方で、ハッシュ値判定方法決定部１１３１は、後続ハッシュ値がメモリ１２に格納されている場合、後続ハッシュ値の事前取得があると判定する。この場合、ハッシュ値判定方法決定部１１３１は、ハッシュ値の判定方法として、後続ハッシュ値比較部１１３３によるハッシュ値判定を選択する。 In this embodiment, the hash value determination method determining unit 1131 determines that the subsequent hash value has not been pre-acquired when the subsequent hash value is not stored in the memory 12 . In this case, the hash value determination method determination unit 1131 selects hash value determination by the hash value search unit 1132 as the hash value determination method. On the other hand, if the subsequent hash value is stored in the memory 12, the hash value determination method determination unit 1131 determines that the subsequent hash value has been pre-acquired. In this case, the hash value determination method determination unit 1131 selects hash value determination by the subsequent hash value comparison unit 1133 as the hash value determination method.

ハッシュ値検索部１１３２は、後述するハッシュテーブル１５２（図３参照）をリードすることで、ハッシュ値判定処理部１１３の処理対象のチャンク（処理中チャンク）のハッシュ値が登録されているか否かを調べ、ハッシュ値の登録の有無を判定する。 The hash value search unit 1132 reads a hash table 152 (see FIG. 3), which will be described later, to check whether the hash value of the chunk to be processed by the hash value determination processing unit 113 (the chunk being processed) is registered. Investigate and determine whether or not the hash value is registered.

図３は、ハッシュテーブル１５２の内容の一例を示す図である。ハッシュテーブル１５２は、ディスクアレイ装置１に記憶済みのチャンクデータに対応する全てのハッシュ値を記憶するテーブルである。ハッシュテーブル１５２は、全てのライトデータのチャンクに対応するハッシュ値を記憶する必要があり膨大となる。一方で、ハッシュテーブル１５２は、できるだけテーブルサイズを抑ながら高速に所望のハッシュ値を検索できるように、階層管理を用いたアルゴリズムが採用される。ハッシュテーブル１５２は、「テーブル情報」の一例である。 FIG. 3 is a diagram showing an example of the contents of the hash table 152. As shown in FIG. The hash table 152 is a table that stores all hash values corresponding to chunk data already stored in the disk array device 1 . The hash table 152 needs to store hash values corresponding to all chunks of write data, and becomes enormous. On the other hand, the hash table 152 employs an algorithm using hierarchical management so that a desired hash value can be searched at high speed while keeping the table size as small as possible. The hash table 152 is an example of "table information".

図３に示すように、ハッシュテーブル１５２は、例えば、ハッシュ値を検索するための情報（例えばツリー検索情報ＴＳ）と、ハッシュ値に関連する情報を記憶するハッシュエントリ１５２１とを含む。図３に示す例では、ハッシュテーブル１５２は、３階層のツリー検索情報ＴＳ（ＴＳ１～ＴＳ３）を含む。 As shown in FIG. 3, the hash table 152 includes, for example, information for searching for hash values (eg, tree search information TS) and hash entries 1521 that store information related to hash values. In the example shown in FIG. 3, the hash table 152 includes three layers of tree search information TS (TS1 to TS3).

ツリー検索情報ＴＳには、ハッシュ値を検索する際に次にリードすべきツリー検索情報ＴＳの位置情報が含まれており、最上位階層から順にツリー検索情報ＴＳをリードすることでハッシュテーブル１５２におけるハッシュ値の登録の有無を調べることができる。最下位（最下層）のツリー検索情報ＴＳ３にはハッシュ値が登録されているハッシュエントリ１５２１の位置情報が格納されている。このため、ハッシュ値が登録されている場合は、ハッシュエントリ１５２１の位置情報も取得することができる。 The tree search information TS includes position information of the tree search information TS to be read next when searching for a hash value. You can check whether the hash value is registered or not. The position information of the hash entry 1521 in which the hash value is registered is stored in the lowest tree search information TS3. Therefore, if the hash value is registered, the location information of the hash entry 1521 can also be acquired.

図４は、ハッシュエントリ１５２１の内容の一例を示す図である。ハッシュエントリ１５２１には、ハッシュ値、後続ハッシュ値、後続ハッシュエントリ位置、およびチャンク格納位置が登録されている。後続ハッシュエントリ位置は、後続ハッシュ値に対応するハッシュエントリ１５２１の位置情報である。チャンク格納位置は、例えば、チャンクが格納された論理ディスク番号を示すチャンク格納ＬＤＮと、チャンクが格納されたアドレスを示すチャンク格納アドレスとを含む。 FIG. 4 is a diagram showing an example of the contents of the hash entry 1521. As shown in FIG. A hash value, a subsequent hash value, a subsequent hash entry position, and a chunk storage position are registered in the hash entry 1521 . The subsequent hash entry position is position information of the hash entry 1521 corresponding to the subsequent hash value. The chunk storage location includes, for example, a chunk storage LDN indicating the logical disk number where the chunk is stored, and a chunk storage address indicating the address where the chunk is stored.

ハッシュ値検索部１１３２は、処理中チャンクのハッシュ値がハッシュテーブル１５２に登録されている場合、ハッシュ値の登録の有無と、ハッシュ値が登録されているハッシュエントリ１５２１の位置情報を、ハッシュ値判定結果情報１２１（図５参照）の一部としてメモリ１２に保存する。さらに、ハッシュ値検索部１１３２は、処理中チャンクのハッシュエントリ１５２１の位置情報を取得すると、ハッシュエントリ１５２１をリードすることで、後続ハッシュ値および後続ハッシュエントリ位置を取得する（すなわち事前に取得する）。ハッシュ値検索部１１３２は、処理中チャンクに関して実施された処理のなかで取得された後続ハッシュ値および後続ハッシュエントリ位置を示す情報を、後続ハッシュ情報１２２（図６参照）としてメモリ１２に保存する。 When the hash value of the chunk under processing is registered in the hash table 152, the hash value search unit 1132 determines whether the hash value is registered and the position information of the hash entry 1521 in which the hash value is registered. It is stored in the memory 12 as part of the result information 121 (see FIG. 5). Furthermore, after acquiring the position information of the hash entry 1521 of the chunk in process, the hash value search unit 1132 acquires the subsequent hash value and the subsequent hash entry position by reading the hash entry 1521 (that is, acquires in advance). . The hash value search unit 1132 saves information indicating subsequent hash values and subsequent hash entry positions acquired in the process performed on the chunk in process in the memory 12 as subsequent hash information 122 (see FIG. 6).

図５は、１つのライトデータに対応するハッシュ値判定結果情報１２１の内容の一例を示す図である。ハッシュ値判定結果情報１２１は、ライトデータごとに生成されてメモリ１２に保存される。ハッシュ値判定結果情報１２１では、１つのライトデータに関して、ライト範囲内のチャンク番号と、ハッシュ値と、ハッシテーブル１５２におけるハッシュ値の登録の有無と、ハッシュエントリ１５２１の位置情報とが対応付けられて登録されている。 FIG. 5 is a diagram showing an example of the content of hash value determination result information 121 corresponding to one piece of write data. The hash value determination result information 121 is generated for each write data and stored in the memory 12 . In the hash value determination result information 121, the chunk number within the write range, the hash value, whether or not the hash value is registered in the hash table 152, and the position information of the hash entry 1521 are associated with each write data. Registered.

図６は、１つの処理中チャンクに対応する後続ハッシュ情報１２２の内容の一例を示す図である。後続ハッシュ情報１２２は、処理中チャンクごとに生成されてメモリ１２に保存される。後続ハッシュ情報１２２では、処理中チャンクに関して実施された処理のなかで事前に取得された後続ハッシュ値と後続ハッシュエントリ位置とが対応付けられて登録されている。 FIG. 6 is a diagram showing an example of the content of the subsequent hash information 122 corresponding to one in-process chunk. Subsequent hash information 122 is generated for each in-process chunk and stored in memory 12 . In the subsequent hash information 122, the subsequent hash value and the subsequent hash entry position acquired in advance during the process executed for the chunk in process are registered in association with each other.

後続ハッシュ値比較部１１３３は、ハッシュ値判定処理部１１３の処理対象のチャンク（処理中チャンク）のハッシュ値と、処理中チャンクの１つ前方のチャンクに関して実施された処理のなかで事前に取得されてメモリ１２に保存された後続ハッシュ値（後続ハッシュ情報１２２に含まれるハッシュ値）とを比較する。後続ハッシュ値比較部１１３３は、処理中チャンクのハッシュ値と後続ハッシュ値とが一致する場合、当該ハッシュ値のハッシュエントリ１５２１（後続ハッシュ情報１２２に含まれる後続ハッシュエントリ位置）をリードして、当該ハッシュ値のハッシュエントリ１５２１が有効であるか否か判定する。後続ハッシュ値比較部１１３３は、ハッシュエントリ１５２１が有効であればハッシュ値の登録がありと判定する。一方で、後続ハッシュ値比較部１１３３は、ハッシュエントリ１５２１が有効でない場合、または、上記比較した２つのハッシュ値が一致しない場合、ハッシュ値検索部１１３２によるハッシュ値判定を実施する。 The subsequent hash value comparison unit 1133 compares the hash value of the chunk to be processed by the hash value determination processing unit 113 (the chunk being processed) and the hash value obtained in advance in the processing performed for the chunk one ahead of the chunk being processed. is compared with the subsequent hash value stored in the memory 12 (the hash value included in the subsequent hash information 122). If the hash value of the chunk under processing and the subsequent hash value match, the subsequent hash value comparison unit 1133 reads the hash entry 1521 of the hash value (the position of the subsequent hash entry included in the subsequent hash information 122), and It is determined whether the hash entry 1521 of the hash value is valid. The subsequent hash value comparison unit 1133 determines that the hash value is registered if the hash entry 1521 is valid. On the other hand, if the hash entry 1521 is not valid, or if the two hash values compared above do not match, the subsequent hash value comparison unit 1133 performs hash value determination by the hash value search unit 1132 .

後続ハッシュ値比較部１１３３は、ハッシュエントリ１５２１が有効である場合、ハッシュ値検索部１１３２と同様に、所定の情報をメモリ１２に保存する。すなわち、後続ハッシュ値比較部１１３３は、ハッシュ値の登録の有無と、ハッシュ値が登録されているハッシュエントリ１５２１の位置情報を、ハッシュ値判定結果情報１２１（図５参照）の一部としてメモリ１２に保存する。さらに、後続ハッシュ値比較部１１３３は、処理中チャンクのハッシュエントリ１５２１の位置情報を取得すると、ハッシュエントリ１５２１をリードすることで、後続ハッシュ値および後続ハッシュエントリ位置を取得する。後続ハッシュ値比較部１１３３は、処理中チャンクに関して実施された処理のなかで取得された後続ハッシュ値および後続ハッシュエントリ位置を示す情報を、後続ハッシュ情報１２２（図６参照）としてメモリ１２に保存する。 If hash entry 1521 is valid, subsequent hash value comparison section 1133 stores predetermined information in memory 12 in the same manner as hash value search section 1132 . That is, the subsequent hash value comparison unit 1133 stores the presence or absence of hash value registration and the position information of the hash entry 1521 in which the hash value is registered as part of the hash value determination result information 121 (see FIG. 5) in the memory 12. Save to Furthermore, after obtaining the position information of the hash entry 1521 of the chunk in process, the subsequent hash value comparison unit 1133 reads the hash entry 1521 to obtain the subsequent hash value and the subsequent hash entry position. The subsequent hash value comparison unit 1133 saves the information indicating the subsequent hash value and the subsequent hash entry position acquired in the process performed for the chunk in process in the memory 12 as the subsequent hash information 122 (see FIG. 6). .

重複排除処理部１１４は、ライト範囲（ライトデータ）の先頭チャンクから順に、メモリ１２に記憶済みのハッシュ値判定結果情報１２１に基づき、重複排除を行ったライト処理を行う。例えば、重複排除処理部１１４は、ハッシュ値判定結果情報１２１にハッシュ値の登録が既にあるチャンクについては重複データと判定し、当該ハッシュ値が登録されているハッシュエントリ１５２１からチャンク格納位置情報を取得する。そして、重複排除処理部１１４は、ハッシュエントリ１５２１から取得したチャンク格納位置情報に基づき、チャンクアドレス変換テーブル１５１（図２参照）のチャンク格納位置情報を登録する。一方で、重複排除処理部１１４は、ハッシュ値判定結果情報１２１にハッシュ値の登録がないチャンクについては非重複データと判定し、当該チャンクのデータのライト処理を実施する。そして、重複排除処理部１１４は、チャンクのデータを格納したチャンク格納位置に応じて、チャンクアドレス変換テーブル１５１のチャンク格納位置情報を登録する。 The deduplication processing unit 114 performs deduplication write processing based on the hash value determination result information 121 already stored in the memory 12 in order from the head chunk of the write range (write data). For example, the deduplication processing unit 114 determines that a chunk whose hash value has already been registered in the hash value determination result information 121 is duplicated data, and acquires chunk storage location information from the hash entry 1521 in which the hash value is registered. do. Then, based on the chunk storage location information acquired from the hash entry 1521, the deduplication processing unit 114 registers the chunk storage location information in the chunk address conversion table 151 (see FIG. 2). On the other hand, the deduplication processing unit 114 determines that a chunk whose hash value is not registered in the hash value determination result information 121 is non-duplicated data, and writes the data of the chunk. Then, the deduplication processing unit 114 registers chunk storage location information in the chunk address conversion table 151 according to the chunk storage location where the chunk data is stored.

詳しく述べると、重複排除処理部１１４は、ハッシュ値登録部１１４１と、チャンクデータライト部１１４２とを備える。 Specifically, the deduplication processing unit 114 includes a hash value registration unit 1141 and a chunk data write unit 1142 .

ハッシュ値登録部１１４１は、ハッシュ値判定結果情報１２１にハッシュ値の登録がない非重複チャンクについて、ハッシュテーブル１５２の新たなハッシュエントリ１５２１に、ハッシュ値、後続ハッシュ値、後続ハッシュエントリ位置、およびチャンク格納位置情報を登録する。このとき、ハッシュ値登録部１１４１は、ハッシュ値判定処理部１１３がメモリ１２に保存した後続ハッシュ情報１２２から後続ハッシュ値および後続ハッシュエントリ位置を取得する。例えば図８に示す例では、ハッシュ値登録部１１４１は、チャンク番号００００のハッシュ値ＡＡＡＡＡＡを登録するハッシュエントリ１５２１の後続ハッシュ値には、チャンク番号０００１のハッシュ値であるＢＢＢＢＢＢを登録する。ハッシュ値登録部１１４１は、「特徴データ登録手段」の一例である。ハッシュ値が登録されるハッシュエントリ１５２１に、後続ハッシュ値も併せて登録することは、第２データに対応する特徴データを、第１データに対応する特徴データと関連付けて記憶部Ｍに記憶させることの一例である。 The hash value registration unit 1141 stores the hash value, the subsequent hash value, the position of the subsequent hash entry, and the chunk in the new hash entry 1521 of the hash table 152 for the non-duplicate chunk whose hash value is not registered in the hash value determination result information 121. Register storage location information. At this time, the hash value registration unit 1141 acquires the subsequent hash value and the subsequent hash entry position from the subsequent hash information 122 stored in the memory 12 by the hash value determination processing unit 113 . For example, in the example shown in FIG. 8, the hash value registration unit 1141 registers BBBBBB, which is the hash value of chunk number 0001, as the subsequent hash value of the hash entry 1521 in which the hash value AAAAA of chunk number 0000 is registered. Hash value registration unit 1141 is an example of a “characteristic data registration unit”. Registering the subsequent hash value together with the hash entry 1521 in which the hash value is registered means that the feature data corresponding to the second data is stored in the storage unit M in association with the feature data corresponding to the first data. is an example.

チャンクデータライト部１１４２は、非重複と判定されたチャンクのデータをプールボリューム１４にライトする。ここでのライト先が上述したチャンクアドレス変換テーブル１５１およびハッシュエントリ１５２１に登録されるチャンク格納位置となる。 The chunk data write unit 1142 writes data of chunks determined to be non-overlapping to the pool volume 14 . The write destination here is the chunk storage position registered in the chunk address conversion table 151 and the hash entry 1521 described above.

メモリ１２は、各種の制御情報（例えばハッシュ値判定結果情報１２１および後続ハッシュ情報１２２）を一時的に保存する揮発性の記憶デバイスである。ライト処理にてメモリ１２に保存される制御情報（例えばハッシュ値判定結果情報１２１および後続ハッシュ情報１２２）は、ライト処理が完了した時点で不要となる。この場合、ライト処理にて保存される制御情報はメモリ１２から消去される。これにより、ライト処理にて保存される制御情報が保存されていたメモリ１２内の領域は、他用途に使用可能な領域となる。 The memory 12 is a volatile storage device that temporarily stores various types of control information (for example, hash value determination result information 121 and subsequent hash information 122). The control information (for example, the hash value determination result information 121 and the subsequent hash information 122) stored in the memory 12 in the write process becomes unnecessary when the write process is completed. In this case, the control information saved in the write process is erased from the memory 12 . As a result, the area in the memory 12 in which the control information saved in the write process has been saved becomes an area that can be used for other purposes.

重複排除ボリューム１３は、ホスト２がアクセス可能な、重複排除を行う記憶領域（例えば論理ディスク）である。重複排除ボリューム１３には、各ライトデータにおける各チャンクのアドレス（重複排除ＬＤチャンクアドレス）が格納される。重複排除ボリューム１３は、例えば異なるボリューム番号が割り振られた複数の重複排除ボリューム１３１を含む。 The deduplication volume 13 is a deduplication storage area (for example, a logical disk) accessible by the host 2 . The deduplication volume 13 stores the address of each chunk in each write data (deduplication LD chunk address). The deduplication volume 13 includes, for example, multiple deduplication volumes 131 to which different volume numbers are assigned.

プールボリューム１４は、重複排除ボリューム１３が保存するデータを一定長のチャンクごとに記憶する記憶領域（例えば論理ディスク）である。プールボリューム１４は、Ｉ／Ｏ処理部１１によって読み書きが可能である。プールボリューム１４は、例えば異なるボリューム番号が割り振られた複数のプールボリューム１４１を含む。 The pool volume 14 is a storage area (for example, a logical disk) that stores the data saved in the deduplication volume 13 in chunks of a certain length. The pool volume 14 is readable and writable by the I/O processor 11 . The pool volume 14 includes, for example, multiple pool volumes 141 to which different volume numbers are assigned.

管理ボリューム１５は、チャンクアドレス変換テーブル１５１およびハッシュテーブル１５２が保存される記憶領域（例えば論理ディスク）である。管理ボリューム１５は、Ｉ／Ｏ処理部１１によって読み書きが可能である。管理ボリューム１５は、例えば異なるボリューム番号が割り振られた複数の管理ボリューム１５１を含む。 The management volume 15 is a storage area (eg, logical disk) in which the chunk address conversion table 151 and hash table 152 are stored. The management volume 15 is readable and writable by the I/O processor 11 . The management volume 15 includes, for example, multiple management volumes 151 to which different volume numbers are assigned.

＜３．処理の流れ＞
＜３．１ライト処理の流れ＞
図７および図８を参照し、重複排除ボリューム１３のライト処理について説明する。
図７は、重複排除ボリューム１３のライト処理の流れを示すフローチャートである。まず、ディスクアレイ装置１のＩ／Ｏ処理部１１は、ホスト２からライト命令および当該ライト命令に対応するライトデータを受信する（Ｓ５００）。ライト命令処理部１１１は、Ｉ／Ｏ処理部１１がホスト２からライト命令を受信すると、当該ライト命令によりライト処理が要求されたライトデータをチャンク単位に分割する（Ｓ５０１）。そして、ハッシュ値算出部１１１１は、分割したチャンクごとにハッシュ値を算出し、算出したハッシュ値をメモリ１２に保存する（Ｓ５０２）。 <3. Process Flow>
<3.1 Flow of Write Processing>
Write processing of the deduplication volume 13 will be described with reference to FIGS. 7 and 8. FIG.
FIG. 7 is a flowchart showing the flow of write processing for the deduplication volume 13. As shown in FIG. First, the I/O processor 11 of the disk array device 1 receives a write command and write data corresponding to the write command from the host 2 (S500). When the I/O processing unit 11 receives a write command from the host 2, the write command processing unit 111 divides write data for which write processing is requested by the write command into chunk units (S501). Then, the hash value calculation unit 1111 calculates a hash value for each divided chunk and stores the calculated hash value in the memory 12 (S502).

続いて、ハッシュ値判定処理部１１３は、メモリ１２に保存されたハッシュ値がそれぞれハッシュテーブル１５２に保存済みであるか否か（すなわち、ハッシュテーブル１５２に登録があるか否か）を判定するハッシュ値判定処理を実施する（Ｓ５０３）。ハッシュ値判定処理の詳細は、図８を参照して説明する。 Subsequently, the hash value determination processing unit 113 determines whether or not each hash value stored in the memory 12 has been stored in the hash table 152 (that is, whether or not there is a registration in the hash table 152). A value determination process is performed (S503). Details of the hash value determination process will be described with reference to FIG.

図８は、ハッシュ値判定処理の流れの詳細を示すフローチャートである。ハッシュ値判定処理では、ライト範囲のチャンクに対応するハッシュ値（メモリ１２に保存されたハッシュ値）について、先頭チャンクから順に１つずつハッシュ値判定が行われる。まず、ハッシュ値判定方法決定部１１３１は、メモリ１２を参照し、ハッシュ値判定の処理対象のチャンク（処理中チャンク）の１つ前方のチャンクで実施したハッシュ値判定処理で後続ハッシュ値の取得が行われているか否かを調べる（Ｓ６００）。すなわち、ハッシュ値判定方法決定部１１３１は、メモリ１２内に、処理中チャンクに対する後続ハッシュ情報１２２が事前に保存されているか否かを判定する。 FIG. 8 is a flowchart showing the details of the flow of hash value determination processing. In the hash value determination process, hash values corresponding to chunks in the write range (hash values stored in the memory 12) are determined one by one in order from the first chunk. First, the hash value determination method determination unit 1131 refers to the memory 12, and acquires the subsequent hash value in the hash value determination process performed in the chunk one ahead of the chunk to be processed for hash value determination (chunk in process). It is checked whether or not it is performed (S600). That is, the hash value determination method determination unit 1131 determines whether or not the subsequent hash information 122 for the in-process chunk is stored in advance in the memory 12 .

ハッシュ値判定方法決定部１１３１は、処理中チャンクに対応する後続ハッシュ値が取得されている場合（すなわち後続ハッシュ情報１２２が存在する場合、Ｓ６００：ＹＥＳ）、後続ハッシュ値によるハッシュ値判定を選択する。一方で、ハッシュ値判定方法決定部１１３１は、処理中チャンクに対応する後続ハッシュ値が取得されていない場合（すなわち後続ハッシュ情報１２２が存在しない場合、Ｓ６００：ＮＯ）、ハッシュテーブル検索によるハッシュ判定を選択する。ライト範囲の最初のチャンクは、後続ハッシュ値の取得が行われていないので、必ずハッシュテーブル検索によるハッシュ値判定が選択されることになる。 The hash value determination method determination unit 1131 selects hash value determination using the subsequent hash value when the subsequent hash value corresponding to the chunk being processed has been obtained (that is, when the subsequent hash information 122 exists, S600: YES). . On the other hand, if the subsequent hash value corresponding to the chunk being processed has not been acquired (that is, if the subsequent hash information 122 does not exist, S600: NO), the hash value determination method determination unit 1131 performs hash determination by hash table search. select. For the first chunk in the write range, since subsequent hash values have not been acquired, hash value determination by hash table search is always selected.

ハッシュ値判定方法決定部１１３１がハッシュテーブル検索によるハッシュ値判定を選択した場合（Ｓ６００：ＮＯ）、ハッシュ値検索部１１３２は、管理ボリューム１５のハッシュテーブル１５２を検索してハッシュ値の登録の有無を調べる。本実施形態では、ハッシュ値検索部１１３２は、ハッシュ値の検索として、ハッシュテーブル１５２の最上位階層から順にツリー検索情報ＴＳをリードすることを行う。ツリー検索情報ＴＳには、検索するハッシュ値に対応する次の階層のリード場所に関する情報が格納されている。このため、最下位層まで（図３に示す例では第三階層ＴＳ３まで）リードすることでハッシュ値の登録の有無を判定し、ハッシュ値が登録されているハッシュエントリ１５２１の位置情報を取得することができる（Ｓ６０１～Ｓ６０３）。 When the hash value determination method determination unit 1131 selects hash value determination by hash table search (S600: NO), the hash value search unit 1132 searches the hash table 152 of the management volume 15 to check whether or not the hash value is registered. investigate. In this embodiment, the hash value search unit 1132 reads the tree search information TS in order from the highest hierarchy of the hash table 152 to search for hash values. The tree search information TS stores information about the read location of the next hierarchy corresponding to the hash value to be searched. Therefore, by reading up to the lowest layer (up to the third layer TS3 in the example shown in FIG. 3), it is determined whether or not the hash value is registered, and the position information of the hash entry 1521 in which the hash value is registered is acquired. (S601-S603).

ハッシュ値検索部１１３２は、上述したハッシュ値の検索を行うことで、ハッシュテーブル１５２にハッシュ値の登録がされているか否かを判定する（Ｓ６０４）。ハッシュ値検索部１１３２は、ハッシュ値の登録がある場合（Ｓ６０４：ＹＥＳ）、ハッシュ値が登録されているハッシュエントリ１５２１をリードし（Ｓ６０５）、後続ハッシュ値および後続ハッシュエントリ位置を取得する（Ｓ６０６）。この場合、ハッシュ値検索部１１３２は、取得した後続ハッシュ値および後続ハッシュエントリ位置を、次の後続ハッシュ情報１２２としてメモリ１２に保存する。 The hash value search unit 1132 determines whether or not the hash value is registered in the hash table 152 by searching for the hash value described above (S604). If the hash value is registered (S604: YES), the hash value search unit 1132 reads the hash entry 1521 in which the hash value is registered (S605), and acquires the subsequent hash value and the subsequent hash entry position (S606). ). In this case, the hash value search unit 1132 stores the acquired subsequent hash value and subsequent hash entry position in the memory 12 as the next subsequent hash information 122 .

一方で、ハッシュ値判定方法決定部１１３１が後続ハッシュ値によるハッシュ値判定を選択した場合は（Ｓ６００：ＹＥＳ）、後続ハッシュ値比較部１１３３は、メモリ１２に記憶されている後続ハッシュ情報１２２に含まれる後続ハッシュ値と、処理中チャンクのハッシュ値とを比較する（Ｓ６１１）。そして、後続ハッシュ値比較部１１３３は、比較された２つのハッシュ値が一致するか否かを判定する（Ｓ６１２）。 On the other hand, if the hash value determination method determination unit 1131 selects hash value determination based on the subsequent hash value (S600: YES), the subsequent hash value comparison unit 1133 determines whether the subsequent hash value included in the subsequent hash information 122 stored in the memory 12. Then, the subsequent hash value received is compared with the hash value of the chunk being processed (S611). Then, the subsequent hash value comparison unit 1133 determines whether or not the two hash values thus compared match (S612).

後続ハッシュ値比較部１１３３は、これらハッシュ値が一致した場合（Ｓ６１２：ＹＥＳ）、メモリ１２に記憶されている後続ハッシュ情報１２２に含まれる後続ハッシュエントリ位置に存在するハッシュエントリ１５２１を管理ボリューム１５からリードする（Ｓ６１３）。そして、後続ハッシュ値比較部１１３３は、リードしたハッシュエントリ１５２１が有効であるか否かを判定する（Ｓ６１４）。例えば、後続ハッシュ値比較部１１３３は、ハッシュエントリ１５２１に登録されているハッシュ値が処理中チャンクのハッシュ値と一致するか否かでハッシュエントリ１５２１の有効性を判定することができる。 If these hash values match (S612: YES), the subsequent hash value comparison unit 1133 removes the hash entry 1521 existing at the subsequent hash entry position included in the subsequent hash information 122 stored in the memory 12 from the management volume 15. Read (S613). Then, the subsequent hash value comparison unit 1133 determines whether or not the read hash entry 1521 is valid (S614). For example, the succeeding hash value comparison unit 1133 can determine the validity of the hash entry 1521 based on whether the hash value registered in the hash entry 1521 matches the hash value of the chunk being processed.

後続ハッシュ値比較部１１３３は、ハッシュエントリ１５２１が有効である場合（Ｓ６１４：ＹＥＳ）、ハッシュ値が登録されていると判定し、ハッシュエントリ１５２１に登録されている後続ハッシュ値および後続ハッシュエントリ位置をメモリ１２に取得する（Ｓ６０６）。すなわち、後続ハッシュ値比較部１１３３は、取得した後続ハッシュ値および後続ハッシュエントリ位置を、次の後続ハッシュ情報１２２としてメモリ１２に保存する。 If the hash entry 1521 is valid (S614: YES), the subsequent hash value comparison unit 1133 determines that the hash value is registered, and compares the subsequent hash value registered in the hash entry 1521 and the subsequent hash entry position. It is acquired in the memory 12 (S606). That is, the subsequent hash value comparison unit 1133 stores the acquired subsequent hash value and subsequent hash entry position in the memory 12 as the next subsequent hash information 122 .

一方で、ハッシュエントリ１５２１が無効である場合（Ｓ６１４：ＮＯ）、または、後続ハッシュ情報１２２に含まれる後続ハッシュ値と処理中チャンクのハッシュ値とが不一致の場合（Ｓ６１２：ＮＯ）、ハッシュ値検索部１１３２は、上述したハッシュテーブル１５２の検索（Ｓ６０１～Ｓ６０６）を行い、ハッシュ値の登録の有無を判定する。 On the other hand, if the hash entry 1521 is invalid (S614: NO), or if the subsequent hash value included in the subsequent hash information 122 and the hash value of the chunk being processed do not match (S612: NO), hash value search is performed. The unit 1132 searches the above hash table 152 (S601 to S606) to determine whether or not the hash value is registered.

ハッシュ値判定処理部１１３は、処理中チャンクのハッシュ値の登録の有無が判明したら、処理中チャンクのハッシュ値の登録の有無と、登録がある場合におけるハッシュエントリ位置を示す情報とを、ハッシュ値判定結果情報１２１の一部としてメモリ１２に記憶する（Ｓ６０７）。ハッシュ値判定処理部１１３は、ライト範囲のすべてのチャンクに対してハッシュ値の登録の有無をメモリ１２に記憶させる処理が完了したら、ハッシュ値判定処理を完了する（Ｓ６０８：ＹＥＳ）。 When it is determined whether or not the hash value of the chunk being processed is registered, the hash value determination processing unit 113 determines whether or not the hash value of the chunk being processed is registered, and if there is registration, the information indicating the hash entry position as the hash value. It is stored in the memory 12 as part of the determination result information 121 (S607). The hash value determination processing unit 113 completes the hash value determination processing when the memory 12 stores the presence/absence of hash value registration for all chunks in the write range (S608: YES).

次に、図５に戻り、ライト処理に関する残りの処理について説明する。
重複排除処理部１１４は、ハッシュ値判定処理（Ｓ５０３）が完了したら、ライト範囲の先頭チャンクから順にハッシュ値の登録の有無に従い処理を行う。具体的には、重複排除処理部１１４は、重複排除処理部１１４による処理対象のチャンクのハッシュ値の登録の必要性の有無を判定する（Ｓ５０４）。 Next, referring back to FIG. 5, the rest of the write process will be described.
When the hash value determination process (S503) is completed, the deduplication processing unit 114 performs processing according to the presence or absence of hash value registration in order from the head chunk of the write range. Specifically, the deduplication processing unit 114 determines whether it is necessary to register the hash value of the chunk to be processed by the deduplication processing unit 114 (S504).

重複排除処理部１１４は、ハッシュテーブル１５２に処理対象のチャンクのハッシュ値の登録がある場合（Ｓ５０４：ＹＥＳ、例えば図５のチャンク０００１）、重複データと判定する。この場合、重複排除処理部１１４は、当該ハッシュ値のハッシュエントリ１５２１からチャンク格納位置情報を取得する（Ｓ５０５）。一方で、重複排除処理部１１４は、ハッシュテーブル１５２に処理対象のチャンクのハッシュ値の登録が無い場合（Ｓ５０４：ＮＯ、例えば図５のチャンク００００）、非重複データと判定する。この場合、ハッシュ値登録部１１４１は、ハッシュテーブル１５２の新たなハッシュエントリ１５２１に、処理対象のチャンクのハッシュ値およびチャンク格納位置を登録する（Ｓ５１１）。チャンク格納位置は、例えば、プールボリューム１４の未使用箇所であれば決定方法は問わない。 If the hash value of the chunk to be processed is registered in the hash table 152 (S504: YES, for example, chunk 0001 in FIG. 5), the deduplication processing unit 114 determines duplicate data. In this case, the deduplication processing unit 114 acquires chunk storage location information from the hash entry 1521 of the hash value (S505). On the other hand, if the hash value of the chunk to be processed is not registered in the hash table 152 (S504: NO, for example chunk 0000 in FIG. 5), the deduplication processing unit 114 determines non-duplicate data. In this case, the hash value registration unit 1141 registers the hash value of the chunk to be processed and the chunk storage position in the new hash entry 1521 of the hash table 152 (S511). The chunk storage location can be determined by any method as long as it is an unused portion of the pool volume 14, for example.

本実施形態では、ハッシュ値登録部１１４１は、ライト範囲に後続のチャンクがある場合（図８の例でチャンク００００の後続チャンクは０００１）、後続チャンクのハッシュ値およびハッシュエントリ位置を、処理対象のチャンクのハッシュ値に関するハッシュエントリ１５２１の後続ハッシュ値および後続ハッシュエントリ位置に登録する。一方で、ハッシュ値登録部１１４１は、ライト範囲に後続チャンクが無い場合（図８の例でチャンク０００８）、処理対象のチャンクのハッシュ値に関するハッシュエントリ１５２１の後続ハッシュ値と後続ハッシュエントリ位置をクリアする（Ｓ５１２）。 In this embodiment, if there is a subsequent chunk in the write range (in the example of FIG. 8, the chunk subsequent to chunk 0000 is 0001), the hash value registration unit 1141 stores the hash value and hash entry position of the subsequent chunk in the write range. Register the subsequent hash value and the subsequent hash entry position of the hash entry 1521 for the hash value of the chunk. On the other hand, if there is no subsequent chunk in the write range (chunk 0008 in the example of FIG. 8), the hash value registration unit 1141 clears the subsequent hash value and the subsequent hash entry position of the hash entry 1521 related to the hash value of the chunk to be processed. (S512).

Ｓ５０４の処理で非重複データと判定された場合、ハッシュ値登録部１１４１によるハッシュエントリ１５２１への情報登録が完了したら、チャンクデータライト部１１４２は、処理対象のチャンクのチャンク格納位置に対応するプールボリューム１４に、非重複データであるチャンクデータをライトする（Ｓ５１３）。チャンクデータライト部１１４２は、チャンク格納先アドレス情報が確定したら、管理ボリューム１５のチャンクアドレス変換テーブル１５１にチャンク格納アドレス情報を登録する（Ｓ５０６）。 If it is determined to be non-duplicate data in the process of S504, when the information registration in the hash entry 1521 by the hash value registration unit 1141 is completed, the chunk data write unit 1142 writes the pool volume corresponding to the chunk storage position of the chunk to be processed. 14, the chunk data, which is non-overlapping data, is written (S513). After the chunk storage destination address information is determined, the chunk data write unit 1142 registers the chunk storage address information in the chunk address conversion table 151 of the management volume 15 (S506).

そして、ライト命令処理部１１１は、ライト範囲内の全チャンクについてチャンクアドレス変換テーブル１５１へのチャンク格納アドレス情報の登録が完了したか否かを判定する（Ｓ５０７）。ライト命令処理部１１１は、チャンク格納アドレス情報の登録が完了していないチャンクがある場合（Ｓ５０７：ＮＯ）、Ｓ５０４に戻り上述した処理を繰り返す。一方で、ライト命令処理部１１１は、全チャンクについてチャンク格納アドレス情報の登録が完了した場合（Ｓ５０７：ＹＥＳ）、ホスト２にライト完了を報告する（Ｓ５０８）。これにより、ライト処理に関する一連の処理が完了する。 Then, the write command processing unit 111 determines whether or not registration of chunk storage address information in the chunk address conversion table 151 has been completed for all chunks within the write range (S507). If there is a chunk for which registration of chunk storage address information has not been completed (S507: NO), the write command processing unit 111 returns to S504 and repeats the above-described processing. On the other hand, when the registration of the chunk storage address information for all chunks is completed (S507: YES), the write command processing unit 111 reports write completion to the host 2 (S508). This completes a series of processing related to write processing.

＜３．２リード処理の流れ＞
図９を参照し、重複排除ボリューム１３のリード処理について説明する。
図９は、重複排除ボリューム１３のリード処理の流れを示すフローチャートである。まず、ディスクアレイ装置１のＩ／Ｏ処理部１１は、ホスト２から重複排除ボリューム１３に対するリード命令を受信する（Ｓ７０1）。リード命令処理部１１２は、ディスクアレイ装置１のＩ／Ｏ処理部１１がホスト２から重複排除ボリューム１３に対するリード命令を受信すると、メモリ１２にリードデータを作成するワークメモリ（ワークバッファ）を確保する（Ｓ７０２）。 <3.2 Read processing flow>
Read processing of the deduplication volume 13 will be described with reference to FIG.
FIG. 9 is a flowchart showing the flow of read processing for the deduplication volume 13. As shown in FIG. First, the I/O processing unit 11 of the disk array device 1 receives a read command for the deduplication volume 13 from the host 2 (S701). When the I/O processing unit 11 of the disk array device 1 receives a read command for the deduplication volume 13 from the host 2, the read command processing unit 112 secures a work memory (work buffer) for creating read data in the memory 12. (S702).

続いて、リード命令処理部１１２は、リード範囲内（リードアドレス内）の先頭チャンクから順に全てのチャンクに対して、チャンクアドレス変換テーブル１５１をリードし（Ｓ７０３）、チャンクアドレス変換テーブル１５１からチャンク格納位置を取得する（Ｓ７０４）。そして、リード命令処理部１１２は、チャンク格納位置に対応するプールボリューム１４のデータを上記ワークメモリにリードする（Ｓ７０５）。 Subsequently, the read command processing unit 112 reads the chunk address conversion table 151 for all chunks in order from the first chunk within the read range (within the read address) (S703), and stores chunks from the chunk address conversion table 151. A position is acquired (S704). Then, the read command processing unit 112 reads the data of the pool volume 14 corresponding to the chunk storage position to the work memory (S705).

続いて、リード命令処理部１１２は、リード範囲内の全チャンクについてリードが完了したか否かを判定する（Ｓ７０６）。リード命令処理部１１２は、リードが完了していないチャンクがある場合（Ｓ７０６：ＮＯ）、Ｓ７０３に戻り上述した処理を繰り返す。一方で、リード命令処理部１１２は、全チャンクについてリードが完了した場合（Ｓ７０６：ＹＥＳ）、ホスト２にリードデータを返却し、ワークメモリを開放する（Ｓ７０７）。これにより、リード処理に関する一連の処理が完了する。 Subsequently, the read command processing unit 112 determines whether or not all chunks within the read range have been read (S706). If there is a chunk for which reading has not been completed (S706: NO), the read command processing unit 112 returns to S703 and repeats the above-described processing. On the other hand, when the reading of all chunks is completed (S706: YES), the read command processing unit 112 returns the read data to the host 2 and releases the work memory (S707). This completes a series of processing related to read processing.

＜４．利点＞
重複排除を行うストレージ装置では、ライトデータが既にストレージ装置に記憶されているデータと重複しているか否かを調べる際に、ライトデータを一定長のチャンク分割し、チャンクから算出するハッシュ値などの特徴データ（ここでは便宜上「ハッシュ値」と記載）を既にストレージ記憶しているハッシュ値と比較することで判定することが行われる。ストレージ内でハッシュ値を記憶するハッシュテーブルは、全てのライトデータのチャンクに対するハッシュ値を記憶する必要があり膨大となる。一方、できるだけテーブルサイズを抑ながら高速に所望のハッシュ値を検索できるように、ハッシュテーブルは階層管理を用いたアルゴリズムが採用される。 <4. Advantage>
In a storage device that deduplicates, when checking whether write data overlaps with data already stored in the storage device, the write data is divided into chunks of a certain length, and a hash value calculated from the chunks is used. Determination is performed by comparing feature data (here, described as "hash value" for convenience) with hash values already stored in the storage. A hash table that stores hash values in storage needs to store hash values for all chunks of write data, and becomes huge. On the other hand, the hash table employs an algorithm using hierarchical management so that a desired hash value can be searched at high speed while keeping the table size as small as possible.

このためライト処理では、ライトデータをチャンクに分割し、チャンクに対するハッシュ値を算出し、算出したハッシュ値がハッシュテーブルに登録されているか否かを調べる。この際、ハッシュテーブルの階層を辿りながら検索するために複数回の内部処理が発生する。重複排除効果の高いストレージのチャンクサイズは例えば４ＫＢや８ＫＢであり、チャンクサイズより大きいサイズのライト要求をホストから受けるケースが多くなる。チャンクサイズより大きいサイズのライトではチャンク数分のハッシュ値検索が動作し、大量の内部処理が発生するためライト性能が低下するという場合がある。 Therefore, in the write process, write data is divided into chunks, hash values for the chunks are calculated, and it is checked whether the calculated hash values are registered in the hash table. At this time, internal processing occurs multiple times in order to search while tracing the hierarchy of the hash table. The chunk size of storage with a high deduplication effect is, for example, 4 KB or 8 KB, and there are many cases where a write request with a size larger than the chunk size is received from the host. If the write size is larger than the chunk size, hash value searches for the number of chunks are performed, and a large amount of internal processing occurs, which may reduce the write performance.

一方で、本実施形態では、ストレージ装置（例えばディスクアレイ装置１）は、ライト命令処理部（例えばライト命令処理部１１１）と、特徴データ登録部（例えばハッシュ値登録部１１４１）と、判定処理部（例えばハッシュ値判定処理部１１３）とを備える。ライト命令処理部は、ひと纏まりまたは一連のライトデータに含まれる所定の管理単位のデータ（例えばチャンク）ごとに、当該データに応じた特徴データ（例えばハッシュ値）を対応付ける。特徴データ登録部は、第１ライトデータに含まれる管理単位の第１データ（例えば図５中のチャンク番号００００のデータ）に対応する特徴データを記憶部（例えば記憶部Ｍ）に記憶させる場合、第１ライトデータに含まれて前記第１データの後続となる管理単位の第２データ（例えば図５中のチャンク番号０００１のデータ）に対応する特徴データを、第１データに対応する特徴データと関連付けて記憶部に記憶させる。判定処理部は、第２ライトデータに対して重複判定を行う場合に、第１データに対応する特徴データを用いて重複判定を行い、第１データに対応する特徴データとの関連付けに基づき第２データに対応する特徴データを読み出し、第２データに対応する特徴データを用いて重複判定を行う。このような構成によれば、第２データに対応する特徴データを特定することの負荷が低減され、ライト性能の向上を図ることができる。例えば上述した実施形態では、図６中のＳ６１１～Ｓ６１３に示すように後続ハッシュ値を利用してハッシュ値の登録の有無を調べることができるため、ハッシュテーブル１５２の検索のためのリード（S６０１～Ｓ６０３）に関する内部処理を削減することができる。 On the other hand, in this embodiment, the storage device (for example, disk array device 1) includes a write command processing unit (for example, write command processing unit 111), a feature data registration unit (for example, hash value registration unit 1141), and a determination processing unit. (eg, hash value determination processing unit 113). The write command processing unit associates feature data (eg, hash value) corresponding to each piece of data (eg, chunk) in a predetermined management unit included in a set or series of write data. When the feature data registration unit stores the feature data corresponding to the first data (for example, the data of chunk number 0000 in FIG. 5) of the management unit included in the first write data in the storage unit (for example, the storage unit M), The feature data corresponding to the second data (for example, the data of chunk number 0001 in FIG. 5) of the management unit included in the first write data and subsequent to the first data is defined as the feature data corresponding to the first data. Associated and stored in the storage unit. When performing overlap determination on the second write data, the determination processing unit performs overlap determination using the feature data corresponding to the first data, and determines the second write data based on association with the feature data corresponding to the first data. The feature data corresponding to the data is read, and overlap determination is performed using the feature data corresponding to the second data. According to such a configuration, the load of specifying the feature data corresponding to the second data is reduced, and the write performance can be improved. For example, in the above-described embodiment, as indicated by S611 to S613 in FIG. S603) can be reduced.

本実施形態では、特徴データは、管理単位のデータから算出されるハッシュ値である。このような構成によれば、管理単位のデータに応じた特徴データを比較的容易に求めることができる。これにより、ライト性能の向上を図ることができる。 In this embodiment, the feature data is a hash value calculated from the management unit data. According to such a configuration, it is possible to obtain the feature data corresponding to the data of the management unit relatively easily. As a result, the write performance can be improved.

本実施形態では、特徴データ登録部は、第２データに対応する特徴データを、第１データに対応する特徴データと同じエントリに登録する。このような構成によれば、別のテーブルを作成することなく、第１データに対応する特徴データと第２データに対応する特徴データとを関連付けて記憶することができる。これにより、別のテーブルを管理する場合にと比べて管理負担を低減しつつ、ライト性能の向上を図ることができる。 In this embodiment, the feature data registration unit registers the feature data corresponding to the second data in the same entry as the feature data corresponding to the first data. According to such a configuration, the feature data corresponding to the first data and the feature data corresponding to the second data can be associated and stored without creating another table. As a result, the write performance can be improved while reducing the management burden as compared with the case of managing another table.

なお、第２データに対応する特徴データを、第１データに対応する特徴データと関連付けて記憶部に記憶させることは、第２データに対応する特徴データを、第１データに対応する特徴データと同じエントリに登録することに限定されず、第２データに対応する特徴データの格納位置を示す情報を、第１データに対応する特徴データと同じエントリに登録することなども含む。この場合、上記関連付けに基づいて得られる第２データに対応する特徴データの格納位置を示す情報に基づき、第２データに対応する特徴データを取得することができる。 It should be noted that storing the feature data corresponding to the second data in the storage unit in association with the feature data corresponding to the first data means that the feature data corresponding to the second data is stored as the feature data corresponding to the first data. It is not limited to registering in the same entry, but includes registering information indicating the storage position of the feature data corresponding to the second data in the same entry as the feature data corresponding to the first data. In this case, the feature data corresponding to the second data can be acquired based on the information indicating the storage position of the feature data corresponding to the second data obtained based on the association.

本実施形態では、特徴データ登録部は、第２データに対応する特徴データと、第２データに対応する特徴データの格納位置を示す情報とを、第１データに対応する特徴データと同じエントリに登録する。このような構成によれば、上記エントリをリードすることで第２データに対応する特徴データが格納される格納位置を示す情報を取得することができ、これにより、上記格納位置からリードした第２データに対応する特徴データを用いて有効性の判定を行うことができる。これにより、ライトに関する信頼性の向上を図ることができる。 In this embodiment, the feature data registration unit stores the feature data corresponding to the second data and the information indicating the storage location of the feature data corresponding to the second data in the same entry as the feature data corresponding to the first data. sign up. According to such a configuration, by reading the entry, it is possible to obtain the information indicating the storage position where the feature data corresponding to the second data is stored. A validity determination can be made using feature data corresponding to the data. As a result, it is possible to improve the reliability of the write.

本実施形態では、特徴データ登録部は、第２データに対応する特徴データを記憶部に記憶させる場合、第１ライトデータに含まれて第２データの後続となる管理単位の第３データに対応する特徴データを、第２データに対応する特徴データと関連付けて記憶部に記憶させる。例えば、上述した実施形態では、第３データに対応するハッシュ値が、第２データに対応するハッシュ値のエントリに登録される。そして、判定処理部は、第２ライトデータに対して重複判定を行う場合に、第２データに対応する特徴データとの関連付けに基づき第３データに対応する特徴データを読み出し、第３データに対応する特徴データを用いて重複判定を行う。このような構成によれば、第３データに対応する特徴データについては、第２データに対応する特徴データとの関連付けに基づいて取得することができる。すなわち、同じ処理を繰り返すことで次々と後続データに対応する特徴データを取得することができる。これにより、ライト性能の向上をさらに図ることができる。 In this embodiment, when the feature data registration unit stores feature data corresponding to the second data in the storage unit, the feature data registration unit corresponds to the third data of the management unit included in the first write data and subsequent to the second data. The feature data corresponding to the second data is stored in the storage unit in association with the feature data corresponding to the second data. For example, in the embodiment described above, the hash value corresponding to the third data is registered in the hash value entry corresponding to the second data. Then, when performing overlap determination on the second write data, the determination processing unit reads out the feature data corresponding to the third data based on the association with the feature data corresponding to the second data, and reads the feature data corresponding to the third data. Duplicate determination is performed using feature data that According to such a configuration, the feature data corresponding to the third data can be acquired based on the association with the feature data corresponding to the second data. That is, by repeating the same process, it is possible to acquire feature data corresponding to succeeding data one after another. This makes it possible to further improve the write performance.

＜５．変形例＞
上述した実施形態では、チャンクサイズよりも大きいサイズのライトデータに関するものであったが、上述した実施形態は、例えば小さいサイズのデータのライト命令が連続するシーケンシャルライトにも適用可能である。シーケンシャルライトで受信するライトデータは、「一連のライトデータ」の一例であるとともに、「第１ライトデータ」および「第２ライトデータ」のそれぞれ別の一例である。ディスクアレイ装置１がホスト２からのシーケンシャルライトを検出した場合には、ハッシュエントリ１５２１に後続ハッシュ値と後続ハッシュエントリ位置とを記憶することで、続くアドレスへのライト処理におけるハッシュ値判定にて、後続ハッシュ値によるハッシュ判定（図６のＳ６１１～Ｓ６１２）を実施することが可能となる。これにより、ライト性能の向上を図ることができる。 <5. Variation>
Although the above-described embodiments relate to write data having a size larger than the chunk size, the above-described embodiments can also be applied to sequential writes in which write commands for small-sized data are consecutive, for example. The write data received by sequential write is an example of "a series of write data" and is an example of "first write data" and "second write data". When the disk array device 1 detects a sequential write from the host 2, by storing the subsequent hash value and the subsequent hash entry position in the hash entry 1521, the hash value determination in the write process to the subsequent address Hash determination (S611 to S612 in FIG. 6) can be performed using subsequent hash values. As a result, the write performance can be improved.

図１０は、最小構成のストレージ装置を示す図である。
図１１は、最小構成のストレージ装置における処理の流れを示すフローを示す図である。ストレージ装置８００は、ライト命令処理部８１０と、特徴データ登録部８２０と、判定処理部８３０とを備える。
ライト命令処理部８１０は、ひと纏まりまたは一連のライトデータに含まれる所定の管理単位のデータごとに、当該データに応じた特徴データを対応付ける（Ｓ９０２）。
特徴データ登録部８２０は、第１ライトデータに含まれる管理単位の第１データに対応する特徴データを記憶部に記憶させる場合、第１ライトデータに含まれて第１データの後続となる管理単位の第２データに対応する特徴データを、第１データに対応する特徴データと関連付けて記憶部に記憶させる（Ｓ９０１）。
判定処理部８３０は、第２ライトデータに対して重複判定を行う場合に、第１データに対応する特徴データを用いて重複判定を行い、第１データに対応する特徴データとの関連付けに基づき第２データに対応する特徴データを読み出し、第２データに対応する特徴データを用いて重複判定を行う（Ｓ９０３）。 FIG. 10 is a diagram showing a storage device with a minimum configuration.
FIG. 11 is a diagram showing a flow of processing in a storage device with a minimum configuration. The storage device 800 includes a write command processing section 810 , a characteristic data registration section 820 and a determination processing section 830 .
The write command processing unit 810 associates each piece of data in a predetermined management unit included in a batch or a series of write data with feature data corresponding to the data (S902).
When the feature data registration unit 820 stores feature data corresponding to the first data in the management unit included in the first write data in the storage unit, the feature data registration unit 820 stores the feature data in the management unit following the first data included in the first write data. is stored in the storage unit in association with the feature data corresponding to the first data (S901).
When performing overlap determination on the second write data, the determination processing unit 830 performs overlap determination using the feature data corresponding to the first data, and performs the overlap determination based on the association with the feature data corresponding to the first data. The feature data corresponding to the second data are read out, and overlap determination is performed using the feature data corresponding to the second data (S903).

図１２は、少なくとも１つの実施形態に係るストレージ装置のコンピュータの構成を示す概略ブロック図である。コンピュータ５は、ＣＰＵ６、メインメモリ７、ストレージ８、インターフェース９を備える。 FIG. 12 is a schematic block diagram showing the computer configuration of the storage device according to at least one embodiment. Computer 5 includes CPU 6 , main memory 7 , storage 8 and interface 9 .

例えば、上述のＩ／Ｏ処理部１１、メモリ１２、重複排除ボリューム１３、プールボリューム１４、および管理ボリューム１５のそれぞれは、コンピュータ５に実装される。そして、上述したＩ／Ｏ処理部１１の動作は、プログラムの形式でストレージ８に記憶されている。ＣＰＵ６は、プログラムをストレージ８から読み出してメインメモリ７に展開し、当該プログラムに従って上記処理を実行する。また、ＣＰＵ６は、プログラムに従って、上述した重複排除ボリューム１３、プールボリューム１４、および管理ボリューム１５に対応する記憶領域をストレージ８に確保する。メモリ１２は、メインメモリ７により実現される。 For example, each of the I/O processing unit 11 , memory 12 , deduplication volume 13 , pool volume 14 and management volume 15 described above is implemented in the computer 5 . The operation of the I/O processing unit 11 described above is stored in the storage 8 in the form of a program. The CPU 6 reads out the program from the storage 8, develops it in the main memory 7, and executes the above process according to the program. In addition, the CPU 6 secures storage areas in the storage 8 corresponding to the deduplication volume 13, the pool volume 14, and the management volume 15 described above according to the program. The memory 12 is implemented by the main memory 7 .

重複排除ボリューム１３、プールボリューム１４、および管理ボリューム１５の例としては、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）、ＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）、磁気ディスク、光磁気ディスク、ＣＤ－ＲＯＭ（ＣｏｍｐａｃｔＤｉｓｃＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、ＤＶＤ－ＲＯＭ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｃＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、半導体メモリなどが挙げられる。重複排除ボリューム１３、プールボリューム１４、および管理ボリューム１５は、コンピュータ５のバスに直接接続された内部メディアであってもよいし、インターフェース９または通信回線を介してコンピュータ５に接続される外部メディアであってもよい。また、このプログラムが通信回線によってコンピュータ５に配信される場合、配信を受けたコンピュータ５が当該プログラムをメインメモリ７に展開し、上記処理を実行してもよい。少なくとも１つの実施形態において、ストレージ８は、一時的でない有形の記憶媒体である。 Examples of the deduplication volume 13, pool volume 14, and management volume 15 include HDD (Hard Disk Drive), SSD (Solid State Drive), magnetic disk, magneto-optical disk, CD-ROM (Compact Disc Read Only Memory), DVD-ROM (Digital Versatile Disc Read Only Memory), semiconductor memory, and the like. The deduplication volume 13, pool volume 14, and management volume 15 may be internal media directly connected to the bus of the computer 5, or external media connected to the computer 5 via the interface 9 or communication line. There may be. Further, when this program is distributed to the computer 5 through a communication line, the computer 5 that receives the distribution may develop the program in the main memory 7 and execute the above process. In at least one embodiment, storage 8 is a non-transitory, tangible storage medium.

また、上記プログラムは、前述した機能の一部を実現してもよい。さらに、上記プログラムは、前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるファイル、いわゆる差分ファイル（差分プログラム）であってもよい。 Further, the program may implement part of the functions described above. Furthermore, the program may be a file capable of realizing the above functions in combination with a program already recorded in the computer system, that is, a so-called difference file (difference program).

本発明のいくつかの実施形態を説明したが、これらの実施形態は、例であり、発明の範囲を限定しない。これらの実施形態は、発明の要旨を逸脱しない範囲で、種々の追加、省略、置き換え、変更を行ってよい。 While several embodiments of the invention have been described, these embodiments are examples and do not limit the scope of the invention. Various additions, omissions, replacements, and modifications may be made to these embodiments without departing from the scope of the invention.

Ｓ…情報処理システム
１…ディスクアレイ装置
２…ホスト
１１…Ｉ／Ｏ処理部
１２…メモリ
１３…重複排除ボリューム
１４…プールボリューム
１５…管理モジュール
１１１…ライト命令処理部
１１１１…ハッシュ値算出部
１１２…リード命令処理部
１１３…ハッシュ値判定処理部
１１３１…ハッシュ値判定方法決定部
１１３２…ハッシュ値検索部
１１３３…後続ハッシュ値比較部
１１４…重複排除処理部
１１４１…ハッシュ値登録部
１１４２…チャンクデータライト部
８００…ストレージ装置
８１０…ライト命令処理部
８２０…特徴データ登録部
８３０…判定処理部 S... Information processing system 1... Disk array device 2... Host 11... I/O processing unit 12... Memory 13... Deduplication volume 14... Pool volume 15... Management module 111... Write command processing unit 1111... Hash value calculation unit 112... Read command processing unit 113 Hash value determination processing unit 1131 Hash value determination method determination unit 1132 Hash value search unit 1133 Subsequent hash value comparison unit 114 Deduplication processing unit 1141 Hash value registration unit 1142 Chunk data write unit 800... Storage device 810... Write command processing unit 820... Characteristic data registration unit 830... Judgment processing unit

上記目的を達成するために、本発明の第１態様によれば、ストレージ装置は、ひと纏まりまたは一連のライトデータに含まれる所定の管理単位のデータごとに、当該データに応じた特徴データを対応付けるライト命令処理手段と、第１ライトデータに含まれる前記管理単位の第１データに対応する前記特徴データである第１特徴データを記憶手段に記憶させる場合、前記第１ライトデータに含まれて前記第１データの後続となる前記管理単位の第２データに対応する前記特徴データである第２特徴データを、前記第１特徴データと関連付けて前記記憶手段に記憶させる特徴データ登録手段と、第２ライトデータに対して重複判定を行う場合に、前記第１特徴データと、前記第２ライトデータに含まれる前記管理単位の第４データに対応する前記特徴データである第４特徴データとを比較する第１重複判定を行い、前記第１重複判定において前記第１特徴データと前記第４特徴データとが一致し、前記第１特徴データとの関連付けに基づき前記第２特徴データが読み出し可能である場合、前記第１特徴データとの関連付けに基づき前記第２特徴データを読み出し、前記第２特徴データと、前記第２ライトデータに含まれて前記第４データの後続となる前記管理単位の第５データに対応する前記特徴データである第５特徴データとを比較する第２重複判定を行う判定処理手段と、前記第１重複判定の結果に基づき、前記第４データが重複データであるか否か判定し、前記第２重複判定の結果に基づき、前記第５データが重複データであるか否か判定する重複排除処理手段と、を備える。 In order to achieve the above object, according to a first aspect of the present invention, a storage device associates feature data corresponding to each piece of data in a predetermined management unit included in a bundle or a series of write data. When the write command processing means and the first characteristic data corresponding to the first data of the management unit included in the first write data are stored in the storage means, the first characteristic data included in the first write data are stored in the storage means. a feature data registering means for storing second feature data, which is the feature data corresponding to the second data of the management unit subsequent to the first data, in the storage means in association with the first feature data; comparing the first feature data with the fourth feature data corresponding to the fourth data of the management unit included in the second write data when performing overlap determination on the write data; When a first duplication determination is performed, the first feature data and the fourth feature data match in the first duplication determination, and the second feature data can be read based on the association with the first feature data. , reading the second feature data based on the association with the first feature data, and reading the second feature data and the fifth data of the management unit included in the second write data and subsequent to the fourth data; determination processing means for performing a second overlap determination for comparing the fifth feature data, which is the feature data corresponding to , and determining whether or not the fourth data is duplicate data based on the result of the first overlap determination and duplicate elimination processing means for determining whether or not the fifth data is duplicate data based on the result of the second duplicate determination.

上記目的を達成するために、本発明の第２態様によれば、情報処理システムは、ホストと、前記ホストと通信可能に接続されたストレージ装置と、を備える。前記ストレージ装置は、ひと纏まりまたは一連のライトデータに含まれる所定の管理単位のデータごとに、当該データに応じた特徴データを対応付けるライト命令処理手段と、第１ライトデータに含まれる前記管理単位の第１データに対応する前記特徴データである第１特徴データを記憶手段に記憶させる場合、前記第１ライトデータに含まれて前記第１データの後続となる前記管理単位の第２データに対応する前記特徴データである第２特徴データを、前記第１特徴データと関連付けて前記記憶手段に記憶させる特徴データ登録手段と、第２ライトデータに対して重複判定を行う場合に、前記第１特徴データと、前記第２ライトデータに含まれる前記管理単位の第４データに対応する前記特徴データである第４特徴データとを比較する第１重複判定を行い、前記第１重複判定において前記第１特徴データと前記第４特徴データとが一致し、前記第１特徴データとの関連付けに基づき前記第２特徴データが読み出し可能である場合、前記第１特徴データとの関連付けに基づき前記第２特徴データを読み出し、前記第２特徴データと、前記第２ライトデータに含まれて前記第４データの後続となる前記管理単位の第５データに対応する前記特徴データである第５特徴データとを比較する第２重複判定を行う判定処理手段と、前記第１重複判定の結果に基づき、前記第４データが重複データであるか否か判定し、前記第２重複判定の結果に基づき、前記第５データが重複データであるか否か判定する重複排除処理手段と、を含む。 To achieve the above object, according to a second aspect of the present invention, an information processing system includes a host and a storage device communicably connected to the host. The storage device comprises: write command processing means for associating characteristic data corresponding to each piece of data in a predetermined management unit included in a set or a series of write data; When the first feature data , which is the feature data corresponding to the first data, is stored in the storage means, it corresponds to the second data of the management unit included in the first write data and subsequent to the first data. a feature data registering means for storing the second feature data , which is the feature data, in the storage means in association with the first feature data; and fourth feature data, which is the feature data corresponding to the fourth data of the management unit included in the second write data . If the data and the fourth feature data match and the second feature data can be read based on the association with the first feature data, the second feature data is read based on the association with the first feature data. reading and comparing the second feature data with the fifth feature data which is the feature data corresponding to the fifth data of the management unit included in the second write data and subsequent to the fourth data ; determination processing means for performing 2- duplication determination; determining whether or not the fourth data is duplicate data based on the result of the first duplication determination; determining whether the fifth data is duplicate data based on the result of the second duplication determination; and deduplication processing means for determining whether the data is duplicated data .

上記目的を達成するために、本発明の第３態様によれば、情報処理方法は、ひと纏まりまたは一連のライトデータに含まれる所定の管理単位のデータごとに、当該データに応じた特徴データを対応付け、第１ライトデータに含まれる前記管理単位の第１データに対応する前記特徴データである第１特徴データを記憶させる場合、前記第１ライトデータに含まれて前記第１データの後続となる前記管理単位の第２データに対応する前記特徴データである第２特徴データを、前記第１特徴データと関連付けて記憶させ、第２ライトデータに対して重複判定を行う場合に、前記第１特徴データと、前記第２ライトデータに含まれる前記管理単位の第４データに対応する前記特徴データである第４特徴データとを比較する第１重複判定を行い、前記第１重複判定において前記第１特徴データと前記第４特徴データとが一致し、前記第１特徴データとの関連付けに基づき前記第２特徴データが読み出し可能である場合、前記第１特徴データとの関連付けに基づき前記第２特徴データを読み出し、前記第２特徴データと、前記第２ライトデータに含まれて前記第４データの後続となる前記管理単位の第５データに対応する前記特徴データである第５特徴データとを比較する第２重複判定を行なわせ、前記第１重複判定の結果に基づき、前記第４データが重複データであるか否か判定させ、前記第２重複判定の結果に基づき、前記第５データが重複データであるか否か判定させる、ことを含む。
In order to achieve the above object, according to a third aspect of the present invention, an information processing method provides, for each data of a predetermined management unit included in a set or a series of write data, feature data corresponding to the data. When storing the first feature data, which is the feature data corresponding to the first data of the management unit included in the first write data, the first feature data included in the first write data and subsequent to the first data When the second feature data , which is the feature data corresponding to the second data in the management unit, is stored in association with the first feature data, and the duplication determination is performed on the second write data, the first performing a first duplication determination for comparing the feature data with the fourth feature data, which is the feature data corresponding to the fourth data of the management unit included in the second write data; When the first feature data and the fourth feature data match and the second feature data can be read based on the association with the first feature data, the second feature data can be read based on the association with the first feature data. reading data, and comparing the second feature data with the fifth feature data, which is the feature data corresponding to the fifth data of the management unit included in the second write data and subsequent to the fourth data; determining whether the fourth data is duplicate data based on the result of the first duplication determination; determining whether the fifth data is duplicate data based on the result of the second duplication determination; It includes determining whether or not it is duplicate data .

上記目的を達成するために、本発明の第４態様によれば、プログラムは、コンピュータに、ひと纏まりまたは一連のライトデータに含まれる所定の管理単位のデータごとに、当該データに応じた特徴データを対応付けることと、第１ライトデータに含まれる前記管理単位の第１データに対応する前記特徴データである第１特徴データを記憶させる場合、前記第１ライトデータに含まれて前記第１データの後続となる前記管理単位の第２データに対応する前記特徴データである第２特徴データを、前記第１特徴データと関連付けて記憶させることと、第２ライトデータに対して重複判定を行う場合に、前記第１特徴データと、前記第２ライトデータに含まれる前記管理単位の第４データに対応する前記特徴データである第４特徴データとを比較する第１重複判定を行い、前記第１重複判定において前記第１特徴データと前記第４特徴データとが一致し、前記第１特徴データとの関連付けに基づき前記第２特徴データが読み出し可能である場合、前記第１特徴データとの関連付けに基づき前記第２特徴データを読み出し、前記第２特徴データと、前記第２ライトデータに含まれて前記第４データの後続となる前記管理単位の第５データに対応する前記特徴データである第５特徴データとを比較する第２重複判定を行うことと、前記第１重複判定の結果に基づき、前記第４データが重複データであるか否か判定し、前記第２重複判定の結果に基づき、前記第５データが重複データであるか否か判定することと、を実行させる。
In order to achieve the above object, according to the fourth aspect of the present invention, a program is provided in a computer for each data in a predetermined management unit included in a set or a series of write data, characteristic data corresponding to the data and when storing the first feature data, which is the feature data corresponding to the first data of the management unit included in the first write data, the first data included in the first write data storing the second feature data , which is the feature data corresponding to the second data of the succeeding management unit, in association with the first feature data; and performing duplication determination on the second write data. , performing a first duplication determination for comparing the first feature data with the fourth feature data, which is the feature data corresponding to the fourth data of the management unit included in the second write data, and performing the first duplication determination ; If the first feature data and the fourth feature data match in determination and the second feature data can be read based on the association with the first feature data, based on the association with the first feature data reading out the second feature data, and a fifth feature that is the second feature data and the feature data corresponding to the fifth data of the management unit that is included in the second write data and succeeds the fourth data; performing a second overlap determination comparing the data , determining whether the fourth data is overlap data based on the result of the first overlap determination, and determining whether the fourth data is overlap data based on the result of the second overlap determination; and determining whether the fifth data is duplicate data .

Claims

write command processing means for associating feature data corresponding to each piece of data in a predetermined management unit included in a set or a series of write data;
When the feature data corresponding to the first data of the management unit included in the first write data is stored in the storage means, the first data of the management unit included in the first write data and subsequent to the first data is stored in the storage means. a feature data registering means for storing the feature data corresponding to the second data in the storage means in association with the feature data corresponding to the first data;
When the overlap determination is performed on the second write data, the overlap determination is performed using the feature data corresponding to the first data, and the second write data is associated with the feature data corresponding to the first data. determination processing means for reading the feature data corresponding to the data and performing overlap determination using the feature data corresponding to the second data;
A storage device with

The feature data is a hash value calculated from the data of the management unit,
The storage device according to claim 1.

The feature data registration means registers the feature data corresponding to the second data in the same entry as the feature data corresponding to the first data.
3. The storage device according to claim 1 or 2.

The feature data registration means registers the feature data corresponding to the second data and the information indicating the storage location of the feature data corresponding to the second data to be the same as the feature data corresponding to the first data. register for entry
The storage device according to any one of claims 1 to 3.

The storage means stores table information including a plurality of hierarchies, and position information of the entry is stored in table information of the lowest layer of the plurality of hierarchies.
5. The storage device according to claim 3 or 4.

When the feature data registering means stores the feature data corresponding to the second data in the storage means, the third data of the management unit included in the first write data and succeeding the second data. the feature data corresponding to is stored in the storage means in association with the feature data corresponding to the second data;
The determination processing means reads the feature data corresponding to the third data based on the association with the feature data corresponding to the second data when performing overlap determination on the second write data, and performing overlap determination using the feature data corresponding to the third data;
The storage device according to any one of claims 1 to 5.

host and
a storage device communicably connected to the host;
with
The storage device is
write command processing means for associating feature data corresponding to each piece of data in a predetermined management unit included in a set or a series of write data;
When the feature data corresponding to the first data of the management unit included in the first write data is stored in the storage means, the first data of the management unit included in the first write data and subsequent to the first data is stored in the storage means. a feature data registering means for storing the feature data corresponding to the second data in the storage means in association with the feature data corresponding to the first data;
When the overlap determination is performed on the second write data, the overlap determination is performed using the feature data corresponding to the first data, and the second write data is associated with the feature data corresponding to the first data. determination processing means for reading the feature data corresponding to the data and performing overlap determination using the feature data corresponding to the second data;
An information processing system, including;

Associate feature data corresponding to each data in a predetermined management unit included in a set or a series of write data,
When storing the characteristic data corresponding to the first data of the management unit included in the first write data, the second data of the management unit included in the first write data and succeeding the first data storing the corresponding feature data in association with the feature data corresponding to the first data;
When the overlap determination is performed on the second write data, the overlap determination is performed using the feature data corresponding to the first data, and the second write data is associated with the feature data corresponding to the first data. reading the feature data corresponding to the data, and performing overlap determination using the feature data corresponding to the second data;
Information processing methods.

to the computer,
Associating feature data corresponding to each piece of data in a predetermined management unit included in a bundle or a series of write data;
When storing the characteristic data corresponding to the first data of the management unit included in the first write data, the second data of the management unit included in the first write data and succeeding the first data storing the corresponding feature data in association with the feature data corresponding to the first data;
When the overlap determination is performed on the second write data, the overlap determination is performed using the feature data corresponding to the first data, and the second write data is associated with the feature data corresponding to the first data. reading the feature data corresponding to the data and performing overlap determination using the feature data corresponding to the second data;
program to run.