JPWO2020081512A5

JPWO2020081512A5 -

Info

Publication number: JPWO2020081512A5
Application number: JP2021520585A
Authority: JP
Publication date: 2023-02-22
Anticipated expiration: 2039-10-15

Description

本明細書で説明される実施形態は、データブロックの重複排除を行うように構成されているクラスタのストレージノードによってサービスされる論理ボリューム（「ｖｏｌｕｍｅｓ」）のデータブロックの複製および消去コーディングのような、様々なデータ保護スキームのためのストレージ利用を改善するように構成されている技術に関する。追加的に、この技術は、ストレージノードのストレージ空間を減少させながら、各重複排除されたデータブロックがデータ保護スキームのデータ完全性保証に適合することを保証するように構成されている。すなわち、データブロックに適用される複数のデータ保護スキームに対しては、同じデータ完全性保証を維持しながら、データ完全性保証を提供するために必要な冗長情報のストレージを低減してもよい。 Embodiments described herein provide data block replication and erasure coding for logical volumes ("volumes") serviced by storage nodes of a cluster that are configured to deduplicate data blocks. , relates to techniques configured to improve storage utilization for various data protection schemes. Additionally, the technique is configured to ensure that each deduplicated data block meets the data integrity guarantees of the data protection scheme while reducing the storage space of the storage node. That is, for multiple data protection schemes applied to data blocks, the storage of redundant information required to provide data integrity assurance may be reduced while maintaining the same data integrity assurance.

各ボリュームは、ボリュームのためのデータを記憶するデータブロックや、ボリュームのデータを説明するメタデータブロックのような、データ構造のセットとして実装されてもよい。ボリュームは、データブロックに分割されてもよい。各ノードに実装されるストレージサービスは、メタデータを処理し記憶するように構成されている１つ以上のメタデータ（スライス）サービスを有するメタデータ層と、ノードのストレージデバイスでデータを処理（重複排除）し、これを記憶するように構成されている１つ以上のブロックサービスを有するブロックサーバ層と、を含む。特に、ブロックサービスは、様々なデータ保護スキームによって提供される最大限の程度のデータ保護を提供するように構成され、ボリューム間の様々なデータ保護スキームにもかかわらず、依然としてボリューム全体にわたってデータブロックを重複排除する。 Each volume may be implemented as a set of data structures, such as data blocks that store data for the volume and metadata blocks that describe the data of the volume. A volume may be divided into data blocks. The storage service implemented in each node has a metadata layer with one or more metadata (slice) services configured to process and store metadata, and the node's storage device to process (duplicate) data. a block server layer having one or more block services configured to store it. In particular, block services are configured to provide the greatest degree of data protection provided by various data protection schemes, and still provide data blocks across volumes despite various data protection schemes between volumes. Deduplicate .

この技術によれば、データブロックはまた、対応する保護スキームの表示に関連付けられる（タグ付けされる）。例えば、二重複製データ保護を有するボリュームのデータブロック（すなわち、各々が１レプリカを有するデータブロック）は、Ｒ０データブロックがＲ０ブロックサービスに割り当てられ、Ｒ１データブロックが同じビンであるが異なるブロックサービス、すなわち、プライマリＲ１ブロックサービスに割り当てられるため、２つのブロックサービスに割り当てられるデータブロックを有することがある。例示的に、データブロックは、二重複製データ保護を備えた第１のボリュームに属し、三重複製データ保護を備えた異なる第２のボリュームに属することがある。この技術は、より高い（最も高い）データ完全性保証（すなわち、最も高いデータ保護スキーム）でボリュームを満足させるために、データブロックの十分な複製があることを保証する。例示的に、ノードのスライスサービスは、次に、データブロックのコピー（例えば、二重複製の場合は、Ｒ０、Ｒ１、または三重複製の場合は、Ｒ０～Ｒ２）を、識別されたストレージノードに関連付けられたブロックサービスに非同期的にフラッシュするために、複製スキームに基づき記憶リクエストを発行してもよい。 According to this technique, data blocks are also associated (tagged) with an indication of the corresponding protection scheme. For example, data blocks of a volume with dual-replicate data protection (i.e., data blocks each having one replica) are assigned R0 data blocks to the R0 block service and R1 data blocks to the same bin but different block services. ie, it may have data blocks allocated to two block services because it is allocated to the primary R1 block service. Illustratively, a data block may belong to a first volume with double-replicated data protection and belong to a different second volume with triple-replicated data protection. This technique ensures that there are enough copies of data blocks to satisfy the volume with the higher (highest) data integrity guarantee (ie, the highest data protection scheme). Illustratively, the node's slice service then copies the data block (eg, R0, R1 for dual replication, or R0-R2 for triple replication ) to the identified storage node. A storage request may be issued based on a replication scheme to asynchronously flush to the associated block service.

各ＳＳＤ２７０ａ、ｂ（または外部ストレージアレイ１５０のストレージデバイス）に対するブロックサービス６１０、６２０は、それがデータブロックのコピーを予め記憶しているかどうかを決定する。もしそうでなければ、ブロックサービス６１０、６２０は、ブロックＩＤに関連付けられた圧縮データブロックをＳＳＤ２７０ａ，ｂ上に記憶する。集約されたＳＳＤのブロックストレージプールは、（いつデータが書き込まれたか、またはそれがどこを起源とするかではなく）ブロックＩＤの内容によって組織化され、それにより、クラスタの「コンテンツアドレス指定可能」な分散ストレージアーキテクチャを提供することに留意する。このようなコンテンツアドレス指定可能なアーキテクチャは、クラスタの少なくとも２つのＳＳＤに記憶された各データブロックの少なくとも２つのコピーを除いて、ＳＳＤレベルで「自動的に」（すなわち、「自由に」）データの重複排除を促進する。換言すれば、分散ストレージアーキテクチャは、データのさらなるコピーのインライン重複排除を伴うデータの単一の複製を利用する。すなわち、ハードウェア故障の場合に冗長性の目的のためにデータの少なくとも２つのコピーが存在する。 Block services 610, 620 for each SSD 270a,b (or storage device of external storage array 150) determines whether it has pre-stored a copy of the data block. If not, block services 610, 620 store the compressed data block associated with the block ID on SSD 270a,b. The aggregated SSD's block storage pool is organized by the contents of the block ID (rather than when the data was written or where it originated), thereby making the clusters "content addressable". provided a distributed storage architecture. Such a content-addressable architecture "automatically" (i.e., "freely") data at the SSD level, with the exception of at least two copies of each data block stored on at least two SSDs of the cluster. facilitates deduplication of In other words, a distributed storage architecture utilizes a single copy of data with in-line deduplication of further copies of the data. That is, there are at least two copies of the data for redundancy purposes in case of hardware failure.

本明細書で説明される実施形態は、データブロックの重複排除を実行するように構成されているクラスタのストレージノードがサービスする論理ボリュームのデータブロックのための、複製および消去コーディングなどの様々なデータ保護スキームのためのストレージ利用を改善するための技術に関する。追加的に、この技術は、ストレージノードのストレージ空間を改善しながら、各重複排除されたデータブロックがデータ保護スキームのデータ完全性保証に準拠することを保証するように構成されている。すなわち、データブロックに適用される複数のデータ保護スキームに対しては、同じデータ完全性保証を維持しながら、データ完全性保証を提供するために必要な冗長情報のストレージを低減することができる。 Embodiments described herein provide various data blocks, such as replication and erasure coding, for data blocks of logical volumes served by storage nodes of a cluster that are configured to perform data block deduplication . Techniques for improving storage utilization for protection schemes. Additionally, the technology is configured to ensure that each deduplicated data block complies with the data integrity guarantees of the data protection scheme while improving the storage space of the storage node. That is, for multiple data protection schemes applied to data blocks, the storage of redundant information required to provide data integrity assurance can be reduced while maintaining the same data integrity assurance.

上述のように、各ノードに実装されるストレージサービスは、メタデータを処理および記憶するように構成されている１つ以上のメタデータ（スライス）サービスを有するメタデータ層と、データを処理し（複製し）、ノードのストレージデバイスでデータを処理（重複排除）し、これを記憶するように構成された１つ以上のブロックサービスを有するブロックサーバ層と、を含む。特に、ブロックサービスは、種々のデータ保護スキームによって提供される最大限の程度のデータ保護を提供するように構成されており、ボリューム間でデータ保護スキームが変動するにもかかわらず、依然として、ボリューム全体にわたってデータブロックを重複排除する。 As described above, the storage service implemented in each node includes a metadata layer having one or more metadata (slice) services configured to process and store metadata, and a metadata layer to process data ( a block server layer having one or more block services configured to replicate ) , process (deduplicate) and store data on the node's storage devices. In particular, block services are configured to provide the greatest degree of data protection provided by various data protection schemes, and still provide data protection across volumes despite varying data protection schemes between volumes. Deduplicate data blocks across

ブロックＢに対する書き込み要求に応答して、スライスサービス３６０ａは、対応するブロックサービス６１０によるストレージのためにブロックＢを準備する。先に示したように、データブロックは、ビン割り当てテーブル４７０に従って各ブロックサービスに割り当てられたビンによって記憶される。上記のように、データブロックは、データブロックに対するブロックＩＤ５０６の先頭ビット（すなわち、ビンフィールド５０８のビット）に基づいてビンに割り当てられてもよい。また、前述のように、ブロックＩＤは、データブロックの暗号ハッシュに基づいて生成されてもよく、データブロックは、次いで、ビンフィールド５０８のビン識別子に対応するビンに記憶される。例えば、ブロックＢは、ビンフィールド５０８内に先頭ビット「１」を有するブロックＩＤを有し、したがって、ｂｉｎ１－０に割り当てられ、ｂｉｎ１－０は、次にブロックサービス６１０に割り当てられると仮定する。重複排除の結果として、単一のデータブロックが複数のボリュームに関連付けられる可能性があることに留意する。例示的には、ブロックＡは、スライスファイル６０７に示されているように、ボリューム１およびボリューム２の両方に関連付けられているが、ストレージ空間を保存するために、ビン１－０内に１回だけ記憶される。一実施形態では、ブロックはブロックＩＤ５０６と共に記憶されるので、ブロックサービスは、同じハッシュ識別子を有するブロックがすでに記憶されていることを決定することによって、ブロックの重複コピーを記憶することを回避する。 In response to the write request for block B, slice service 360 a prepares block B for storage by the corresponding block service 610 . As indicated above, data blocks are stored by bins assigned to each block service according to bin assignment table 470 . As noted above, data blocks may be assigned to bins based on the leading bit of block ID 506 (ie, the bits of bin field 508) for the data block. Also, as noted above, the block ID may be generated based on a cryptographic hash of the data block, which is then stored in the bin corresponding to the bin identifier in bin field 508 . For example, assume block B has a block ID with a leading bit of '1' in bin field 508 and is therefore assigned to bin 1-0, which in turn is assigned to block service 610. FIG. Note that a single data block may be associated with multiple volumes as a result of deduplication . Illustratively, block A is associated with both volume 1 and volume 2, as shown in slice file 607, but once in bin 1-0 to conserve storage space. only stored. In one embodiment, since blocks are stored with block IDs 506, the block service avoids storing duplicate copies of blocks by determining that a block with the same hash identifier is already stored.

一般に、ガーベッジコレクションは、いかなるＤＰＳのデータ完全性保証にも違反しない方法で採用することができる。たとえば、同じブロックが二重複製および三重複製ボリュームで記憶される場合、そのブロックの少なくとも３つのコピーが存在することが必要となる。すなわち、データ完全性は、ＤＰＳに対する故障の冗長性のレベル（例えば、ｋ個の故障の場合のデータ損失なし）を保証する。スライスサービスは、全ての読み出しおよび書き込み動作に対するデータのＤＰＳを指定することができる。ＤＰＳの仕様は、ブロックサービスが追加のエラーチェックを実行することを可能にすることができる。本明細書で説明される技術は、例示的に、ブロックの符号化されたパリティコピーを記憶するよりも、データブロックの符号化されていないコピーを記憶することを優先し、そのような優先順位付けは、複製ベースのＤＰＳでボリュームに記憶されたデータに対して、劣化した読み出し性能の改善を顧客に提供する。一実施形態では、書き込みグループは、１つのＤＰＳのために作成され、これは、符号化されたブロックと共に記憶するために必要とされるＤＰＳ情報を低減しつつ、符号化、ガーベッジコレクションおよびビン同期を簡単にする。また、ストレージ空間を最大化するか、または異なる数の符号化されたコピーを有する書き込みグループ間の重複排除を実行することは、ガーベッジコレクション中の書き込み増幅を増加させる。 In general, garbage collection can be employed in a manner that does not violate any DPS data integrity guarantees. For example, if the same block is stored in dual-replicated and triple-replicated volumes, there must be at least three copies of that block. That is, data integrity guarantees a level of failure redundancy for the DPS (eg, no data loss for k failures). A slice service can specify the DPS of data for all read and write operations. The DPS specification may allow block services to perform additional error checking. The techniques described herein illustratively prioritize storing unencoded copies of data blocks over storing encoded parity copies of blocks, and such priority Tagging provides customers with improved read performance that has degraded over data stored in volumes with replication-based DPS. In one embodiment, a write group is created for one DPS, which reduces the DPS information required to store with the encoded block, while enabling encoding, garbage collection and bin synchronization. make it easy. Also, maximizing storage space or performing deduplication between write groups with different numbers of encoded copies increases write amplification during garbage collection.