CN116235140A

CN116235140A - Block storage method and system for simplifying data in data deduplication

Info

Publication number: CN116235140A
Application number: CN202080105821.5A
Authority: CN
Inventors: 兹维·施耐德; 阿萨夫·纳塔逊
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2020-10-01
Filing date: 2020-10-01
Publication date: 2023-06-06
Also published as: WO2022069041A1

Abstract

A block storage method for storing incoming I/O write requests in the form of blocks of data in a block memory, the block storage method reducing the amount of write amplification in the block memory. The method includes dividing the incoming I/O write request into a first incoming partition and a second incoming partition using a content-based variable partitioning method. The method also includes calculating a first hash value of the first incoming chunk and comparing the first hash value to a hash value in a map. If there is a match, the method includes identifying a matching pre-stored chunk and storing at least one pointer to a pre-stored first data chunk in the pre-stored chunk data for efficient data deduplication.

Description

Block storage method and system for simplifying data in data deduplication

Technical Field

The present invention relates generally to the field of data protection and backup, and more particularly, to a block storage method, a block storage device, and a block storage system for compacting data in data deduplication.

Background

In general, data backup is used to protect and recover data in the event of a data loss in a primary storage system (e.g., a host server). For security reasons, a separate backup system or storage system is widely used to store a backup of data present in the primary storage system. In general, over time, storage space of a storage system is gradually occupied due to changes in data or any new data that may occupy a large amount of storage space in a conventional storage system. This is undesirable because it reduces the performance of the storage system. Furthermore, the cost of data storage, and all associated costs including the cost of storage hardware, remains a burden. In general, data deduplication is a process widely used by storage systems to eliminate duplicate or redundant data stored on the storage system without affecting the fidelity of the original data. In a storage system, data is typically stored in block devices, where a given block device is a computer data storage device that supports reading and optionally writing data in fixed-size blocks, sectors, or clusters. Conventional block storage systems (or devices) typically use only fixed-size deduplication. Such conventional block storage systems typically have an initial online phase that uses entries in the cache to perform some fixed-size deduplication, followed by offline processing that sometimes applies more powerful data reduction techniques, such as differential compression. The main problem is that offline data reduction techniques require a lot of computational processing power (e.g. a lot of central processing units are used) and also lead to read and write amplification, since the data has to be processed again after writing, which is not desirable.

Currently, there are many techniques available for data reduction, such as fixed-size data deduplication. Fixed size deduplication techniques divide stored data into fixed size aligned blocks, such as aligned blocks of 4KB, 8KB, or 16KB in size. For each aligned block, a strong hash value is generated. An aligned block is considered to be identical if it to be written has the same hash value as a block already stored in the storage system. However, fixed size deduplication techniques provide a degree of reliable data repetition for small-sized data blocks. For large-sized data blocks, the problem is that the deduplication rate is not efficient enough (i.e., it shows inefficiency in deduplication). Yet another conventional technique for data reduction is called similarity compression or differential compression. The differential compression technique uses an off-line process in which data is processed after being written into the storage system to perform data deduplication, so that the differential compression technique increases write amplification of the data and requires a large number of CPUs to perform the data deduplication process. In addition, when reading data compressed with differential compression, decompression needs to be applied to a larger data block, which also requires more CPU cycles and increases the time to complete the operation. In some cases, conventional block storage systems use a two-phase approach for data compaction. In the two-phase approach, conventional block storage systems attempt data deduplication and similarity compression in the online phase and later try again in the offline phase. On-line deduplication removes some redundancy in the data before it is written to a conventional block storage system. However, compressing data during the online phase uses fixed-size deduplication techniques. Furthermore, the offline processing requires a large number of CPUs and causes read and write amplification, since the data must be processed again after writing.

Thus, in light of the foregoing discussion, there is a need to overcome the above-described drawbacks associated with conventional methods and systems for data deduplication.

Disclosure of Invention

The present invention seeks to provide a block storage method, a block storage device and a block storage system for compacting data in data deduplication. The present invention seeks to provide a solution to the inefficiency of existing data duplication techniques, namely how to reduce write amplification and computation at higher compression ratios for data deduplication in a storage system. It is an object of the present invention to provide a solution which at least partly overcomes the problems encountered in the prior art and provides a block storage method, device and system with improved performance and efficiency in terms of data compaction, wherein the amount of write amplification is significantly reduced.

The object of the invention is achieved by the solution provided in the attached independent claims. Advantageous implementations of the invention are further defined in the dependent claims.

In one aspect, the present invention provides a block storage method for storing incoming I/O write requests in a block memory in the form of data blocks. The block memory comprises previously stored data for which a hash value has been calculated according to a content-based variable chunking method, and a mapping between the hash value and a pointer to a logical address of a corresponding pre-stored chunk of the previously stored data. The method includes the step of dividing the incoming I/O write request into a first incoming partition and a second incoming partition using the content-based variable partitioning method. The method further comprises the steps of: a first hash value of the first incoming chunk is calculated and compared to the hash value in the map and if there is a match. The method further comprises: the method further includes identifying a matching pre-stored chunk based on the first hash value and storing at least one pointer to a pre-stored first data chunk of the pre-stored chunk data that has at least partially the same data as a first incoming data chunk of the first incoming chunk in the chunk memory, instead of storing the first incoming data chunk in the chunk memory. If there is no match, the method includes storing the first incoming chunk in the chunk memory, and storing the first hash value in the map in the chunk memory along with a pointer to a logical address of the first incoming chunk.

The method of the invention can carry out high-efficiency data deduplication on the input I/O writing request, and improve the performance and efficiency in the aspect of data compaction. This approach reduces the amount of write amplification and reduces the computation required for the block memory. In the case where the size of the incoming I/O write request is greater than a defined threshold, the incoming I/O write request employs a variable length deduplication technique (as in this case). The method uses variable length deduplication techniques to deduplicate incoming I/O write requests (e.g., large I/O write requests that are greater than a defined threshold are separated from small I/O write requests that are less than a defined threshold). Variable length deduplication techniques provide better performance, higher deduplication rates, and less write amplification for large-sized I/O write requests. The first hash value of the first incoming chunk is compared to the hash values in the map to check if the first incoming chunk already exists in the chunk store. Thus, data deduplication is achieved by comparing the first hash value to the hash values in the map without first storing the first incoming chunk in the chunk store.

In one implementation, if the pre-stored first data block has exactly the same data as the first incoming data block, the method further includes storing a pointer to the pre-stored first data block instead of storing the first incoming data block.

In case the data in the first incoming data block is identical to the data in the pre-stored first data block, only one pointer needs to be allocated to the pre-stored first block in the block memory. The pointer indicates the physical storage location of the pre-stored first block on the block memory, thus preventing further repetition (i.e., reducing redundancy) of the first incoming data block in the block memory.

In one implementation, if the pre-stored first data block has the same data as the first incoming data block portion, the method includes storing one pointer to the pre-stored first data block and one pointer to an adjacent block having the same data as the remaining portion of the first incoming data block, instead of storing the first incoming data block.

In another case where the data in the first incoming data block is only partially identical to the pre-stored first data block (i.e., some portions of the data are identical, and some portions are different or similar but different), two pointers may be allocated instead of storing the actual first incoming data block, which reduces the amount of data to be stored in the block memory.

In one implementation, if the first incoming partition includes a first incomplete data block in addition to one or more complete fixed-size incoming data blocks, a remaining portion of the first incomplete data block is stored in an adjacent incoming partition of the first incoming partition, the method includes identifying a second pre-stored data block in the first pre-stored data partition, the second pre-stored data block having the same data as the first portion of the first incomplete data block. The method also includes identifying an adjacent incoming partition of the first incoming partition and obtaining a second hash value of the adjacent incoming partition. If there is a match in the block memory with the second hash value, the method includes identifying a third pre-stored data block corresponding to the second hash value, identifying a second pre-stored data block in the second pre-stored data block having the same data as the rest of the first incomplete data block, and storing a pointer to the second pre-stored data block and a pointer to the third pre-stored data block instead of storing the first incomplete data block. If there is no match in the block memory with the second hash value, the method includes storing a pointer to the second pre-stored data block with the remainder of the first incomplete data block instead of storing the first incomplete data block.

Each incoming chunk (i.e., a variable chunk) has an arbitrary data size. The method can provide efficient data compaction even if the first incoming partition has incomplete data blocks, e.g. the first incomplete data block. Thus, data deduplication is achieved by comparing hash values in the maps without the need to store the entire first incoming chunk and the adjacent incoming chunks in chunk memory.

In one implementation, the block memory is divided into at least a first portion in which the size of the I/O write request is expected to be above a threshold and a second portion in which the size of the I/O write request is expected to be below the threshold. The method includes the steps of checking the size of the incoming I/O write request and performing the step (above) when the size is above the threshold. If the size is below a threshold, the method includes performing fixed-size deduplication on the first data block.

The method uses fixed-size deduplication and variable-size deduplication simultaneously, thereby improving performance and efficiency in terms of data reduction. When the size of the I/O write request is within a defined threshold, a fixed size deduplication method is performed on the incoming I/O write request. Since the incoming I/O write request is below the defined threshold, the write amplification and computation are very small, and therefore a fixed-size deduplication method is implemented. However, when the I/O write request is greater than a defined threshold, the method may implement variable size deduplication. Variable size deduplication performed on large incoming I/O write requests reduces the amount of write amplification and computation performed on incoming I/O write requests. Variable size deduplication further increases the compression ratio, thereby increasing the overall performance of the block memory.

In one implementation, the expected size of the I/O write request is determined based on statistics of the size of the I/O write request stored in the system in conjunction with an address of the I/O write request in the block storage system.

Statistics of the size of I/O write requests and the addresses of I/O write requests in the block storage system stored in the system enable determination of appropriate thresholds to determine whether an incoming I/O write request is a large incoming I/O write request or a small incoming I/O write request. Thus, it may be determined whether fixed-size deduplication or variable-size deduplication techniques need to be implemented on incoming I/O write requests.

In one implementation, the content-based variable chunking method is a Rabin hash method.

In another aspect, the present invention provides a block storage device. The block storage device includes a disk for storing I/O write requests. The apparatus includes logic circuitry to perform the method for each incoming I/O write request.

The block storage device of the present aspect achieves all the advantages and effects of the method of the present invention.

In another aspect, the present invention provides a block storage system. The block storage system includes a block storage device and a control system including logic circuitry.

The block storage system of this aspect achieves all the advantages and effects of the method of the invention.

In another aspect, the invention provides a control system for controlling writing of data to a block storage device, the control system being arranged to perform the method for each incoming I/O write request.

The control system of this aspect achieves all the advantages and effects of the method of the invention.

In one aspect, a computer program product is used in a control system for controlling writing of data to a block storage device. The computer program product comprises computer readable instructions which, when executed in the control system, cause the control system to perform the method.

The computer program product of this aspect achieves all the advantages and effects of the method of the invention.

It should be noted that all devices, elements, circuits, units and modules described in this application may be implemented in software or hardware elements or any type of combination thereof. All steps performed by the various entities described in this application, as well as the functions described to be performed by the various entities, are intended to indicate that the respective entities are adapted to or for performing the respective steps and functions. Although in the following description of specific embodiments, specific functions or steps performed by external entities are not reflected in the description of specific detailed elements of the entity performing the specific steps or functions, it should be clear to a skilled person that these methods and functions may be implemented by corresponding hardware or software elements or any combination thereof. It will be appreciated that features of the invention are susceptible to being combined in various combinations without departing from the scope of the invention as defined by the accompanying claims.

Additional aspects, advantages, features and objects of the invention will become apparent from the accompanying drawings and detailed description of illustrative implementations which are explained in connection with the following appended claims.

Drawings

The foregoing summary, as well as the following detailed description of illustrative embodiments, is better understood when read in conjunction with the appended drawings. For the purpose of illustrating the invention, there is shown in the drawings exemplary constructions of the invention. However, the invention is not limited to the specific methods and instrumentalities disclosed herein. Moreover, those skilled in the art will appreciate that the drawings are not drawn to scale. Wherever possible, like elements are designated by like numerals.

Embodiments of the invention will now be described, by way of example only, with reference to the following figures, in which:

FIG. 1 is a flow chart of a block storage method for reducing data in data deduplication provided by an embodiment of the present invention;

FIG. 2A is a diagram of a block storage system with a block storage device and a control system provided by an embodiment of the present invention;

FIG. 2B is a diagram of exemplary variable length deduplication provided by embodiments of the present invention;

3A, 3B, 3C, 3D and 3E are exemplary illustrations of various operations provided by embodiments of the present invention for data deduplication and data reduction;

Fig. 4A, 4B, and 4C collectively refer to a flowchart of a method for reducing data in data deduplication of incoming I/O write requests in a block storage device, provided by another embodiment of the present invention.

In the drawings, the underlined numbers are used to denote items where the underlined numbers are located or items adjacent to the underlined numbers. The non-underlined numbers relate to items identified by lines associating the non-underlined numbers with the items. When a number is not underlined and has an associated arrow, the number without the underline is used to identify the general item to which the arrow points.

Detailed Description

The following detailed description illustrates embodiments of the invention and the manner in which the embodiments may be implemented. While some modes for carrying out the invention have been disclosed, those skilled in the art will recognize that other embodiments for carrying out or practicing the invention may also exist.

FIG. 1 is a flow chart of a block storage method 100 for reducing data in data deduplication, provided by an embodiment of the present invention. Method 100 is performed in a block storage device such as that depicted in fig. 2A. The method 100 includes steps 102 to 114.

The method 100 is for storing incoming I/O write requests in the form of blocks of data in a block memory. Block memory refers to hardware memory that supports reading and writing data in fixed-size blocks, sectors, or clusters. For example, a block memory is described in detail in FIG. 2A. The incoming I/O write request (input/output write request) refers to a block write command. The block memory stores incoming I/O write requests in the form of blocks of data. Typically, a data block corresponds to a particular number of bytes occupied in physical storage space on a block memory (of a disk), e.g., a 4 kilobyte (kB) data block, an 8 kilobyte data block, etc. A block of data may also be referred to as a logical block. In one example, a data block may be the smallest unit of data (i.e., a defined amount of data) used by a database. For example, incoming I/O write requests may be stored in block memory in the form of data blocks of defined (or fixed) size (e.g., 8 kB). The block storage device is to receive an incoming I/O write request and to create a chunk of data in a plurality of data chunks stored in a block memory (e.g., block storage device). In one example, the incoming I/O write request may be backup data received from a primary storage system (e.g., a host server) during a backup. In a block memory, the writing is a block writing with an offset and a length of data. When a user requests desired data from the block memory, the block memory device may retrieve and reorganize the data in the plurality of data blocks by an offset and present the requested data to the user. For example, 8 data blocks starting from offset 8100 are retrieved.

The block memory comprises previously stored data, a hash value of which has been calculated according to a content-based variable chunking method, and a mapping between the hash value and a pointer to a logical address of a corresponding pre-stored chunk of previously stored data. The mapping may be a complex data structure that supports locating (e.g., pre-stored partitions) locations based on hash-based lookups (e.g., in practice, the fact that large blocks with many variable partitions that are close to each other are received is why efficient data structures are allowed). Previously stored data may refer to backup data that was previously backed up (i.e., stored) in block memory (e.g., from a primary storage system). Upon receipt, the data (i.e., previously stored data) is split into smaller sized portions (referred to as chunks) using a variable chunk method based on content. The content-based variable chunking approach splits data into variable length chunks. For example, using a variable chunk-based approach, data may be split into three chunks of 7 kilobytes, 19 kilobytes, and 28 kilobytes. Typically, in practice, the tiles are byte aligned, rather than block aligned. Unlike fixed length chunking methods, content-based variable chunking methods are more resistant to byte shifting, thus improving the reliability of determining repeated chunking between previously stored data and incoming I/O write requests. Such data chunks are also stored in the chunk store in the form of defined-size data chunks (e.g., fixed-size 8kB data chunks).

In one example, the hash value is generated by the chunk store device for each chunk of previously stored data using a hash algorithm. The hash value refers to a fixed-size value of original data of an arbitrary size generated using a hash function (i.e., a hash algorithm). Examples of hash algorithms include, but are not limited to, SHA-2, MD5. Advantageously, the hash value supports identifying corresponding chunks of previously stored data in the chunk store, thereby preventing further duplication of data of the data chunk in the chunk store. The block storage device assigns a respective logical address (or virtual address) to each pre-stored partition of previously stored data. The logical address serves as a reference to the physical address of the block in the block memory of the access block storage device. In addition, pointers are assigned to respective location addresses of each block of previously stored data. Each pointer supports a physical location and verification of each chunk (distributed over one or more data chunks) of previously stored data on the chunk store. Further, in one example, the mapping refers to a data structure generated by the block storage device between a hash value of a logical address assigned to each partition in the block memory and a pointer. The mapping facilitates configuring access to each partition in the block memory and provides efficient deduplication for future incoming I/O write requests. It should be appreciated that the mapping may not be a simple hash table, but in practice may be a complex data structure that supports locating (e.g., of pre-stored chunks) based on hash lookups (e.g., in practice, the fact that large chunks with many variable chunks close to each other are received is why efficient data structures are allowed).

According to one embodiment, the expected size of the I/O write request is determined based on statistics of the size of I/O write requests stored in the system, in combination with the address of the I/O write request in the block memory. The block storage device is used to obtain statistics of the size of the I/O write requests in an address retrieval system (e.g., block memory) in the block memory in conjunction with the I/O write requests. In particular, the block storage device gathers statistics of I/O write request sizes and relative logical block addresses in the block memory. This indicates that one or more I/O write requests expect a larger storage area (or portion) and that I/O write requests expect a relatively smaller storage area. Thereafter, the block memory is divided into at least a first portion in which the size of the I/O write request is expected to be above the threshold and a second portion in which the size of the I/O write request is expected to be below the threshold. In this case, the method 100 comprises a step of checking the size of the incoming I/O write request, steps 102 to 114 of the method 100 being performed if the size is above a threshold, and if the size is below the threshold, fixed size deduplication of the first data block being performed. The block memory is divided into fixed-size portions (i.e., memory areas). For example, the block memory may be divided into a first portion and a second portion. For example, the first portion and the second portion may each be 100MB. In addition, a threshold is also determined (or set) to check whether the incoming I/O write request is large or small. For example, in the case of setting a threshold defining a size (e.g., 256 KB), incoming I/O write requests that are greater than 256KB in size may be considered large I/O write requests. Similarly, incoming I/O write requests that are less than the threshold may be considered relatively smaller I/O write requests. For incoming I/O write requests that reach the portion mapped to the second portion (i.e., when the size of the incoming I/O write request is below a threshold), the incoming I/O write request is partitioned into fixed-size aligned partitions (e.g., 8 KB) and fixed-size deduplication is applied by the block storage device. Since the size of the incoming I/O write request to the second portion is below the threshold, the write amplification is minimal or negligible even with the fixed-size deduplication method applied. Furthermore, for these portions (or memory regions), the block storage device applies a differential compression scheme for further compression and data compaction. In fig. 2A, a fixed size deduplication method is further described in one example. For incoming I/O write requests that reach the portion mapped as the first portion (i.e., when the size of the incoming I/O write request is above a threshold), various data reduction operations are applied as described in steps 102-114 of method 100. Various data reduction operations correspond to variable size deduplication.

In step 102, the method 100 includes dividing an incoming I/O write request into a first incoming partition and a second incoming partition using a content-based variable partitioning method. Each incoming partition may be of any size. The incoming I/O write request is divided into a plurality of variable length (i.e., size) partitions, such as a first incoming partition and a second incoming partition, using a content-based variable partitioning algorithm. In other words, the sizes of the first and second incoming partitions are different from each other. The block sizes of the first incoming block and the second incoming block are larger than a given data block having a fixed size. For example, the size of each fixed-size data block may be 8 kilobytes, while the block sizes of the first and second incoming blocks are arbitrary. The content-based variable chunking approach may process incoming I/O write requests before they are written to the chunk store. Thus, the content-based variable chunking approach reduces write amplification and computation in the chunk store. After performing the variable content-based partitioning, each incoming partition (e.g., the first incoming partition and the second incoming partition) has an amount of data that may be equal to one or more incomplete data blocks (e.g., less than 8 KB) and one or more full fixed-size incoming data blocks (i.e., the sizes of the fixed-size data blocks are perfectly aligned and equal, e.g., 8 KB). It should be appreciated that there may be far more than two partitions in a single I/O write request, and that the first incoming partition and the second incoming partition are for explanation purposes.

According to one embodiment, the content-based variable chunking method is a Rabin hash method. In other words, some known partitioning algorithms (e.g., rabin hashing) are used to partition (partition) incoming I/O write requests into variable-size blocks whose average partition size is significantly larger than a fixed-size data block, e.g., larger than an 8KB data block.

In step 104, the method 100 further includes calculating a first hash value of the first incoming chunk. In practice, hashes of all incoming chunks may be calculated, wherein the search for the first hash position may also depend on other hashes. The first hash value of the first incoming chunk is calculated by the chunk store device using a hash function. The first hash value refers to a fixed size value representing the original data of the first incoming chunk. Examples of hash functions include, but are not limited to, SHA-1, SHA-2, or MD5. Advantageously, the first hash value supports identifying a first incoming chunk that supports comparing the first hash value of the first incoming chunk with a hash value of previously stored data to prevent duplication of the first incoming chunk in the chunk store.

In step 106, the method 100 further includes comparing the first hash value to the hash value in the map (i.e., the data structure). The first hash value of the first incoming chunk is searched in a data structure, called a map, and it is checked whether the first hash value can be found in the data structure of the chunk store. In the event a match is found (i.e., the first hash value is found in the data structure), control moves to step 108. However, in the event that no match is found, control moves to step 112.

In step 108, the method 100 further includes identifying a matching pre-stored chunk based on the first hash value. In case a match with the first hash value is found in the data structure, this means that the data corresponding to the first incoming chunk has been stored in the chunk memory. Thus, pre-stored chunks are identified in the chunk store to prevent storing the first incoming chunk, thereby reducing duplication of data in the chunk store. In another example, the chunk store device identifies the pre-stored chunk by comparing the first hash value with a previously calculated hash value of previously stored data stored in the map.

In step 110, the method 100 further includes storing at least one pointer to a pre-stored first data block of the pre-stored partitioned data that has at least partially the same data as a first incoming data block of the first incoming partitioned data in the block memory, instead of storing the first incoming data block in the block memory. In one example, a first incoming partition may include data equal to one incomplete data block (e.g., 4 KB) that is not fully aligned to a given block (e.g., 8 KB), and two full fixed-size incoming data blocks that are fully aligned to consecutive blocks (each 8 KB). In this case, if there is a first hash value of a first incoming chunk in the current map, then each of the two full fixed-size incoming data chunks that are fully included in the alignment are stored by the chunk store device as corresponding pointer chunks. The pointer block does not store actual data, but only pointers to one or more consecutive blocks in a first portion of the block memory where data exists (e.g., pointers to two data blocks of the pre-store in the pre-store block data). Fig. 3D depicts one example. On the other hand, if the pre-stored first data block has the same data as the first incoming data block portion (i.e., an incomplete data block that is not fully aligned or included in a given data block), then the block storage device stores at least one pointer to the pre-stored first data block in the pre-stored partitioned data in the block memory, rather than storing the first incoming data block (i.e., a new incoming data portion) in the block memory. The pointer may locate the pre-stored first data block on the block memory. Thus, the method 100 prevents data duplication without storing the first incoming data block in block memory in advance, which reduces write amplification and significantly shortens computation time. Furthermore, in some cases, the block storage device may store another pointer to an adjacent block having the same data as the rest of the first incoming data block (if the pre-stored first data block has the same data as the first incoming data block portion).

According to one embodiment, if the pre-stored first data block has exactly the same data as the first incoming data block, a pointer to the pre-stored first block is stored instead of the first incoming data block. In one case, if the data in the first incoming data block exactly matches the data in the pre-stored first data block, the block storage device need only allocate and store a pointer to the pre-stored first block to locate the address of the pre-stored first block in the block memory.

According to one embodiment, if the pre-stored first data block has the same data as the first incoming data block portion, a pointer to the pre-stored first data block and a pointer to an adjacent block having the same data as the remaining portion of the first incoming data block are stored instead of the first incoming data block. In another case, the data in the first incoming data block may partially match the data in the pre-stored first data block and partially match a data block adjacent to the pre-stored first data block. In other words, only the data portion (not all) of the first incoming data block is identical to the data in the pre-stored first data block, while the remaining data portion of the first incoming data block may be identical to the pre-stored second data block adjacent to the pre-stored first data block. The block storage device then allocates two pointers for locating the pre-stored first block on the block memory. One pointer is assigned to a pre-stored first data block and the other pointer is assigned to a data block adjacent to the pre-stored first data block (i.e., pre-stored second data block) to physically locate the pre-stored first block and the pre-stored second block on the block memory without storing the first incoming data block. In this case, for example, if the pre-stored first data block has the same data as the first incoming data block portion (i.e., incomplete data blocks that are not fully aligned or included in a given data block), the block storage device stores at least one pointer to the pre-stored first data block in the pre-stored partitioned data and another pointer to an adjacent block.

In step 112, the method 100 further comprises: the first incoming chunk is stored in the chunk memory, and the first hash value in the map (i.e., the data structure) is stored in the chunk memory along with a pointer to the logical address of the first incoming chunk. If there is no match in the data structure to the first hash value, it indicates that there is no first incoming chunk in the chunk store. Thus, the first incoming partition may be stored in the block memory. Further, the first hash value is generated by the block storage device for the first incoming chunk for future identification of the first incoming chunk in the block memory, thereby preventing duplication of the first data chunk in the block memory. In addition, the block storage device assigns a logical address (or virtual address) to the first incoming block, which serves as a reference to access the physical address of the block in the block memory. In addition, the block storage device allocates pointers to logical addresses of the first incoming partition. The pointer may locate and verify the first incoming chunk on the chunk store. In one example, the data structure establishes a correspondence between the hash value, the first incoming chunk, and a pointer to a logical address allocated to the first incoming chunk in the chunk store. The mapping provides access to the first incoming chunk in the chunk store using the corresponding hash value and eliminates unnecessary data-to-data comparisons in future deduplication operations, thereby reducing complexity.

According to one embodiment, the method 100 further comprises the steps of: if the first incoming partition includes a first incomplete data block in addition to one or more complete fixed-size incoming data blocks, a remaining portion of the first incomplete data block is stored in an adjacent incoming partition of the first incoming partition, a second pre-stored data block in the first pre-stored data partition is identified, the second pre-stored data block having the same data as the first portion of the first incomplete data block. The size of the first incomplete data block may be smaller than a given data block of a fixed size. If the remaining portion of the first incomplete data block (e.g., the remaining 4 kilobytes of the first incomplete data block) is found to be stored in a neighboring incoming partition (e.g., a second incoming partition or block 2), a second pre-stored data block of the first pre-stored data blocks is identified. This is done to determine if the first incomplete block of data has been stored in the block memory and to prevent duplication of data of the first incoming block.

The method 100 also includes identifying an adjacent incoming chunk of the first incoming chunk and obtaining a second hash value of the adjacent incoming chunk. The block storage device identifies an adjacent incoming partition of a first incoming block having remaining data of a first incomplete data block. Identifying adjacent incoming chunks prevents duplication of data in the chunk store. The second hash value of the adjacent incoming chunk is obtained to compare the second hash value of the adjacent incoming chunk with the hash value of the previously stored data to prevent duplication of the adjacent incoming chunk in the chunk store.

The method 100 further comprises: if there is a match in the block memory with the second hash value, a second pre-stored data block corresponding to the second hash value is identified, a third pre-stored data block in the second pre-stored data block having the same data as the rest of the first incomplete data block is identified, and a pointer to the second pre-stored data block and a pointer to the third pre-stored data block are stored instead of storing the first incomplete data block. If the second hash value is found in the data structure of the block memory, the second hash value is also searched and checked to determine if there is already remaining data of the first incomplete data block in the block memory. In case a match with the second hash value is found, this means that data corresponding to the neighboring incoming partition has been stored in the block memory. The second pre-stored partition includes remaining data of the first incomplete data block stored in the adjacent incoming partition.

In one example, the first incomplete data block and the remaining data of the first incomplete data block are not stored in the block memory, since there already exists a data portion corresponding to the first incomplete data block and the remaining data of the first incomplete data block in the previously stored data in the block memory. Instead, the block storage device assigns a pointer to a second pre-stored data block of the first pre-stored data block, the second pre-stored data block including data corresponding to the data of the first incomplete data block. Furthermore, the block memory allocates another pointer to a third pre-stored data block of the second pre-stored data block, the third pre-stored data block comprising data corresponding to the remaining data of the first incomplete data block stored in the adjacent incoming block. The pointer eliminates the need to store actual data and significantly reduces write amplification and associated computation.

If there is no match with the second hash value in the block memory, a pointer to the second pre-stored data block is stored with the remainder of the first incomplete data block, instead of storing the first incomplete data block. The absence of a match in the block memory with the second hash value indicates that the remaining data of the first incomplete data block has not been previously stored in the block memory. In this case, the block memory allocates a pointer to the second pre-stored data block to locate the second pre-stored data block in the block memory. Furthermore, the remaining data of the first incomplete data block is stored in the block memory. In one example, the remaining data of the first incomplete data block may be stored by the block storage device in more than one fixed-size data block. In one example, optionally, the remaining data of the first incomplete data block may be compressed as a difference from the start (beginning) of the current block.

Thus, the method 100 is capable of performing a fixed size deduplication method for incoming I/O write requests when the size of the I/O write requests is within a predetermined threshold, and is capable of performing a variable size deduplication method when the I/O write requests are greater than the predetermined threshold. Incoming I/O write requests within a predetermined threshold do not involve adverse write amplification and computation, and therefore, fixed-size deduplication methods may be implemented with performance impact. However, using a variable size deduplication method on large incoming I/O write requests reduces the amount of write amplification and computation performed on the incoming I/O write requests to prevent duplication. The variable size deduplication method further increases the deduplication rate, thereby reducing data duplication of the block memory.

FIG. 2A is a block diagram of various exemplary components of a block storage system provided by an embodiment of the present invention. Fig. 2A is described in conjunction with fig. 1A and 1B. Referring to FIG. 2A, a block storage system 200A is shown. The block storage system 200A includes a block storage device 202 and a control system 214. The block storage device 202 includes a disk 204 having a block memory 206. The block storage device 202 also includes logic circuitry 208 and a first network interface 210. The block memory 206 includes a first portion 212A and a second portion 212B. The control system 214 includes logic circuitry 214A, a second network interface 214B, and memory 214C.

The block storage device 202 refers to a secondary storage device for backup. The block storage device 202 may comprise suitable logic, circuitry, interfaces, and/or code that may be operable to process incoming I/O write requests. In one implementation, the block storage device 202 is part of a storage area network (storage area network, SAN) or cloud-based storage environment. Other examples of block storage devices 202 include, but are not limited to, secondary storage servers, block memory-based computing devices in a computer cluster (e.g., a massively parallel computer cluster), block memory-based electronic devices, or supercomputers. The block storage device 202 is communicatively coupled to a control system 214.

The block storage device 202 includes a disk 204 for storing I/O write requests. Disk 204 herein refers to the hardware or physical memory of block storage device 202. Disk 204 is used to store I/O write requests and instructions executable by logic circuitry 208. In one example, disk 204 includes block memory 206 that defines the storage space for data blocks. The disk 204 may also include other known components for reading and writing data, such as magnetic heads, etc. (not shown for simplicity). Examples of implementations of disk 204 may include, but are not limited to, hard Disk Drives (HDDs), solid-state drives (SSDs), backup storage disks, block storage units, or other computer storage media. Disk 204 may store an operating system and/or other program products (including one or more operating algorithms) to operate block storage device 202. The block memory 206 is typically used for high performance applications requiring consistent input/output performance and low latency, such as storage-area network (SAN) environments, cloud-based memory, virtual machine file systems, and the like. Data is stored in blocks of fixed size in block memory 206, referred to as blocks or blocks of data.

Logic circuitry 208 is used to execute instructions stored in disk 204 to control the storage of incoming I/O write requests in the form of blocks of data in block storage device 202. Examples of logic circuitry 208 include, but are not limited to, microprocessors, microcontrollers, complex instruction set computing (complex instruction set computing, CISC) microprocessors, reduced instruction set (reduced instruction set, RISC) microprocessors, very long instruction word (very long instruction word, VLIW) microprocessors, central processing units (central processing unit, CPUs), state machines, data processing units, and other processors or control circuits. Further, logic 208 may refer to one or more separate processors, processing devices, processing units that are part of a machine (e.g., block storage device 202).

The first network interface 210 is an arrangement of interconnected programmable and/or non-programmable components for facilitating data transfer between one or more electronic devices. The first network interface 210 may support a communication protocol for an internet small computer system interface (Internet small computer systems interface, iSCSI), fibre channel, or fibre channel over ethernet (fibre channel over Ethernet, FCoE) protocol. The first network interface 210 may also support communication protocols for all or part of a peer-to-peer network, a hybrid peer-to-peer network, a local area network (local area network, LAN), a metropolitan area network (metropolitan area network, MAN), a wide area network (wide area network, WAN), a public network such as a global computer network known as the internet, a private network, or any other communication system or systems at one or more locations. In addition, the first network interface 210 supports wired or wireless communications that may be performed by any number of known protocols including, but not limited to, internet protocol (Internet protocol, IP), wireless access protocol (wireless access protocol, WAP), frame relay or asynchronous transfer mode (asynchronous transfer mode, ATM). In addition, the first network interface 210 may also employ and support any other suitable protocol using voice, video, data, or a combination thereof.

The first portion 212A and the second portion 212B are portions logically divided in the block memory 206. The first portion 212A is for I/O write requests that are expected to be above a threshold in size. For example, in one example, the first portion 212A is for I/O write requests that exceed 256 kilobytes in size. The second portion 212B is configured for I/O write requests that are expected to be below a threshold size. For example, in one example, the second portion 212B is used for I/O write requests that are less than 256 kilobytes in size.

The control system 214 may comprise suitable logic, circuitry, and/or interfaces that may be operable to control writing data to the block storage device 202. In one implementation, control system 214 is used to execute control instructions stored in memory 214C to control the writing of incoming I/O write requests. Control system 214 is part of block storage system 200A. For example, control system 214 may be a backup server that controls the backup of data from a primary storage system (e.g., a host server) to a secondary storage system (e.g., block storage device 202). In another example, control system 214 may alternatively be a primary storage system (e.g., a host server) that controls writing in a secondary storage system (e.g., block storage device 202). In yet another example, control system 214 may be a control device communicatively coupled to block storage device 202, for example, via a communication medium (e.g., iSCSI, fibre channel, or FCoE) to control writing data to block storage device 202.

The control system 214 includes logic circuitry 214A, a second network interface 214B, and memory 214C. Logic 214A of control system 214 is configured to execute instructions stored in memory 214C to control writing of incoming I/O write requests to disk 204 of block storage device 202. Examples of logic circuitry 214A include, but are not limited to, microprocessors, microcontrollers, complex Instruction Set Computing (CISC) microprocessors, reduced Instruction Set (RISC) microprocessors, very Long Instruction Word (VLIW) microprocessors, central Processing Units (CPUs), state machines, data processing units, and other processors or control circuits. The implementation example of the second network interface 214B is the same as or similar to the implementation example of the first network interface 210 of the block storage device 202. Examples of implementations of the memory 214C of the control system 214 may include, but are not limited to, electrically Erasable Programmable Read Only Memory (EEPROM), random access memory (random access memory, RAM), read Only Memory (ROM), hard Disk Drive (HDD), flash memory, solid State Disk (SSD), and/or CPU cache memory.

In operation, the logic 208 of the block storage device 202 is operable to divide the block memory into at least a first portion 212A in which the size of the I/O write request is expected to be above a threshold and a second portion 212B in which the size of the I/O write request is expected to be below the threshold. In this case, the logic circuitry 208 is to examine the size of the incoming I/O write request, perform various data reduction operations (e.g., as described in steps 102-114 of the method 100 in FIG. 1) using variable size deduplication if the size is above a threshold, and perform fixed size deduplication of the first data block if the size is below the threshold. For incoming I/O write requests that reach the portion mapped to second portion 212B (i.e., when the size of the incoming I/O write request is below a threshold), logic circuitry 208 applies fixed size deduplication. For incoming I/O write requests that reach the portion mapped to the first portion 212A (i.e., when the size of the incoming I/O write request is above a threshold), the logic circuitry 208 applies variable size deduplication.

The logic 208 is also configured to divide the incoming I/O write request into a first incoming chunk and a second incoming chunk (in practice, there may be more chunks of any size) using a content-based variable chunk method. The incoming I/O write request is divided into a plurality of variable length (i.e., size) partitions, such as a first incoming partition and a second incoming partition, using a content-based variable partitioning algorithm. The logic 208 is also operable to calculate a first hash value of the first incoming chunk. The first hash value of the first incoming chunk is calculated using a hash function.

Logic circuitry 208 is also configured to compare the first hash value to the hash value in the map (i.e., data structure). The first hash value of the first incoming chunk is searched in the data structure and it is checked whether the first hash value can be found in the data structure (possibly with other hashes in the data structure while the first hash value is searched in the chunk store).

The logic 208 is also operable to identify a matching pre-stored chunk based on the first hash value. In case a match with the first hash value is found, this means that the data corresponding to the first incoming chunk has been stored in the chunk memory.

The logic circuitry 208 is also to store at least one pointer to a pre-stored first data block of the pre-stored partitioned data that has at least partially the same data as a first incoming data block of the first incoming partitioned data in the block memory, instead of storing the first incoming data block in the block memory. The pointer may locate the pre-stored first data block on the block memory. If the pre-stored first data block has exactly the same data as the first incoming data block, the logic 208 is configured to store a pointer to the pre-stored first block instead of storing the first incoming data block. In one case, the logic 208 only assigns a pointer if the data in the first incoming data block exactly matches the data in the pre-stored first data block. If the pre-stored first data block has the same data as the first incoming data block portion, the logic circuitry 208 is operable to store one pointer to the pre-stored first data block and one pointer to an adjacent block having the same data as the remainder of the first incoming data block, instead of storing the first incoming data block. In another case, the data in the first incoming data block may partially match the data in the pre-stored first data block and partially match a data block adjacent to the pre-stored first data block. Logic circuitry 208 then allocates two pointers to physically locate the pre-stored first block on the block memory.

The logic circuitry 208 is also to store the first incoming chunk in the chunk memory and store the first hash value in the map in the chunk memory along with a pointer to the logical address of the first incoming chunk. If there is no match in the data structure to the first hash value, it indicates that there is no first incoming chunk in the chunk store. Thus, the first incoming partition may be stored in the block memory.

Further, if the first incoming partition includes an amount of data equal to the first incomplete data block in addition to the one or more full fixed-size incoming data blocks, wherein a remaining portion of the first incomplete data block is stored in an adjacent incoming partition of the first incoming partition, the logic circuit 208 is further configured to identify a second pre-stored data block in the first pre-stored data partition. The first pre-stored data chunk has the same data as the first portion of the first incomplete data chunk. Logic circuitry 208 is also to identify an adjacent incoming chunk of the first incoming chunk and to obtain a second hash value of the adjacent incoming chunk. If there is a match in the block memory to the second hash value, the logic 208 is further operable to: a third pre-stored data chunk corresponding to the second hash value is identified, a second pre-stored data chunk in the second pre-stored data chunk having the same data as the remainder of the first incomplete data chunk is identified, and a pointer to the second pre-stored data chunk and a pointer to the third pre-stored data chunk are stored instead of the first incomplete data chunk. If there is no match in the block memory to the second hash value, the logic circuitry 208 is further operable to store a pointer to the second pre-stored data block with the remainder of the first incomplete data block instead of storing the first incomplete data block.

A control system 214 for controlling writing of data to the block storage device 202, the control system 214 being arranged to perform the method 100 (described in fig. 1) for each incoming I/O write request. In operation, logic 214A of control system 214 is used to control writing data to block storage device 202. In one aspect, the logic 214A of the control system 214 performs various operations such as those described in fig. 1 and 2A (e.g., the operations of the logic 208 of the block storage device 202). In other words, the various embodiments, operations, and variations disclosed above apply to control system 214 with modifications.

A computer program product for use in a control system 214, the control system 214 for controlling writing data to a block storage system 200A, the computer program product comprising computer readable instructions that, when executed in the control system 214, cause the control system 214 to perform the method 100. The computer program product may include, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. In yet another aspect, the present invention provides a computer program adapted to be executed by a device (e.g., block storage device 202 or control system 214) to perform method 100. In another aspect, the present invention provides a computer program product comprising a non-transitory computer readable storage medium having computer readable instructions executable by a processor to perform the method 100. Examples of implementations of the non-transitory computer-readable storage medium include, but are not limited to, electrically erasable programmable read-only memory (EEPROM), random Access Memory (RAM), read-only memory (ROM), hard Disk Drives (HDD), flash memory, secure Digital (SD) cards, solid State Disks (SSD), computer-readable storage media, and/or CPU cache memory.

Fig. 2B is an exemplary illustration of exemplary variable length deduplication provided by embodiments of the present invention. Referring to FIG. 2B, two exemplary I/O write requests are shown, for example a first I/O write request 218 and a second I/O write request 220. Both the first I/O write request 218 and the second I/O write request 220 experience variable partitioning based on content (as shown by representative block 216). For example, the first I/O write request 218 is divided into five blocks of data of variable size based on its content (denoted as A, C, D, B and D in the first I/O write request 218 and h in the representative block 216 _A 、h _C 、h _D 、h _B H _D ). Similarly, the second I/O write request 220 is divided into three blocks of data of variable size based on its content (denoted D, C and B in the first I/O write request 218 and denoted h _D 、h _C And h _B ). Note that blocks D, C and B in second I/O write request 220 are identical to data blocks D, C and B in first I/O write request 218.

According to one embodiment, both the first I/O write request 218 and the second I/O write request 220 include duplicate data blocks B, C and D. Further, the first incoming object includes two duplicate copies of the data chunk D. When the first I/O write request 218 and the second I/O write request 220 are to be backed up in secondary memory (i.e., block memory 206), deduplication is applied in the event that duplicate blocks of data for the first I/O write request 218 and the second I/O write request 220 are not stored in the block memory 206. Thus, data chunks A, B, C and D are only retained (backed up) in block memory 206, and duplicate data chunks are deleted. The above examples describe typical examples using variable length deduplication. However, it should be appreciated that the variable deduplication (or variable length deduplication) applied (or performed) by the chunk store 202 is different in the present invention, as all data at the end is stored in fixed size chunks, and variable size deduplication (also simply referred to as deduplication) only supports understanding the location of data in other chunks.

Fig. 3A, 3B, 3C, 3D, and 3E are exemplary illustrations of various operations provided by embodiments of the present invention for data deduplication and data reduction. Fig. 3A-3E are described in connection with the elements of fig. 1, 2A and 2B. Referring to FIG. 3A, a fixed-size blocking arrangement 300A of an incoming I/O write request 302 using a fixed-size deduplication technique is shown. When the incoming I/O write request 302 is below a defined threshold, a fixed-size deduplication technique is performed. For example, where a threshold of defined size (e.g., 256 KB) is set, incoming I/O write requests 302 are partitioned into fixed-size aligned partitions (e.g., 8 KB) and fixed-size deduplication is applied by the chunk store device 202. Further, a hash value is generated for each data chunk of the incoming I/O write request 302 and compared to the hash value of the previously stored data in the chunk store device 202. If a match is found between the hash value of the previously stored data in the block storage device 202 and the hash value generated for the data block of the incoming I/O write request 302, this means that there is already a data block of the incoming I/O write request in the block storage device 202. Thus, pointers to pre-stored data blocks are stored, rather than storing the data again. However, if no match is found between the hash value of the previously stored data in the chunk store device 202 and the hash value generated for each chunk of data of the incoming I/O write request 302, this means that there is no data of the incoming I/O write request 302 in the chunk store device 202. Thus, incoming I/O write requests 302 are stored in the block storage device 202. In addition, pointers are allocated and stored as metadata to the data blocks of the incoming I/O write request 302 to map the data blocks on the block memory 206.

Referring to FIG. 3B, a variable-size chunking arrangement 300B of incoming I/O write requests 304 partitioned using a variable chunking technique based on content is shown. When the incoming I/O write request 304 is above a threshold, a variable chunking technique based on content is performed. For example, where a threshold of defined size (e.g., 256 KB) is set, incoming I/O write requests 304 are partitioned into variable size blocks, and variable content-based partitioning is applied to incoming I/O write requests 304. In content-based variable partitioning, incoming I/O write requests 304 are divided into variable-size data partitions. For example, the incoming I/O write request 304 is divided into a first incoming partition 304A, a second incoming partition 304B, and a third incoming partition 304C. The sizes of the first incoming tile 304A, the second incoming tile 304B, and the third incoming tile 304C are different from one another. Further, the chunk sizes of the first, second, and third

incoming chunks

304A, 304B, 304C are larger than a given data chunk having a fixed size. For example, if the size of a fixed-size data block is 8 kilobytes, the determined block sizes of the first incoming block 304A, the second incoming block 304B, and the third incoming block 304C are different and greater than 8 kilobytes. After performing the content-based variable partitioning, each of the incoming partitions, e.g., the first incoming partition 304A, the second incoming partition 304B, and the third incoming partition 304C, includes an amount of data equal to one or two incomplete data blocks (e.g., less than 8 KB) and one or more full fixed-size incoming data blocks (i.e., the sizes of the fixed-size data blocks are perfectly aligned and equal, e.g., 8 KB).

Referring to fig. 3C, an exemplary alignment of a first incoming partition 304A with respect to a set of fixed-size data blocks 306 (e.g., in block memory) is shown. In this example, the first incoming partition 304A includes a first incomplete data block 308A, three full fixed-size incoming data blocks 308B, 308C, and 308D, and a second incomplete data block 308E. The first incomplete data block 308A and the second incomplete data block 308E are smaller in size than a given data block of a fixed size. For example, the sizes of the three full fixed-size incoming data blocks 308B, 308C, and 308D may be 8KB, while the sizes of the first incomplete data block 306B and the second incomplete data block 306C are less than 8KB, such as 2142 bytes and 3968 bytes, respectively (it is understood that sizes in bytes are considered, i.e., sizes are byte-aligned, rather than kilobyte-aligned). The chunk store 202 is used to calculate a first hash value of a first incoming chunk 304A. Advantageously, the first hash value supports identifying a first incoming chunk 304A, the first incoming chunk 304A supporting searching the data structure for the first hash value of the first incoming chunk 304A, and if found, may prevent duplication of the first incoming chunk 304A in the chunk store 206.

Referring to FIG. 3D, an exemplary illustration of a case when there is a match in the block memory 206 with the first hash value of the first incoming partition 304A is shown. Referring to FIG. 3D, a first pre-stored data chunk 310 and a first incoming chunk 304A are shown. In this case, the first pre-stored data block 310 is pre-stored in the block memory 206 in the form of four data blocks of defined size and a partial block, e.g., pre-storing the first data block 310A, pre-storing the second data block 310B, pre-storing the third data block 310C, and pre-storing the fourth data block 310D (followed by a terminal partial block, which is a partial block 310E, e.g., 1525 bytes in size).

In the case where the pre-stored first data block 310A has the same data as the portion of the incoming data block 308B of the first incoming partition 304A and the pre-stored second data block 310B includes the same data as the rest of the incoming data block 308B, the block storage device 202 may allocate two

pointers

312A and 312B to physically locate the pre-stored first data block 310A and the pre-stored second data block 310B on the block memory 206. The pointer 312A is assigned to the pre-stored first data block 310A and the other pointer 312B is assigned to the data block adjacent to the pre-stored first data block 310A (i.e., the pre-stored second data block 310B) without the need to store the incoming data block 308B.

Alternatively, where there is a first hash value of the first incoming partition 304A in the current mapping (i.e., data structure), each of the two full fixed-size incoming data blocks (e.g., incoming data blocks 308B, 308C, and 308D) that are fully included in the alignment are stored by the block memory 206 as corresponding pointer blocks. The pointer block does not store actual data and stores only pointers to one or more consecutive blocks of data present in the block memory 206 (e.g., pointers to two pre-stored data blocks in the pre-stored partitioned data (e.g., the first pre-stored data partition 310).

Referring to fig. 3E, an exemplary illustration of a situation when the first incoming partition 304A includes a first incomplete data partition 308A, the first incomplete data partition 308A is not fully aligned (including) with respect to the data blocks of a set of fixed-size data blocks 306. In one example, the remaining portion of the first incomplete data block 308A may be stored in an adjacent incoming partition 316 of the first incoming partition 304A. Referring to FIG. 3E, a first pre-stored data partition 310 and a second pre-stored data partition 314 are shown. The second pre-stored data partition 314 includes pre-stored data blocks, such as a pre-stored first data block 314A, a pre-stored second data block 314B, a pre-stored third data block 314C, a pre-stored fourth data block 314D, and a pre-stored fifth pre-stored data block 314E. Also shown is an adjacent incoming partition 316 of the first incoming partition 308. The adjacent incoming partitions 316 include incoming data blocks, such as a first incoming data block 316A, a second incoming data block 316B, and a third incoming data block 316C.

According to one embodiment, the first incomplete data block 308A and the remaining data of the first incomplete data block are not stored in the block memory 206 because there already exists a data portion in the previously stored data in the block memory 206 that corresponds to the first incomplete data block 308A and the remaining data of the first incomplete data block 308A (i.e., data corresponding to the third incoming data block 316C of the adjacent incoming sub-block 316). Instead, the block storage device 202 allocates a pointer 318A to a pre-stored first data block 310A of the first pre-stored data partitions 310, the pre-stored first data block 310A including data corresponding to the data of the first incomplete data block 308A. In addition, the block storage device 202 assigns another pointer 318B to a fifth pre-stored data block 314E of the second pre-stored data block 314, the fifth pre-stored data block 314E including data corresponding to the remaining data of the first incomplete data block 308A stored in the adjacent incoming block 316.

Pointers

318A and 318B eliminate the need to store actual data and significantly reduce write amplification and associated computation.

Fig. 4A, 4B, and 4C collectively refer to a flowchart of a method for reducing data in data deduplication of incoming I/O write requests in a block storage device, provided by another embodiment of the present invention. Fig. 4A, 4B, and 4C are described in connection with the elements of fig. 1, 2A, 2B, and 3A-3E. Method 400 is performed by block storage device 202 (of fig. 2A). Method 400 includes steps 402 through 432.

In step 402, the method 400 includes checking the size of the incoming I/O write request 302 and dividing the block memory 206 into at least a first portion 212A in which the size of the I/O write request is expected to be above a threshold and a second portion 212B in which the size of the I/O write request is expected to be below the threshold. In other words, variable blocking is performed based on whether the region (e.g., as in the first portion 212A) has a large IO for more time (i.e., often large I/O write requests), or based on whether the particular IO is large.

In step 404, the method 400 further includes determining whether the size of the incoming I/O write request 302 is above or below a threshold. In the event that the size of the incoming I/O write request (e.g., incoming I/O write request 302) is not above the threshold, then control moves to step 406. However, in the event that the size of the incoming I/O write request (e.g., incoming I/O write request 304) is not above the threshold, control moves to step 410. In step 406, the method 400 further includes performing fixed-size deduplication on the first data block.

In step 408, the method 400 further includes dividing the incoming I/O write request 302 into a first incoming partition 304A and a second incoming partition 304B using a variable content-based partitioning method. In step 410, the method 400 further includes calculating a first hash value of the first incoming partition 304A.

In step 412, the method 400 further includes determining whether the first incoming partition 304A includes a first incomplete data block 308A. In the event that the first incoming partition 304A does not include the first incomplete data block 308A, control moves to step 414. However, in the event that the first incoming partition 304A includes a first incomplete data block 308A, control moves to step 422. In step 414, the method 400 further includes comparing the first hash value to the hash value in the map. In the event that no match is found between the first hash value and the hash value in the map, control moves to step 416. However, in the event that a match is found between the first hash value and the hash value in the map, control moves to step 418.

In step 416, the method 400 further comprises: the first incoming chunk 304A is stored in the chunk memory 206 and the first hash value in the map is stored in the chunk memory 206 along with a pointer to the logical address of the first incoming chunk. In step 418, the method 400 further includes identifying a matching pre-stored chunk based on the first hash value.

In step 420, the method 400 further includes storing at least one pointer 312A to a pre-stored first data block 310A of the pre-stored partitioned data that has at least partially identical data to a first incoming data block of the first incoming partitioned data 304A in the block memory 206, instead of storing the first incoming data block in the block memory 206. In step 422, the method 400 further includes identifying a second pre-stored data block in the first pre-stored data partition 310, the second pre-stored data block having the same data as the first portion of the first incomplete data block 308A.

In step 424, the method 400 further includes identifying an adjacent incoming partition 316 of the first incoming partition 304A and obtaining a second hash value of the adjacent incoming partition 316. In step 426, the method 400 further includes determining whether there is a match in the block memory 206 with the second hash value. In the event that no match to the second hash value is found in the block memory 206, control moves to step 428. However, in the event that a match to the second hash value is found in the block memory 206, control moves to step 430.

In step 428, the method 400 further includes storing a pointer 318A to the second pre-stored data block with the remainder of the first incomplete data block 308A instead of storing the first incomplete data block 308A. In step 430, the method 400 further includes identifying a second pre-stored data partition 314 corresponding to the second hash value, identifying a second pre-stored data block of the second pre-stored data block 314 having the same data as the rest of the first incomplete data block 308A, and storing a pointer 318A to the second pre-stored data block and a pointer 318B to a third pre-stored data block instead of storing the first incomplete data block 308A. Alternatively, in some cases, each portion (e.g., a given portion of the first incomplete data block 308A) may include two pointers.

Modifications may be made to the embodiments of the invention described above without departing from the scope of the invention, as defined in the appended claims. Expressions such as "comprising," "combining," "having," "being/being" and the like, which are used to describe and claim the present invention, are intended to be interpreted in a non-exclusive manner, i.e. to allow for items, components or elements that are not explicitly described to exist as well. Reference to the singular is also to be construed to relate to the plural. The word "exemplary" is used herein to mean "serving as an example, instance, or illustration. Any embodiment described as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments and/or does not preclude the incorporation of features of other embodiments. The word "optionally" as used herein means "provided in some embodiments and not provided in other embodiments. It is appreciated that certain features of the invention, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the invention, which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable combination or as in any other described embodiment of the invention.

Claims

1. A block storage method (100) for storing incoming I/O write requests in the form of blocks of data in a block memory (206), the block memory (206) comprising previously stored data, a hash value of which has been calculated according to a content-based variable chunking method, the block memory (206) further comprising a mapping between the hash value and a pointer to a logical address of a corresponding pre-stored chunk of the previously stored data, the method (100) comprising the steps of:

-dividing the incoming I/O write request into a first incoming partition and a second incoming partition using the content-based variable partitioning method;

-calculating a first hash value of the first incoming chunk;

-comparing the first hash value with the hash values in the map, if there is a match

o identifying a matching pre-stored chunk based on the first hash value;

o storing at least one pointer to a pre-stored first data block of the pre-stored partitioned data having at least partially the same data as a first incoming data block of the first incoming partitioned data in the block memory (206), instead of storing the first incoming data block in the block memory (206);

-if there is no match, then

o storing the first incoming chunk in the chunk memory (206) and storing the first hash value in the map in the chunk memory (206) together with a pointer to a logical address of the first incoming chunk.

2. The method (100) according to claim 1, comprising the steps of: if the pre-stored first data block has exactly the same data as the first incoming data block, a pointer to the pre-stored first data block is stored instead of storing the first incoming data block.

3. The method (100) according to claim 1 or 2, comprising the steps of: if the pre-stored first data block has the same data as the first incoming data block portion, a pointer to the pre-stored first data block and a pointer to an adjacent block having the same data as the remaining portion of the first incoming data block are stored instead of storing the first incoming data block.

4. The method (100) according to any of the preceding claims, comprising the steps of: if the first incoming partition includes a first incomplete data block in addition to one or more complete fixed-size incoming data blocks, the remainder of the first incomplete data block being stored in an adjacent incoming partition of the first incoming partition

-identifying a second pre-stored data block of the first pre-stored data block, the second pre-stored data block having the same data as the first portion of the first incomplete data block;

-identifying an adjacent incoming partition of the first incoming partition and obtaining a second hash value of the adjacent incoming partition;

-if there is a match in the block memory (206) with the second hash value, identifying a second pre-stored data chunk corresponding to the second hash value, identifying a third pre-stored data chunk of the second pre-stored data chunk having the same data as the rest of the first incomplete data chunk, and storing a pointer to the second pre-stored data chunk and a pointer to the third pre-stored data chunk instead of storing the first incomplete data chunk;

-if there is no match with the second hash value in the block memory (206), storing a pointer to the second pre-stored data block with the rest of the first incomplete data block instead of storing the first incomplete data block.

5. The method (100) according to any one of the preceding claims, wherein the block memory (206) is divided into at least a first portion (212A) in which the size of the I/O write request is expected to be above a threshold value and a second portion (212B) in which the size of the I/O write request is expected to be below the threshold value, comprising the steps of: checking the size of the incoming I/O write request and performing the steps of any of the preceding claims only if the size is above the threshold, performing fixed-size deduplication on the first data block if the size is below the threshold.

6. The method (100) of claim 5, wherein the expected size of the I/O write request is determined based on statistics of the size of the I/O write request stored in the block memory (206) in combination with an address of the I/O write request in the block memory (206).

7. The method (100) according to any of the preceding claims, wherein the content-based variable chunking method is a rabin hash method.

8. A block storage device (202) comprising a disk (204) for storing I/O write requests, the block storage device (202) comprising logic circuitry (208) for performing the method (100) of any of the preceding claims for each incoming I/O write request.

9. A block storage system (200A) comprising a block storage device (202) according to claim 8 and a control system (214) comprising the logic circuit (214A).

10. A control system (214) for controlling writing of data to a block storage device (202), the control system (214) being arranged to perform the method (100) according to any of claims 1 to 7 for each incoming I/O write request.

11. A computer program product for use in a control system (214), the control system (214) for controlling writing of data to a block storage system (200A), the computer program product comprising computer readable instructions which, when executed in the control system (214), cause the control system (214) to perform the method (100) according to any one of claims 1 to 7.