WO2015100639A1 - De-duplication method, apparatus and system - Google Patents

De-duplication method, apparatus and system Download PDF

Info

Publication number
WO2015100639A1
WO2015100639A1 PCT/CN2013/091170 CN2013091170W WO2015100639A1 WO 2015100639 A1 WO2015100639 A1 WO 2015100639A1 CN 2013091170 W CN2013091170 W CN 2013091170W WO 2015100639 A1 WO2015100639 A1 WO 2015100639A1
Authority
WO
WIPO (PCT)
Prior art keywords
stripe
data
group
data block
stored
Prior art date
Application number
PCT/CN2013/091170
Other languages
French (fr)
Chinese (zh)
Inventor
薛迎春
邵长庚
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN201380002564.2A priority Critical patent/CN104205097B/en
Priority to PCT/CN2013/091170 priority patent/WO2015100639A1/en
Publication of WO2015100639A1 publication Critical patent/WO2015100639A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • G06F3/0641De-duplication techniques

Definitions

  • the present invention relates to the field of storage, and in particular to a deduplication technology. Background technique
  • de-duplicate is a technology that is often used. If you need to store multiple copies of the same data, only one of them is stored. Data duplicated with this data is no longer stored. That is to say, repeated data is deleted, so this technique is also called deduplication.
  • the file can be split into data blocks, with the data block as the basic unit of deduplication.
  • each data block can be fingerprinted, and the fingerprint and the content of the data block are strongly correlated.
  • the fingerprints of the two data blocks are the same, we can conclude that the contents of the two data blocks are the same.
  • deduplication technology also brings about a problem of reduced data security. If only one of the data is damaged due to a storage system failure, the security of the data may be greatly reduced or the data may be permanently lost. Summary of the invention
  • the invention can improve data security.
  • the present invention provides a data processing method for a controller, where the method includes: when there are multiple data blocks having the same fingerprint, querying an address of the data block according to the fingerprint, according to the The data block address is used to search for a stripe group occupied by the data block; verifying the independent stripped disk redundant array RAID stripe state of the stripe group of the data block, and saving the stripe consistent strip according to the check result
  • the data block in the group; the data block in the group of the stripe group that is consistent according to the check result including at least one of the following: if there is a stripe group with consistent strips, and there is a non-striped consistent
  • the stripe group in the stripe group keeps the data blocks in the stripe group consistent with the stripe, and deletes the data blocks in the remaining stripe group; if there is no stripe group that is consistent in stripe, and exists Downgraded
  • the stripe group is repaired by the RAID algorithm to repair the degraded stripe group into a stripe group with the same stripe group.
  • the present invention provides a data block processing apparatus, the device comprising: a fingerprint matching module, configured to compare fingerprints of data blocks in a storage device; and an address finding module, when there are multiple When the data block has the same fingerprint, the address of the plurality of data blocks is queried according to the fingerprint, and the stripe group occupied by the plurality of data blocks is searched according to the plurality of data block addresses; the consistency check module, An independent low-cost redundant array RAID stripe state of the stripe group for verifying the plurality of data blocks, and storing data blocks in a stripe-consistent stripe group according to a check result; The result is to save the data block in the stripe group with the same stripe, including at least one of the following: if there is a stripe group with the same stripe, and there is a strip group that is not stripe consistent, the stripe is retained Consistent data blocks in the strip group, deleting data blocks in the remaining strip groups; if there is no stripe group with consistent strips, and there
  • the present invention provides a data block processing method, which is used in a controller, where the method includes: querying a fingerprint database of a data block in a storage device, where the data block in the storage device and the data block to be stored have the same fingerprint
  • the independent low-cost redundant array RAID stripe state of the stripe group in which the stored data block is located is detected; and the data block storage is performed according to the detection, including: if the stripe state of the stripe group is If the stripe is consistent, the data block to be stored is not stored; if the stripe state of the stripe group is stripped down, then: storing the to-be-stored data block, deleting the stored data block; or, storing according to The RAID algorithm repairs the stored data block by the degraded stripe group; or, if the stripe state of the stripe group is inconsistent, storing: the data block to be stored, deleting the The data block is stored; or, if no data error occurs in the data striping unit, the
  • the present invention provides a data block processing apparatus, the device comprising: a query module 61, configured to query a fingerprint database of a data block in a storage device; and a consistency check module 62, configured to be a storage device
  • a query module 61 configured to query a fingerprint database of a data block in a storage device
  • a consistency check module 62 configured to be a storage device
  • detecting a RAID array state of the independent inexpensive disk of the striped group in which the stored data block is located storing the data block according to the detection, including If the stripe state of the stripe group is consistent, the data block to be stored is not stored; if the stripe state of the stripe group is stripped down, then: storing the to-be-stored data block Deleting the stored data block; or storing the stored data block by the degraded stripe group according to a RAID algorithm; or, if the stripe state of the stripe group is inconsistent, And storing the stored data block, or deleting the stored data block according to a
  • the solution of the invention can perform stability detection on the stripe group in which the data block is located when the data block is de-duplicated, and improve the stability of the strip group in the data block by re-storing or repairing when the stability is insufficient. Data security. DRAWINGS
  • FIG. 1 is an example of a topology diagram of an application scenario of the present invention
  • FIG. 2 is a flowchart of a data block processing method according to an embodiment of the present invention.
  • FIG. 3 is a flowchart of a data block processing method according to an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of an embodiment of a data block processing apparatus
  • FIG. 5 is a schematic diagram of an embodiment of a data block processing apparatus. detailed description
  • a storage system usually consists of a controller and a storage device.
  • the controller is equivalent to a computer. Including the processor and memory, you can manage the storage device and provide interfaces to the host and storage devices.
  • the storage device provides a physical storage space.
  • the storage device may be composed of, for example, a Solid State Disc (SSD) and a Serial Attached SCSI (SAS) disk.
  • SSD Solid State Disc
  • SAS Serial Attached SCSI
  • the controller and storage device can be physically separate devices, or the storage device can be integrated into the controller.
  • the storage device When the storage device is integrated in the controller, the data interaction between the controller and the storage device becomes a data interaction inside the controller, and the controller can also be referred to as a storage server.
  • the controller provides management for not providing data transfer, and the data exchange between the storage device and the host may be performed without the controller, and the calculation solution provided by the embodiment of the present invention may also be applied.
  • Deduplication technology can be divided into online (On-line) and offline (Off-line).
  • the online mode has higher space utilization for storage devices; offline data writes faster.
  • the controller receives a write request, and the write request carries a new data block.
  • the controller checks whether the same data block already exists in the storage device, and if not, stores the new data block to the storage device; if it exists, the new data block is no longer stored. And establish an index relationship between the LU that owns this new data block and the existing data block. This index relationship can be, for example, a pointer, and when the new data block needs to be read later, the existing data block can be read by the pointer.
  • the data block is first stored in the storage device regardless of whether the storage device already stores the same data block.
  • the deduplication operation is then performed periodically, or the deduplication operation is performed when the storage device is idle.
  • only one copy of the duplicate data block is reserved, freeing up the storage space occupied by the duplicate data block. Point the LUs that point to these data blocks to the one that is reserved.
  • Storage system can be independent of cheap disk redundancy 'J (Redundant Arrays of Independent Disks, RAID) to improve its data security.
  • RAID Redundant Arrays of Independent Disks
  • the protection that RAID can provide is limited. When the reliability of a RAID strip is reduced, the security of the data stored therein is reduced.
  • Stripe In RAID technology, Stripe consists of several Stripe Units (SUs). The SUs that make up the same stripe can belong to different physical memories. Stripes are also called strips. SUs belonging to the same stripe can have the same size of storage space. For ease of management, SUs belonging to the same stripe can also have the same offset, meaning that they are in the same location in different memories. For example, for RAID5 or RAID6, SU can be divided into data SU and check SU. Data SU is used to store service data. Verify SU is used to store check data of service data. Verify SU can also be called redundant SU. An integer number of strips can form a logical unit (LU), and LU can be used as a host-oriented logical storage unit. Conventionally, a logical unit is also called a Logic Unit Number (LU), which is used in the present invention. Convention.
  • LU Logic Unit Number
  • Controller 1 and storage device 2 are connected to form a storage system.
  • the controller 1 is composed of a process 11 and a cache 12, in which a computer instruction is stored in the cache 12, and the processor 11 executes computer instructions to perform corresponding operations on the storage device to complete the present invention.
  • the storage device 2 is composed of a plurality of memories 21, each of which provides one stripe unit SU to form a stripe 211.
  • the host sends the data to the controller 1, and the controller stores the data in the stripe of the storage device.
  • a data block occupies an integer number of stripes, and controller 1 can de-duplicate the data at the granularity of the data block.
  • An error or loss in the data in the SU is called a SU fault.
  • the RAID algorithm can be used to recover the data in the faulty SU with the data stored in the unfaulted SU. This process of restoring data is called stripping repair. .
  • Some RAID algorithms can recover data for a single SU fault, and some RAID algorithms can recover data for a larger number of SU faults.
  • the number of faults SU that can be recovered by the RAID algorithm is called the number of SUs that this stripe allows for faults. For example, RAID5 allows one SU fault, while RAID6 allows 2 SU faults.
  • the status of the stripe includes: the stripe is consistent; the stripe is downgraded; the stripe is inconsistent; the stripe is invalid.
  • the reliability of the strips in these four states is sequentially reduced.
  • the stripe consistency is normal, the data in all SUs in the strip is normal, that is to say, the data of each SU can be read out, and the check data calculated by the data of the data SU and the verification SU The data is the same.
  • Inconsistent striping means that the data in each SU of the stripe can be read out, but the check data calculated from the data in the data SU is different from the data stored in the check SU.
  • the reason for the inconsistency of the stripe may be that the data SU data is in error, or the data of the verify SU is in error, or there is an error in the data SU data and the verification SU data. Since the redundant data is stored in the verification SU, when only the data of the SU is erroneous and the data of the data SU is not erroneous, the data transmission is not considered to be lost, and the data of the verification SU can be recalculated by the data of the data SU.
  • Striping downgrade means that there is a fault SU in the strip, but the data in the fault SU can be recovered by means of the remaining SUs in the stripe.
  • the striping degradation can be further subdivided into multiple levels. The more SUs with faults, the lower the reliability. For example, if you use RAID6 striping, one SU fault or two SU faults is called stripe downgrade, but when there are two strip faults, the data in the stripe is more secure than one stripe fault. Lower. If the stripe fails, it means that there is a fault in the SU, and the data in the fault SU cannot be recovered through the remaining SUs of the stripe, that is, some data in the stripe is permanently lost. If the SU fails, it means that there is a logical or physical error in the storage space of the SU. As a result, the data in the SU cannot be read or cannot be read completely.
  • a block is the basic unit of deduplication.
  • a block of data is stored in one or more strips, and a block of data that stores the same stripe is called a stripe group.
  • a stripe group consists of one strip or multiple strips.
  • the data of the LUN consists of the data pointed to by the LUN.
  • the embodiment of the present invention considers the reliability level of the strip in which the data block is located.
  • offline deduplication when there are multiple identical data blocks, check the reliability level of the strips where each data block is located, retain the data blocks in the most reliable stripe, and delete the data blocks in the remaining strips.
  • the stripe state of all strips is not consistent, that is, each stripe has a certain degree of reliability degradation, the data block can be written into the strips with the same stripe, and the remaining strips are deleted. The data.
  • online deduplication if the same data block as the data block to be stored is already stored in the storage device, the reliability level of the stored data block is detected. If the striping status is not the same as the stripe, the data block is used. Write the stripe in the stripe consistently, and delete the stored data block.
  • the reliability level of the lowest reliability stripe is used as the reliability level of the entire strip group.
  • the reliability level of the stripe in which the data block is located is further considered, and the data block is stored in a highly reliable strip. Improve the security of the data block.
  • FIG. 2 it is a flowchart of a data block processing method, which can perform offline deduplication.
  • the method includes the following steps.
  • Step 31 The controller compares the fingerprints of the data blocks in the storage system.
  • Each data block has a fingerprint.
  • the data content of the data block can be calculated by using an algorithm such as MD5 or SHA 128, and the calculated result is used as a fingerprint of the data block.
  • Data blocks with the same fingerprint are the same data block.
  • the fingerprint of the data block in the storage device is stored in the fingerprint library of the controller.
  • Step 32 When there are multiple data blocks having the same fingerprint, query the addresses of all the data blocks that own the fingerprint according to the fingerprint, and search for the strip group occupied by the data block according to the data block address.
  • the controller stores a mapping table in which the storage addresses of the data blocks represented by the fingerprint and the fingerprint are recorded.
  • the controller can find the stripe group that stores the data block based on the address of the data block.
  • the data blocks are stored in a strip group, the data block and the stripe group - corresponding.
  • the data block address can be expressed as an offset of the LU, which can be converted into a physical address, and the location information of the stripe group in which the data block is located can be a physical address.
  • Step 33 Check the RAID stripe state of the independent inexpensive disk redundant array of the stripe group of the data block, and save the data block in the stripe group with the same stripe according to the check result.
  • the data block in the stripe group that is consistent in the stripe including at least one of the following: if the stripe group that has the same stripe already exists, and the sub-segment is consistent
  • the data block in the strip group in which the strips are consistent remains unchanged, and the remaining strips are deleted.
  • the stripe status is verified for each strip group. The status of the strip group is determined by the least reliable stripe in the strip group.
  • the repair in the embodiment of the present invention refers to that when some SUs in the stripe have data errors or SU faults, the data in the normal SU is used to use the RAID check algorithm to recalculate the data in the SU with the fault SU or the data error. , The data in the normal SU and the recalculated data are newly written into the stripe group of the storage device. The stripe status of the repaired stripe group is consistent. The security of the data block in the stripe group is higher than that of the stripe degrading and striping. After the repaired data block is specifically written to the location, the stripe group can be reassigned. If the normal read and write can be satisfied, the original stripe group can also be written to overwrite the data in the original stripe group.
  • a policy provided by the embodiment of the present invention is: when the stripe state of any stripe group is consistent, the data block in the stripe is reliable, and the other stripe groups can no longer perform the stripe state. The verification is directly deleted.
  • Non-striped consistent groupings include states other than stripping consistent, such as striping inconsistencies, striping downgrades, or striping failures. From the statistical results, in the case of multiple deduplication operations, the prior art may result in data loss or data security degradation.
  • the embodiment of the present invention can improve data reliability as compared with the prior art.
  • the embodiment of the present invention further provides another strategy: if there is no stripe group with the same stripe, and there is a degraded strip group, the degraded stripe group is repaired into a stripe-consistent strip by a RAID algorithm. Group, delete the data blocks in the remaining stripe groups.
  • the deduplicated data block may be stored in the degraded strip group, or other stable Qualitatively low in the group of strips.
  • the degraded stripe group may be repaired, so that the finally retained data blocks are stored in the stripe group in a consistent manner. This increases data security.
  • the two strategies are independent and can be performed arbitrarily, which can improve the data security of the storage device statistically. Therefore, the device or controller that implements this method can support both strategies, or only one of them.
  • the embodiment of the present invention further provides an optional policy: if all the stripe groups are invalid stripe groups, the data in the stripe consistent in the stripe group is deduplicated, and the deletion is invalid. The data in the strip. Like the first two strategies, this strategy is also independent. For any device or controller that implements this method, either of these strategies, or any two or three of them can be supported.
  • the strip group consists of strips, and embodiments of the present invention can save data in a portion of the strip.
  • the salvation strips may be able to form a complete group of strips. Even if the salvage strip group is not enough to form a complete strip group, it is still meaningful to retain the data in these strips, for example, in the future.
  • the newly written data and the stored strips can be combined into a complete stripe group. This strategy therefore avoids or reduces data loss in the data block.
  • the measures for retaining the stripe include: If there are stripe-consistent strips, the strips that are consistent with each stripe are deduplicated; if there are degraded strips, and there are no strips that are degraded and stored, the strips of the same data are stored. Consistent striping, the RAID algorithm is used to repair the degraded stripe; if there are inconsistent strips, and the inconsistent stripe data SU does not have an error, and there is no consistent and consistent stripe storing the same data. The stripe or the degraded stripe uses a RAID algorithm to fix the degraded stripe.
  • Step 34 Point the LU pointing to the strip group in which the data block is located to the stripe group with the stripe consistent.
  • the LU is managed by the controller and is provided for host use.
  • the controller records the stripe group pointed to by the LUN.
  • the data blocks in the stripe group form the data of the LUN.
  • the host reads the data
  • the data block stored in the LUN can be found through the pointing relationship between the LUN and the stripe group.
  • some data blocks are deleted, and the reserved data blocks are shared by these LUNs. Therefore, it is necessary to change the LUs pointing to the group of the deleted data blocks to the stripe-consistent strips. group.
  • step 33 when there is a stripe-consistent stripe group in the execution of step 33, the LUs point to the originally existing stripe group; if there is no stripe-consistent strip group When repairing sparsely generated stripe groups, these LUs point to the consistent stripe group generated by the repair.
  • a mapping table of the fingerprint recorded in the controller and the storage address of the data block represented by the fingerprint is further updated. Update the address stored in the data block to a stripe group that points to a consistent strip. In the next deduplication, you can use this correspondence to find the stripe group in which the data block is located, and reconfirm the stripe status of the stripe group.
  • Step 35 the number of references of the data block may also be updated, and the number of times of the reference number increase is equal to the number of deleted data blocks.
  • the controller records the number of times the data block is referenced, and the number of references is used to describe the number of LUs that point to this data block. When the number of references is 0, it means that no LU needs to use this data block, and this data block can be deleted.
  • Steps 35 and 34 are not limited to the order of execution, and may be performed either first or both.
  • FIG. 3 it is a flow chart of the data block processing method, which can be deleted online.
  • the method includes the following steps.
  • the method can be performed by a controller, in particular a processor of the controller executing computer instructions in the cache.
  • Step 41 Query a fingerprint database of a data block in the storage device.
  • a fingerprint database of a data block in the storage device When there is a stored data block in the storage device that has the same fingerprint as the data block to be stored, detecting the independent cheap disk redundancy of the group of the stored data block The remaining array RAID stripe status.
  • the controller Before storing the data block to be stored in the storage device, the controller first calculates the fingerprint of the data block to be stored, and then checks whether the fingerprint of the data block to be stored already exists in the fingerprint database. If it does not exist, it means that the data block to be stored is not stored, and the data block to be stored needs to be stored in the storage device. If it exists, it means that this data block has been stored, and further judge whether it needs to be re-stored. The data block to be stored.
  • Step 42 Check the RAID stripe status of the stripe group in which the stored data block is located. If the stripe is inconsistent, you need to generate a stripe group with the same stripe. Specifically, the stored data block can be replaced by the data block to be stored, and the stripe group in which the data block to be stored is stored is consistently divided into groups; if the strip group in which the data block is stored can be repaired, Fix the stripe group where the stored data block is located. If the stripe state is consistent, there is no need to store the data block to be stored, and the stored data block does not need to be changed.
  • the policy is specifically described below, and the embodiment of the present invention may include any one or more of the following strategies.
  • the controller applying the embodiment of the present invention may have a function supporting multiple policies at the same time, or may only support one of the policies.
  • the stripe state of the stripe group is a stripe downgrade
  • the data block to be stored is stored, the stored data block is deleted, or the storage is performed by the degraded stripe group according to a RAID algorithm.
  • the data block has been stored for repair.
  • the data in the fault SU can be repaired by the RAID algorithm.
  • the policy C if the stripe state of the stripe group is inconsistent, stores the to-be-stored data block, and deletes the stored data block. Or when it is determined that there is no data SU failure, that is, when the fault SU is all the verification SU, the strip group is repaired.
  • the specific repair method is to recalculate the data in the verification SU according to the RAID algorithm, and then write the data in the original data SU and the data in the verification SU obtained by the recalculation into the stripe group of the storage device. It can be the sub-group of the original data SU, or it can be the sub-group of re-application.
  • Determining whether there is data SU is faulty Calculate the fingerprint of the data in the data SU. If the fingerprint and the fingerprint of the data block to be stored are the same, it indicates that data SU has not been corrupted. The fingerprint can be calculated without using the check data.
  • Step 43 Point the LU where the data block to be stored is located to the stripe group with the same stripe.
  • the storage device stores a stripe group with the same stripe, and the data block stored in the stripe group with the same stripe is the same as the data block to be stored.
  • the LU points to the stripe group in which the stored data block is located; if the stored data block is replaced with the to-be-stored data block, the LU points to be stored.
  • the stripe group in which the data block is stored after the storage device; If the stripe group of the stored data block is repaired, the LU points to the repaired stripe group.
  • mapping table of the storage address of the data block represented by the fingerprint and the fingerprint recorded in the controller is further updated. Update the address stored in the data block to a stripe group that points to a consistent strip. In the next deduplication, you can use this correspondence to find the stripe group in which the data block is located, and confirm the stripe status again.
  • Step 44 optionally, update the reference number of data blocks. Increase the number of references to stored data blocks by 1.
  • the block processing device 5 includes a fingerprint matching module 51, an address lookup module 52, a parity check module 53, and an index module 54.
  • a counting module 55 can also be included.
  • the fingerprint matching module 51 is configured to compare fingerprints of data blocks in the storage system.
  • Each data block has a fingerprint.
  • the data content of the data block can be calculated by using an algorithm such as MD5 or SHA 128, and the calculated result is used as a fingerprint of the data block.
  • Data blocks with the same fingerprint are the same data block.
  • the fingerprint of the data block in the storage device can be stored in the fingerprint library of the controller.
  • the address finding module 52 is configured to: when a plurality of data blocks have the same fingerprint, query an address of the data block according to the fingerprint, and search for a stripe group occupied by the data block according to the data block address.
  • the controller stores a mapping table in which the storage address of the data block represented by the fingerprint and the fingerprint is recorded.
  • the controller can find the stripe group that stores the data block based on the address of the data block.
  • Data blocks are stored in strip groups, data blocks and strip groups - corresponding.
  • the consistency check module 53 is configured to check the RAID state of the independent inexpensive disk redundancy array of the stripe group of the data block, and save the data block in the stripe group with consistent stripes according to the verification result. .
  • the consistency check module 53 can also be used to repair a strip group. And saving, according to the verification result, the data block in the stripe group that is consistent in the stripe, including at least one of the following: if the stripe group that has the same stripe already exists, and the sub-segment is consistent.
  • the strip group keeps the data blocks in the strip group that are consistent in the stripe unchanged, and deletes the data blocks in the remaining strip group; if there is no stripe group in which the strips are consistent, and there is degradation
  • the degraded stripe group is repaired by the RAID algorithm, and the stripe group with the same stripe is generated, and the data blocks in the remaining stripe groups are deleted.
  • the stripe status is verified for each stripe group. The status of the strip group is the least reliable stripe in the strip group.
  • the repair in the embodiment of the present invention refers to that when some SUs in the stripe have data errors or SU faults, the data in the normal SU is used to use the RAID check algorithm to recalculate the data in the SU with the fault SU or the data error. , The data in the normal SU and the recalculated data are newly written into the stripe group of the storage device. The stripe status of the repaired stripe group is consistent. The security of the data block in the stripe group is higher than that of the stripe degrading and striping. After the repaired data block is specifically written to the location, the stripe group can be reassigned. If the normal read and write can be satisfied, the original stripe group can also be written to overwrite the data in the original stripe group.
  • a policy provided by the embodiment of the present invention is: when the stripe state of any stripe group is consistent, the data block in the stripe is reliable, and the other stripe groups can no longer perform the stripe state. The verification is directly deleted.
  • Non-striped consistent groupings include states other than stripping consistent, such as striping inconsistencies, striping downgrades, or striping failures. From the statistical results, in the multiple deduplication operations, the prior art may cause data loss or data security degradation. However, the embodiment of the present invention can be improved compared to the prior art. Data reliability.
  • the embodiment of the present invention further provides another strategy: if there is no stripe group with the same stripe, and there is a degraded strip group, the degraded stripe group is repaired into a stripe-consistent strip by a RAID algorithm. Group, delete the data blocks in the remaining stripe groups.
  • the deduplicated data block may be stored in the degraded strip group, or other stable.
  • the degraded stripe group may be repaired, so that the finally retained data blocks are stored in the stripe group in a consistent manner. This increases data security.
  • the two strategies are independent and can be performed arbitrarily, which can improve the data security of the storage device statistically. Therefore, the device or controller that implements this method can support both strategies, or only one of them.
  • the embodiment of the present invention further provides an optional policy: if all the stripe groups are invalid stripe groups, the data in the stripe consistent in the stripe group is deduplicated, and the deletion is invalid. The data in the strip. Like the first two strategies, this strategy is also independent. For any device or controller that implements this method, either of these strategies, or any two or three of them can be supported.
  • the strip group consists of strips, and embodiments of the present invention can save data in a portion of the strip.
  • the salvation strips may be able to form a complete group of strips. Even if the salvage strip group is not enough to form a complete strip group, it is still meaningful to retain the data in these strips, for example, in the future.
  • the newly written data and the stored strips can be combined into a complete stripe group. This strategy therefore avoids or reduces data loss in the data block.
  • the measures for retaining the stripe include: If there are stripe-consistent strips, the strips that are consistent with each stripe are deduplicated; if there are degraded strips, and there are no strips that are degraded and stored, the strips of the same data are stored. Consistent striping, use RAID algorithm to fix degraded stripe; if there is inconsistency The stripe, and the inconsistent stripe data does not have an error, and there is no stripe or degraded stripe that stores the same data in the inconsistent stripe, and the RAID algorithm is used to repair the degraded stripe.
  • the indexing module 54 is connected to the consistency checking module 53 for pointing the LUs pointing to the group of the data blocks in the group to the stripe groups in which the strips are consistent.
  • the LU is managed by the controller and is provided for host use.
  • the controller records the stripe group that the LUN points to.
  • the data blocks in the stripe group form the data of the LUN.
  • the host reads the data
  • the data block stored in the LUN can be found through the LUN and the stripe group.
  • some data blocks are deleted, and the reserved data blocks are shared by these LUNs. Therefore, it is necessary to change the LUs pointing to the group of the deleted data blocks to the stripe-consistent strips. group.
  • the indexing module 54 points the LUs to the originally existing stripe group; if there is no stripe group with the same stripe, the repair generates the stripe-consistent When the group is striped, the indexing module 54 points these LUs to the stripe group that is consistently generated by the repair.
  • the indexing module 54 further updates the mapping table of the fingerprint recorded in the controller and the storage address of the data block represented by the fingerprint. Update the address stored in the data block to a stripe group that points to a consistent strip. In the next deduplication, you can use this correspondence to find the strip group in which the data block is located, and confirm the striping status of the strip group again.
  • the counting module 55 is connected to the consistency checking module 53 to update the reference number of the data block, and the number of times of reference increase is equal to the number of deleted data blocks.
  • the controller records the number of times the data block is referenced, and the number of references is used to describe the number of LUs that point to this data block. When the number of references is 0, it means that no LU needs to use this data block, and this data block can be deleted.
  • FIG. 5 it is a schematic diagram of a data block processing device, which can perform online deduplication.
  • the data block processing device 6 includes: a query module 61, a test result module 62, and an index module 63.
  • a counting module 64 can also be included.
  • the query module 61 is configured to query a fingerprint database of the data block in the storage device, and determine whether there is a stored data block in the storage device that has the same fingerprint as the data block to be stored. Before storing the data block to be stored in the storage device, the controller first calculates the fingerprint of the data block to be stored, and then checks whether the fingerprint of the data block to be stored already exists in the fingerprint database. If it does not exist, it means that the data block to be stored is not stored, and the data block to be stored needs to be stored in the storage device. If it exists, it means that this data block has been stored, and further determines whether it is necessary to re-store the data block to be stored.
  • the consistency check module 62 is configured to detect the RAID stripe status of the stripe group in which the stored data block is located. If the stripe is inconsistent, a stripe group with the same stripe needs to be generated. Specifically, the stored data block can be replaced by the data block to be stored, and the stripe group in which the data block to be stored is stored is consistently divided into groups; if the strip group in which the data block is stored can be repaired, Fix the stripe group where the stored data block is located. If the stripe state is consistent, there is no need to store the data block to be stored, and the stored data block does not need to be changed.
  • the consistency check module 62 also has the function of repairing the stripe group or striping.
  • the processing strategy of the consistency check module 62 is specifically described below.
  • the embodiment of the present invention may include any one or more of the following strategies.
  • the controller of the embodiment of the present invention may have the function of supporting multiple policies at the same time, or may only support one of the policies.
  • the stripe state of the stripe group is a stripe downgrade
  • the data block to be stored is stored, the stored data block is deleted, or the storage is performed by the degraded stripe group according to a RAID algorithm.
  • the data block has been stored for repair.
  • the data in the fault SU can be repaired by the RAID algorithm.
  • the policy C if the stripe state of the stripe group is inconsistent, stores the to-be-stored data block, and deletes the stored data block. Or, when it is determined that there is no data SU failure, that is, the fault SU is all the verification SU, the strip group is repaired.
  • the specific repair method is to recalculate the data in the verification SU according to the RAID algorithm, and then write the data in the original data SU and the data in the verification SU obtained by the recalculation into the stripe group of the storage device. Can be the original data SU is in the strip Group, or a re-apply group.
  • Determining whether there is data SU is faulty Calculate the fingerprint of the data in the data SU. If the fingerprint and the fingerprint of the data block to be stored are the same, it indicates that data SU has not been corrupted. The fingerprint can be calculated without using the check data.
  • the indexing module 63 is configured to point the LUN where the data block to be stored is located to the stripe group with the same stripe.
  • the storage device stores a stripe group with the same stripe.
  • the data block stored in the stripe group with the same stripe is the same as the data block to be stored.
  • the LU points to the stripe group in which the stored data block is located; if the stored data block is replaced with the to-be-stored data block, the LU points to be stored.
  • the stripe group in which the data block is stored after the storage device; If the stripe group of the stored data block is repaired, the LU points to the repaired stripe group.
  • the indexing module 63 is further configured to update a mapping table of the storage addresses of the data blocks represented by the fingerprint and the fingerprint recorded in the controller. Update the address stored in the data block to a stripe group that points to a consistent strip. The next time you perform deduplication, you can use this correspondence to find the stripe group in which the data block is located, and confirm the stripe status again.
  • Counting module 64 used to update the number of times the data block is referenced. Increase the number of references to stored data blocks by 1. Counting module 64 is optional.
  • the embodiment of the present invention may further provide a data block processing system, including a data block processing device 5 and a storage device, where the storage device is configured to store a data block.
  • the block processing device can be a controller or software or hardware integrated in the controller.
  • the embodiment of the present invention may further provide a data block processing system, including a data block processing device 6 and a storage device, where the storage device is configured to store data blocks.
  • the block processing device can be a controller or software or hardware integrated in the controller.
  • aspects of the present invention, or possible implementations of various aspects may be embodied as a system, method, or computer program product.
  • aspects of the invention, or possible implementations of various aspects may employ an entirely hardware embodiment, full software Embodiments (including firmware, resident software, etc.), or a combination of software and hardware aspects, are collectively referred to herein as "circuits,""modules," or “systems.”
  • aspects of the invention, or possible implementations of various aspects may take the form of a computer program product, which is a computer readable program code stored on a computer readable medium.
  • the computer readable medium can be a computer readable signal medium or a computer readable storage medium.
  • the computer readable storage medium includes, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any suitable combination of the foregoing, such as random access memory (RAM), read only memory (ROM), Erase programmable read-only memory (EPROM or flash memory), optical fiber, portable read-only memory (CD-ROM:).
  • the processor in the computer reads the computer readable program code stored in the computer readable medium, such that the processor can perform the functional actions specified in each step or combination of steps in the flowchart; A device that functions as specified in each block, or combination of blocks.

Abstract

Provided is a data processing technology, which is used for data de-duplication and fingerprint detection on a plurality of data blocks. When the fingerprints of the plurality of data blocks are the same, the stability of a stripe group where the data blocks are located is further detected, and the data blocks with high stability are retained, or when the reliability of the stripe group where the data blocks are located is not high, data recovery is carried out on the stripe group where the data blocks are located. The reliability of the stripe group where the data blocks are located may be increased, and data security is improved.

Description

一种去重方法装置与系统 技术领域  De-weighting method device and system
本发明涉及存储领域, 特别涉及一种去重技术。 背景技术  The present invention relates to the field of storage, and in particular to a deduplication technology. Background technique
在存储领域中, 为了节约存储空间, 去重 (De-duplicate )是一种经常被 使用到的技术, 其做法是如果需要对多份相同的数据进行存储时, 只存储其 中的一份, 其余与这份数据重复的数据不再存储。 也就是说重复的数据被删 除, 因此这种技术也称为重复数据删除。  In the storage field, in order to save storage space, de-duplicate is a technology that is often used. If you need to store multiple copies of the same data, only one of them is stored. Data duplicated with this data is no longer stored. That is to say, repeated data is deleted, so this technique is also called deduplication.
在粒度的选择上, 可以把文件拆分成数据块, 以数据块作为去重的基本 单位。 在以数据块作为去重的基本单位时, 可以为每个数据块赋予指紋, 指 紋和数据块的内容强相关。 当两个数据块的指紋相同时, 我们可以得出这两 个数据块的内容相同的结论, 通过执行去重操作, 只存储其中一个数据块到 存储系统中, 另外一个数据块不存储。  In the choice of granularity, the file can be split into data blocks, with the data block as the basic unit of deduplication. When a data block is used as a basic unit for deduplication, each data block can be fingerprinted, and the fingerprint and the content of the data block are strongly correlated. When the fingerprints of the two data blocks are the same, we can conclude that the contents of the two data blocks are the same. By performing the deduplication operation, only one of the data blocks is stored in the storage system, and the other data block is not stored.
然而, 去重技术也带来了数据安全性降低的问题, 如果这仅有的一份数 据因为存储系统故障而损坏, 可能造成数据的安全性大大降低或者数据永久 性丟失。 发明内容  However, deduplication technology also brings about a problem of reduced data security. If only one of the data is damaged due to a storage system failure, the security of the data may be greatly reduced or the data may be permanently lost. Summary of the invention
本发明可以提高数据安全性。  The invention can improve data security.
第一方面, 本发明提供一种数据处理方法,用于控制器中,该方法包括: 当有多个数据块有相同的指紋时, 根据所述指紋查询所述数据块的地址, 根 据所述数据块地址查找所述数据块所占用的分条组; 校验所述数据块的所述 分条组的独立廉价磁盘冗余阵列 RAID 分条状态, 根据校验结果保存分条一 致的分条组中的数据块; 所述根据校验结果保存分条一致的分条组中的数据 块, 包括下述至少一种: 如果存在分条一致的所述分条组, 并且存在非分条 一致的所述分条组, 则保留分条一致的所述分条组中的数据块, 删除其余所 述分条组中的数据块; 如果不存在分条一致的所述分条组, 并且存在降级的 分条组, 则通过 RAID 算法修复将降级的分条组修复为分条一致的分条组, 所述分条一致的分条组中存储有所述数据块, 删除其余所述分条组中的所述 数据块。 In a first aspect, the present invention provides a data processing method for a controller, where the method includes: when there are multiple data blocks having the same fingerprint, querying an address of the data block according to the fingerprint, according to the The data block address is used to search for a stripe group occupied by the data block; verifying the independent stripped disk redundant array RAID stripe state of the stripe group of the data block, and saving the stripe consistent strip according to the check result The data block in the group; the data block in the group of the stripe group that is consistent according to the check result, including at least one of the following: if there is a stripe group with consistent strips, and there is a non-striped consistent The stripe group in the stripe group keeps the data blocks in the stripe group consistent with the stripe, and deletes the data blocks in the remaining stripe group; if there is no stripe group that is consistent in stripe, and exists Downgraded The stripe group is repaired by the RAID algorithm to repair the degraded stripe group into a stripe group with the same stripe group. The stripe group with the consistent stripe stores the data block, and deletes the remaining stripe group. The data block.
第二方面, 本发明提供一种数据块处理装置, 该装置包括: 指紋比对模 块, 用于对存储设备中数据块的指紋进行比对; 地址查找模块, 当所述存储 设备中有多个数据块有相同的指紋时, 根据所述指紋查询所述多个数据块的 地址, 根据所述多个数据块地址查找所述多个数据块所占用的分条组; 一致 性校验模块, 用于校验所述多个数据块的所述分条组的独立廉价磁盘冗余阵 列 RAID 分条状态, 根据校验结果保存分条一致的分条组中的数据块; 所述 根据校验结果保存分条一致的分条组中的数据块, 包括下述至少一种: 如果 存在分条一致的所述分条组, 并且存在非分条一致的所述分条组, 则保留分 条一致的所述分条组中的数据块, 删除其余所述分条组中的数据块; 如果不 存在分条一致的所述分条组, 并且存在降级的分条组, 则通过 RAID 算法修 复将降级的分条组修复为分条一致的分条组, 所述分条一致的分条组中存储 有所述数据块, 删除其余所述分条组中的所述数据块。  In a second aspect, the present invention provides a data block processing apparatus, the device comprising: a fingerprint matching module, configured to compare fingerprints of data blocks in a storage device; and an address finding module, when there are multiple When the data block has the same fingerprint, the address of the plurality of data blocks is queried according to the fingerprint, and the stripe group occupied by the plurality of data blocks is searched according to the plurality of data block addresses; the consistency check module, An independent low-cost redundant array RAID stripe state of the stripe group for verifying the plurality of data blocks, and storing data blocks in a stripe-consistent stripe group according to a check result; The result is to save the data block in the stripe group with the same stripe, including at least one of the following: if there is a stripe group with the same stripe, and there is a strip group that is not stripe consistent, the stripe is retained Consistent data blocks in the strip group, deleting data blocks in the remaining strip groups; if there is no stripe group with consistent strips, and there is a degraded strip group, then through RAID Points of the set of repair method to repair a degraded stripe uniform stripe group, the stripe group consistent striping the data block is stored, deleting the remaining data blocks in the stripe set.
第三方面, 本发明提供一种数据块处理方法, 用于控制器中, 该方法包 括: 查询存储设备中数据块的指紋库, 当存储设备中存在和所述待存储数据 块拥有相同指紋的已存储数据块时, 检测所述已存储数据块所在分条组的独 立廉价磁盘冗余阵列 RAID分条状态; 根据检测进行数据块存储, 包括: 如 果所述分条组的分条状态是分条一致, 则不存储所述待存储数据块; 如果所 述分条组的分条状态是分条降级, 则: 存储所述待存储数据块, 删除所述已 存储数据块; 或者, 存储根据 RAID 算法由所述降级分条组对所述已存储数 据块进行修复; 或者, 如果所述分条组的分条状态是分条不一致, 则: 存储 所述待存储数据块, 删除所述已存储数据块; 或者, 如果数据分条单元未发 生数据错误, 则根据 RAID算法对所述已存储数据块进行修复。  In a third aspect, the present invention provides a data block processing method, which is used in a controller, where the method includes: querying a fingerprint database of a data block in a storage device, where the data block in the storage device and the data block to be stored have the same fingerprint When the data block is stored, the independent low-cost redundant array RAID stripe state of the stripe group in which the stored data block is located is detected; and the data block storage is performed according to the detection, including: if the stripe state of the stripe group is If the stripe is consistent, the data block to be stored is not stored; if the stripe state of the stripe group is stripped down, then: storing the to-be-stored data block, deleting the stored data block; or, storing according to The RAID algorithm repairs the stored data block by the degraded stripe group; or, if the stripe state of the stripe group is inconsistent, storing: the data block to be stored, deleting the The data block is stored; or, if no data error occurs in the data striping unit, the stored data block is repaired according to a RAID algorithm.
第四方面,本发明提供一种数据块处理装置,该装置包括: 查询模块 61 , 用于查询存储设备中数据块的指紋库; 一致性校验模块 62, 用于当存储设备 中存在和所述待存储数据块拥有相同指紋的已存储数据块时, 检测所述已存 储数据块所在分条组的独立廉价磁盘冗余阵列 RAID分条状态; 根据检测进 行数据块存储, 包括: 如果所述分条组的分条状态是分条一致, 则不存储所 述待存储数据块; 如果所述分条组的分条状态是分条降级, 则: 存储所述待 存储数据块, 删除所述已存储数据块; 或者, 存储根据 RAID 算法由所述降 级分条组对所述已存储数据块进行修复; 或者, 如果所述分条组的分条状态 是分条不一致, 则: 存储所述待存储数据块, 删除所述已存储数据块; 或者, 如果数据分条单元未发生数据错误, 则根据 RAID 算法对所述已存储数据块 进行修复。 In a fourth aspect, the present invention provides a data block processing apparatus, the device comprising: a query module 61, configured to query a fingerprint database of a data block in a storage device; and a consistency check module 62, configured to be a storage device When there is a stored data block having the same fingerprint as the data block to be stored, detecting a RAID array state of the independent inexpensive disk of the striped group in which the stored data block is located; storing the data block according to the detection, including If the stripe state of the stripe group is consistent, the data block to be stored is not stored; if the stripe state of the stripe group is stripped down, then: storing the to-be-stored data block Deleting the stored data block; or storing the stored data block by the degraded stripe group according to a RAID algorithm; or, if the stripe state of the stripe group is inconsistent, And storing the stored data block, or deleting the stored data block according to a RAID algorithm;
本发明方案, 可以在数据块去重时, 同时对数据块所在分条组进行稳定 性检测, 当稳定性不足时通过重新存储或者修复的方式, 提高数据块所在分 条组稳定性, 从而提高了数据安全性。 附图说明  The solution of the invention can perform stability detection on the stripe group in which the data block is located when the data block is de-duplicated, and improve the stability of the strip group in the data block by re-storing or repairing when the stability is insufficient. Data security. DRAWINGS
为了更清楚地说明本发明实施例技术方案, 下面将对实施例所需要使用 的附图作简单地介绍, 下面描述中的附图仅仅是本发明的一些实施例, 还可 以根据这些附图获得其他的附图。  In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings to be used in the embodiments will be briefly described below. The drawings in the following description are only some embodiments of the present invention, and can also be obtained according to the drawings. Other drawings.
图 1是本发明应用场景拓朴图示例;  1 is an example of a topology diagram of an application scenario of the present invention;
图 2是本发明实施例一种数据块处理方法流程图;  2 is a flowchart of a data block processing method according to an embodiment of the present invention;
图 3是本发明实施例一种数据块处理方法流程图;  3 is a flowchart of a data block processing method according to an embodiment of the present invention;
图 4是一种数据块处理装置实施例示意图;  4 is a schematic diagram of an embodiment of a data block processing apparatus;
图 5是一种数据块处理装置实施例示意图。 具体实施方式  FIG. 5 is a schematic diagram of an embodiment of a data block processing apparatus. detailed description
下面将结合本发明实施例中的附图, 对本发明的技术方案进行清楚、 完 整地描述, 显然, 所描述的实施例仅仅是本发明一部分实施例, 而不是全部 的实施例。 基于本发明中的实施例所获得的所有其他实施例, 都属于本发明 保护的范围。  The technical solutions of the present invention will be clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments obtained based on the embodiments of the present invention are within the scope of the present invention.
存储系统通常由控制器和存储设备共同组成。 控制器相当于一台电脑, 包括处理器和内存, 可以对存储设备进行管理, 并提供接口给主机及存储设 备。 存储设备提供物理上的存储空间, 存储设备例如可以由固态硬盘(Solid State Disc, SSD ), 串行连接小型计算机系统接口 SCSI ( Serial Attached SCSI, SAS )盘组成。 写数据时, 主机的写数据请求先发往控制器, 然后由控制器为 这个写请求分配存储设备中的存储空间, 并将写请求中携带的待写数据发往 存储设备进行存储。 读数据时, 数据从存储设备读入控制器, 然后由控制器 发给主机。 控制器和存储设备可以是物理上独立的设备, 也可以把存储设备 集成在控制器中。 当把存储设备集成在控制器中, 控制器与存储设备之间的 数据交互成为控制器内部的数据交互, 此时也可以把控制器称为存储服务器。 A storage system usually consists of a controller and a storage device. The controller is equivalent to a computer. Including the processor and memory, you can manage the storage device and provide interfaces to the host and storage devices. The storage device provides a physical storage space. The storage device may be composed of, for example, a Solid State Disc (SSD) and a Serial Attached SCSI (SAS) disk. When writing data, the host writes the data request to the controller first, and then the controller allocates the storage space in the storage device for the write request, and sends the data to be written carried in the write request to the storage device for storage. When reading data, the data is read from the storage device into the controller and then sent to the host by the controller. The controller and storage device can be physically separate devices, or the storage device can be integrated into the controller. When the storage device is integrated in the controller, the data interaction between the controller and the storage device becomes a data interaction inside the controller, and the controller can also be referred to as a storage server.
在另外一种拓朴结构中, 控制器提供管理不提供数据的传递, 存储设备 与主机之间数据交换可以不经过控制器, 也可以适用本发明实施例提供的计 算方案。  In another topology, the controller provides management for not providing data transfer, and the data exchange between the storage device and the host may be performed without the controller, and the calculation solution provided by the embodiment of the present invention may also be applied.
去重技术可以分为在线(On-line )和离线(Off-line ) 两种方式。 在线方 式对存储设备的空间利用率更高; 离线方式数据写入速度更快。  Deduplication technology can be divided into online (On-line) and offline (Off-line). The online mode has higher space utilization for storage devices; offline data writes faster.
在线方式, 控制器收到一个写请求, 写请求携带有新数据块。 在新数据 块被写入存储设备前, 控制器检查存储设备中是否已经存在相同的数据块, 如果不存在, 则存储新数据块到存储设备; 如果存在, 则不再存储这个新数 据块, 而建立拥有这个新数据块的 LU 和那个已经存在的数据块之间的索引 关系。 这个索引关系例如可以是指针, 当后续需要读取这个新数据块时, 可 以通过指针读取那个已经存在的数据块。  In online mode, the controller receives a write request, and the write request carries a new data block. Before the new data block is written to the storage device, the controller checks whether the same data block already exists in the storage device, and if not, stores the new data block to the storage device; if it exists, the new data block is no longer stored. And establish an index relationship between the LU that owns this new data block and the existing data block. This index relationship can be, for example, a pointer, and when the new data block needs to be read later, the existing data block can be read by the pointer.
离线方式, 在控制器收到一个写请求后, 不论存储设备是否已经存储有 相同的数据块,先把数据块存储到存储设备中。然后周期性的执行去重操作, 或者待存储设备空闲时执行去重操作。 去重操作过程中, 重复的数据块仅保 留一份, 释放重复数据块占用的存储空间。 把指向这些数据块的 LU 都指向 保留的那一个数据块。  In offline mode, after the controller receives a write request, the data block is first stored in the storage device regardless of whether the storage device already stores the same data block. The deduplication operation is then performed periodically, or the deduplication operation is performed when the storage device is idle. During the de-duplication operation, only one copy of the duplicate data block is reserved, freeing up the storage space occupied by the duplicate data block. Point the LUs that point to these data blocks to the one that is reserved.
由于重复数据仅保留一份, 这仅有的一份数据的安全性就显得尤为重要。 存储系统可以通过独立廉价磁盘冗余阵歹 'J ( Redundant Arrays of Independent Disks, RAID )来提高其数据安全性。 然而 RAID能提供的保障也是有限的, 当某个 RAID的分条可靠性降低时, 会导致其中存储的数据安全性降低。 Since only one copy of the duplicate data is kept, the security of the only piece of data is particularly important. Storage system can be independent of cheap disk redundancy 'J (Redundant Arrays of Independent Disks, RAID) to improve its data security. However, the protection that RAID can provide is limited. When the reliability of a RAID strip is reduced, the security of the data stored therein is reduced.
在 RAID技术中, 分条( Stripe ) 由若干个分条单元( Stripe Unit, SU ) 组成, 组成同一个分条的 SU可以属于不同的物理存储器, 分条又称为条带。 属于同一个分条的 SU, 可以拥有相同大小的存储空间。 为了管理方便, 属于 同一个分条的 SU, 还可以有相同的偏移量, 也就是说它们位于不同存储器的 相同位置。 例如对于 RAID5或者 RAID6, SU可以分为数据 SU和校验 SU, 数据 SU用于存储业务数据,校验 SU用于存储业务数据的校验数据,校验 SU 也可以称为冗余 SU。 整数个分条可以组成逻辑单元(Logic Unit, LU ), LU 可以作为面向主机的逻辑存储单位, 在惯例上, 逻辑单元也称为逻辑单元号 ( Logic Unit Number, LU ), 本发明沿用这种惯例。  In RAID technology, Stripe consists of several Stripe Units (SUs). The SUs that make up the same stripe can belong to different physical memories. Stripes are also called strips. SUs belonging to the same stripe can have the same size of storage space. For ease of management, SUs belonging to the same stripe can also have the same offset, meaning that they are in the same location in different memories. For example, for RAID5 or RAID6, SU can be divided into data SU and check SU. Data SU is used to store service data. Verify SU is used to store check data of service data. Verify SU can also be called redundant SU. An integer number of strips can form a logical unit (LU), and LU can be used as a host-oriented logical storage unit. Conventionally, a logical unit is also called a Logic Unit Number (LU), which is used in the present invention. Convention.
如图 1所示, 是本发明应用场景拓朴图示例。 控制器 1和存储设备 2连 接组成存储系统。 控制器 1由处理 11和緩存 12组成, 緩存 12中存储有计算 机指令, 处理器 11运行计算机指令, 对存储设备执行相应的操作即可完成本 发明。 存储设备 2由多个存储器 21组成, 每个存储器 21提供 1个分条单元 SU组成一个分条 211。 数据存储时, 主机把数据发给通过控制器 1 , 控制器 把数据存储到存储设备的分条中。 一个数据块占用整数个分条, 控制器 1 可 以以数据块为粒度对数据进行去重。  As shown in FIG. 1, it is an example of a topology diagram of an application scenario of the present invention. Controller 1 and storage device 2 are connected to form a storage system. The controller 1 is composed of a process 11 and a cache 12, in which a computer instruction is stored in the cache 12, and the processor 11 executes computer instructions to perform corresponding operations on the storage device to complete the present invention. The storage device 2 is composed of a plurality of memories 21, each of which provides one stripe unit SU to form a stripe 211. When the data is stored, the host sends the data to the controller 1, and the controller stores the data in the stripe of the storage device. A data block occupies an integer number of stripes, and controller 1 can de-duplicate the data at the granularity of the data block.
SU中的数据发生错误或者丟失称为 SU故障。 在一个分条中, 当其中一 个 SU发生故障时, 通过 RAID算法, 可以用未发生故障的 SU中存储的数据 对故障 SU中的数据进行恢复, 这个恢复数据的过程, 称为分条的修复。 有的 RAID算法可以对单个 SU故障进行数据恢复, 有的 RAID算法可以对更多数 量的 SU故障进行数据恢复。 能够通过 RAID算法恢复出的故障 SU的数量, 称为这个分条允许故障的 SU数量,例如 RAID5允许一个 SU故障,而 RAID6 允许 2个 SU故障。  An error or loss in the data in the SU is called a SU fault. In a stripe, when one of the SUs fails, the RAID algorithm can be used to recover the data in the faulty SU with the data stored in the unfaulted SU. This process of restoring data is called stripping repair. . Some RAID algorithms can recover data for a single SU fault, and some RAID algorithms can recover data for a larger number of SU faults. The number of faults SU that can be recovered by the RAID algorithm is called the number of SUs that this stripe allows for faults. For example, RAID5 allows one SU fault, while RAID6 allows 2 SU faults.
分条的状态包括: 分条一致; 分条降级; 分条不一致; 分条失效。 处于 这四种状态的分条的可靠性依次降低。 分条一致是正常状态, 分条中的所有 SU中的数据都正常,也就是说每个 SU都数据都可以读出来, 而且通过数据 SU的数据计算得到的校验数据和校 验 SU中的数据相同。 分条不一致, 是指分条的每个 SU中的数据都可以读出 来, 但是根据数据 SU中的数据计算得到的校验数据和校验 SU中存储的数据 不同。 分条不一致产生的原因可能是数据 SU数据出错, 也可能是校验 SU的 数据出错, 或者既有数据 SU数据出错又有校验 SU数据出错。 由于校验 SU 中存储的是冗余数据, 因此当仅校验 SU的数据出错, 数据 SU的数据没有出 错时, 不认为数据发送丟失, 可以通过数据 SU的数据重新计算校验 SU的数 据。 The status of the stripe includes: the stripe is consistent; the stripe is downgraded; the stripe is inconsistent; the stripe is invalid. The reliability of the strips in these four states is sequentially reduced. The stripe consistency is normal, the data in all SUs in the strip is normal, that is to say, the data of each SU can be read out, and the check data calculated by the data of the data SU and the verification SU The data is the same. Inconsistent striping means that the data in each SU of the stripe can be read out, but the check data calculated from the data in the data SU is different from the data stored in the check SU. The reason for the inconsistency of the stripe may be that the data SU data is in error, or the data of the verify SU is in error, or there is an error in the data SU data and the verification SU data. Since the redundant data is stored in the verification SU, when only the data of the SU is erroneous and the data of the data SU is not erroneous, the data transmission is not considered to be lost, and the data of the verification SU can be recalculated by the data of the data SU.
分条降级, 是指分条中有故障 SU, 但可以借助分条中其余 SU恢复故障 SU中的数据。 当分条允许发生故障的 SU不止一个时, 分条降级可以进一步 细分成多个等级, 故障的 SU越多, 可靠性越低。 例如釆用 RAID6的分条, 出现 1个 SU故障或者出现 2个 SU故障都称为分条降级, 但出现 2个分条故 障时, 分条中数据的安全性比出现 1 个分条故障时更低。 分条失效, 是指分 条中有 SU出现了故障, 而且不能通过分条的其余 SU恢复出故障 SU中的数 据, 也就是说分条中有部分数据出现了永久性丟失。 SU发生故障, 是指 SU 的存储空间出现逻辑或者物理上的错误,导致 SU中数据无法读出或者无法完 整读出。  Striping downgrade means that there is a fault SU in the strip, but the data in the fault SU can be recovered by means of the remaining SUs in the stripe. When striping allows more than one SU to fail, the striping degradation can be further subdivided into multiple levels. The more SUs with faults, the lower the reliability. For example, if you use RAID6 striping, one SU fault or two SU faults is called stripe downgrade, but when there are two strip faults, the data in the stripe is more secure than one stripe fault. Lower. If the stripe fails, it means that there is a fault in the SU, and the data in the fault SU cannot be recovered through the remaining SUs of the stripe, that is, some data in the stripe is permanently lost. If the SU fails, it means that there is a logical or physical error in the storage space of the SU. As a result, the data in the SU cannot be read or cannot be read completely.
数据块(Block )是去重的基本单元。 一个数据块会存储到一个或者多个 分条中, 存储同一个分条的数据块称为一个分条组。 一个分条组包括一个分 条或者多个分条。 LUN的数据由 LUN指向的数据组成。  A block is the basic unit of deduplication. A block of data is stored in one or more strips, and a block of data that stores the same stripe is called a stripe group. A stripe group consists of one strip or multiple strips. The data of the LUN consists of the data pointed to by the LUN.
本发明实施例在进行去重的过程中, 同时考虑数据块所在的分条的可靠 性等级。 在离线去重时, 当有多个相同数据块时, 检查各个数据块所在的分 条可靠性等级, 保留可靠性最高的分条中的数据块, 删除其余分条中的数据 块。 当所有分条的分条状态都不是分条一致, 也就是每个分条都出现了一定 程度的可靠性降低时, 可以把数据块写入分条一致的分条中, 删除其余分条 中的数据。 在线去重时, 如果存储设备中已经存储有与待存储数据块相同的数据块 时, 检测已存储的数据块所在的分条可靠性等级, 如果分条状态不是分条一 致, 则把数据块写入分条一致的分条中, 删除已存储的数据块。 In the process of performing deduplication, the embodiment of the present invention considers the reliability level of the strip in which the data block is located. When offline deduplication, when there are multiple identical data blocks, check the reliability level of the strips where each data block is located, retain the data blocks in the most reliable stripe, and delete the data blocks in the remaining strips. When the stripe state of all strips is not consistent, that is, each stripe has a certain degree of reliability degradation, the data block can be written into the strips with the same stripe, and the remaining strips are deleted. The data. When online deduplication, if the same data block as the data block to be stored is already stored in the storage device, the reliability level of the stored data block is detected. If the striping status is not the same as the stripe, the data block is used. Write the stripe in the stripe consistently, and delete the stored data block.
数据块占用多个分条时, 以可靠性最低的分条的可靠性等级作为整个分 条组的可靠性等级。 应用本发明实施例提供的技术, 在去重过程中进一步考 虑数据块所在分条的可靠性等级, 把数据块存入可靠性更高的分条中。 提高 了数据块的安全性。  When a data block occupies multiple strips, the reliability level of the lowest reliability stripe is used as the reliability level of the entire strip group. Applying the technology provided by the embodiment of the present invention, in the deduplication process, the reliability level of the stripe in which the data block is located is further considered, and the data block is stored in a highly reliable strip. Improve the security of the data block.
如图 2 所示, 是一种数据块处理方法流程图, 可以进行离线重删。 方法 包括以下步骤。  As shown in Figure 2, it is a flowchart of a data block processing method, which can perform offline deduplication. The method includes the following steps.
步骤 31 , 控制器对存储系统中数据块的指紋进行比对。  Step 31: The controller compares the fingerprints of the data blocks in the storage system.
每个数据块拥有一个指紋, 例如可以用 MD5, SHA 128等算法对数据块 的数据内容进行计算, 把计算的结果作为数据块的指紋。 拥有相同指紋的数 据块是相同数据块。 存储设备中数据块的指紋存储在控制器的指紋库中。  Each data block has a fingerprint. For example, the data content of the data block can be calculated by using an algorithm such as MD5 or SHA 128, and the calculated result is used as a fingerprint of the data block. Data blocks with the same fingerprint are the same data block. The fingerprint of the data block in the storage device is stored in the fingerprint library of the controller.
步骤 32, 当有多个数据块有相同的指紋时, 根据所述指紋查询拥有这个 指紋的所有数据块的地址, 根据所述数据块地址查找所述数据块所占用的分 条组。  Step 32: When there are multiple data blocks having the same fingerprint, query the addresses of all the data blocks that own the fingerprint according to the fingerprint, and search for the strip group occupied by the data block according to the data block address.
当找到多个拥有相同指紋的数据块时, 可以保留其中一个数据块, 删除 其余数据块, 以减少对存储设备存储空间的占用。 控制器存储有映射表, 映 射表中记录有指紋和指紋代表的数据块的存储地址。 控制器可以根据数据块 的地址, 找到存储这个数据块的分条组。 数据块存储在分条组中, 数据块和 分条组——对应。 数据块地址可以表现为 LU 的偏移, 可以被转换成物理地 址, 数据块所在分条组的位置信息可以是物理地址。  When multiple data blocks with the same fingerprint are found, one of the data blocks can be reserved, and the remaining data blocks can be deleted to reduce the storage space of the storage device. The controller stores a mapping table in which the storage addresses of the data blocks represented by the fingerprint and the fingerprint are recorded. The controller can find the stripe group that stores the data block based on the address of the data block. The data blocks are stored in a strip group, the data block and the stripe group - corresponding. The data block address can be expressed as an offset of the LU, which can be converted into a physical address, and the location information of the stripe group in which the data block is located can be a physical address.
步骤 33 , 校验所述数据块的所述分条组的独立廉价磁盘冗余阵列 RAID 分条状态, 根据校验结果, 保存分条一致的分条组中的数据块。  Step 33: Check the RAID stripe state of the independent inexpensive disk redundant array of the stripe group of the data block, and save the data block in the stripe group with the same stripe according to the check result.
所述根据校验结果保存分条一致的分条组中的数据块, 包括下述至少一 种方案: 如果已经存在分条一致的所述分条组, 并且存在非分条一致的所述 分条组, 则保留分条一致的所述分条组中的数据块不变, 删除其余所述分条 组中的数据块;如果不存在分条一致的所述分条组,并且存在降级的分条组, 则通过 RAID 算法修复降级的分条组, 生成分条一致的分条组, 删除其余所 述分条组中的数据块。 对每个分条组进行分条状态的校验, 分条组的状态由 分条组中可靠性最低的分条决定。 And saving, according to the verification result, the data block in the stripe group that is consistent in the stripe, including at least one of the following: if the stripe group that has the same stripe already exists, and the sub-segment is consistent For the strip group, the data block in the strip group in which the strips are consistent remains unchanged, and the remaining strips are deleted. The data block in the group; if there is no stripe group with the same stripe, and there is a degraded strip group, the RAID algorithm is used to repair the degraded stripe group, and the stripe group with the same stripe is generated, and the remaining strips are deleted. The data block in the strip group. The stripe status is verified for each strip group. The status of the strip group is determined by the least reliable stripe in the strip group.
本发明实施例所说的修复, 是指当分条中有部分 SU 出现数据错误或者 SU故障时, 利用正常 SU中的数据使用 RAID校验算法, 重新计算出故障 SU 或者数据出错的 SU中的数据, 把正常 SU中的数据以及重新计算出的数据新 写入存储设备的分条组中。 修复后的分条组的分条状态是分条一致, 分条组 中数据块的安全性比分条降级、 分条不一致更高。 修复后的数据块具体写入 位置, 可以重新分配分条组; 在能够满足正常读写的情况下, 也可以写入原 分条组, 覆盖原分条组中的数据。  The repair in the embodiment of the present invention refers to that when some SUs in the stripe have data errors or SU faults, the data in the normal SU is used to use the RAID check algorithm to recalculate the data in the SU with the fault SU or the data error. , The data in the normal SU and the recalculated data are newly written into the stripe group of the storage device. The stripe status of the repaired stripe group is consistent. The security of the data block in the stripe group is higher than that of the stripe degrading and striping. After the repaired data block is specifically written to the location, the stripe group can be reassigned. If the normal read and write can be satisfied, the original stripe group can also be written to overwrite the data in the original stripe group.
本发明实施例提供的一种策略是: 当有任意分条组的分条状态是分条一 致时, 说明这个分条中的数据块是可靠的, 其他分条组可以不再执行分条状 态的校验直接删除。  A policy provided by the embodiment of the present invention is: when the stripe state of any stripe group is consistent, the data block in the stripe is reliable, and the other stripe groups can no longer perform the stripe state. The verification is directly deleted.
现有去重技术中, 由于不执行分条校验, 因此, 如果既存在分条一致的 所述分条组, 又存在非分条一致的分条组。 那么会随机保留其中一个分条组 中的数据块。 如果保留的是分条失效的分条组中的数据块, 而清空其他分条 组, 那么数据块的内容将无法被正常读出, 也就是说去重操作造成了数据丟 失。非分条一致的分条组包括除了分条一致外的其他状态,例如分条不一致、 分条降级或分条失效。 从统计结果来看, 在多次去重操作中, 现有技术会造 成数据丟失或者数据安全性降低。 而本发明实施例相较于现有技术可以提高 数据可靠性。  In the existing deduplication technology, since the stripe check is not performed, if there are both the stripe group in which the strips are consistent, and the stripe group in which the strips are not consistent. Then the data blocks in one of the strip groups are randomly reserved. If the data block in the stripe group that has failed to be stripped is retained, and the other stripe groups are cleared, the contents of the data block cannot be read normally, that is, the deduplication operation causes data loss. Non-striped consistent groupings include states other than stripping consistent, such as striping inconsistencies, striping downgrades, or striping failures. From the statistical results, in the case of multiple deduplication operations, the prior art may result in data loss or data security degradation. The embodiment of the present invention can improve data reliability as compared with the prior art.
本发明实施例还提供另外一种策略: 如果不存在分条一致的所述分条组, 并且存在降级的分条组, 则通过 RAID 算法把降级的分条组修复为分条一致 的分条组, 删除其余所述分条组中的数据块。  The embodiment of the present invention further provides another strategy: if there is no stripe group with the same stripe, and there is a degraded strip group, the degraded stripe group is repaired into a stripe-consistent strip by a RAID algorithm. Group, delete the data blocks in the remaining stripe groups.
现有去重技术中, 由于不对分条组进行执行分条校验, 因此无法知道分 条组的可靠性, 去重后的数据块可能会被存到降级的分条组中, 或者其他稳 定性不高的分条组中。 本发明实施例, 针对此种情况, 可以对降级分条组进 行修复, 使最终保留的数据块以分条一致的形式存储在分条组中。 因此提高 了数据安全性。 In the existing deduplication technology, since the stripe group is not subjected to stripe check, it is impossible to know the reliability of the strip group, and the deduplicated data block may be stored in the degraded strip group, or other stable Qualitatively low in the group of strips. In this embodiment of the present invention, for the case, the degraded stripe group may be repaired, so that the finally retained data blocks are stored in the stripe group in a consistent manner. This increases data security.
需要说明的是, 这两种策略是独立的, 任意执行一种, 在统计上就可以 提高存储设备的数据安全性。 因此, 对执行这个方法的装置或者控制器, 可 以对两种策略都支持, 也可以仅其中一种策略。  It should be noted that the two strategies are independent and can be performed arbitrarily, which can improve the data security of the storage device statistically. Therefore, the device or controller that implements this method can support both strategies, or only one of them.
本发明实施例还提供一种可选的策略: 如果所有分条组都是失效分条组, 则对所述失效分条组中分条一致的分条中的数据进行去重, 并且删除失效的 分条中的数据。 和前两种策略一样, 这种策略也是独立的, 对执行这个方法 的装置或者控制器, 可以这任意一种策略, 或者支持任意两种或者三种都支 持。  The embodiment of the present invention further provides an optional policy: if all the stripe groups are invalid stripe groups, the data in the stripe consistent in the stripe group is deduplicated, and the deletion is invalid. The data in the strip. Like the first two strategies, this strategy is also independent. For any device or controller that implements this method, either of these strategies, or any two or three of them can be supported.
当所有分条组都是失效分条组, 意味着每个分条组都发生了永久性数据 丟失, 依靠单个分条组不足以恢复整个数据块。 分条组由分条组成, 本发明 实施例可以挽救部分分条中的数据。 挽救的分条可能可以凑成一个完整的分 条组, 即使挽救的分条组不足以凑成一个完整的分条组, 保留这些分条中的 数据也仍然是有意义的, 例如在未来可能会有新的数据写入时, 新写入的数 据和已存储的分条可以凑成一个完整的分条组。 因此这种策略避免或者减少 了数据块的数据丟失。  When all the stripe groups are invalid stripe groups, it means that each stripe group has permanent data loss, and relying on a single stripe group is not enough to recover the entire data block. The strip group consists of strips, and embodiments of the present invention can save data in a portion of the strip. The salvation strips may be able to form a complete group of strips. Even if the salvage strip group is not enough to form a complete strip group, it is still meaningful to retain the data in these strips, for example, in the future. When new data is written, the newly written data and the stored strips can be combined into a complete stripe group. This strategy therefore avoids or reduces data loss in the data block.
挽留分条的措施包括: 如果存在分条一致的分条, 则对这些分条一致的 分条进行去重; 如果存在降级的分条, 且不存在和降级的分条存储相同数据 的分条一致的分条, 则使用 RAID 算法修复降级的分条; 如果存在不一致的 分条, 且不一致的分条的数据 SU未发生错误,且不存在和不一致的分条所存 储相同数据的分条一致的分条或者降级的分条, 则使用 RAID 算法修复降级 的分条。  The measures for retaining the stripe include: If there are stripe-consistent strips, the strips that are consistent with each stripe are deduplicated; if there are degraded strips, and there are no strips that are degraded and stored, the strips of the same data are stored. Consistent striping, the RAID algorithm is used to repair the degraded stripe; if there are inconsistent strips, and the inconsistent stripe data SU does not have an error, and there is no consistent and consistent stripe storing the same data. The stripe or the degraded stripe uses a RAID algorithm to fix the degraded stripe.
步骤 34, 将指向所述数据块所在分条组的 LU 指向所述分条一致的分 条组。  Step 34: Point the LU pointing to the strip group in which the data block is located to the stripe group with the stripe consistent.
LU 由控制器管理并提供主机使用。控制器记录有 LUN所指向的分条组, 分条组中的数据块组成 LUN的数据, 当主机读取数据时, 可以通过 LUN和 分条组的指向关系找到存储在 LUN中的数据块。 在对数据去重的过程中, 部 分数据块被删除, 保留的数据块供这些 LUN公用, 因此需要把指向那些被删 除的数据块所在分条组的 LU , 修改为指向分条一致的分条组。 The LU is managed by the controller and is provided for host use. The controller records the stripe group pointed to by the LUN. The data blocks in the stripe group form the data of the LUN. When the host reads the data, the data block stored in the LUN can be found through the pointing relationship between the LUN and the stripe group. In the process of deduplicating data, some data blocks are deleted, and the reserved data blocks are shared by these LUNs. Therefore, it is necessary to change the LUs pointing to the group of the deleted data blocks to the stripe-consistent strips. group.
按照步骤 33所描述的不同情况, 当步骤 33的执行过程中原本就存在分 条一致分条组时, 这些 LU 指向这个原本就存在的分条组; 如果原本不存在 分条一致的分条组, 通过修复生成分条一致的分条组时, 这些 LU 指向修复 生成的分条一致的分条组。  According to the different situations described in step 33, when there is a stripe-consistent stripe group in the execution of step 33, the LUs point to the originally existing stripe group; if there is no stripe-consistent strip group When repairing sparsely generated stripe groups, these LUs point to the consistent stripe group generated by the repair.
可选的, 进一步更新控制器中记录的指紋和指紋代表的数据块的存储地 址的映射表。 把数据块存储的地址更新为指向分条一致的分条组。 在下次进 行去重时, 可以使用这个对应关系找到数据块所在的分条组, 再次确认分条 组的分条状态。  Optionally, a mapping table of the fingerprint recorded in the controller and the storage address of the data block represented by the fingerprint is further updated. Update the address stored in the data block to a stripe group that points to a consistent strip. In the next deduplication, you can use this correspondence to find the stripe group in which the data block is located, and reconfirm the stripe status of the stripe group.
步骤 35 , 可选的, 还可以更新所述数据块的引用次数, 引用次数增加的 数值等于删除的数据块的数量。 控制器记录有数据块被引用的次数, 引用次 数用以描述指向这个数据块的 LU 的数量。 当引用次数为 0 时, 表示没有 LU 需要使用这个数据块, 这个数据块可以被删除。 步骤 35和步骤 34不限 制执行的先后次序, 可以任一先执行或者二者同时执行。  Step 35: Optionally, the number of references of the data block may also be updated, and the number of times of the reference number increase is equal to the number of deleted data blocks. The controller records the number of times the data block is referenced, and the number of references is used to describe the number of LUs that point to this data block. When the number of references is 0, it means that no LU needs to use this data block, and this data block can be deleted. Steps 35 and 34 are not limited to the order of execution, and may be performed either first or both.
如图 3 所示, 是一种数据块处理方法流程图, 可以进行在线重删。 方法 包括以下步骤。 该方法可以有控制器执行, 具体而言是控制器的处理器执行 緩存中的计算机指令。  As shown in Figure 3, it is a flow chart of the data block processing method, which can be deleted online. The method includes the following steps. The method can be performed by a controller, in particular a processor of the controller executing computer instructions in the cache.
步骤 41 , 查询存储设备中数据块的指紋库, 当存储设备中存在和所述待 存储数据块拥有相同指紋的已存储数据块, 检测所述已存储数据块所在分条 组的独立廉价磁盘冗余阵列 RAID分条状态。  Step 41: Query a fingerprint database of a data block in the storage device. When there is a stored data block in the storage device that has the same fingerprint as the data block to be stored, detecting the independent cheap disk redundancy of the group of the stored data block The remaining array RAID stripe status.
控制器在把待存储数据块存入存储设备之前, 先计算这个待存储数据块 的指紋, 然后检查指紋库中是否已经存在待存储数据块指紋。 如果不存在, 意味着这个待存储数据块未被存储过, 待存储数据块需要存储到存储设备中。 如果存在, 意味着这个数据块已经存储过了, 进一步判断是否需要重新存储 待存储数据块。 Before storing the data block to be stored in the storage device, the controller first calculates the fingerprint of the data block to be stored, and then checks whether the fingerprint of the data block to be stored already exists in the fingerprint database. If it does not exist, it means that the data block to be stored is not stored, and the data block to be stored needs to be stored in the storage device. If it exists, it means that this data block has been stored, and further judge whether it needs to be re-stored. The data block to be stored.
步骤 42, 通过检测已存储数据块所在分条组的 RAID分条状态, 如果分 条不一致, 则需要生成分条一致的分条组。 具体措施, 可以用待存储数据块 替换已存储数据块, 重新存储的待存储数据块所在的分条组时分条一致的分 条组; 如果已存储数据块所在的分条组可以修复, 也可以修复已存储数据块 所在的分条组。 如果分条状态是分条一致, 就不需要存储待存储数据块, 已 存储数据块也不需要改变。  Step 42: Check the RAID stripe status of the stripe group in which the stored data block is located. If the stripe is inconsistent, you need to generate a stripe group with the same stripe. Specifically, the stored data block can be replaced by the data block to be stored, and the stripe group in which the data block to be stored is stored is consistently divided into groups; if the strip group in which the data block is stored can be repaired, Fix the stripe group where the stored data block is located. If the stripe state is consistent, there is no need to store the data block to be stored, and the stored data block does not need to be changed.
下面对策略进行具体说明, 本发明实施例可以包括下述策略任意一种或 多种。 应用本发明实施例的控制器, 可以同时具有支持多种策略的功能, 也 可以只支持其中一种策略。  The policy is specifically described below, and the embodiment of the present invention may include any one or more of the following strategies. The controller applying the embodiment of the present invention may have a function supporting multiple policies at the same time, or may only support one of the policies.
策略 A, 如果已存储数据块所在分条组的分条状态是分条一致, 则不存 储所述待存储数据块。 当然, 如果用待存储数据块替换已存储数据块也是可 以的。  Policy A. If the stripe status of the stripe group in which the stored data block is located is consistent, the data block to be stored is not stored. Of course, it is also possible to replace the stored data block with the block of data to be stored.
策略 B, 如果所述分条组的分条状态是分条降级, 则存储所述待存储数 据块, 删除所述已存储数据块, 或者存储根据 RAID 算法由所述降级分条组 对所述已存储数据块进行修复。处于降级的分条组, 其故障 SU中的数据是可 以通过 RAID算法修复的。  If the stripe state of the stripe group is a stripe downgrade, the data block to be stored is stored, the stored data block is deleted, or the storage is performed by the degraded stripe group according to a RAID algorithm. The data block has been stored for repair. In the degraded group, the data in the fault SU can be repaired by the RAID algorithm.
策略 C, 如果所述分条组的分条状态是分条不一致, 则存储所述待存储 数据块, 删除所述已存储数据块。 或者在确定不存在数据 SU发生故障, 也就 是说故障 SU全是校验 SU时, 修复分条组。 具体修复办法, 是根据 RAID算 法重新计算校验 SU中的数据, 然后将原数据 SU中的数据和重新计算获得的 校验 SU中的数据, 一起写入存储装置的分条组。 可以是原数据 SU所在分条 组, 也可以是重新申请的分条组。  The policy C, if the stripe state of the stripe group is inconsistent, stores the to-be-stored data block, and deletes the stored data block. Or when it is determined that there is no data SU failure, that is, when the fault SU is all the verification SU, the strip group is repaired. The specific repair method is to recalculate the data in the verification SU according to the RAID algorithm, and then write the data in the original data SU and the data in the verification SU obtained by the recalculation into the stripe group of the storage device. It can be the sub-group of the original data SU, or it can be the sub-group of re-application.
确定有没有数据 SU发生故障的方式是: 计算数据 SU中数据的指紋, 如 果指紋和待存储数据块的指紋相同,说明数据 SU未发生数据损坏。可以不使 用校验数据计算指紋。  Determining whether there is data SU is faulty: Calculate the fingerprint of the data in the data SU. If the fingerprint and the fingerprint of the data block to be stored are the same, it indicates that data SU has not been corrupted. The fingerprint can be calculated without using the check data.
步骤 43 , 将待存储数据块所在的 LU 指向所述分条一致的分条组。 经过步骤 42的处理后, 存储设备中存储有一个分条一致的分条组, 这个 分条一致的分条组中存储的数据块与待存储数据块相同。 按照步骤 42的不同 策略, 如果已存储数据块的分条状态是分条一致, 那么 LU 指向已存储数据 块所在分条组; 如果使用待存储数据块替换已存储数据块, 那么 LU 指向待 存储数据块存储到存储设备后所在的分条组; 如果对已存储数据块的分条组 进行修复, 那么 LU 指向修复后的分条组。 Step 43: Point the LU where the data block to be stored is located to the stripe group with the same stripe. After the processing in step 42, the storage device stores a stripe group with the same stripe, and the data block stored in the stripe group with the same stripe is the same as the data block to be stored. According to the different strategies of step 42, if the stripe state of the stored data block is consistent, then the LU points to the stripe group in which the stored data block is located; if the stored data block is replaced with the to-be-stored data block, the LU points to be stored. The stripe group in which the data block is stored after the storage device; If the stripe group of the stored data block is repaired, the LU points to the repaired stripe group.
可选的, 进一步更新控制器中记录有指紋和指紋代表的数据块的存储地 址的映射表。 把数据块存储的地址更新为指向分条一致的分条组。 在下次进 行去重时, 可以使用这个对应关系找到数据块所在的分条组, 再次确认分条 状态。  Optionally, the mapping table of the storage address of the data block represented by the fingerprint and the fingerprint recorded in the controller is further updated. Update the address stored in the data block to a stripe group that points to a consistent strip. In the next deduplication, you can use this correspondence to find the stripe group in which the data block is located, and confirm the stripe status again.
步骤 44, 可选的, 更新数据块的引用次数。 把已存储数据块的引用次数 加 1。  Step 44, optionally, update the reference number of data blocks. Increase the number of references to stored data blocks by 1.
如图 4所示, 是一种数据块处理装置示意图。 数据块处理装置 5 包括指 紋比对模块 51 , 地址查找模块 52, —致性校验模块 53以及索引模块 54。 还 可以包括计数模块 55。  As shown in FIG. 4, it is a schematic diagram of a data block processing device. The block processing device 5 includes a fingerprint matching module 51, an address lookup module 52, a parity check module 53, and an index module 54. A counting module 55 can also be included.
指紋比对模块 51 , 用于对存储系统中数据块的指紋进行比对。  The fingerprint matching module 51 is configured to compare fingerprints of data blocks in the storage system.
每个数据块拥有一个指紋, 例如可以用 MD5, SHA 128等算法对数据块 的数据内容进行计算, 把计算的结果作为数据块的指紋。 拥有相同指紋的数 据块是相同数据块。 存储设备中数据块的指紋可以存储在控制器的指紋库中。  Each data block has a fingerprint. For example, the data content of the data block can be calculated by using an algorithm such as MD5 or SHA 128, and the calculated result is used as a fingerprint of the data block. Data blocks with the same fingerprint are the same data block. The fingerprint of the data block in the storage device can be stored in the fingerprint library of the controller.
地址查找模块 52, 用于当有多个数据块有相同的指紋时, 根据所述指紋 查询所述数据块的地址, 根据所述数据块地址查找所述数据块所占用的分条 组。  The address finding module 52 is configured to: when a plurality of data blocks have the same fingerprint, query an address of the data block according to the fingerprint, and search for a stripe group occupied by the data block according to the data block address.
当找到多个拥有相同指紋的数据块时, 可以保留其中一个数据块, 删除 其余数据块, 以减少对存储设备存储空间的占用。 控制器存储有映射表, 映 射表中记录有指紋和指紋代表的数据块的存储地址。 控制器可以根据数据块 的地址, 找到存储这个数据块的分条组。 数据块存储在分条组中, 数据块和 分条组——对应。 一致性校验模块 53 , 用于校验所述数据块的所述分条组的独立廉价磁盘 冗余阵列 RAID 分条状态, 根据校验结果, 保存分条一致的分条组中的数据 块。 When multiple data blocks with the same fingerprint are found, one of the data blocks can be reserved, and the remaining data blocks can be deleted to reduce the storage space of the storage device. The controller stores a mapping table in which the storage address of the data block represented by the fingerprint and the fingerprint is recorded. The controller can find the stripe group that stores the data block based on the address of the data block. Data blocks are stored in strip groups, data blocks and strip groups - corresponding. The consistency check module 53 is configured to check the RAID state of the independent inexpensive disk redundancy array of the stripe group of the data block, and save the data block in the stripe group with consistent stripes according to the verification result. .
一致性校验模块 53还可以用于修复分条组。 所述根据校验结果保存分条 一致的分条组中的数据块, 包括下述至少一种方案: 如果已经存在分条一致 的所述分条组, 并且存在非分条一致的所述分条组, 则保留分条一致的所述 分条组中的数据块不变, 删除其余所述分条组中的数据块; 如果不存在分条 一致的所述分条组, 并且存在降级的分条组, 则通过 RAID 算法修复降级的 分条组, 生成分条一致的分条组, 删除其余所述分条组中的数据块。 对每个 分条组进行分条状态的校验, 分条组的状态由分条组中可靠性最低的分条。  The consistency check module 53 can also be used to repair a strip group. And saving, according to the verification result, the data block in the stripe group that is consistent in the stripe, including at least one of the following: if the stripe group that has the same stripe already exists, and the sub-segment is consistent The strip group keeps the data blocks in the strip group that are consistent in the stripe unchanged, and deletes the data blocks in the remaining strip group; if there is no stripe group in which the strips are consistent, and there is degradation In the stripe group, the degraded stripe group is repaired by the RAID algorithm, and the stripe group with the same stripe is generated, and the data blocks in the remaining stripe groups are deleted. The stripe status is verified for each stripe group. The status of the strip group is the least reliable stripe in the strip group.
本发明实施例所说的修复, 是指当分条中有部分 SU 出现数据错误或者 SU故障时, 利用正常 SU中的数据使用 RAID校验算法, 重新计算出故障 SU 或者数据出错的 SU中的数据, 把正常 SU中的数据以及重新计算出的数据新 写入存储设备的分条组中。 修复后的分条组的分条状态是分条一致, 分条组 中数据块的安全性比分条降级、 分条不一致更高。 修复后的数据块具体写入 位置, 可以重新分配分条组; 在能够满足正常读写的情况下, 也可以写入原 分条组, 覆盖原分条组中的数据。  The repair in the embodiment of the present invention refers to that when some SUs in the stripe have data errors or SU faults, the data in the normal SU is used to use the RAID check algorithm to recalculate the data in the SU with the fault SU or the data error. , The data in the normal SU and the recalculated data are newly written into the stripe group of the storage device. The stripe status of the repaired stripe group is consistent. The security of the data block in the stripe group is higher than that of the stripe degrading and striping. After the repaired data block is specifically written to the location, the stripe group can be reassigned. If the normal read and write can be satisfied, the original stripe group can also be written to overwrite the data in the original stripe group.
本发明实施例提供的一种策略是: 当有任意分条组的分条状态是分条一 致时, 说明这个分条中的数据块是可靠的, 其他分条组可以不再执行分条状 态的校验直接删除。  A policy provided by the embodiment of the present invention is: when the stripe state of any stripe group is consistent, the data block in the stripe is reliable, and the other stripe groups can no longer perform the stripe state. The verification is directly deleted.
现有去重技术中, 由于不执行分条校验, 因此, 如果既存在分条一致的 所述分条组, 又存在非分条一致的分条组。 那么会随机保留其中一个分条组 中的数据块。 如果保留的是分条失效的分条组中的数据块, 而清空其他分条 组, 那么数据块的内容将无法被正常读出, 也就是说去重操作造成了数据丟 失。非分条一致的分条组包括除了分条一致外的其他状态,例如分条不一致、 分条降级或分条失效。 从统计结果来看, 在多次去重操作中, 现有技术会造 成数据丟失或者数据安全性降低。 而本发明实施例相较于现有技术可以提高 数据可靠性。 In the existing deduplication technology, since the stripe check is not performed, if there are both the stripe group in which the strips are consistent, and the stripe group in which the strips are not consistent. Then the data blocks in one of the strip groups are randomly reserved. If the data block in the stripe group that has failed to be stripped is retained, and the other stripe groups are cleared, the contents of the data block cannot be read normally, that is, the deduplication operation causes data loss. Non-striped consistent groupings include states other than stripping consistent, such as striping inconsistencies, striping downgrades, or striping failures. From the statistical results, in the multiple deduplication operations, the prior art may cause data loss or data security degradation. However, the embodiment of the present invention can be improved compared to the prior art. Data reliability.
本发明实施例还提供另外一种策略: 如果不存在分条一致的所述分条组, 并且存在降级的分条组, 则通过 RAID 算法把降级的分条组修复为分条一致 的分条组, 删除其余所述分条组中的数据块。  The embodiment of the present invention further provides another strategy: if there is no stripe group with the same stripe, and there is a degraded strip group, the degraded stripe group is repaired into a stripe-consistent strip by a RAID algorithm. Group, delete the data blocks in the remaining stripe groups.
现有去重技术中, 由于不对分条组进行执行分条校验, 因此无法知道分 条组的可靠性, 去重后的数据块可能会被存到降级的分条组中, 或者其他稳 定性不高的分条组中。 本发明实施例, 针对此种情况, 可以对降级分条组进 行修复, 使最终保留的数据块以分条一致的形式存储在分条组中。 因此提高 了数据安全性。  In the existing deduplication technology, since the stripe group is not subjected to stripe check, it is impossible to know the reliability of the strip group, and the deduplicated data block may be stored in the degraded strip group, or other stable. In the low-scoring group. In this embodiment of the present invention, for the case, the degraded stripe group may be repaired, so that the finally retained data blocks are stored in the stripe group in a consistent manner. This increases data security.
需要说明的是, 这两种策略是独立的, 任意执行一种, 在统计上就可以 提高存储设备的数据安全性。 因此, 对执行这个方法的装置或者控制器, 可 以对两种策略都支持, 也可以仅其中一种策略。  It should be noted that the two strategies are independent and can be performed arbitrarily, which can improve the data security of the storage device statistically. Therefore, the device or controller that implements this method can support both strategies, or only one of them.
本发明实施例还提供一种可选的策略: 如果所有分条组都是失效分条组, 则对所述失效分条组中分条一致的分条中的数据进行去重, 并且删除失效的 分条中的数据。 和前两种策略一样, 这种策略也是独立的, 对执行这个方法 的装置或者控制器, 可以这任意一种策略, 或者支持任意两种或者三种都支 持。  The embodiment of the present invention further provides an optional policy: if all the stripe groups are invalid stripe groups, the data in the stripe consistent in the stripe group is deduplicated, and the deletion is invalid. The data in the strip. Like the first two strategies, this strategy is also independent. For any device or controller that implements this method, either of these strategies, or any two or three of them can be supported.
当所有分条组都是失效分条组, 意味着每个分条组都发生了永久性数据 丟失, 依靠单个分条组不足以恢复整个数据块。 分条组由分条组成, 本发明 实施例可以挽救部分分条中的数据。 挽救的分条可能可以凑成一个完整的分 条组, 即使挽救的分条组不足以凑成一个完整的分条组, 保留这些分条中的 数据也仍然是有意义的, 例如在未来可能会有新的数据写入时, 新写入的数 据和已存储的分条可以凑成一个完整的分条组。 因此这种策略避免或者减少 了数据块的数据丟失。  When all the stripe groups are invalid stripe groups, it means that each stripe group has permanent data loss, and relying on a single stripe group is not enough to recover the entire data block. The strip group consists of strips, and embodiments of the present invention can save data in a portion of the strip. The salvation strips may be able to form a complete group of strips. Even if the salvage strip group is not enough to form a complete strip group, it is still meaningful to retain the data in these strips, for example, in the future. When new data is written, the newly written data and the stored strips can be combined into a complete stripe group. This strategy therefore avoids or reduces data loss in the data block.
挽留分条的措施包括: 如果存在分条一致的分条, 则对这些分条一致的 分条进行去重; 如果存在降级的分条, 且不存在和降级的分条存储相同数据 的分条一致的分条, 则使用 RAID 算法修复降级的分条; 如果存在不一致的 分条, 且不一致的分条的数据 su未发生错误,且不存在和不一致的分条所存 储相同数据的分条一致的分条或者降级的分条, 则使用 RAID 算法修复降级 的分条。 The measures for retaining the stripe include: If there are stripe-consistent strips, the strips that are consistent with each stripe are deduplicated; if there are degraded strips, and there are no strips that are degraded and stored, the strips of the same data are stored. Consistent striping, use RAID algorithm to fix degraded stripe; if there is inconsistency The stripe, and the inconsistent stripe data does not have an error, and there is no stripe or degraded stripe that stores the same data in the inconsistent stripe, and the RAID algorithm is used to repair the degraded stripe.
索引模块 54, 和一致性校验模块 53连接, 用于将指向所述数据块所在分 条组的 LU 指向所述分条一致的分条组。  The indexing module 54 is connected to the consistency checking module 53 for pointing the LUs pointing to the group of the data blocks in the group to the stripe groups in which the strips are consistent.
LU 由控制器管理并提供主机使用。控制器记录有 LUN所指向的分条组, 分条组中的数据块组成 LUN的数据, 当主机读取数据时, 可以通过 LUN和 分条组的指向关系找到存储在 LUN中的数据块。 在对数据去重的过程中, 部 分数据块被删除, 保留的数据块供这些 LUN公用, 因此需要把指向那些被删 除的数据块所在分条组的 LU , 修改为指向分条一致的分条组。  The LU is managed by the controller and is provided for host use. The controller records the stripe group that the LUN points to. The data blocks in the stripe group form the data of the LUN. When the host reads the data, the data block stored in the LUN can be found through the LUN and the stripe group. In the process of deduplicating data, some data blocks are deleted, and the reserved data blocks are shared by these LUNs. Therefore, it is necessary to change the LUs pointing to the group of the deleted data blocks to the stripe-consistent strips. group.
如果存储设备中原本就存在分条一致分条组时,索引模块 54将这些 LU 指向这个原本就存在的分条组; 如果原本不存在分条一致的分条组, 通过修 复生成分条一致的分条组时, 索引模块 54将这些 LU 指向修复生成的分条 一致的分条组。  If there is a stripe-consistent stripe group in the storage device, the indexing module 54 points the LUs to the originally existing stripe group; if there is no stripe group with the same stripe, the repair generates the stripe-consistent When the group is striped, the indexing module 54 points these LUs to the stripe group that is consistently generated by the repair.
可选的, 索引模块 54进一步更新控制器中记录的指紋和指紋代表的数据 块的存储地址的映射表。 把数据块存储的地址更新为指向分条一致的分条组。 在下次进行去重时, 可以使用这个对应关系找到数据块所在的分条组, 再次 确认分条组的分条状态。  Optionally, the indexing module 54 further updates the mapping table of the fingerprint recorded in the controller and the storage address of the data block represented by the fingerprint. Update the address stored in the data block to a stripe group that points to a consistent strip. In the next deduplication, you can use this correspondence to find the strip group in which the data block is located, and confirm the striping status of the strip group again.
计数模块 55, 和一致性校验模块 53连接, 可以更新所述数据块的引用次 数, 引用次数增加的数值等于删除的数据块的数量。 控制器记录有数据块被 引用的次数, 引用次数用以描述指向这个数据块的 LU 的数量。 当引用次数 为 0时, 表示没有 LU 需要使用这个数据块, 这个数据块可以被删除。  The counting module 55 is connected to the consistency checking module 53 to update the reference number of the data block, and the number of times of reference increase is equal to the number of deleted data blocks. The controller records the number of times the data block is referenced, and the number of references is used to describe the number of LUs that point to this data block. When the number of references is 0, it means that no LU needs to use this data block, and this data block can be deleted.
如图 5 所示, 是一种数据块处理装置示意图, 可以进行在线重删。 数据 块处理装置 6包括: 查询模块 61 , —致性校验模块 62以及索引模块 63。 还 可以包括计数模块 64。  As shown in FIG. 5, it is a schematic diagram of a data block processing device, which can perform online deduplication. The data block processing device 6 includes: a query module 61, a test result module 62, and an index module 63. A counting module 64 can also be included.
查询模块 61 , 用于查询存储设备中数据块的指紋库, 确定存储设备中是 否存在和所述待存储数据块拥有相同指紋的已存储数据块。 控制器在把待存储数据块存入存储设备之前, 先计算这个待存储数据块 的指紋, 然后检查指紋库中是否已经存在待存储数据块指紋。 如果不存在, 意味着这个待存储数据块未被存储过, 待存储数据块需要存储到存储设备中。 如果存在, 意味着这个数据块已经存储过了, 进一步判断是否需要重新存储 待存储数据块。 The query module 61 is configured to query a fingerprint database of the data block in the storage device, and determine whether there is a stored data block in the storage device that has the same fingerprint as the data block to be stored. Before storing the data block to be stored in the storage device, the controller first calculates the fingerprint of the data block to be stored, and then checks whether the fingerprint of the data block to be stored already exists in the fingerprint database. If it does not exist, it means that the data block to be stored is not stored, and the data block to be stored needs to be stored in the storage device. If it exists, it means that this data block has been stored, and further determines whether it is necessary to re-store the data block to be stored.
一致性校验模块 62, 用于检测已存储数据块所在分条组的 RAID分条状 态, 如果分条不一致, 则需要生成分条一致的分条组。 具体措施, 可以用待 存储数据块替换已存储数据块, 重新存储的待存储数据块所在的分条组时分 条一致的分条组; 如果已存储数据块所在的分条组可以修复, 也可以修复已 存储数据块所在的分条组。 如果分条状态是分条一致, 就不需要存储待存储 数据块, 已存储数据块也不需要改变。  The consistency check module 62 is configured to detect the RAID stripe status of the stripe group in which the stored data block is located. If the stripe is inconsistent, a stripe group with the same stripe needs to be generated. Specifically, the stored data block can be replaced by the data block to be stored, and the stripe group in which the data block to be stored is stored is consistently divided into groups; if the strip group in which the data block is stored can be repaired, Fix the stripe group where the stored data block is located. If the stripe state is consistent, there is no need to store the data block to be stored, and the stored data block does not need to be changed.
一致性校验模块 62还具有修复分条组或者分条的功能。 下面对一致性校 验模块 62的处理策略进行具体说明, 本发明实施例可以包括下述策略任意一 种或多种。应用本发明实施例的控制器,可以同时具有支持多种策略的功能, 也可以只支持其中一种策略。  The consistency check module 62 also has the function of repairing the stripe group or striping. The processing strategy of the consistency check module 62 is specifically described below. The embodiment of the present invention may include any one or more of the following strategies. The controller of the embodiment of the present invention may have the function of supporting multiple policies at the same time, or may only support one of the policies.
策略 A, 如果已存储数据块所在分条组的分条状态是分条一致, 则不存 储所述待存储数据块。 当然, 如果用待存储数据块替换已存储数据块也是可 以的。  Policy A. If the stripe status of the stripe group in which the stored data block is located is consistent, the data block to be stored is not stored. Of course, it is also possible to replace the stored data block with the block of data to be stored.
策略 B, 如果所述分条组的分条状态是分条降级, 则存储所述待存储数 据块, 删除所述已存储数据块, 或者存储根据 RAID 算法由所述降级分条组 对所述已存储数据块进行修复。处于降级的分条组, 其故障 SU中的数据是可 以通过 RAID算法修复的。  If the stripe state of the stripe group is a stripe downgrade, the data block to be stored is stored, the stored data block is deleted, or the storage is performed by the degraded stripe group according to a RAID algorithm. The data block has been stored for repair. In the degraded group, the data in the fault SU can be repaired by the RAID algorithm.
策略 C, 如果所述分条组的分条状态是分条不一致, 则存储所述待存储 数据块, 删除所述已存储数据块。 或者在确定不存在数据 SU发生故障, 也就 是说故障 SU全是校验 SU时, 修复分条组。 具体修复办法, 是根据 RAID算 法重新计算校验 SU中的数据, 然后将原数据 SU中的数据和重新计算获得的 校验 SU中的数据, 一起写入存储装置的分条组。 可以是原数据 SU所在分条 组, 也可以是重新申请的分条组。 The policy C, if the stripe state of the stripe group is inconsistent, stores the to-be-stored data block, and deletes the stored data block. Or, when it is determined that there is no data SU failure, that is, the fault SU is all the verification SU, the strip group is repaired. The specific repair method is to recalculate the data in the verification SU according to the RAID algorithm, and then write the data in the original data SU and the data in the verification SU obtained by the recalculation into the stripe group of the storage device. Can be the original data SU is in the strip Group, or a re-apply group.
确定有没有数据 SU发生故障的方式是: 计算数据 SU中数据的指紋, 如 果指紋和待存储数据块的指紋相同,说明数据 SU未发生数据损坏。可以不使 用校验数据计算指紋。  Determining whether there is data SU is faulty: Calculate the fingerprint of the data in the data SU. If the fingerprint and the fingerprint of the data block to be stored are the same, it indicates that data SU has not been corrupted. The fingerprint can be calculated without using the check data.
索引模块 63 , 用于将待存储数据块所在的 LUN指向所述分条一致的分 条组。  The indexing module 63 is configured to point the LUN where the data block to be stored is located to the stripe group with the same stripe.
一致性校验模块 62处理后, 存储设备中存储有一个分条一致的分条组, 这个分条一致的分条组中存储的数据块与待存储数据块相同。 按照步骤 42的 不同策略, 如果已存储数据块的分条状态是分条一致, 那么 LU 指向已存储 数据块所在分条组; 如果使用待存储数据块替换已存储数据块, 那么 LU 指 向待存储数据块存储到存储设备后所在的分条组; 如果对已存储数据块的分 条组进行修复, 那么 LU 指向修复后的分条组。  After the consistency check module 62 processes, the storage device stores a stripe group with the same stripe. The data block stored in the stripe group with the same stripe is the same as the data block to be stored. According to the different strategies of step 42, if the stripe state of the stored data block is consistent, then the LU points to the stripe group in which the stored data block is located; if the stored data block is replaced with the to-be-stored data block, the LU points to be stored. The stripe group in which the data block is stored after the storage device; If the stripe group of the stored data block is repaired, the LU points to the repaired stripe group.
可选的, 索引模块 63进一步用于更新控制器中记录有指紋和指紋代表的 数据块的存储地址的映射表。 把数据块存储的地址更新为指向分条一致的分 条组。在下次进行去重时,可以使用这个对应关系找到数据块所在的分条组, 再次确认分条状态。  Optionally, the indexing module 63 is further configured to update a mapping table of the storage addresses of the data blocks represented by the fingerprint and the fingerprint recorded in the controller. Update the address stored in the data block to a stripe group that points to a consistent strip. The next time you perform deduplication, you can use this correspondence to find the stripe group in which the data block is located, and confirm the stripe status again.
计数模块 64, 用于更新数据块的引用次数。 把已存储数据块的引用次数 加 1。 计数模块 64是可选的。  Counting module 64, used to update the number of times the data block is referenced. Increase the number of references to stored data blocks by 1. Counting module 64 is optional.
本发明实施例还可以提供一种数据块处理系统, 包括数据块处理装置 5 以及存储设备, 存储设备用于存储数据块。数据块处理装置可以是控制器或者 集成在控制器中的软件或硬件。  The embodiment of the present invention may further provide a data block processing system, including a data block processing device 5 and a storage device, where the storage device is configured to store a data block. The block processing device can be a controller or software or hardware integrated in the controller.
本发明实施例还可以提供一种数据块处理系统, 包括数据块处理装置 6 以及存储设备, 存储设备用于存储数据块。 数据块处理装置可以是控制器或 者集成在控制器中的软件或硬件。  The embodiment of the present invention may further provide a data block processing system, including a data block processing device 6 and a storage device, where the storage device is configured to store data blocks. The block processing device can be a controller or software or hardware integrated in the controller.
本领域普通技术人员将会理解, 本发明的各个方面、 或各个方面的可能 实现方式可以被具体实施为系统、 方法或者计算机程序产品。 因此, 本发明 的各方面、 或各个方面的可能实现方式可以釆用完全硬件实施例、 完全软件 实施例 (包括固件、 驻留软件等等), 或者组合软件和硬件方面的实施例的形 式, 在这里都统称为"电路"、 "模块"或者"系统"。 此外, 本发明的各方面、 或 各个方面的可能实现方式可以釆用计算机程序产品的形式, 计算机程序产品 是指存储在计算机可读介质中的计算机可读程序代码。 Those of ordinary skill in the art will appreciate that various aspects of the present invention, or possible implementations of various aspects, may be embodied as a system, method, or computer program product. Thus, aspects of the invention, or possible implementations of various aspects, may employ an entirely hardware embodiment, full software Embodiments (including firmware, resident software, etc.), or a combination of software and hardware aspects, are collectively referred to herein as "circuits,""modules," or "systems." Furthermore, aspects of the invention, or possible implementations of various aspects, may take the form of a computer program product, which is a computer readable program code stored on a computer readable medium.
计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质。 计算机可读存储介质包含但不限于电子、 磁性、 光学、 电磁、 红外或半导体 系统、 设备或者装置,或者前述的任意适当组合,如随机存取存储器 (RAM)、 只读存储器 (ROM)、可擦除可编程只读存储器 (EPROM或者快闪存储器)、光 纤、 便携式只读存储器 (CD-ROM:)。  The computer readable medium can be a computer readable signal medium or a computer readable storage medium. The computer readable storage medium includes, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any suitable combination of the foregoing, such as random access memory (RAM), read only memory (ROM), Erase programmable read-only memory (EPROM or flash memory), optical fiber, portable read-only memory (CD-ROM:).
计算机中的处理器读取存储在计算机可读介质中的计算机可读程序代码, 使得处理器能够执行在流程图中每个步骤、 或各步骤的组合中规定的功能动 作; 生成实施在框图的每一块、 或各块的组合中规定的功能动作的装置。  The processor in the computer reads the computer readable program code stored in the computer readable medium, such that the processor can perform the functional actions specified in each step or combination of steps in the flowchart; A device that functions as specified in each block, or combination of blocks.

Claims

权 利 要求 书 claims
1、 一种数据处理方法, 用于控制器中, 其特征在于, 该方法包括: 当有多个数据块有相同的指紋时, 根据所述指紋查询所述数据块的地址, 根据所述数据块地址查找所述数据块所占用的分条组; 1. A data processing method used in a controller, characterized in that the method includes: when multiple data blocks have the same fingerprint, query the address of the data block based on the fingerprint, and query the address of the data block based on the fingerprint. The block address looks up the stripe group occupied by the data block;
校验所述数据块的所述分条组的独立廉价磁盘冗余阵列 RAID分条状态, 根据校验结果保存分条一致的分条组中的数据块; Verify the redundant array of independent inexpensive disks RAID stripe status of the stripe group of the data block, and save the data blocks in the stripe group with consistent stripes according to the verification results;
所述根据校验结果保存分条一致的分条组中的数据块, 包括下述至少一种: 如果存在分条一致的所述分条组, 并且存在非分条一致的所述分条组, 则保留分条一致的所述分条组中的数据块, 删除其余所述分条组中的数据 块; Saving the data blocks in the stripe groups with consistent stripes according to the verification results includes at least one of the following: if there is the stripe group with consistent stripes, and there is the stripe group with non-consistent stripes. , then retain the data blocks in the stripe group with consistent stripes, and delete the remaining data blocks in the stripe group;
如果不存在分条一致的所述分条组, 并且存在降级的分条组, 则通过 RAID算法修复将降级的分条组修复为分条一致的分条组,所述分条一致的 分条组中存储有所述数据块, 删除其余所述分条组中的所述数据块。 If the striping group with consistent striping does not exist, and a degraded striping group exists, the degraded striping group is repaired into a striping group with consistent striping through RAID algorithm repair, and the striping group with consistent striping is The data blocks are stored in the group, and the data blocks in the remaining striping groups are deleted.
2、 根据权利要求 1所述的数据处理方法, 其特征在于, 所述根据校验结果 保存分条一致的分条组中的数据块, 进一步包括: 2. The data processing method according to claim 1, characterized in that, storing the data blocks in the stripe groups with consistent stripes according to the verification results further includes:
如果所有分条组都是失效分条组, 则对所述失效分条组中分条一致的分条 中的数据进行去重, 删除所述失效分条组中失效的分条中的数据。 If all the stripe groups are invalid stripe groups, then the data in the consistent stripes in the invalid stripe group are deduplicated, and the data in the invalid stripes in the invalid stripe group are deleted.
3、 根据权利要求 1所述的数据处理方法, 其特征在于, 该方法之后进一步 包括: 3. The data processing method according to claim 1, characterized in that the method further includes:
将指向所述数据块所在分条组的 LU 指向所述分条一致的分条组。 Point the LU pointing to the stripe group in which the data block is located to the stripe group with which the stripe is consistent.
4、 根据权利要求 1所述的数据处理方法, 其特征在于, 该方法之后进一步 包括: 4. The data processing method according to claim 1, characterized in that the method further includes:
更新所述数据块的引用次数, 更新后的引用次数等于原计数值加上删除的 所述数据块的数量。 The number of references of the data block is updated, and the updated number of references is equal to the original count value plus the number of deleted data blocks.
5、 一种数据块处理装置, 其特征在于, 该装置包括: 5. A data block processing device, characterized in that the device includes:
指紋比对模块 51 , 用于对存储设备中数据块的指紋进行比对; The fingerprint comparison module 51 is used to compare the fingerprints of data blocks in the storage device;
地址查找模块 52, 当所述存储设备中有多个数据块有相同的指紋时, 根据 所述指紋查询所述多个数据块的地址, 根据所述多个数据块地址查找所述多个 数据块所占用的分条组; Address search module 52, when there are multiple data blocks in the storage device with the same fingerprint, according to The fingerprint queries the addresses of the plurality of data blocks, and searches the stripe groups occupied by the plurality of data blocks according to the addresses of the plurality of data blocks;
一致性校验模块 53 , 用于校验所述多个数据块的所述分条组的独立廉价磁 盘冗余阵列 RAID分条状态,根据校验结果保存分条一致的分条组中的数据块; 所述根据校验结果保存分条一致的分条组中的数据块, 包括下述至少一种: 如果存在分条一致的所述分条组, 并且存在非分条一致的所述分条 组, 则保留分条一致的所述分条组中的数据块, 删除其余所述分条组中 的数据块; Consistency check module 53, used to verify the independent cheap disk redundant array RAID stripe status of the stripe group of the multiple data blocks, and save the data in the stripe group with consistent stripes according to the verification result. block; the data blocks in the stripe group that are consistent in striping are saved according to the verification results, including at least one of the following: if there is a stripe group that is consistent in stripes, and there is a non-consistent stripe in the stripe group; stripe group, retain the data blocks in the stripe group with consistent stripes, and delete the remaining data blocks in the stripe group;
如果不存在分条一致的所述分条组, 并且存在降级的分条组, 则通 过 RAID算法修复将降级的分条组修复为分条一致的分条组, 所述分条 一致的分条组中存储有所述数据块, 删除其余所述分条组中的所述数据 块。 If the striping group with consistent striping does not exist, and a degraded striping group exists, the degraded striping group is repaired into a striping group with consistent striping through RAID algorithm repair, and the striping group is consistent with the striping group. The data blocks are stored in the group, and the data blocks in the remaining striping groups are deleted.
6、 根据权利要求 5所述的数据处理装置, 其特征在于, 所述根据校验结果 保存分条一致的分条组中的数据块, 进一步包括: 6. The data processing device according to claim 5, characterized in that: saving the data blocks in the stripe groups with consistent stripes according to the verification results further includes:
如果所有分条组都是失效分条组, 则对所述失效分条组中分条一致的分条 中的数据进行去重, 删除所述失效分条组中失效的分条中的数据。 If all the stripe groups are invalid stripe groups, then the data in the consistent stripes in the invalid stripe group are deduplicated, and the data in the invalid stripes in the invalid stripe group are deleted.
7、 根据权利要求 5所述的数据处理装置, 其特征在于, 所述装置进一步包 括: 7. The data processing device according to claim 5, characterized in that the device further includes:
索引模块 54, 用于将指向所述数据块所在分条组的 LU 指向所述分条一 致的分条组。 The index module 54 is configured to point the LU pointing to the stripe group where the data block is located to the stripe group with the same stripe.
8、 根据权利要求 5所述的数据处理装置, 其特征在于, 所述装置进一步包 括: 8. The data processing device according to claim 5, characterized in that the device further includes:
计数模块 55 , 用于更新所述数据块的计数值, 更新后的计数值等于原计数 值加上删除的所述数据块的数量。 Counting module 55 is used to update the count value of the data block. The updated count value is equal to the original count value plus the number of deleted data blocks.
9、 一种存储系统, 包括权利要求 6-8任一项的存储装置, 以及存储设备。 9. A storage system, including the storage device of any one of claims 6-8, and a storage device.
10、 一种数据块处理方法, 用于控制器中, 其特征在于, 该方法包括: 查询存储设备中数据块的指紋库, 当存储设备中存在和所述待存储数据块 拥有相同指紋的已存储数据块时, 检测所述已存储数据块所在分条组的独立廉 价磁盘冗余阵列 RAID分条状态; 10. A data block processing method used in a controller, characterized in that the method includes: querying the fingerprint library of data blocks in a storage device, when the data block to be stored exists in the storage device When there are stored data blocks with the same fingerprint, detect the independent cheap disk redundant array RAID stripe status of the stripe group where the stored data block is located;
根据检测进行数据块存储, 包括: Data block storage based on detection, including:
如果所述分条组的分条状态是分条一致, 则不存储所述待存储数据块; 如果所述分条组的分条状态是分条降级,则:存储所述待存储数据块, 删除所述已存储数据块; 或者, 存储根据 RAID 算法由所述降级分条组对所述 已存储数据块进行修复; 或者 If the striping status of the striping group is striping consistent, then the data block to be stored is not stored; if the striping status of the striping group is striping downgraded, then: the data block to be stored is stored, Delete the stored data block; or, store and repair the stored data block by the degraded stripe group according to the RAID algorithm; or
如果所述分条组的分条状态是分条不一致, 则: 存储所述待存储数据 块, 删除所述已存储数据块; 或者, 如果数据分条单元未发生数据错误, 则根 据 RAID算法对所述已存储数据块进行修复。 If the striping status of the striping group is inconsistent, then: store the data block to be stored and delete the stored data block; or, if no data error occurs in the data striping unit, perform the striping according to the RAID algorithm. The stored data blocks are repaired.
11、 根据权利要求 10所述的数据块处理方法, 其特征在于, 所述方法之 后还包括: 11. The data block processing method according to claim 10, characterized in that, the method further includes:
将所述待存储数据块所属于的 LU ,指向所述根据检测所存储的数据块。 Point the LU to which the data block to be stored belongs to the data block stored based on the detection.
12、 根据权利要求 10所述的数据块处理方法, 其特征在于, 所述方法之 后还包括: 12. The data block processing method according to claim 10, characterized in that the method further includes:
把所述已存储数据块的引用次数加 1。 Increase the reference count of the stored data block by 1.
13、 一种数据块处理装置, 其特征在于, 该装置包括: 13. A data block processing device, characterized in that the device includes:
查询模块 61 , 用于查询存储设备中数据块的指紋库; Query module 61, used to query the fingerprint database of data blocks in the storage device;
一致性校验模块 62, 用于当存储设备中存在和所述待存储数据块拥有相同 指紋的已存储数据块时, 检测所述已存储数据块所在分条组的独立廉价磁盘冗 余阵列 RAID分条状态; The consistency check module 62 is used to detect the independent cheap disk redundant array RAID of the stripe group where the stored data block is located when there is a stored data block with the same fingerprint as the data block to be stored in the storage device. striping status;
根据检测进行数据块存储, 包括: Data block storage based on detection, including:
如果所述分条组的分条状态是分条一致, 则不存储所述待存储数据块; 如果所述分条组的分条状态是分条降级,则:存储所述待存储数据块, 删除所述已存储数据块; 或者, 存储根据 RAID 算法由所述降级分条组对所述 已存储数据块进行修复; 或者 If the striping status of the striping group is striping consistent, then the data block to be stored is not stored; if the striping status of the striping group is striping downgraded, then: the data block to be stored is stored, Delete the stored data block; or, store and repair the stored data block by the degraded stripe group according to the RAID algorithm; or
如果所述分条组的分条状态是分条不一致, 则: 存储所述待存储数据 块, 删除所述已存储数据块; 或者, 如果数据分条单元未发生数据错误, 则根 据 RAID算法对所述已存储数据块进行修复。 If the striping status of the striping group is inconsistent, then: store the data to be stored block, delete the stored data block; or, if no data error occurs in the data striping unit, repair the stored data block according to the RAID algorithm.
14、 根据权利要求 13所述的数据块处理装置, 其特征在于, 所述装置进 一步包括: 14. The data block processing device according to claim 13, characterized in that the device further includes:
索引模块 63 , 用于将所述待存储数据块所属于的 LU , 指向所述根据检 测所存储的数据块。 The index module 63 is used to point the LU to which the data block to be stored belongs to the data block stored according to the detection.
15、 根据权利要求 13所述的数据块处理方法, 其特征在于, 所述装置进 一步包括: 15. The data block processing method according to claim 13, characterized in that the device further includes:
计数模块 64 , 用于对所述已存储数据块的引用次数加 1。 The counting module 64 is used to add 1 to the number of references to the stored data block.
16、 一种存储系统, 包括权利要求 13-15任一项的存储装置, 以及存储设 备。 16. A storage system, including the storage device of any one of claims 13-15, and a storage device.
17、 一种控制器, 和存储设备连接, 所述控制器包括处理器和緩存, 所 述緩存用于存储计算机指令, 所述控制器通过运行所述计算机指令执行以下 步骤: 17. A controller connected to a storage device. The controller includes a processor and a cache. The cache is used to store computer instructions. The controller performs the following steps by running the computer instructions:
当有多个数据块有相同的指紋时, 根据所述指紋查询所述数据块的地址, 根据所述数据块地址查找所述数据块所占用的分条组; When there are multiple data blocks with the same fingerprint, query the address of the data block based on the fingerprint, and search the stripe group occupied by the data block based on the data block address;
校验所述数据块的所述分条组的独立廉价磁盘冗余阵列 RAID分条状态, 根据校验结果保存分条一致的分条组中的数据块; Verify the redundant array of independent inexpensive disks RAID stripe status of the stripe group of the data block, and save the data blocks in the stripe group with consistent stripes according to the verification results;
所述根据校验结果保存分条一致的分条组中的数据块, 包括下述至少一种: 如果存在分条一致的所述分条组, 并且存在非分条一致的所述分条组, 则保留分条一致的所述分条组中的数据块, 删除其余所述分条组中的数据 块; Saving the data blocks in the stripe groups with consistent stripes according to the verification results includes at least one of the following: if there is the stripe group with consistent stripes, and there is the stripe group with non-consistent stripes. , then retain the data blocks in the stripe group with consistent stripes, and delete the remaining data blocks in the stripe group;
如果不存在分条一致的所述分条组, 并且存在降级的分条组, 则通过 RAID算法修复将降级的分条组修复为分条一致的分条组,所述分条一致的 分条组中存储有所述数据块, 删除其余所述分条组中的所述数据块。 If the striping group with consistent striping does not exist, and a degraded striping group exists, the degraded striping group is repaired into a striping group with consistent striping through RAID algorithm repair, and the striping group with consistent striping is The data blocks are stored in the group, and the data blocks in the remaining striping groups are deleted.
18、 根据权利要求 17所述的控制器, 其特征在于, 所述根据校验结果保存 分条一致的分条组中的数据块, 进一步包括: 如果所有分条组都是失效分条组, 则对所述失效分条组中分条一致的分条 中的数据进行去重, 删除所述失效分条组中失效的分条中的数据。 18. The controller according to claim 17, wherein said saving the data blocks in the stripe groups with consistent stripes according to the verification results further includes: If all the stripe groups are invalid stripe groups, then the data in the consistent stripes in the invalid stripe group are deduplicated, and the data in the invalid stripes in the invalid stripe group are deleted.
19、 据权利要求 17所述的控制器, 其特征在于, 所述处理器还用于执行: 将指向所述数据块所在分条组的 LU 指向所述分条一致的分条组。 19. The controller according to claim 17, wherein the processor is further configured to: point the LU pointing to the stripe group where the data block is located to the stripe group with the same stripe.
20、 据权利要求 17所述的控制器, 其特征在于, 所述处理器还用于执行: 更新所述数据块的引用次数, 更新后的引用次数等于原计数值加上删除的 所述数据块的数量。 20. The controller according to claim 17, wherein the processor is further configured to: update the number of references of the data block, and the updated number of references is equal to the original count value plus the deleted data. The number of blocks.
21、 一种控制器, 和存储设备连接, 所述控制器包括处理器和緩存, 所述 緩存用于存储计算机指令, 所述控制器通过运行所述计算机指令执行以下步骤: 查询存储设备中数据块的指紋库, 当存储设备中存在和所述待存储数据块 拥有相同指紋的已存储数据块时, 检测所述已存储数据块所在分条组的独立廉 价磁盘冗余阵列 RAID分条状态; 21. A controller connected to a storage device. The controller includes a processor and a cache. The cache is used to store computer instructions. The controller performs the following steps by running the computer instructions: Query data in the storage device. Block fingerprint database, when there is a stored data block with the same fingerprint as the data block to be stored in the storage device, detect the independent cheap disk redundant array RAID stripe status of the stripe group where the stored data block is located;
根据检测进行数据块存储, 包括: Data block storage based on detection, including:
如果所述分条组的分条状态是分条一致, 则不存储所述待存储数据块; 如果所述分条组的分条状态是分条降级,则:存储所述待存储数据块, 删除所述已存储数据块; 或者, 存储根据 RAID 算法由所述降级分条组对所述 已存储数据块进行修复; 或者 If the striping status of the striping group is striping consistent, then the data block to be stored is not stored; if the striping status of the striping group is striping downgraded, then: the data block to be stored is stored, Delete the stored data block; or, store and repair the stored data block by the degraded stripe group according to the RAID algorithm; or
如果所述分条组的分条状态是分条不一致, 则: 存储所述待存储数据 块, 删除所述已存储数据块; 或者, 如果数据分条单元未发生数据错误, 则根 据 RAID算法对所述已存储数据块进行修复。 If the striping status of the striping group is inconsistent, then: store the data block to be stored and delete the stored data block; or, if no data error occurs in the data striping unit, perform the striping according to the RAID algorithm. The stored data blocks are repaired.
22、 根据权利要求 21所述的控制器, 其特征在于, 所述处理器还用于: 将所述待存储数据块所属于的 LU , 指向所述根据检测所存储的数据块。 22. The controller according to claim 21, wherein the processor is further configured to: point the LU to which the data block to be stored belongs to the data block stored according to the detection.
23、 根据权利要求 21所述的控制器, 其特征在于, 所述处理器还用于: 把所述已存储数据块的引用次数加 1。 23. The controller according to claim 21, wherein the processor is further configured to: increase the number of references of the stored data block by 1.
PCT/CN2013/091170 2013-12-31 2013-12-31 De-duplication method, apparatus and system WO2015100639A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201380002564.2A CN104205097B (en) 2013-12-31 2013-12-31 A kind of De-weight method device and system
PCT/CN2013/091170 WO2015100639A1 (en) 2013-12-31 2013-12-31 De-duplication method, apparatus and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2013/091170 WO2015100639A1 (en) 2013-12-31 2013-12-31 De-duplication method, apparatus and system

Publications (1)

Publication Number Publication Date
WO2015100639A1 true WO2015100639A1 (en) 2015-07-09

Family

ID=52088178

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2013/091170 WO2015100639A1 (en) 2013-12-31 2013-12-31 De-duplication method, apparatus and system

Country Status (2)

Country Link
CN (1) CN104205097B (en)
WO (1) WO2015100639A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105760108B (en) * 2014-12-16 2018-12-07 华为数字技术(苏州)有限公司 A kind of method and apparatus of data storage
CN104822137B (en) * 2015-04-14 2019-06-11 宇龙计算机通信科技(深圳)有限公司 A kind of received method of information and terminal
CN110557657A (en) * 2018-05-30 2019-12-10 视联动力信息技术股份有限公司 data processing method and system based on video network
CN112650628A (en) * 2020-12-30 2021-04-13 浪潮云信息技术股份公司 High-availability and expandable data deduplication method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090271454A1 (en) * 2008-04-29 2009-10-29 International Business Machines Corporation Enhanced method and system for assuring integrity of deduplicated data
CN101789977A (en) * 2010-02-08 2010-07-28 北京同有飞骥科技有限公司 Teledata copying and de-emphasis method based on Hash coding
CN102460398A (en) * 2009-06-08 2012-05-16 赛门铁克公司 Source classification for performing deduplication in a backup operation
CN103150260A (en) * 2011-11-25 2013-06-12 华为数字技术(成都)有限公司 Method and device for deleting repeating data

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8412682B2 (en) * 2006-06-29 2013-04-02 Netapp, Inc. System and method for retrieving and using block fingerprints for data deduplication
US8082228B2 (en) * 2008-10-31 2011-12-20 Netapp, Inc. Remote office duplication
US8510643B2 (en) * 2009-12-23 2013-08-13 Nvidia Corporation Optimizing raid migration performance
CN102221982B (en) * 2011-06-13 2013-09-11 北京卓微天成科技咨询有限公司 Method and system for implementing deletion of repeated data on block-level virtual storage equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090271454A1 (en) * 2008-04-29 2009-10-29 International Business Machines Corporation Enhanced method and system for assuring integrity of deduplicated data
CN102460398A (en) * 2009-06-08 2012-05-16 赛门铁克公司 Source classification for performing deduplication in a backup operation
CN101789977A (en) * 2010-02-08 2010-07-28 北京同有飞骥科技有限公司 Teledata copying and de-emphasis method based on Hash coding
CN103150260A (en) * 2011-11-25 2013-06-12 华为数字技术(成都)有限公司 Method and device for deleting repeating data

Also Published As

Publication number Publication date
CN104205097A (en) 2014-12-10
CN104205097B (en) 2017-08-25

Similar Documents

Publication Publication Date Title
US8738963B2 (en) Methods and apparatus for managing error codes for storage systems coupled with external storage systems
US7447938B1 (en) System and method for reducing unrecoverable media errors in a disk subsystem
US8209595B2 (en) Storage sub-system and method for controlling the same
US7877626B2 (en) Method and system for disk storage devices rebuild in a data storage system
US9547552B2 (en) Data tracking for efficient recovery of a storage array
US8930745B2 (en) Storage subsystem and data management method of storage subsystem
US9269376B1 (en) Efficient burst data verify in shingled data storage drives
US10120769B2 (en) Raid rebuild algorithm with low I/O impact
US8589726B2 (en) System and method for uncovering data errors
US20110197024A1 (en) Providing redundancy in a virtualized storage system for a computer system
WO2013159503A1 (en) Hard disk data recovery method, device and system
US8839026B2 (en) Automatic disk power-cycle
US9558068B1 (en) Recovering from metadata inconsistencies in storage systems
US10204003B2 (en) Memory device and storage apparatus
US10896088B2 (en) Metadata recovery mechanism for page storage
US10067833B2 (en) Storage system
US9519545B2 (en) Storage drive remediation in a raid system
US10324782B1 (en) Hiccup management in a storage array
WO2015100639A1 (en) De-duplication method, apparatus and system
US11321178B1 (en) Automated recovery from raid double failure
US7529776B2 (en) Multiple copy track stage recovery in a data storage system
US7174476B2 (en) Methods and structure for improved fault tolerance during initialization of a RAID logical unit
US11080136B2 (en) Dropped write error detection
US10769020B2 (en) Sharing private space among data storage system data rebuild and data deduplication components to minimize private space overhead
US8880803B2 (en) Storage device and storage-device control method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13900807

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 13900807

Country of ref document: EP

Kind code of ref document: A1