CN113050892A - Method and device for protecting deduplication data - Google Patents

Method and device for protecting deduplication data Download PDF

Info

Publication number
CN113050892A
CN113050892A CN202110328625.0A CN202110328625A CN113050892A CN 113050892 A CN113050892 A CN 113050892A CN 202110328625 A CN202110328625 A CN 202110328625A CN 113050892 A CN113050892 A CN 113050892A
Authority
CN
China
Prior art keywords
deduplication
block
data
fingerprint
physical address
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110328625.0A
Other languages
Chinese (zh)
Other versions
CN113050892B (en
Inventor
上官应兰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Macrosan Technologies Co Ltd
Original Assignee
Macrosan Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Macrosan Technologies Co Ltd filed Critical Macrosan Technologies Co Ltd
Priority to CN202110328625.0A priority Critical patent/CN113050892B/en
Publication of CN113050892A publication Critical patent/CN113050892A/en
Application granted granted Critical
Publication of CN113050892B publication Critical patent/CN113050892B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/0652Erasing, e.g. deleting, data cleaning, moving of data to a wastebasket
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1456Hardware arrangements for backup
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices
    • G06F3/0689Disk arrays, e.g. RAID, JBOD

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a method and a device for protecting deleted data, which are applied to storage equipment. According to the embodiment of the application, data writing and data backup are asynchronously executed in the same deduplication processing flow, so that the data protection efficiency and the space utilization rate of the storage device can be considered, and meanwhile, the influence of data protection on the front-end service performance can be effectively reduced.

Description

Method and device for protecting deduplication data
Technical Field
The present application relates to the field of storage technologies, and in particular, to a method and an apparatus for protecting deduplication data.
Background
In order to improve the resource utilization rate of the storage device, a deduplication (full name: deduplication) mechanism is usually adopted to eliminate duplicate data in the storage device, so that only one copy of the same data is saved.
A Logical Unit Number (LUN) supporting a deduplication function in a storage device is called a deduplication LUN. Data protection requirements also exist for data written into the deduplication LUN, for example, data of the deduplication LUN (denoted as a first deduplication LUN) is copied to a target LUN to obtain backup data of the deduplication LUN.
If the target LUN is also a LUN supporting a deduplication function (denoted as a second deduplication LUN), a deduplication process needs to be executed when data of the first deduplication LUN is copied (written) to the second deduplication LUN. The deduplication process relates to operations such as fingerprint calculation, fingerprint database maintenance, repeated data comparison and the like, so that the calculation amount is large, the access path is long, and the data protection efficiency is low.
If the target LUN is not a deduplication LUN, and the deduplication process is not executed, all data (including duplicate data) is written into the storage space of the storage device, resulting in a lower storage space utilization rate.
It can be seen that, for data protection of a deduplication LUN, there is no technical scheme that can simultaneously consider both protection efficiency and space utilization.
Disclosure of Invention
In view of this, the present application provides a method and an apparatus for protecting deduplication data, so as to consider both data protection efficiency and space utilization of a storage device, and at the same time, effectively reduce the influence of data protection on front-end service performance.
In order to achieve the purpose of the application, the application provides the following technical scheme:
in a first aspect, the present application provides a deduplication data protection method, which is applied to a storage device, where the storage device includes at least one deduplication LUN and at least one disk array (english: Redundant Arrays of Independent Disks, abbreviated as RAID), a target deduplication LUN in the at least one deduplication LUN corresponds to a deduplication pool created based on RAID, the deduplication pool is divided into a plurality of deduplication blocks according to a preset deduplication block size, the target deduplication LUN is divided into a plurality of logical blocks according to the preset deduplication block size, the target deduplication LUN further corresponds to a fingerprint library created based on RAID, the fingerprint library is used to store a plurality of fingerprint records, the fingerprint records are used to store fingerprint information of data, a physical address of a main deduplication block storing the data, and a corresponding relationship between physical addresses of backup deduplication blocks storing the data, and the method includes:
determining first data to be written into a first logical block, wherein the first logical block is any logical block of the target deduplication LUN into which the data is to be written;
if the fingerprint information of the first data does not exist in the fingerprint database, allocating a first main deduplication block to the first logic block from the deduplication pool;
writing the first data to the first primary deduplication block;
adding a first fingerprint record in the fingerprint library, wherein the fingerprint information of the first fingerprint record is the fingerprint information of the first data, the physical address of a main deduplication block of the first fingerprint record is the physical address of the first main deduplication block, and the physical address of a backup deduplication block of the first fingerprint record is null;
traversing a second fingerprint record with an empty physical address of a backup deduplication block in the fingerprint library, wherein the physical address of a main deduplication block of the second fingerprint record is the physical address of a second main deduplication block;
the following processing is executed for each second fingerprint record:
allocating a corresponding second backup deduplication block to the second primary deduplication block from the deduplication pool;
reading second data in the second main deduplication block;
writing the second data to the second backup deduplication block;
and updating the physical address of the second backup deduplication block in the second fingerprint record to be the physical address of the second backup deduplication block.
Optionally, the primary deduplication block is located in a first RAID, the backup deduplication block is located in a second RAID, and the read-write performance of the first RAID is equal to or better than the read-write performance of the second RAID.
Optionally, when the primary deduplication block and the backup deduplication block are located in the same RAID, a physical address interval between the primary deduplication block and the backup deduplication block is greater than a preset address interval threshold.
Optionally, the target deduplication LUN further corresponds to a logical space mapping table, where the logical space mapping table is used to record a mapping relationship between a logical address of a mapped logical block and a physical address of a deduplication block, and the method further includes:
when third data in a second logic block needs to be read, determining a first deduplication block corresponding to the second logic block from the logic space mapping table;
reading the third data from the first deduplication block;
reading the third data from the second deduplication block if reading from the first deduplication block fails and a third fingerprint record exists in the fingerprint library, the third fingerprint record comprising a physical address of the first deduplication block and a physical address of a second deduplication block;
if the reading from the second deduplication block is successful, modifying the mapping relationship between the logical address of the second logical block and the physical address of the first deduplication block in the logical space mapping table into the mapping relationship between the logical address of the second logical block and the physical address of the second deduplication block.
Optionally, after the reading from the second deduplication block is successful, the method further includes:
updating the physical address of the first re-deleted block in the third fingerprint record to null;
the method further comprises the following steps:
traversing a fourth fingerprint record in the fingerprint library, wherein the physical address of the main deduplication block is empty, and the physical address of a backup deduplication block of the fourth fingerprint record is the physical address of a third backup deduplication block;
for each fourth fingerprint record the following is performed:
allocating a corresponding third primary deduplication block to the third backup deduplication block from the deduplication pool;
reading fourth data in the third backup deduplication block;
writing the fourth data to the third primary deduplication block;
and updating the physical address of the main deduplication block in the fourth fingerprint record to be the physical address of the third main deduplication block.
In a second aspect, the present application provides a deduplication data protection apparatus, which is applied to a storage device, where the storage device includes at least one deduplication LUN and at least one RAID, where a target deduplication LUN in the at least one deduplication LUN corresponds to a deduplication pool created based on the RAID, the deduplication pool is divided into a plurality of deduplication blocks according to a preset deduplication block size, the target deduplication LUN is divided into a plurality of logical blocks according to the preset deduplication block size, the target deduplication LUN further corresponds to a fingerprint library created based on the RAID, the fingerprint library is used to store a plurality of fingerprint records, and the fingerprint records are used to record correspondence between fingerprint information of data, a physical address of a main deduplication block storing the data, and a physical address of a backup deduplication block storing the data, where the apparatus includes:
a data determining unit, configured to determine first data to be written into a first logical block, where the first logical block is any logical block of the target deduplication LUN into which the data is to be written;
an allocating unit, configured to allocate a first primary deduplication block to the first logical block from the deduplication pool if fingerprint information of the first data does not exist in the fingerprint library;
a write unit configured to write the first data into the first primary deduplication block;
a record adding unit, configured to add a first fingerprint record to the fingerprint library, where fingerprint information of the first fingerprint record is fingerprint information of the first data, a physical address of a main deduplication block of the first fingerprint record is a physical address of the first main deduplication block, and a physical address of a backup deduplication block of the first fingerprint record is null;
a traversal unit, configured to traverse a second fingerprint record in the fingerprint library, where a physical address of a backup deduplication block is empty, and a physical address of a primary deduplication block of the second fingerprint record is a physical address of a second primary deduplication block;
the allocating unit is further configured to allocate a corresponding second backup deduplication block to the second primary deduplication block from the deduplication pool;
a reading unit, configured to read second data in the second main deduplication block;
the writing unit is configured to write the second data into the second deduplication block;
and the record updating unit is used for updating the physical address of the second backup deduplication block in the second fingerprint record to be the physical address of the second backup deduplication block.
Optionally, the primary deduplication block is located in a first RAID, the backup deduplication block is located in a second RAID, and the read-write performance of the first RAID is equal to or better than the read-write performance of the second RAID.
Optionally, when the primary deduplication block and the backup deduplication block are located in the same RAID, a physical address interval between the primary deduplication block and the backup deduplication block is greater than a preset address interval threshold.
Optionally, the target deduplication LUN further corresponds to a logical space mapping table, where the logical space mapping table is used to record a mapping relationship between a logical address of a mapped logical block and a physical address of a deduplication block, and the apparatus further includes:
a deduplication block determining unit, configured to determine, when third data in a second logical block needs to be read, a first deduplication block corresponding to the second logical block from the logical space mapping table;
the reading unit is further configured to read the third data from the first deduplication block;
the reading unit is further configured to read third data from a second duplicate block if reading from the first duplicate block fails and a third fingerprint record exists in the fingerprint database, the third fingerprint record including a physical address of the first duplicate block and a physical address of the second duplicate block;
a modifying unit, configured to modify, if the reading from the second deduplication block is successful, a mapping relationship between a logical address of the second logical block and a physical address of the first deduplication block in the logical space mapping table to a mapping relationship between a logical address of the second logical block and a physical address of the second deduplication block.
Optionally, the apparatus further comprises:
the record updating unit is configured to update a physical address of a first deduplication block in the third fingerprint record to be null;
the traversal unit is further configured to traverse a fourth fingerprint record in the fingerprint library, where a physical address of a main deduplication block is empty, and a physical address of a backup deduplication block of the fourth fingerprint record is a physical address of a third backup deduplication block;
the allocating unit is further configured to allocate a corresponding third primary deduplication block to the third backup deduplication block from the deduplication pool;
the reading unit is further configured to read fourth data in the third backup deduplication block;
the writing unit is further configured to write the fourth data into the third main deduplication block;
the record updating unit is further configured to update a physical address of a main deduplication block in the fourth fingerprint record to be a physical address of the third main deduplication block.
As can be seen from the above description, in the embodiment of the present application, data writing and data backup are performed in the same deduplication processing flow, so that the data protection efficiency can be effectively improved, and the space utilization rate of the storage device is considered at the same time. In addition, because the data writing and the data backup are executed asynchronously, the influence of the data backup (protection) on the front-end service performance can be effectively reduced.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a flowchart illustrating a method for protecting deduplication data according to an embodiment of the present application;
FIG. 2 is a flow chart illustrating a process of reading deduplication data according to an embodiment of the present application;
FIG. 3 illustrates a flow of deduplication data repair according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of a deduplication data protection apparatus according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application.
The terminology used in the embodiments of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the embodiments of the present application. As used in the embodiments of the present application, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It should be understood that although the terms first, second, third, etc. may be used in the embodiments of the present application to describe various information, the information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, the negotiation information may also be referred to as second information, and similarly, the second information may also be referred to as negotiation information without departing from the scope of the embodiments of the present application. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
In order to make the objects, technical solutions and advantages of the embodiments of the present application more clear, the embodiments of the present application are described in detail below with reference to the accompanying drawings and specific embodiments:
referring to fig. 1, a flowchart of a method for protecting deduplication data is shown in an embodiment of the present application. The flow is applied to a storage device.
The storage device includes at least one deduplication LUN and at least one RAID. A target deduplication LUN of the at least one deduplication LUN corresponds to a deduplication pool created based on RAID.
Here, the target deduplication LUN may be any one of the at least one deduplication LUN. The target deduplication LUN is called for convenience of distinguishing and is not intended to be limiting.
The deduplication pool is divided into a plurality of deduplication blocks according to a preset deduplication block size (e.g., 8 KB). The target deduplication LUN is divided into a plurality of logical blocks according to a preset deduplication block size.
The target deduplication LUN also corresponds to a fingerprint library created based on RAID. The fingerprint database comprises a plurality of fingerprint records, wherein each fingerprint record is used for recording the corresponding relation among fingerprint information of data, the physical address of a main deduplication block for storing the data and the physical address of a backup deduplication block for storing the data.
It can be seen that, in the embodiment of the present application, the same data is stored in at least two copies in the deduplication pool, and the two copies are located in the primary deduplication block and the backup deduplication block respectively. The main deduplication block is used for storing data written by front-end services; the backup deduplication block is used for storing the same data as in the primary deduplication block to play a role of data backup (protection).
As shown in fig. 1, the process of protecting the deduplication data may include the following steps:
step 101, determining first data to be written into a first logical block, where the first logical block is any logical block of the target deduplication LUN into which the data is to be written.
And a user issues a write request aiming at the target deduplication LUN to the storage equipment through the upper-layer application, wherein the write request comprises data to be written and a corresponding address range to be written of the data to be written in the target deduplication LUN. For example, the size of the data to be written is 64KB, the start address to be written is 256MB, and the address range to be written is [ 256MB, 256MB +64KB ].
According to the address range to be written, the corresponding logical block to be written of the address range to be written in the target deduplication LUN can be determined. Taking the logical block size of 8KB and the address range to be written [ 256MB, 256MB +64KB ] as an example, the address range to be written corresponds to 8 logical blocks to be written from 256MB to 256MB +64KB in the target deduplication LUN.
And then, dividing the data to be written carried by the writing request according to the size of the logic blocks, determining the data to be written into each logic block to be written, and respectively calculating fingerprint information of the data to be written into each logic block to be written. The method for calculating the fingerprint can adopt a general implementation scheme of deduplication, and the method is not explained in the invention.
Here, the logical block to be written is referred to as a first logical block, and data to be written to the first logical block is referred to as first data. It is to be understood that the first logic block, the first data, is named for convenience of description only and is not intended to be limiting.
Executing steps 102 to 104 for each first logic block:
and 102, if the fingerprint information of the first data does not exist in the fingerprint database, allocating a first main deduplication block to the first logic block from the deduplication pool.
The fingerprint database is used for recording fingerprint information of the stored data in the deduplication pool. When it is determined that the fingerprint information of the first data does not exist in the fingerprint database, it indicates that the first data is new data which is not stored, and the new data needs to be written into the deduplication pool.
Therefore, in the embodiment of the present application, a corresponding primary deduplication block is allocated to a first logical block to which the first data belongs from a deduplication pool corresponding to a target deduplication LUN. Here, the primary deduplication block allocated for the first logical block is referred to as a first primary deduplication block. It is to be understood that the first main deduplication block is named for convenience of distinction and is not meant to be limiting.
Step 103, writing the first data into the first main deduplication block.
That is, the first data is written to the deduplication pool.
Then, in a logical space mapping table (for recording mapping relationship between logical addresses of mapped logical blocks and physical addresses of the deleted blocks) corresponding to the target deleted LUN, a mapping relationship between a logical address of the first logical block and a physical address of the first master deleted block is increased, so that data corresponding to the first logical block is subsequently read from the first master deleted block according to the mapping relationship.
And 104, adding a first fingerprint record in the fingerprint library, wherein the fingerprint information of the first fingerprint record is the fingerprint information of the first data, the physical address of the main deduplication block of the first fingerprint record is the physical address of the first main deduplication block, and the physical address of the backup deduplication block of the first fingerprint record is empty.
After writing the first data through step 103, it is necessary to record the correspondence between the fingerprint information of the first data and the physical address of the first primary deduplication block storing the first data.
Therefore, in the embodiment of the application, a new fingerprint record is added in the fingerprint database and is recorded as the first fingerprint record. Here, the first fingerprint record is named for convenience of distinction and is not intended to be limiting.
The fingerprint information of the first fingerprint record is the fingerprint information of the first data, the physical address of the main deduplication block of the first fingerprint record is the physical address of the first main deduplication block, and the physical address of the backup deduplication block of the first fingerprint record is null.
Here, it can be appreciated that the physical address of the backup chunk of the first fingerprint record is empty, since a backup operation has not been currently performed on the first data.
And finishing the data writing process based on the front-end service.
And 105, traversing a second fingerprint record with an empty physical address of the backup deduplication block in the fingerprint library.
Here, the fingerprint record in which the physical address of the backup deduplication block found while traversing the fingerprint library is empty is referred to as a second fingerprint record. It is to be understood that the reference to the second fingerprint record is merely a name for convenience of distinguishing and is not intended to be limiting.
And the physical address of the main deduplication block in the second fingerprint record is the physical address of the second main deduplication block. Here, the second main deduplication block is named for convenience of distinction and is not intended to be limiting.
Executing step 106 to step 109 for each second fingerprint record:
and 106, distributing a corresponding second backup deduplication block for the second main deduplication block from the deduplication pool.
Since the physical address of the backup deduplication block in the second fingerprint record is empty, it indicates that the data in the second primary deduplication block has not been backed up, or the original backup deduplication block may be damaged, so this step allocates the second backup deduplication block for backup to the second primary deduplication block. Here, the second deduplication block is named for convenience of distinction and is not intended to be limiting.
And step 107, reading second data in the second main deduplication block.
Here, the data stored in the second main deduplication block is referred to as second data. It is to be understood that the reference to the second data is merely a name for convenience of distinction and is not intended to be limiting.
And 108, writing the second data into the second backup deduplication block.
That is, the backup of the second data is completed.
And step 109, updating the physical address of the second deduplication block in the second fingerprint record to be the physical address of the second deduplication block.
Namely, a backup relationship between the second primary deduplication block and the second backup deduplication block is established.
Thus, the flow shown in fig. 1 is completed.
As can be seen from the flow shown in fig. 1, in the embodiment of the present application, data writing and data backup are performed in the same deduplication processing flow, so that the data protection efficiency can be effectively improved, and the space utilization rate of the storage device is considered at the same time. In addition, because the data writing and the data backup are executed asynchronously, the influence of the data backup (protection) on the front-end service performance can be effectively reduced.
In addition, it should be added that, in order to further ensure the security of the data, the primary deduplication block and the backup deduplication block may be isolated.
For one embodiment, the primary and backup deduplication blocks may originate from different RAIDs.
Here, RAID to which the primary deduplication block belongs is referred to as first RAID; and the RAID to which the backup deduplication block belongs is called a second RAID. It is understood that the reference to the first RAID and the second RAID is a name for convenience of distinction and is not intended to be limiting.
The read-write performance of the first RAID can be better than the read-write performance of the second RAID. That is, the read-write performance of the RAID for the front-end service is superior to that of the RAID for the backup.
For example, the first RAID is composed of Solid State Disks (SSD); the second RAID is composed of Hard Disk drives (abbreviated to HDD) in english.
Thus, the equipment cost can be reduced while ensuring the safety.
Of course, the read-write performance of the first RAID may also be equal to the read-write performance of the second RAID.
As another embodiment, if the primary and backup deduplication blocks are located in the same RAID, the physical address interval between the primary and backup deduplication blocks is controlled to be greater than a preset address interval threshold.
That is, even though the primary and backup deduplication blocks have to be located in the same RAID, the primary and backup deduplication blocks are physically far apart to avoid that the primary and backup data cannot be accessed due to a failure of consecutive sectors of the disk.
Referring to fig. 2, a deduplication data reading process is shown for the embodiment of the present application. As shown in fig. 2, the process may include the following steps:
step 201, when the third data in the second logic block needs to be read, determining the first deduplication block corresponding to the second logic block from the logic space mapping table.
As described above, the logical space mapping table is used to record the mapping relationship between the logical address of the mapped logical block and the physical address of the deduplication block, so that, by querying the logical space mapping table, the storage location (first deduplication block) of the data to be read in the deduplication pool can be determined.
Here, the second logical block, the third data, and the first deduplication block are all named for convenience of distinction, and are not intended to be limiting.
Step 202, reading third data from the first deduplication block.
If the reading is successful, the reading process is ended. If the read fails, go to step 203.
In step 203, if the reading from the first duplicate block fails and a third fingerprint record exists in the fingerprint database, the third fingerprint record includes the physical address of the first duplicate block and the physical address of the second duplicate block, the third data is read from the second duplicate block.
Since the physical addresses of all the deduplication blocks (the primary deduplication block and the backup deduplication block) storing the same data are recorded in the fingerprint record, when the third data cannot be read from the first deduplication block, the physical address of the second deduplication block storing the same data as the first deduplication block can be found by querying the fingerprint record in the fingerprint database, and the third data can be read from the second deduplication block. Here, the second deduplication block is named for convenience of distinction and is not intended to be limiting.
In addition, the embodiment of the present application refers to a fingerprint record including a physical address of the first punctured block and a physical address of the second punctured block as a third fingerprint record. It is to be understood that this is referred to as the third fingerprint record, and is a name for convenience of distinction and is not intended to be limiting.
Step 204, if the reading from the second deduplication block is successful, the mapping relationship between the logical address of the second logical block and the physical address of the first deduplication block in the logical space mapping table is modified into the mapping relationship between the logical address of the second logical block and the physical address of the second deduplication block.
Since the first deduplication block is already abnormal, after the first deduplication block is found through the logical space mapping table, data cannot be directly read from the first deduplication block, which affects data reading efficiency to a certain extent.
Therefore, in the embodiment of the present application, after the data in the second deduplication block is successfully read, the mapping relationship between the logical address of the second logical block and the physical address of the first deduplication block in the logical space mapping table is modified into the mapping relationship between the logical address of the second logical block and the physical address of the second deduplication block.
In this way, when the third data in the second logical block is read again, the second deduplication block which stores the third data and is not abnormal can be found directly from the logical space mapping table (without querying the fingerprint database), and the third data is read from the second deduplication block, so that the data reading efficiency is improved.
Thus, the flow shown in fig. 2 is completed.
Reading of the deleted data is achieved by the flow shown in fig. 2.
In addition, it should be noted that after determining that the reading of the first deduplication block fails and the reading of the second deduplication block succeeds, the physical address of the first deduplication block in the third fingerprint record may be updated to be null, so as to facilitate subsequent asynchronous repair of data in the corrupted deduplication block, which is not repeated herein.
Referring to fig. 3, a flow of repairing the deleted data is shown in the embodiment of the present application. As shown in fig. 3, the process may include the following steps:
step 301, traverse a fourth fingerprint record in the fingerprint database where the physical address of the primary deduplication block is empty.
Here, a fingerprint record in which the physical address of the main deduplication block in the fingerprint library is empty is referred to as a fourth fingerprint record. It is to be understood that the fourth fingerprint record is named after for convenience of distinguishing and is not intended to be limiting.
The physical address of the backup deduplication block in the fourth fingerprint record is the physical address of the third backup deduplication block. Here, the third deduplication block is named for convenience of distinction and is not intended to be limiting.
For each fourth fingerprint record the following is performed:
step 302, allocating a corresponding third primary deduplication block to a third backup deduplication block from the deduplication pool.
Since the physical address of the main deduplication block in the fourth fingerprint record is empty, which indicates that the original main deduplication block may be damaged, a new main deduplication block is allocated to the third backup deduplication block, which is called a third main deduplication block. Here, the third backup deletion block is only named for convenience of distinction and is not intended to be limiting.
Step 303, reading fourth data in the third deduplication block.
Here, the data stored in the third deduplication block is referred to as fourth data. It is to be understood that the fourth datum is named only for the convenience of distinction and is not intended to be limiting.
Step 304, writing the fourth data into the third main deduplication block.
I.e., the data of the primary deduplication block is recovered.
Step 305, the physical address of the main deduplication block in the fourth fingerprint record is updated to the physical address of the third main deduplication block.
Namely, a backup relationship between the third primary deduplication block and the third backup deduplication block is established, and backup of the fourth data is restored.
The flow shown in fig. 3 is completed.
As can be seen from the flow shown in fig. 3, in the embodiment of the present application, the data in the main deduplication block is repaired in an asynchronous repair manner, and the front-end service processing time is not occupied, so that the influence on the performance of the front-end service can be effectively reduced.
In addition, it should be added that, if the read-write performance of the primary deduplication block is better than the read-write performance of the backup deduplication block, after the primary deduplication block is repaired, the mapping relationship between the logic block and the backup deduplication block in the logic space mapping table is modified into the mapping relationship between the logic block and the repaired primary deduplication block, so that data can be read from the primary deduplication block subsequently, and the data access performance is improved.
The method provided by the embodiment of the present application is described above, and the apparatus provided by the embodiment of the present application is described below:
referring to fig. 4, a schematic structural diagram of an apparatus provided in an embodiment of the present application is shown. The device is applied to a storage device, the storage device includes at least one deduplication LUN and at least one RAID, a target deduplication LUN in the at least one deduplication LUN corresponds to a deduplication pool created based on the RAID, the deduplication pool is divided into a plurality of deduplication blocks according to a preset deduplication block size, the target deduplication LUN is divided into a plurality of logic blocks according to the preset deduplication block size, the target deduplication LUN also corresponds to a fingerprint library created based on the RAID, the fingerprint library is used for storing a plurality of fingerprint records, and the fingerprint records are used for recording correspondence between fingerprint information of data, physical addresses of main deduplication blocks storing the data, and physical addresses of backup deduplication blocks storing the data, and the device includes: a data determination unit 401, an allocation unit 402, a write unit 403, a record increment unit 404, a traversal unit 405, a read unit 406, and a record update unit 407, where:
a data determining unit 401, configured to determine first data to be written into a first logical block, where the first logical block is any logical block of the target deduplication LUN into which the data is to be written;
an allocating unit 402, configured to allocate a first primary deduplication block to the first logical block from the deduplication pool if fingerprint information of the first data does not exist in the fingerprint library;
a writing unit 403, configured to write the first data into the first primary deduplication block;
a record adding unit 404, configured to add a first fingerprint record to the fingerprint library, where fingerprint information of the first fingerprint record is fingerprint information of the first data, a physical address of a main deduplication block of the first fingerprint record is a physical address of the first main deduplication block, and a physical address of a backup deduplication block of the first fingerprint record is null;
a traversing unit 405, configured to traverse a second fingerprint record in the fingerprint library, where a physical address of a backup deduplication block is empty, and a physical address of a primary deduplication block of the second fingerprint record is a physical address of a second primary deduplication block;
the allocating unit 402 is further configured to allocate a corresponding second backup deduplication block to the second primary deduplication block from the deduplication pool;
a reading unit 406, configured to read second data in the second main deduplication block;
the writing unit 403 is configured to write the second data into the second deduplication block;
a record updating unit 407, configured to update the physical address of the second deduplication block in the second fingerprint record to be the physical address of the second deduplication block.
In one embodiment, the primary deduplication block is located in a first RAID, and the backup deduplication block is located in a second RAID, and the read-write performance of the first RAID is equal to or better than that of the second RAID.
For one embodiment, when the primary and backup deduplication blocks are located in the same RAID, a physical address interval between the primary and backup deduplication blocks is greater than a preset address interval threshold.
As an embodiment, the target deduplication LUN further corresponds to a logical space mapping table, where the logical space mapping table is used to record a mapping relationship between a logical address of a mapped logical block and a physical address of a deduplication block, and the apparatus further includes:
a deduplication block determining unit, configured to determine, when third data in a second logical block needs to be read, a first deduplication block corresponding to the second logical block from the logical space mapping table;
the reading unit 406 is further configured to read the third data from the first deduplication block;
the reading unit 406 is further configured to, if reading from the first duplicate block fails and a third fingerprint record exists in the fingerprint database, where the third fingerprint record includes a physical address of the first duplicate block and a physical address of a second duplicate block, read the third data from the second duplicate block;
a modifying unit, configured to modify, if the reading from the second deduplication block is successful, a mapping relationship between a logical address of the second logical block and a physical address of the first deduplication block in the logical space mapping table to a mapping relationship between a logical address of the second logical block and a physical address of the second deduplication block.
As an embodiment, the apparatus further comprises:
the record updating unit 407 is configured to update a physical address of a first deduplication block in the third fingerprint record to be null;
the traversal unit 405 is further configured to traverse a fourth fingerprint record in the fingerprint library, where a physical address of a main deduplication block is empty, and a physical address of a backup deduplication block of the fourth fingerprint record is a physical address of a third backup deduplication block;
the allocating unit 402 is further configured to allocate a corresponding third primary deduplication block to the third backup deduplication block from the deduplication pool;
the reading unit 406 is further configured to read fourth data in the third backup erasure block;
the writing unit 403 is further configured to write the fourth data into the third main deduplication block;
the record updating unit 407 is further configured to update the physical address of the main deduplication block in the fourth fingerprint record to be the physical address of the third main deduplication block.
As can be seen from the above description, in the embodiment of the present application, data writing and data backup are performed in the same deduplication processing flow, so that the data protection efficiency can be effectively improved, and the space utilization rate of the storage device is considered at the same time. In addition, because the data writing and the data backup are executed asynchronously, the influence of the data backup (protection) on the front-end service performance can be effectively reduced.
The above description is only a preferred embodiment of the present application, and should not be taken as limiting the present application, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present application shall be included in the scope of the present application.

Claims (10)

1. A method for protecting deduplication data is applied to a storage device, the storage device includes at least one deduplication LUN and at least one RAID, a target deduplication LUN in the at least one deduplication LUN corresponds to a deduplication pool created based on the RAID, the deduplication pool is divided into a plurality of deduplication blocks according to a preset deduplication block size, the target deduplication LUN is divided into a plurality of logical blocks according to the preset deduplication block size, the target deduplication LUN further corresponds to a fingerprint library created based on the RAID, the fingerprint library is used for storing a plurality of fingerprint records, the fingerprint records are used for recording correspondence between fingerprint information of data, physical addresses of main deduplication blocks storing the data, and physical addresses of backup deduplication blocks storing the data, and the method includes:
determining first data to be written into a first logical block, wherein the first logical block is any logical block of the target deduplication LUN into which the data is to be written;
if the fingerprint information of the first data does not exist in the fingerprint database, allocating a first main deduplication block to the first logic block from the deduplication pool;
writing the first data to the first primary deduplication block;
adding a first fingerprint record in the fingerprint library, wherein the fingerprint information of the first fingerprint record is the fingerprint information of the first data, the physical address of a main deduplication block of the first fingerprint record is the physical address of the first main deduplication block, and the physical address of a backup deduplication block of the first fingerprint record is null;
traversing a second fingerprint record with an empty physical address of a backup deduplication block in the fingerprint library, wherein the physical address of a main deduplication block of the second fingerprint record is the physical address of a second main deduplication block;
the following processing is executed for each second fingerprint record:
allocating a corresponding second backup deduplication block to the second primary deduplication block from the deduplication pool;
reading second data in the second main deduplication block;
writing the second data to the second backup deduplication block;
and updating the physical address of the second backup deduplication block in the second fingerprint record to be the physical address of the second backup deduplication block.
2. The method of claim 1, wherein a primary deduplication block is located in a first RAID and a backup deduplication block is located in a second RAID, the read-write performance of the first RAID being equal to or better than the read-write performance of the second RAID.
3. The method of claim 1, wherein a physical address spacing between a primary and a backup deduplication block is greater than a preset address spacing threshold when the primary and backup deduplication blocks are located in the same RAID.
4. The method of claim 1, wherein the target deduplication LUN further corresponds to a logical space mapping table, and the logical space mapping table is used for recording mapping relationships between logical addresses of mapped logical blocks and physical addresses of the deduplication blocks, and the method further includes:
when third data in a second logic block needs to be read, determining a first deduplication block corresponding to the second logic block from the logic space mapping table;
reading the third data from the first deduplication block;
reading the third data from the second deduplication block if reading from the first deduplication block fails and a third fingerprint record exists in the fingerprint library, the third fingerprint record comprising a physical address of the first deduplication block and a physical address of a second deduplication block;
if the reading from the second deduplication block is successful, modifying the mapping relationship between the logical address of the second logical block and the physical address of the first deduplication block in the logical space mapping table into the mapping relationship between the logical address of the second logical block and the physical address of the second deduplication block.
5. The method of claim 4, wherein after a successful read from the second punctured block, the method further comprises:
updating the physical address of the first re-deleted block in the third fingerprint record to null;
the method further comprises the following steps:
traversing a fourth fingerprint record in the fingerprint library, wherein the physical address of the main deduplication block is empty, and the physical address of a backup deduplication block of the fourth fingerprint record is the physical address of a third backup deduplication block;
for each fourth fingerprint record the following is performed:
allocating a corresponding third primary deduplication block to the third backup deduplication block from the deduplication pool;
reading fourth data in the third backup deduplication block;
writing the fourth data to the third primary deduplication block;
and updating the physical address of the main deduplication block in the fourth fingerprint record to be the physical address of the third main deduplication block.
6. A deduplication data protection apparatus is applied to a storage device, where the storage device includes at least one deduplication LUN and at least one RAID, where a target deduplication LUN in the at least one deduplication LUN corresponds to a deduplication pool created based on the RAID, the deduplication pool is divided into a plurality of deduplication blocks according to a preset deduplication block size, the target deduplication LUN is divided into a plurality of logical blocks according to the preset deduplication block size, the target deduplication LUN further corresponds to a fingerprint library created based on the RAID, the fingerprint library is used to store a plurality of fingerprint records, and the fingerprint records are used to record correspondence between fingerprint information of data, a physical address of a main deduplication block storing the data, and a physical address of a backup deduplication block storing the data, and the apparatus includes:
a data determining unit, configured to determine first data to be written into a first logical block, where the first logical block is any logical block of the target deduplication LUN into which the data is to be written;
an allocating unit, configured to allocate a first primary deduplication block to the first logical block from the deduplication pool if fingerprint information of the first data does not exist in the fingerprint library;
a write unit configured to write the first data into the first primary deduplication block;
a record adding unit, configured to add a first fingerprint record to the fingerprint library, where fingerprint information of the first fingerprint record is fingerprint information of the first data, a physical address of a main deduplication block of the first fingerprint record is a physical address of the first main deduplication block, and a physical address of a backup deduplication block of the first fingerprint record is null;
a traversal unit, configured to traverse a second fingerprint record in the fingerprint library, where a physical address of a backup deduplication block is empty, and a physical address of a primary deduplication block of the second fingerprint record is a physical address of a second primary deduplication block;
the allocating unit is further configured to allocate a corresponding second backup deduplication block to the second primary deduplication block from the deduplication pool;
a reading unit, configured to read second data in the second main deduplication block;
the writing unit is configured to write the second data into the second deduplication block;
and the record updating unit is used for updating the physical address of the second backup deduplication block in the second fingerprint record to be the physical address of the second backup deduplication block.
7. The apparatus of claim 6, wherein a primary deduplication block is located in a first RAID and a backup deduplication block is located in a second RAID, the first RAID having read-write performance equal to or better than the read-write performance of the second RAID.
8. The apparatus of claim 6, wherein a physical address spacing between a primary and a backup deduplication block is greater than a preset address spacing threshold when the primary and backup deduplication blocks are located in the same RAID.
9. The apparatus of claim 6, wherein the target deduplication LUN further corresponds to a logical space mapping table, and the logical space mapping table is used for recording mapping relationships between logical addresses of mapped logical blocks and physical addresses of the deduplication blocks, and the apparatus further comprises:
a deduplication block determining unit, configured to determine, when third data in a second logical block needs to be read, a first deduplication block corresponding to the second logical block from the logical space mapping table;
the reading unit is further configured to read the third data from the first deduplication block;
the reading unit is further configured to read third data from a second duplicate block if reading from the first duplicate block fails and a third fingerprint record exists in the fingerprint database, the third fingerprint record including a physical address of the first duplicate block and a physical address of the second duplicate block;
a modifying unit, configured to modify, if the reading from the second deduplication block is successful, a mapping relationship between a logical address of the second logical block and a physical address of the first deduplication block in the logical space mapping table to a mapping relationship between a logical address of the second logical block and a physical address of the second deduplication block.
10. The apparatus of claim 9, wherein the apparatus further comprises:
the record updating unit is configured to update a physical address of a first deduplication block in the third fingerprint record to be null;
the traversal unit is further configured to traverse a fourth fingerprint record in the fingerprint library, where a physical address of a main deduplication block is empty, and a physical address of a backup deduplication block of the fourth fingerprint record is a physical address of a third backup deduplication block;
the allocating unit is further configured to allocate a corresponding third primary deduplication block to the third backup deduplication block from the deduplication pool;
the reading unit is further configured to read fourth data in the third backup deduplication block;
the writing unit is further configured to write the fourth data into the third main deduplication block;
the record updating unit is further configured to update a physical address of a main deduplication block in the fourth fingerprint record to be a physical address of the third main deduplication block.
CN202110328625.0A 2021-03-26 2021-03-26 Method and device for protecting deduplication data Active CN113050892B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110328625.0A CN113050892B (en) 2021-03-26 2021-03-26 Method and device for protecting deduplication data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110328625.0A CN113050892B (en) 2021-03-26 2021-03-26 Method and device for protecting deduplication data

Publications (2)

Publication Number Publication Date
CN113050892A true CN113050892A (en) 2021-06-29
CN113050892B CN113050892B (en) 2022-02-25

Family

ID=76515852

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110328625.0A Active CN113050892B (en) 2021-03-26 2021-03-26 Method and device for protecting deduplication data

Country Status (1)

Country Link
CN (1) CN113050892B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120317084A1 (en) * 2011-06-13 2012-12-13 Beijing Z&W Technology Consulting Co., Ltd. Method and system for achieving data de-duplication on a block-level storage virtualization device
CN103488734A (en) * 2013-09-17 2014-01-01 华为技术有限公司 Data processing method and deduplication engine
WO2016088258A1 (en) * 2014-12-05 2016-06-09 株式会社日立製作所 Storage system, backup program, and data management method
US20160291877A1 (en) * 2013-12-24 2016-10-06 Hitachi, Ltd. Storage system and deduplication control method
US20180239553A1 (en) * 2016-09-28 2018-08-23 Huawei Technologies Co., Ltd. Method for deduplication in storage system, storage system, and controller
CN110187834A (en) * 2019-05-24 2019-08-30 杭州宏杉科技股份有限公司 Data processing method, the device, electronic equipment of copy are deleted again

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120317084A1 (en) * 2011-06-13 2012-12-13 Beijing Z&W Technology Consulting Co., Ltd. Method and system for achieving data de-duplication on a block-level storage virtualization device
CN103488734A (en) * 2013-09-17 2014-01-01 华为技术有限公司 Data processing method and deduplication engine
US20160291877A1 (en) * 2013-12-24 2016-10-06 Hitachi, Ltd. Storage system and deduplication control method
WO2016088258A1 (en) * 2014-12-05 2016-06-09 株式会社日立製作所 Storage system, backup program, and data management method
US20180239553A1 (en) * 2016-09-28 2018-08-23 Huawei Technologies Co., Ltd. Method for deduplication in storage system, storage system, and controller
CN110187834A (en) * 2019-05-24 2019-08-30 杭州宏杉科技股份有限公司 Data processing method, the device, electronic equipment of copy are deleted again

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
匆匆那年S: "备份重复数据删除基础", 《CSDN》 *

Also Published As

Publication number Publication date
CN113050892B (en) 2022-02-25

Similar Documents

Publication Publication Date Title
US7281089B2 (en) System and method for reorganizing data in a raid storage system
US6052799A (en) System and method for recovering a directory for a log structured array
US7281160B2 (en) Rapid regeneration of failed disk sector in a distributed database system
US6647460B2 (en) Storage device with I/O counter for partial data reallocation
US6151685A (en) System and method for recovering a segment directory for a log structured array
KR101678868B1 (en) Apparatus for flash address translation apparatus and method thereof
US8332581B2 (en) Stale track initialization in a storage controller
JP2008204041A (en) Storage device and data arrangement control method
US20100306466A1 (en) Method for improving disk availability and disk array controller
JPH04230512A (en) Method and apparatus for updating record for dasd array
CN112596673B (en) Multiple-active multiple-control storage system with dual RAID data protection
US6907507B1 (en) Tracking in-progress writes through use of multi-column bitmaps
US7062605B2 (en) Methods and structure for rapid background initialization of a RAID logical unit
US11256447B1 (en) Multi-BCRC raid protection for CKD
US9122416B2 (en) Mainframe storage apparatus that utilizes thin provisioning
US11526447B1 (en) Destaging multiple cache slots in a single back-end track in a RAID subsystem
US6678787B2 (en) DASD-free non-volatile updates
CN112181299B (en) Data restoration method and distributed storage cluster
CN113050892B (en) Method and device for protecting deduplication data
CN114415968B (en) Storage system and data writing method thereof
CN111913664B (en) Data writing method and device
CN113050891B (en) Method and device for protecting deduplication data
CN107122261B (en) Data reading and writing method and device of storage equipment
CN113568584B (en) Method and device for protecting duplicate-deletion data
JP2000047832A (en) Disk array device and its data control method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant