CN115269258A - Data recovery method and system - Google Patents
Data recovery method and system Download PDFInfo
- Publication number
- CN115269258A CN115269258A CN202210889212.4A CN202210889212A CN115269258A CN 115269258 A CN115269258 A CN 115269258A CN 202210889212 A CN202210889212 A CN 202210889212A CN 115269258 A CN115269258 A CN 115269258A
- Authority
- CN
- China
- Prior art keywords
- data
- disk
- data blocks
- data block
- blocks
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/08—Error detection or correction by redundancy in data representation, e.g. by using checking codes
- G06F11/10—Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
- G06F11/1008—Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices
- G06F11/1064—Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices in cache or content addressable memories
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/08—Error detection or correction by redundancy in data representation, e.g. by using checking codes
- G06F11/10—Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
- G06F11/1008—Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices
- G06F11/1044—Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices with specific ECC/EDC distribution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1446—Point-in-time backing up or restoration of persistent data
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Signal Processing For Digital Recording And Reproducing (AREA)
Abstract
The invention provides a method, a system, a storage medium and a device for data recovery, wherein the method comprises the following steps: performing cold and hot partition on the disk data block to divide the disk data block into a hot area data block and a cold area data block; generating a local check code aiming at the hot area data block; and generating a global check code aiming at the cold area data block. And when the data of one of the hot area data blocks has errors, performing exclusive OR by using the local check code and the data of other data blocks in the hot area data blocks to recover the data of one of the hot area data blocks. When the data of one of the cold area data blocks has errors, the data of the one of the cold area data blocks is recovered by using the global check code and the data of the other data blocks in the disk data block. When more than one data in the disk data block has errors, the more than one data in the disk data block is recovered by using the global check code and the data of other data blocks in the disk data block.
Description
Technical Field
The present invention relates to the field of data storage and recovery technologies, and in particular, to a method, a system, a storage medium, and a device for data recovery.
Background
In order to improve the data reliability of the storage system and ensure that the data collection node can reconstruct the original file with high probability in the face of the storage requirement of mass data, a certain amount of redundancy needs to be additionally stored on the basis of storing the original data, so that the system can still normally operate under the condition that partial nodes fail, and the data collection node can still decode and recover the original file. Meanwhile, in order to maintain the reliability of the system, the failed node needs to be repaired in time, so that it is very important to design a good node repair mechanism.
Erasure Code (Erasure Code) belongs to a forward error correction technology in the coding theory, and is first applied in the communication field to solve the problems of loss and loss in data transmission. Erasure coding techniques have been introduced into the storage area because of their superior effectiveness in preventing data loss. Erasure codes can effectively reduce storage overhead while ensuring the same reliability, and therefore erasure code technology is widely applied to various large storage systems and data centers such as microsoft Azure, facebook F4, and the like.
It is known that the core concept of erasure codes is to construct a reversible coding matrix to generate check data, and the inverse matrix can be calculated to recover the original data. Common RS erasure codes use the above-described cauchy matrix or vandermonde matrix, which has the advantage that the resulting matrix is definitely reversible, any sub-matrix thereof is also reversible, and the size expansion of the matrix is simple.
The common RS erasure code inverse matrix is calculated by adopting a Gaussian elimination method, the general solution is suitable for the inversion of any reversible matrix, but the characteristic of matrix coding is not optimized, so that a large amount of redundant operation can be introduced although the calculation is regular. When k data blocks are stored and r check data blocks are added, the error probability of a single data block needing to be recovered accounts for 99.75% (2007 storage technical year union statistical data), and (k + r) 3 times of operation is needed by using Gaussian elimination to obtain a required inverse matrix, and then the corresponding data block is recovered.
The RS erasure of MDS code is standardized in its format, and the versatility of encoding and decoding is the most widely used technique in current erasure correction modules, but as data blocks increase, a large number of data blocks must be read for each decoding as erasure correction using RS, resulting in a slow decoding speed.
Therefore, in order to solve the problem, a better data recovery mode needs to be proposed to improve the efficiency and speed of data recovery.
Disclosure of Invention
In view of the above, the present invention is directed to a method, system, storage medium and device for improved data recovery, so as to improve the efficiency and speed of data recovery.
In view of the above objects, in one aspect, the present invention provides a method for data recovery, wherein the method includes the following steps:
performing cold and hot partition on a disk data block to divide the disk data block into a hot area data block and a cold area data block;
generating a local check code for the hot zone data block using equation (1):
LPh=flp(Dh) (1)
wherein D ishFor the hot zone data block, flpTo XOR the data blocks, and LPhThe local check code is the local check code;
generating a global check code for the cold zone data block by using the Van der Menu matrix of the formula (2) or the Cauchy matrix of the formula (3):
wherein k is the number of data blocks, r is the number of global check codes, D1~DkAll the disk data blocks including the cold zone data block and the hot zone data block, and P1~PrR global check codes are used;
when data of one of the hot area data blocks is wrong, carrying out exclusive OR by using the local check code and data of other data blocks in the hot area data blocks to recover the data of one of the hot area data blocks;
when data of one of the cold area data blocks is in error, recovering the data of the one of the cold area data blocks by using the global check code and the data of other data blocks in the disk data blocks; and
when more than one data in the disk data blocks have errors, the more than one data in the disk data blocks are recovered by using the global check code and the data of other data blocks in the disk data blocks.
In some embodiments of the method for recovering data according to the present invention, a new disk is added for storing the local check code generated for the hot zone data block.
In some embodiments of the method for data recovery according to the present invention, the disk data blocks are partitioned into hot and cold partitions according to the operation type and/or the read-write frequency of the disk data,
when the disk data block is subjected to cold and hot partitioning according to the read-write frequency of the disk data, the cold area data block is a data block with a lower relative read-write frequency, and the hot area data block is a data block with a higher relative read-write frequency.
In some embodiments of the method for data recovery according to the present invention, the method for recovering data of one of the cold area data blocks by using the global check code and data of other data blocks in the disk data blocks includes:
when r +1 is larger than or equal to h, recovering by performing exclusive OR on all the global check codes, the local check codes and other cold area data blocks in the disk data blocks, wherein r is the number of the global check codes, and h is the number of the hot area data blocks; and is
And when r +1 is less than h, recovering by using the global check code and the data of other data blocks in the disk data block.
In some embodiments of the method of data recovery according to the invention, the method further comprises:
the disk data blocks of every two stripes are grouped, the local check code of one stripe disk data block is subjected to exclusive OR with a second global check code of the other stripe disk data block,
when the data of one data block in the hot area data blocks has errors, the recovery is carried out according to the following steps:
taking k-1 data blocks and a first global check code for one stripe to carry out error recovery on the data blocks on the stripe;
taking the hot zone data block with data on another strip not having errors, the second global check code of the strip and the data information read from the strip to perform f-based processinglpTo recover the data.
In another aspect of the present invention, a system for recovering data is further provided, including:
a partitioning module configured to perform cold and hot partitioning on a disk data block to divide the disk data block into a hot zone data block and a cold zone data block;
a local check code generation module configured to generate a local check code for the hot zone data block using equation (1):
LPh=flp(Dh) (1)
wherein D ishFor the hot zone data block, flpIs an operation of XORing blocks of data, and LPhThe local check code is the local check code;
a global check code generation module configured to generate a global check code using the vandermonde matrix of equation (2) or the cauchy matrix of equation (3) for the cold zone data block:
wherein k is the number of data blocks, r is the number of global check codes, D1~DkAll the disk data blocks including the cold zone data block and the hot zone data block, and P1~PrR global check codes are used; and
a data recovery module configured to recover data of the disk data block in which the data error occurred,
when data of one of the hot area data blocks is wrong, the data recovery module performs exclusive OR by using the local check code and data of other data blocks in the hot area data blocks to recover the data of one of the hot area data blocks;
when data of one of the cold area data blocks is in error, the data recovery module recovers the data of the one of the cold area data blocks by using the global check code and data of other data blocks in the disk data blocks; and
when more than one data in the disk data blocks has errors, the data recovery module recovers the more than one data in the disk data blocks by using the global check code and the data of other data blocks in the disk data blocks.
In some embodiments of the system for data recovery according to the invention, the system further comprises
A dedicated disk module configured to store the local check code generated for the hot zone data block.
In some embodiments of the system for data recovery according to the present invention, the partitioning module performs hot and cold partitioning on the disk data block according to the operation type and/or the read/write frequency of the disk data,
when the partitioning module performs cold and hot partitioning on the disk data block according to the read-write frequency of the disk data, the cold area data block is a data block with a lower relative read-write frequency, and the hot area data block is a data block with a higher relative read-write frequency.
In some embodiments of the system for data recovery according to the present invention, the method for the data recovery module to recover the data of one of the cold data blocks using the global parity and the data of the other data blocks of the disk data blocks comprises:
when r +1 is more than or equal to h, the data recovery module performs recovery by performing exclusive or by using all the global check codes, the local check codes and other cold area data blocks in the disk data blocks, wherein r is the number of the global check codes, and h is the number of the hot area data blocks; and is provided with
And when r +1 is less than h, the data recovery module recovers by using the global check code and the data of other data blocks in the disk data block.
In some embodiments of the system for data recovery according to the invention, the system further comprises:
a grouping module configured to group the disk data blocks of every two stripes, and XOR the local parity of one stripe disk data block with a second global parity of another stripe disk data block,
when data of one data block in the hot area data block is wrong, the data recovery module recovers according to the following steps:
taking k-1 data blocks and a first global check code for one stripe to carry out error recovery on the data blocks on the stripe;
taking the hot zone data block with data on another strip not having errors, the second global check code of the strip and the data information read from the strip to perform f-based processinglpTo recover the data.
In still another aspect of the present invention, a computer-readable storage medium is further provided, which stores computer program instructions, when executed, implement any of the above methods for data recovery according to the present invention.
In yet another aspect of the present invention, a computer device is further provided, which includes a memory and a processor, the memory stores a computer program, and the computer program executes the method for data recovery according to any one of the above methods when executed by the processor.
The invention has at least the following beneficial technical effects: the invention provides various improved schemes for repairing cold and hot data, and also considers the problem of switching between the hot data and the cold data. When the requirement on the repair speed of the hot data is extremely high, recovering in a mode of utilizing less extra redundant data to ensure the recovery speed; when the hot data is gradually cooled, the redundancy ratio of the normal RS is switched, compared with the traditional RS, the data recovery speed of a single error is improved, and the recovery capability under a large number of errors is ensured.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained by using the drawings without creative efforts.
In the figure:
FIG. 1 shows a schematic block diagram of an embodiment of a method of data recovery according to the present invention;
FIG. 2 illustrates one example of a cold-hot area encoding scenario in a method of data recovery in accordance with the present invention;
FIG. 3 illustrates one example of a secondary recovery state in a method of data recovery according to the present invention;
FIG. 4 shows a schematic block diagram of an embodiment of a system for data recovery according to the present invention;
FIG. 5 shows a hardware configuration diagram of an embodiment of a computer device implementing a method of data recovery according to the invention;
fig. 6 shows a schematic view of a frame of an embodiment of a chip according to the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention are described in further detail with reference to the accompanying drawings.
It should be noted that all expressions using "first" and "second" in the embodiments of the present invention are used for distinguishing two non-identical entities with the same name or different parameters, and it should be understood that "first" and "second" are only used for convenience of description and should not be construed as limiting the embodiments of the present invention. Furthermore, the terms "comprises" and "comprising," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements does not include all of the other steps or elements inherent in the list.
According to a first aspect of the invention, a method 100 of data recovery is provided. Fig. 1 shows a schematic block diagram of an embodiment of a method of data recovery according to the present invention. In the embodiment shown in fig. 1, the method comprises:
step S10: performing cold and hot partition on the disk data block to divide the disk data block into a hot area data block and a cold area data block;
step S20: for hot zone data blocks, a local check code is generated using equation (1):
LPh=flp(Dh) (1)
wherein D ishFor hot zone data blocks, flpIs an operation of XORing blocks of data, and LPhIs a local check code;
step S30: for the cold zone data block, generating a global check code by using the Van der Menu matrix of the formula (2) or the Cauchy matrix of the formula (3):
wherein k is the number of data blocks, r is the number of global check codes, D1~DkIs the total disk data block including the cold zone data block and the hot zone data block, and P1~PrR global check codes;
step S40: when data of one of the hot area data blocks is wrong, carrying out exclusive OR by using the local check code and the data of other data blocks in the hot area data blocks to recover the data of one of the hot area data blocks;
step S50: when data of one of the cold area data blocks is wrong, recovering the data of one of the cold area data blocks by using the global check code and the data of other data blocks in the disk data block; and
step S60: when more than one data in the disk data block has errors, the more than one data in the disk data block is recovered by using the global check code and the data of other data blocks in the disk data block.
The erasure codes are of various types, and RS codes (Reed-Solomon codes) applied in a distributed environment are more common in a real storage system. The RS code is associated with two parameters k and r. Given two positive integers k and r, the RS code will have k as the number of data blocks and r as the number of global check codes. The way that r parity chunks are encoded based on the vandermonde matrix or the cauchy matrix is called RS erasure coding encoded by the vandermonde matrix or the cauchy matrix, and the specific encoding processes are shown in the above equations (2) and (3), respectively.
The upper k matrix corresponds to k original data blocks, and the lower r matrix corresponds to a coding matrix, which is obtained by correlating the coding matrix with original data D1To DkMultiply to obtain newly added P1To PrThe resulting r check data are encoded. When a plurality of r data are randomly made to have errors or are lost in transmission and the errors need to be corrected, the inverse matrix of the matrix corresponding to the residual data is multiplied by the data, and the original data block D is obtained1To Dk(the derivation process is not described in detail), which will also be referred to as "RS erasure" hereinafter.
With D1To DrFor example, decoding is performed when data is lost, and the process is shown in the following equation (4):
as introduced by the background, assuming there are k data, the code that generates r checked RS erasures can be summarized as:
where f is the encoding scheme used by the different RS, if the Van der Waals algorithm is used, as described aboveAccordingly, if Cauchy, there is a difference in f as described above. Similarly, assuming a decoding scenario, the matrix composed of f is inverted, and assuming that a data error needs to be recovered, the corresponding relationship is: f. of1 -1(D1,D2,...,Dk-1,P1)=DkOther error scenarios are similar.
As can be seen from the above, when RS is used to correct and decode any error, k data blocks need to be read, and multiple errors can be decoded simultaneously in parallel by using multiple decoding modules, but the reading of k data blocks is still needed, which is limited by the read-write speed of the existing storage medium (any kind of medium such as HDD and SSD), and the recovery speed is very slow when the speed k is high.
According to the above embodiment, the disk data block is partitioned into the hot zone data block and the cold zone data block by performing the cold-hot partition on the disk data block. Set the hot zone as DhRelative cold zone is Dc。
In the example shown in fig. 2, the disk data blocks are D1 to D6, and assuming that the hot zone data blocks are D2 and D4 at this time and the cold zone data blocks are the remaining disks, there are:
according to the above embodiment, the local check code is generated using equation (1) for the hot zone data block. For example, when r =2, then flpThe formula of (1) is:
continuing with the above example, using flpCarry out DhThe resulting check code is denoted LPhIt is possible to obtain:
as indicated above, using flpCarry out DhWhen encoding, because flpThe generated coding information contains all data blocks, and the data blocks subjected to cold and hot partition are different, so that the data blocks in the cold partition can be considered as D during codinghIs 0, then in equation (8), the final representation is for D onlyhThe operation of (a) is performed,then XOR'd to get the final output LPh。
As shown in fig. 2, D2 and D4 belong to the hot zone data block. Therefore, when the data of the part has errors, the requirement of higher repair speed is met, and any error only needs to be taken out of LPhAnd exclusive or of the remaining hot zone data blocks. Taking the above as an example, if D2 or D4 is wrong, only D4 or D2 and LP need to be takenh. In the example of fig. 2, 4 data blocks can be read less than the operation recovery by the erasure correction method of the global RS.
Similarly, when any data in the cold area data block has an error, the recovery only needs to read all the global checks P1, P2, and LPhAnd recovering the data which is not in error in the rest cold area data blocks.
When more than one cold zone data block or/and hot zone data block has an error, the recovery needs to be performed by using global check, and the specific recovery manner is as described above with RS erasure.
In a preferred embodiment of the data recovery method according to the present invention, a new disk is added for storing the local check code generated for the hot zone data block.
In a preferred embodiment of the method for recovering data according to the present invention, the data blocks of the disk are partitioned into hot and cold partitions according to the operation type and/or the read/write frequency of the data of the disk. When the disk data block is subjected to cold and hot partitioning according to the read-write frequency of the disk data, the cold zone data block is a data block with a lower relative read-write frequency, and the hot zone data block is a data block with a higher relative read-write frequency.
In an actual storage scenario, there is a distinction between hot and cold data. For example, files such as text are often of cold data, i.e., their read-write frequency is not too high; while the corresponding media files are typically hot data, i.e. they are read and written with a high frequency. The distinction of hot and cold data is directed to the relativity of the read and write frequency of the data in the memory array. Correspondingly, hot data has a high probability of error because of a high probability of reading and writing, and also has a high recovery speed requirement because of a high reading and writing frequency when any error occurs, and cold data is vice versa. Therefore, as can be seen from the above, even the hot data in one memory array will be changed to the cold data after a certain period of time due to different operating conditions, and the corresponding cold data will also be changed to the hot data.
In a preferred embodiment of the method for data recovery according to the present invention, the method for recovering data of one of the cold block data blocks using the global parity and data of the other data blocks in the disk data block includes: when r +1 is more than or equal to h, recovering by performing exclusive OR on all the global check codes, the local check codes and other cold area data blocks in the disk data blocks, wherein r is the number of the global check codes, and h is the number of the hot area data blocks; and when r +1 < h, recovering by using the global check code and the data of other data blocks in the disk data block.
Continuing with the above example, when D1 is in error, there are two recovery methods, as follows:
the choice of a particular recovery method depends on:
r+1≤h (10)
wherein r is the number of global check codes, and h is the number of data blocks divided into hot zones. When the formula (10) is satisfied, 1 in the formula (9) is selected for cold region data error recovery, otherwise, 2 is selected for recovery.
In a preferred embodiment of the method of data recovery according to the invention, the method further comprises: and the disk data blocks of every two stripes are organized into a group, and the local check code of one stripe disk data block is subjected to exclusive OR with the second global check code of the other stripe disk data block. When the data of one data block in the hot area data block has errors, the recovery is carried out according to the following steps: taking k-1 data blocks and a first global check code for one stripe to carry out error recovery on the data blocks on the stripe; and taking a hot zone data block with data on another strip not in error, and carrying out second global check on one stripCode, and data information already read on a stripe, are based on flpTo recover the data. The above-described step will hereinafter be simply referred to as "secondary recovery".
In the above preferred embodiment of the data recovery method 100, a magnetic disk may be additionally provided in order to make the hot zone data recovery speed high.
However, after the data blocks in the hot and cold areas are divided, the recovery speed of the data blocks in the hot area is required to be higher than that when RS erasure correction cannot be provided for one redundant disk; or after working for a period of time in the above state, when the recovery speed requirement of the hot zone data block is reduced (the required read-write frequency is reduced), the sub-recovery state according to the present invention can be entered. Upon entering the secondary recovery state, the redundancy ratio is equivalent to RS erasure correction, but will still have a higher recovery speed for the hot zone data blocks than for RS erasure correction. Specific variations are as follows.
As shown in FIG. 3, continuing the previous example, upon entering the secondary recovery state, a merge operation is performed for every two stripes, in such a way that the odd (or even) LPs are mergedhExclusive or' ed directly with even (or odd) P2. When the number of the global check codes is more than 2 (r)>2) In case of (1), LPhIt can be xored with any global check code P. However, based on the simplification of the operation, it is still recommended that P2 to Pr be xored.
If any data error occurs in any of the hot zone data blocks (D2, D4 as described above), the recovery method is as follows:
k-1 data blocks and P1 are taken from the even stripes (or the odd stripes) to carry out error recovery on the data blocks on the even stripes (or the odd stripes), and the specific recovery mode is a standard flow of RS erasure correction;
then, the hot area data block with data on the odd (or even) stripe not generating error, the P2 of the even (or odd) stripe and the data information already read on the even (or odd) stripe are taken to perform the f-based processinglpThe inversion operation of (2) is resumed.
At this time, when a data error occurs in any of the fast hot zone data, the data recovery read amount required for each two stripes is: k + h.
Compared with data recovery by RS erasure correction for reading 2k of data, the data recovery method can eliminate many data reading requirements and has speed advantage.
And when more than one data in the disk data block has errors, the data recovery of RS erasure is entered.
The invention provides a scheme for carrying out hot area data recovery (degraded reading) acceleration under the condition that a cold data block and a hot data block are divided based on specific user data use scene difference and the like under a storage array. The scheme has two implementation modes, one is a full-speed mode, the data error in any one of the hot zone data blocks can be recovered at a high speed (degraded reading), and the scheme also has a certain recovery effect on the data error in any one of the cold zone data blocks, so that the increase of redundancy can be reduced. If necessary, a transition to a second higher speed scenario (second recovery) is also possible. Under the sub-high speed scheme, compared with the RS, the method does not need to add extra redundant data storage, but can also have a certain speed-up effect of a degraded reading function on the hot zone data blocks.
In a second aspect of the present invention, a system 200 for data recovery is also provided. Fig. 4 shows a schematic block diagram of an embodiment of a system 200 for data recovery according to the present invention. As shown in fig. 4, the system includes:
a partition module 210, where the partition module 210 is configured to perform cold and hot partition on a disk data block to divide the disk data block into a hot zone data block and a cold zone data block;
a local check code generation module 220, wherein the local check code generation module 220 is configured to generate a local check code for the hot-zone data block by using equation (1):
LPh=flp(Dh) (1)
wherein D ishAs hot zone data blocks, flpIs an operation of XORing blocks of data, and LPhIs a local check code;
a global check code generating module 230, where the global check code generating module 230 is configured to generate a global check code by using the vandermonde matrix of equation (2) or the cauchy matrix of equation (3) for the cold area data block:
wherein k is the number of data blocks, r is the number of global check codes, D1~DkIs the total disk data block including the cold zone data block and the hot zone data block, and P1~PrR global check codes;
a data recovery module 240, wherein the data recovery module 240 is configured to recover data of the disk data block with data error.
When the data of one of the hot area data blocks has an error, the data recovery module 240 performs an exclusive or with the local check code and the data of the other data blocks in the hot area data block to recover the data of the one of the hot area data blocks; when data of one of the cold area data blocks is in error, the data recovery module 240 recovers the data of the one of the cold area data blocks by using the global check code and data of other data blocks in the disk data blocks; and when more than one data in the disk data blocks has errors, the data recovery module 240 recovers the more than one data in the disk data blocks by using the global parity and the data of other data blocks in the disk data blocks.
In a preferred embodiment of the system for data recovery according to the present invention, the system further comprises a dedicated disk module configured to store the local check code generated for the hot zone data block.
In a preferred embodiment of the data recovery system according to the present invention, the partition module 210 performs hot and cold partitioning on the disk data blocks according to the operation type and/or the read/write frequency of the disk data. When the partitioning module 210 performs cold-hot partitioning on the disk data block according to the read-write frequency of the disk data, the cold-zone data block is a data block with a lower relative read-write frequency, and the hot-zone data block is a data block with a higher relative read-write frequency.
In a preferred embodiment of the system for data recovery according to the present invention, the method for the data recovery module 240 to recover the data of one of the cold block data blocks by using the global parity and the data of the other data blocks in the disk data blocks includes:
when r +1 is greater than or equal to h, the data recovery module 240 performs recovery by performing exclusive or using all global check codes, local check codes and other cold area data blocks in the disk data blocks, where r is the number of global check codes and h is the number of hot area data blocks; and is
When r +1 < h, the data recovery module 240 recovers using the global parity and the data of the other data blocks in the disk data block.
In a preferred embodiment of the system for data recovery according to the invention, the system further comprises:
and the grouping module is configured to group the disk data blocks of every two stripes, and perform exclusive OR on the local check code of one stripe disk data block and the second global check code of the other stripe disk data block.
When data of one data block in the hot zone data blocks is in error, the data recovery module 240 performs recovery as follows: taking k-1 data blocks and a first global check code for one stripe to carry out error recovery on the data blocks on the stripe; and taking the hot area data block with data on the other strip not having errors, the second global check code of one strip and the read data information on one strip to perform f-based processinglpTo recover the data.
In a third aspect of the embodiment of the present invention, a computer-readable storage medium is further provided, and fig. 5 is a schematic diagram of a computer-readable storage medium illustrating a method for data recovery according to an embodiment of the present invention. As shown in fig. 5, the computer-readable storage medium 300 stores computer program instructions 310, the computer program instructions 310 being executable by a processor. The computer program instructions 310, when executed, implement the method of any of the embodiments described above.
It is to be understood that all embodiments, features and advantages set forth above with respect to the method of data recovery according to the present invention apply equally, without conflict therewith, to the system and storage medium of data recovery according to the present invention.
In a fourth aspect of the embodiments of the present invention, there is further provided a computer device 400, including a memory 420 and a processor 410, where the memory stores therein a computer program, and the computer program, when executed by the processor, implements the method of any one of the above embodiments.
Fig. 6 is a schematic hardware configuration diagram of an embodiment of a computer device for executing the data recovery method according to the present invention. Taking the computer device 400 shown in fig. 6 as an example, the computer device includes a processor 410 and a memory 420, and may further include: an input device 430 and an output device 440. The processor 410, memory 420, input device 430, and output device 440 may be connected by a bus or other means, such as by a bus connection in fig. 6. The input device 430 may receive input numeric or character information and generate signal inputs related to data recovery. The output device 440 may include a display device such as a display screen.
The memory 420 is a non-volatile computer-readable storage medium, and can be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as program instructions/modules corresponding to the resource monitoring method in the embodiment of the present application. The memory 420 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created by use of the resource monitoring method, and the like. Further, the memory 420 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, memory 420 may optionally include memory located remotely from processor 410, which may be connected to local modules via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The processor 410 executes various functional applications of the server and data processing by executing nonvolatile software programs, instructions and modules stored in the memory 420, that is, implements the resource monitoring method of the above-described method embodiment.
Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as software or hardware depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the disclosed embodiments of the present invention.
Finally, it is noted that the computer-readable storage medium (e.g., memory) herein can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. By way of example, and not limitation, nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM), which can act as external cache memory. By way of example and not limitation, RAM may be available in a variety of forms such as synchronous RAM (DRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), enhanced SDRAM (ESDRAM), synchlink DRAM (SLDRAM), and Direct Rambus RAM (DRRAM). The storage devices of the disclosed aspects are intended to comprise, without being limited to, these and other suitable types of memory.
The various illustrative logical blocks, modules, and circuits described in connection with the disclosure herein may be implemented or performed with the following components designed to perform the functions herein: a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination of these components. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP, and/or any other such configuration.
The foregoing is an exemplary embodiment of the present disclosure, but it should be noted that various changes and modifications could be made herein without departing from the scope of the present disclosure as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. Furthermore, although elements of the disclosed embodiments of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.
It should be understood that, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items. The numbers of the embodiments disclosed in the above embodiments of the present invention are merely for description, and do not represent the advantages or disadvantages of the embodiments.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, of embodiments of the invention is limited to these examples; within the idea of an embodiment of the invention, also technical features in the above embodiment or in different embodiments may be combined and there are many other variations of the different aspects of the embodiments of the invention as described above, which are not provided in detail for the sake of brevity. Therefore, any omissions, modifications, substitutions, improvements and the like that may be made without departing from the spirit or scope of the embodiments of the present invention are intended to be included within the scope of the embodiments of the present invention.
Claims (10)
1. A method of data recovery, comprising the steps of:
performing cold and hot partition on a disk data block to divide the disk data block into a hot area data block and a cold area data block;
generating a local check code for the hot zone data block using equation (1):
LPh=flp(Dh) (1)
wherein D ishFor the hot zone data block, flpTo XOR the data blocks, and LPhThe local check code is the local check code;
generating a global check code for the cold zone data block by using the Van der Menu matrix of the formula (2) or the Cauchy matrix of the formula (3):
wherein k is the number of data blocks, r is the number of global check codes, D1~DkIs the whole disk data block containing the cold zone data block and the hot zone data block, and P1~PrR said global check codes;
when data of one of the hot area data blocks is wrong, carrying out exclusive OR by using the local check code and data of other data blocks in the hot area data blocks to recover the data of one of the hot area data blocks;
when data of one of the cold area data blocks has errors, recovering the data of the one of the cold area data blocks by using the global check code and data of other data blocks in the disk data blocks; and
when more than one data in the disk data blocks have errors, the more than one data in the disk data blocks are recovered by using the global check code and the data of other data blocks in the disk data blocks.
2. The method of claim 1,
and newly adding a disk for storing the local check code generated aiming at the hot zone data block.
3. The method of claim 1, wherein the disk data blocks are partitioned into hot and cold partitions according to the operation type and/or the read/write frequency of the disk data,
when the disk data block is subjected to cold and hot partitioning according to the read-write frequency of the disk data, the cold area data block is a data block with a lower relative read-write frequency, and the hot area data block is a data block with a higher relative read-write frequency.
4. The method of claim 1, wherein the method for recovering the data of one of the cold block data blocks by using the global check code and the data of the other data blocks in the disk data blocks comprises:
when r +1 is more than or equal to h, recovering by performing exclusive OR on all the global check codes, the local check codes and other cold area data blocks in the disk data blocks, wherein r is the number of the global check codes, and h is the number of the hot area data blocks; and is
And when r +1 is less than h, recovering by using the global check code and the data of other data blocks in the disk data block.
5. The method of claim 1, further comprising:
the disk data blocks of every two stripes are organized into a group, the local check code of one stripe disk data block is subjected to exclusive OR with the second global check code of the other stripe disk data block,
when the data of one data block in the hot area data blocks has errors, the recovery is carried out according to the following steps:
taking k-1 data blocks and a first global check code for one stripe to carry out error recovery on the data blocks on the stripe;
taking the hot zone data block with data on another strip not having errors, the second global check code of the strip and the data information read from the strip to perform f-based processinglpTo recover the data.
6. A system for data recovery, comprising:
a partitioning module configured to perform cold and hot partitioning on a disk data block to divide the disk data block into a hot zone data block and a cold zone data block;
a local check code generation module configured to generate a local check code for the hot zone data block using equation (1):
LPh=flp(Dh) (1)
wherein D ishFor the hot zone data block, flpTo XOR the data blocks, and LPhThe local check code is the local check code;
a global check code generation module configured to generate a global check code using the vandermonde matrix of equation (2) or the cauchy matrix of equation (3) for the cold zone data block:
wherein k is the number of data blocks, r is the number of global check codes, D1~DkIs the whole disk data block containing the cold zone data block and the hot zone data block, and P1~PrR global check codes are used; and
a data recovery module configured to recover data of the disk data block in which the data error occurred,
when data of one of the hot area data blocks is wrong, the data recovery module performs exclusive OR by using the local check code and data of other data blocks in the hot area data blocks to recover the data of one of the hot area data blocks;
when data of one of the cold area data blocks has an error, the data recovery module recovers the data of the one of the cold area data blocks by using the global check code and data of other data blocks in the disk data blocks; and
when more than one data in the disk data blocks has errors, the data recovery module recovers the more than one data in the disk data blocks by using the global check code and the data of other data blocks in the disk data blocks.
7. The system of claim 6, further comprising:
a dedicated disk module configured to store the local check code generated for the hot zone data block.
8. The system of claim 6,
the partition module performs cold and hot partition on the disk data block according to the working type and/or the read-write frequency of the disk data,
when the partitioning module performs cold and hot partitioning on the disk data block according to the read-write frequency of the disk data, the cold area data block is a data block with a lower relative read-write frequency, and the hot area data block is a data block with a higher relative read-write frequency.
9. The system of claim 6,
the method for the data recovery module to recover the data of one of the cold area data blocks by using the global check code and the data of the other data blocks in the disk data blocks comprises the following steps:
when r +1 is more than or equal to h, the data recovery module performs recovery by performing exclusive or by using all the global check codes, the local check codes and other cold area data blocks in the disk data blocks, wherein r is the number of the global check codes, and h is the number of the hot area data blocks; and is
And when r +1 is less than h, the data recovery module recovers by using the global check code and the data of other data blocks in the disk data block.
10. The system of claim 6, further comprising:
a grouping module configured to group the disk data blocks of every two stripes, and XOR the local parity of one stripe disk data block with a second global parity of another stripe disk data block,
when data of one data block in the hot area data block is wrong, the data recovery module recovers according to the following steps:
taking k-1 data blocks and a first global check code for one stripe to carry out error recovery on the data blocks on the stripe;
taking the hot zone data block with data on another strip not having errors, the second global check code of the strip and the data information read from the strip to perform f-based processinglpTo recover the data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210889212.4A CN115269258A (en) | 2022-07-27 | 2022-07-27 | Data recovery method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210889212.4A CN115269258A (en) | 2022-07-27 | 2022-07-27 | Data recovery method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115269258A true CN115269258A (en) | 2022-11-01 |
Family
ID=83769701
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210889212.4A Pending CN115269258A (en) | 2022-07-27 | 2022-07-27 | Data recovery method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115269258A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115454712A (en) * | 2022-11-11 | 2022-12-09 | 苏州浪潮智能科技有限公司 | Check code recovery method, system, electronic equipment and storage medium |
-
2022
- 2022-07-27 CN CN202210889212.4A patent/CN115269258A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115454712A (en) * | 2022-11-11 | 2022-12-09 | 苏州浪潮智能科技有限公司 | Check code recovery method, system, electronic equipment and storage medium |
CN115454712B (en) * | 2022-11-11 | 2023-02-28 | 苏州浪潮智能科技有限公司 | Check code recovery method, system, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10146618B2 (en) | Distributed data storage with reduced storage overhead using reduced-dependency erasure codes | |
US11531593B2 (en) | Data encoding, decoding and recovering method for a distributed storage system | |
US9600365B2 (en) | Local erasure codes for data storage | |
EP2218003B1 (en) | Correction of errors in a memory array | |
US20160164543A1 (en) | Turbo product codes for nand flash | |
CN106874140B (en) | Data storage method and device | |
WO2018171111A1 (en) | Multi-fault tolerance mds array code encoding and repair method | |
CN114090345B (en) | Disk array data recovery method, system, storage medium and equipment | |
US20100138717A1 (en) | Fork codes for erasure coding of data blocks | |
US10558524B2 (en) | Computing system with data recovery mechanism and method of operation thereof | |
CN112000512B (en) | Data restoration method and related device | |
CN112799875B (en) | Method, system, device and medium for verification recovery based on Gaussian elimination | |
US20240264902A1 (en) | Data encoding method and apparatus, device, and medium | |
CN114816837B (en) | Erasure code fusion method and system, electronic device and storage medium | |
CN109358980B (en) | RAID6 encoding method friendly to data updating and single-disk error recovery | |
CN110032331B (en) | Method and system for improving backup efficiency by bypassing encoding and decoding | |
CN116501553B (en) | Data recovery method, device, system, electronic equipment and storage medium | |
CN113168882B (en) | Encoding method, decoding method and storage controller | |
CN115269258A (en) | Data recovery method and system | |
CN110896309A (en) | Decoding method, device, decoder and computer storage medium for Turbo product code | |
US11928027B1 (en) | System and method for error checking and correction with metadata storage in a memory controller | |
CN112181707A (en) | Distributed storage data recovery scheduling method, system, equipment and storage medium | |
Chen et al. | A new Zigzag MDS code with optimal encoding and efficient decoding | |
US20130091402A1 (en) | Strong single and multiple error correcting wom codes, coding methods and devices | |
CN114691414A (en) | Check block generation method and data recovery method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |