WO2011015134A1 - 多磁盘容错系统及生成校验块、恢复数据块的方法 - Google Patents

多磁盘容错系统及生成校验块、恢复数据块的方法 Download PDF

Info

Publication number
WO2011015134A1
WO2011015134A1 PCT/CN2010/075678 CN2010075678W WO2011015134A1 WO 2011015134 A1 WO2011015134 A1 WO 2011015134A1 CN 2010075678 W CN2010075678 W CN 2010075678W WO 2011015134 A1 WO2011015134 A1 WO 2011015134A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
block
check
disk
data block
Prior art date
Application number
PCT/CN2010/075678
Other languages
English (en)
French (fr)
Inventor
王玉林
姚建业
Original Assignee
成都市华为赛门铁克科技有限公司
电子科技大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 成都市华为赛门铁克科技有限公司, 电子科技大学 filed Critical 成都市华为赛门铁克科技有限公司
Publication of WO2011015134A1 publication Critical patent/WO2011015134A1/zh
Priority to US13/365,960 priority Critical patent/US8489916B2/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1076Parity data used in redundant arrays of independent storages, e.g. in RAID systems
    • G06F11/1096Parity calculation or recalculation after configuration or reconfiguration of the system

Definitions

  • Multi-disk fault-tolerant system and method for generating check block and recovering data block The present application claims to be submitted to the Chinese Patent Office on August 4, 2009, the application number is 200910090420.2, and the invention name is "multi-disk fault-tolerant system and generating check block, recovery Priority of the Chinese Patent Application for the Method of the Data Block, the entire contents of which is hereby incorporated by reference.
  • the present invention relates to the field of data storage technologies, and in particular, to a multi-disk fault tolerant system and a method for generating a parity block and recovering a data block.
  • Redundant disk arrays bad 1 J Redundant Array of Independent Di sks, hereinafter referred to as: RAID
  • RAID Redundant Array of Independent Di sks
  • the disk array is increasing in size, and more and more disks are grouped into an array.
  • the possibility of multiple disk failures in one array is increased;
  • the second is that the increase in disk capacity is faster than the increase in data access speed, and the time required to rebuild a disk is increased, thus lengthening the array during the process of rebuilding a disk
  • the third is that the increase in media storage density leads to a reduction in disk reliability;
  • the fourth is due to the phase of disk failure in actual applications.
  • Criticality due to external environment and internal factors of the disk, causing disk failure is related, which also leads to a large increase in the probability of multiple disk failures in a short period of time.
  • the RAID 51 mode prevents single disk failures from corrupting data and mirrors RAID 5 arrays to protect up to three arbitrary disk failures, resulting in two disk reads and four disk writes for one write request.
  • Dual parity mode extends the RAID5 mode to dual parity, in which at least three disk reads and three disk writes are generated per write request.
  • RAID6 is a dual-disk fault-tolerant method in dual-check mode. Compared with other levels of RAID, two separate error check blocks, check block P and check block Q, are added for each stripe ( Stripe ) includes two check units: P check unit and Q check unit, where P uses parity code and Q uses other check codes such as Reed-Solomon. When a single disk failure occurs, P+Q RAID is converted to RAID5 with N+1 parity. When a dual disk failure occurs, P+Q RAID is converted to RAID 0 without fault tolerance.
  • An object of the embodiments of the present invention is to provide a multi-disk fault tolerant system and generate a check block and a recovery number. According to the block method, the computational complexity of the check block generation in the multi-disk fault-tolerant system can be reduced, and the data processing speed can be improved.
  • an embodiment of the present invention provides a multi-disk fault-tolerant system, including a disk array and a computing module connected through a system bus;
  • the disk array is composed of p disks, where p is a natural number greater than or equal to 3, and the number of fault-tolerant disks of the disk array is q, where q is a natural number less than /2 and not less than 2;
  • the data in the disk array is arranged according to the matrix M of (m + ⁇ xp, where m is a prime number less than or equal to pd; the zeroth in the matrix M is a virtual data block with a virtual value of 0, the first Go to the m-1th behavior data block, the mth to m+q-1 behavior check area; where the check block C m in the check area - the line number of each data block in the check packet is w - /, the range of the column number M + k is / to -l + /, where / is the line number of the check block in the check area, l ⁇ / ⁇ g, n is the column corresponding to the check block No., 0 ⁇ « ⁇ -1; the data in the check block is an exclusive OR value of data in all data blocks of the check packet to which the check block belongs;
  • the calculating module is configured to perform an exclusive OR calculation according to the data block in the check packet to generate a check block in the check packet, and recover the data block according to the check block when the disk is damaged.
  • the embodiment of the invention further provides a method for generating a check block in the multi-disk fault tolerant system, which includes:
  • Embodiments of the present invention also provide a method for recovering a data block in the above multi-disk fault tolerant system. include:
  • the data is subjected to an exclusive OR calculation to obtain data of the data block to be restored;
  • Embodiments of the present invention provide a multi-disk fault-tolerant system, a method for generating a check block in a multi-disk fault-tolerant system, and a method for recovering a data block in a multi-disk fault-tolerant system, generating a check block and recovering in a multi-disk fault-tolerant system
  • complex multiplication and addition operations are not required, which can effectively reduce the computational complexity of the multi-disk fault-tolerant system in the data processing process and improve the data processing speed.
  • FIG. 1 is a schematic structural diagram of a multi-disk fault-tolerant system according to an embodiment of the present invention
  • FIG. 2 is a schematic structural diagram 1 of a computing module according to an embodiment of the present invention
  • FIG. 3 is a schematic structural diagram 2 of a computing module according to an embodiment of the present invention.
  • FIG. 4 is a schematic structural diagram of a multi-disk fault tolerant system according to an embodiment of the present invention.
  • FIG. 5 is a schematic diagram of data layout of a multi-disk fault-tolerant system according to an embodiment of the present invention
  • FIG. 6 is a schematic diagram of relationship between a data block and a check block in a multi-disk fault-tolerant system according to an embodiment of the present invention
  • Figure ⁇ is a schematic flowchart of a method for generating a check block in a multi-disk fault tolerant system according to an embodiment of the present invention
  • FIG. 8 is a schematic flow chart of a specific embodiment of the embodiment shown in FIG. 7;
  • FIG. 9 is a schematic flow chart of a method for restoring a data block in a multi-disk fault tolerant system according to an embodiment of the present invention.
  • FIG. 10 is a schematic flow chart of a specific embodiment of the embodiment shown in FIG. 9;
  • FIG. 11 is a schematic diagram of a process of reconstructing a single faulty disk in a dual-disk fault-tolerant system according to an embodiment of the present invention
  • FIG. 12 is a schematic flow chart of a method for recovering multiple disks in a multi-disk fault-tolerant system according to an embodiment of the present invention
  • Figure 13 is a schematic diagram of the data recovery path in the embodiment shown in Figure 12.
  • FIG. 1 is a schematic structural diagram of an embodiment of a multi-disk fault-tolerant system according to an embodiment of the present invention.
  • a multi-disk fault-tolerant system includes a disk array 12 and a computing module 13 connected through a system bus 11, where:
  • the disk array 12 is composed of ⁇ disks, where ⁇ is a natural number greater than or equal to 3, the magnetic The number of fault-tolerant disks of the disk array is q, where q is a natural number less than p/2 and not less than 2; the data in the disk array is arranged according to the matrix M form of m + ⁇ xp, where m is less than or equal to pq a prime number of the matrix M in the matrix M, a virtual data block having a virtual value of 0, a first row to an m-1 behavior data block, and a mth to m+q-1 behavior check block;
  • the line number of each data block in the check packet where the block ⁇ is located is - /, and the column number is "the value range of + is / to m-l+/, where I is the line number of the check block in the check area.
  • n is the column number corresponding to the check block, 0 ⁇ ⁇ ⁇ ⁇ -1 ; the data of the check block is the exclusive OR value of the data of all the data blocks in the check packet to which the check block belongs ;
  • the calculating module 1 3 is configured to generate a check block in the check packet according to a data block calculation in the check packet, and recover the data block according to the check block when the disk is damaged.
  • the multi-disk fault-tolerant system can implement fault tolerance of q (2 ⁇ q ⁇ Lp / 2") disks in any integer P (P > 3 ) disks, and each of the disks + ( m
  • the physical unit is a data set
  • the 0th row is a virtual physical unit with 0,
  • the 1st row to the m-1 row stores the data block
  • the subsequent q rows store the parity block, wherein the checksum is verified.
  • the data of the block is the exclusive OR value of the data in all the data blocks in the check packet to which the check block belongs.
  • the multi-disk fault-tolerant system provided in this embodiment can eliminate complex multiplication and addition operations when generating check blocks and recovering data blocks, and only perform exclusive-OR calculation, which can effectively reduce the multi-disk fault-tolerant system in the data processing process.
  • the computational complexity in .
  • FIG. 2 is a schematic structural diagram 1 of a computing module according to an embodiment of the present invention. As shown in FIG. 2, the computing module 13 includes:
  • a first obtaining unit 1 31, configured to acquire data of all data blocks in the check packet to which the check block belongs, where the first calculating unit 1 32 is configured to be used in the check packet to which the check block obtained by the first acquiring unit belongs Data of all data blocks is XORed to obtain data of the check block;
  • the first output unit 133 is configured to write the data of the generated check block calculated by the first calculating unit 132 into a corresponding check block in the disk array.
  • the calculation module is divided according to the early function, and the value of the check block is the exclusive OR value of the data in all the data blocks in the check packet to which the check block belongs.
  • FIG. 3 is a schematic structural diagram 2 of a computing module according to an embodiment of the present invention, as shown in FIG. 3, a computing module
  • the second obtaining unit 134 is configured to acquire other data blocks except the data block to be restored in the check packet to which the data block to be restored belongs Data and check block data;
  • a second calculating unit 135, configured to perform an exclusive-OR calculation according to the data of the data block acquired by the second obtaining unit 134 and the data of the check block to obtain data of the to-be-recovered data block;
  • the second output unit 136 is configured to calculate, by the second calculating unit 135, the to-be-recovered data block to be restored into the to-be-recovered data block.
  • the multi-disk fault-tolerant system includes a disk array 21, a main processor 22, an X0R coprocessor 23, a cache module 24, and a system bus 25.
  • the disk array 21, the main processor 22, the X0R coprocessor 23, and the cache module 24 are connected by the system bus 25.
  • the disk array is composed of a (p>2>) disk, and the disk array is provided with a fault-tolerant disk.
  • the number is q (2 ⁇ q ⁇ Lp/2"), and the specific storage format of the data in the disk array can be seen in the embodiment shown in FIG.
  • the main processor 22 in the above embodiment is used to perform operations such as address translation, system management, and cache management of the disk array system.
  • the X0R coprocessor 23 mainly performs XOR calculation of the data block, and the X0R coprocessor can be equivalent to the above implementation.
  • the calculation module in the example; the cache module 24 is used to cache data.
  • FIG. 5 is a schematic diagram of data layout of a multi-disk fault-tolerant system according to an embodiment of the present invention, as shown in FIG. 5
  • the fault tolerance of the disk in this embodiment, the value of m is 5, and there are 7 rows of data blocks in one data set, wherein the 0th parameter is a virtual data block of all 0s, which does not occupy the actual storage space of the disk, 1st to 4th.
  • the behavior data block which stores the valid data, the 5th and 6th behavior check blocks, and the XOR check value of the corresponding data block, provides redundancy protection for the data block, and it can be seen that the check block is not in the embodiment of the present invention. Use a separate disk to store, but evenly distributed across all disks.
  • FIG. 6 is a schematic diagram of a relationship between a data block and a check block in a multi-disk fault-tolerant system according to an embodiment of the present invention. As shown in FIG. 6, seven disks are still taken as an example, and each of the data blocks of the 1-5 rows is two. A different check packet is included, for example, the data block D (1, 0) of the first row on the disk DO is denoted as f4, indicating that the data block belongs to the 4th packet of the first row check, and belongs to the second row.
  • the verified f packet, the check block 4 of the 4 packet is located on the disk D3, and the check block f of the f packet is located on the disk D5, except for all the data blocks except the virtual row (ie, the 0th row of the check packet)
  • the packet check blocks are located on different disks and different check lines. That is, except for the 0th row virtual data block, there is no data block belonging to two check packets on the same disk, and the same data does not exist. The blocks belong to two check packets on the same check line.
  • the row number of the data block to which the check packet belongs is - /, and the column number is " +
  • Calculated data block C (4, 1), C (3, 2), C (2, 3), C (1, 4) and C (0, 5) are the data blocks of the check packet.
  • the check packet data block to which the other check block belongs in the first row can be calculated, and the value of each check block is its school.
  • the XOR value of each data block can be seen as follows:
  • C(5, 0) C(4, 1) ⁇ C(3, 2) ⁇ C(2, 3) ⁇ C(l, 4) ⁇ C(0, 5)
  • C(5, 4) C(4, 5) ⁇ C(3, 6) ⁇ C(2, 0) ⁇ C(l, 1) ⁇ C(0, 2)
  • C(5, 5) C(4, 6) ⁇ C(3, 0) ⁇ C(2, 1) ⁇ C(l, 2) ⁇ C(0, 3)
  • C(5, 6) C(4, 0) ⁇ C(3, 1) ⁇ C(2, 2) ⁇ C(l, 3) ⁇ C(0, 4)
  • C(6, 1) C(1, 3) ⁇ C(4, 4) ⁇ C(2, 5) ⁇ C(0, 6) ⁇ C(3, 0)
  • C(6, 2) C(1,4) ⁇ C(4, 5) ⁇ C(2, 6) ⁇ C(0, 0) ⁇ C(3, 1)
  • C(6, 3) C(1, 5) ⁇ C(4, 6) ⁇ C(2, 0) ⁇ C(0, 1) ⁇ C(3, 2)
  • C(6, 4) C(1,6) ⁇ C(4, 0) ⁇ C(2, 1) ⁇ C(0, 2) ⁇ C(3, 3)
  • C(6, 6) C(1, 1) ⁇ C(4, 2) ⁇ C(2, 3) ⁇ C(0, 4) ⁇ C(3, 5)
  • Figure ⁇ is a schematic flowchart of a method for generating a check block in a multi-disk fault-tolerant system according to an embodiment of the present invention. As shown in Figure 7, the method includes the following steps:
  • Step 101 Obtain data of each data block in the check packet to which the check block to be generated belongs.
  • the method for generating a check block in this embodiment is the multi-disk fault-tolerant system described in FIG.
  • the row number and the column number of each data block in the check packet where the check block ⁇ is located satisfy the following conditions: the line number of the data block is - /, and the range of the column number A is / to m-l+ /, where / is the line number of the check block in the check area, 1 ⁇ / ⁇ g, n is the column number corresponding to the check block, 0 ⁇ ⁇ ⁇ ⁇ -1 ;
  • Step 1 02 Obtain, according to the data of all the data blocks in the check packet, the check data in the check block that needs to be generated;
  • Step 1 03 Write the calculated data of the above check block to the corresponding check block in the disk array.
  • the method for generating a check block in the multi-disk fault-tolerant system obtaineds data of each data block in the check packet, and obtains data of the check block according to the data of each data block, where the data block and
  • the check block adopts the setting mode of the embodiment shown in FIG. 1 , which can eliminate complex multiplication and addition operations when generating the check block, and can effectively reduce the computational complexity of the multi-disk fault-tolerant system in the data processing process. .
  • Step 201 Let the variable k be the line number of the check block in the check area / ;
  • Step 202 Determine a row number and a column number of a next data block that belong to the check packet.
  • the row number i can be determined by calculating (m - k X /)%m, m is a predetermined prime number less than pq, and the column number j is determined by calculating ( n + k)%p;
  • Step 203 Read data in the data block C (i, j) determined in step 202;
  • Step 204 increasing the value of k by one
  • Step 205 Determine whether k is less than m-l + /, if yes, execute step 202, if not, execute step 206;
  • Step 206 Perform XOR calculation on the data of each data block in the check packet and write the calculation result into the check block.
  • the data that has been read is XORed, and in the subsequent process, the XOR value of the last calculation is used to perform an exclusive OR calculation with the data read this time. Only the last XOR calculation result can be written into the check block in the step.
  • the method for generating a check block in the multi-disk fault-tolerant system provided in this embodiment can obtain a disk data check block by using less XOR calculation, which can effectively reduce the computational complexity in the multi-disk fault-tolerant system.
  • FIG. 9 is a schematic flowchart of a method for restoring a data block in a multi-disk fault-tolerant system according to an embodiment of the present invention.
  • the method for restoring a data block in this embodiment is based on the multi-disk fault-tolerant system shown in FIG. 1, as shown in FIG. The following steps:
  • Step 301 Obtain data of other data blocks in the check packet to which the data block to be restored belongs and data of the check block; and obtain data of the data block to be restored according to the calculation;
  • Step 303 Write a value of the data block obtained by the calculation into the to-be-recovered data block.
  • the method for recovering a data block in the multi-disk fault-tolerant system obtaineds data of other data blocks in the check packet to which the data block to be restored belongs and data of the check block, and obtains the data according to the foregoing data.
  • the data to be recovered wherein the data block and the check block adopt the setting mode of the embodiment shown in FIG. 1 , can eliminate complicated multiplication and addition operations when acquiring the data block, and can effectively reduce multi-disk fault tolerance.
  • the data of the data block to be restored includes: performing XOR processing according to data of the check block and data of other data blocks to obtain data of the data block to be restored.
  • the parity block of the first row of the check region or the parity block of the second row may be used for verification, for example, when the data block to be restored is used.
  • D (i, j) the steps shown in Figure 10 can be included:
  • Step 401 Calculate location information of a parity block of the data block D (i, j) to be restored, and obtain data of the parity block;
  • the data can be recovered by using the check block on the first row or the second row, and the check block P on the first row (ie, the mth row) of the check region is utilized.
  • the data therein can be obtained; in this step, the first row and the second row of the check region are calculated according to the position information of the data block in the dual-disk fault-tolerant system.
  • the method of verifying the column number of a block is also applicable to a multi-disk fault-tolerant system composed of three or more disks.
  • Step 402 Read data of other data blocks in the check packet.
  • the row number and column number of other data blocks in the check packet can also be obtained by calculation, for step 401
  • the value of m is taken; for the check packet to which the check block P (m+l, y) on the second line in step 401 belongs, the line number of the data block is (m-2k) %m, and the column number is ( Step 403: Obtain data of the data block to be restored according to data of other data blocks of the check packet to which the data block belongs and data of the check block, and write the data into the data to be restored. In the block.
  • the data of the data block to be recovered may be XORed according to data of other data blocks and parity data of the check packet to which the data block belongs.
  • the foregoing embodiment provides a method for recovering a data block to be restored in a multi-disk fault-tolerant system, which can obtain a better data block update cost.
  • a fault-tolerant system that implements q disks, only one byte is required to update a data block.
  • the secondary disk write operation can improve the write performance of the multi-disk fault tolerant system.
  • FIG. 11 is a schematic diagram of a process of rebuilding a single faulty disk in a dual-disk fault-tolerant system according to an embodiment of the present invention.
  • This embodiment is a method for recovering a single disk in a double-disk fault-tolerant system, as shown in FIG. , including the following steps:
  • Step 501 initially setting the variable i to 0;
  • Step 502 the variable i is incremented by 1, and recovers the data block D(i, j) on the damaged disk by using the recovery method of the single damaged data block mentioned in the above embodiment, that is, by acquiring the belonging check packet of the damaged data block.
  • the data of the check block and the data of other data blocks in the check packet, and the obtained data is XORed to obtain the data of the damaged data block;
  • Step 503 the data recovered in step 502 is written to the data block D(i, j);
  • Step 504 Determine whether the value of i is less than m, where m is a prime number less than pq, where p is the number of disks in the disk array, q is the number of fault-tolerant disks, and step 502 is performed when the value of i is less than m.
  • Step 505 is performed when the value of i is greater than or equal to m;
  • Step 505 Resume the check block on the damaged disk.
  • each disk includes two check blocks. Therefore, in this step, each check packet to which the check block P (m, j ) belongs is first obtained.
  • Data of the data block is subjected to XOR calculation, and the result of the exclusive OR calculation is written into the check block P (m, j ); and each data of the check packet to which the check block P ( m+1 , j ) belongs is obtained.
  • the block data is XORed and the result of the XOR calculation is written into the check block P (m+1, j).
  • FIG. 12 is a schematic flowchart of a method for recovering multiple disks in a multi-disk fault-tolerant system according to an embodiment of the present invention. As shown in FIG. 12, the method includes the following steps:
  • Step 601 Determine a starting point of a recovery path.
  • the sequence numbers of the two failed disks are a and b, that is, the column numbers of the two failed disks in the disk array are a and b, and a> b
  • the second case When two failed disks are not adjacent, ie a ⁇ (b + 1)% ⁇ , there are 4 recovery paths: the first is the check block P (m, (b+1) %p) The data block on the recovered a disk is the starting point, and the second is the data block on the b disk recovered by the check block P(m, (a+l)%p). The strip is based on the data block on the a disk recovered by the check block P(m+1, (bl)%p), and the fourth block is the check block P(m+1, (al)%p) The data block on the recovered b disk is the starting point.
  • Step 602 Restore a data block or a check block on the recovery path.
  • the parity block on different check lines is alternately used to recover the data block.
  • a data block D(i, b) on a failed disk b is recovered using a parity block on the first parity line, and the next recovered on the recovery path
  • the data block is the data block D (j, a) on the other failed disk a, where the check block of the check packet to which D (j, a) and D (i, b) belong is on the second check line.
  • the next recovered data block on the recovery path is another failure.
  • the data block D (j, a) on the disk a, where the check block of the check packet to which D (j, a) and D (i, b) belong is on the first check line.
  • the check block is on the second check line, where m is any pre-set number that is less than the difference between the number of disks p in the disk array and the number q of fault-tolerant disks.
  • Step 603 Determine whether the data block or the check block on the damaged disk is recovered in the foregoing step, terminate the recovery path if it is a check block, and execute step 604. If it is a data block, execute step 602; Step 604, determine the damage Whether all the data blocks on the disk are restored, if yes, step 605 is performed; if not, step 601 is performed to re-determine a recovery path;
  • Step 605 it is determined whether the check block on the damaged disk is restored, if it is restored, the process ends; otherwise, step 606 is performed;
  • Step 606 Restore the unverified check block on the damaged disk.
  • the starting point is to recover the data block C (0, 3) (f 6) on D3 by using the check block C(5, 5) (6) of the first check line on the disk D5 on the right side of the failed disk, and then according to Data block C (0, 3) (f 6) uses the parity block C (6, 5) ( f ) of the second parity row to recover C (3, 4) (f 3 ) of the D4 failed disk, according to the data block C (3, 4) ( f 3 ) Restore the C (4, 3) (a3 ) of the failed disk D3 using the parity block C (5, 2) ( 3 ) of the first parity row, according to the data block C (4) , 3) ( a3 ) using the parity block C (5, 2) of the second parity row ( a ) to recover the C (2, 4) ( a2 ) of the failed disk D4, according to the data block C (2, 4) ( A2) recovering C (2, 3) ( e2 ) of the failed disk D
  • the starting point of another recovery path is to use the parity block C(6, 2) of the second parity row of disk D2 (c) to recover the data block C (1, 4) (cl) of the failed disk D4, and then Data block C (1, 4) (cl) Restores C (2, 3) ( gl ) of faulty disk D3 using parity block C (5, 0) ( 1 ) of the first parity row, according to data block C ( 2, 3) ( gl ) Restore the C (0, 4) ( g7 ) of the failed disk D 4 using the parity block C (6, 6) ( g ) of the first parity row, according to the data block C (0, 4) ( g7 ) Restore the C (1, 3) ( b7 ) of the failed disk D3 using the parity block C (5, 6) ( 7 ) of the first parity row, according to the data block C (1, 3) ( b7 ) Restore the C (4, 4) ( b4 ) of the failed disk D4
  • the starting point of the path is started by using the check block of the first check line to the right of the failed disk or the check block of the second check line of the left, and then alternates
  • the faulty disk data block is recovered by using the check block of the first check line and the second check line until the check block is recovered, and the above two paths can be simultaneously performed, thereby improving the recovery speed.
  • the method of data recovery for non-adjacent dual disk failures is the same as for adjacent double disk failures, except that there are four parallel recovery paths.
  • the two-disk fault tolerance is used as an example.
  • the data block recovery method provided by the embodiment of the present invention may also be used for three or more multi-disk fault-tolerant systems, where the recovery path is determined.
  • the order and method of cross-recovering data blocks on each recovery path are different.
  • a disk in the embodiment of the present invention can be regarded as a storage node in a storage area network (Sorage Area Network, hereinafter referred to as SAN), and the system and method for multi-disk fault tolerance in the embodiment of the present invention are applied to the SAN technology.
  • the method of encoding and decoding data is the same as the above embodiment.
  • a single disk in the embodiment of the present invention is used as a network node in the distributed storage system, and the foregoing embodiment of the present invention can be applied to the distributed storage system, and the data is encoded and
  • the decoding method is the same as that of the above embodiment.
  • the multi-disk fault-tolerant system provided by the foregoing embodiment of the present invention, the method for generating a check block in a multi-disk fault-tolerant system, the method for recovering a data block in a multi-disk fault-tolerant system, and the method for recovering multiple disks in a multi-disk fault-tolerant system can use a small amount XOR calculation to generate the disk data check block, effectively reduce the computational complexity of the multi-disk fault-tolerant system, and obtain the optimal data block update cost.
  • a fault-tolerant system that implements q disks updating a data block only needs to q+1 disk write operation, improved multi-disk fault tolerance system System write performance.
  • the embodiment of the present invention can also obtain the characteristics of load balancing of each disk. Regardless of whether the check block or the recovered data block is calculated, the load of each disk is balanced, thereby improving the overall performance of the multi-disk fault-tolerant system.

Description

多磁盘容错系统及生成校验块、 恢复数据块的方法 本申请要求于 2009年 8月 4日提交中国专利局、申请号为 200910090420.2、 发明名称为"多磁盘容错系统及生成校验块、 恢复数据块的方法"的中国专 利申请的优先权, 其全部内容通过引用结合在本申请中。
技术领域 本发明实施例涉及数据存储技术领域,特别涉及一种多磁盘容错系统及生 成校验块、 恢复数据块的方法。
背景技术 随着网络的发展和计算机技术的普及应用,人们对存储系统的性能要求越 来越高。 廉价冗余磁盘阵歹1 J ( Redundant Array of Independent Di sks , 以下 简称: RAID )采用分条和冗余的方法提高了存储系统的容量、 速度和可靠性, 已成为高性能数据存储的首选结构,磁盘阵列技术的基本思想有两个方面: 利 用数据条纹化提高性能以及利用数据冗余提高可靠性。迄今为止, 大多数系统 被设计成容忍单磁盘故障, 对单磁盘故障的设计原理是, 磁盘发送故障较少, 在一个磁盘故障发生后, 另一个故障发生前有足够的时间从故障中恢复。 随着磁盘技术的发展和用户对更高性能的存储系统的需求使得单磁盘故 障容错越来越不足, 首先是磁盘阵列规模的不断增大,越来越多的磁盘被分组 到一个阵列,相应的一个阵列中发生多个磁盘故障的可能性增加了; 第二是磁 盘容量的增加比数据存取速度的增加快, 重建一个磁盘的所需的时间增加, 因 此加长了重建一个磁盘过程中阵列产生随后的磁盘故障的时间窗口;第三是介 质存储密度的增加导致磁盘可靠性减小;第四是由于实际应用中磁盘故障的相 关性, 由于外部环境和磁盘内部因素的影响, 导致磁盘故障是相关的, 这也会 导致短时间内发生多个磁盘故障的概率大大增加。
一个磁盘阵列中从多磁盘故障恢复的常规技术大致可以分为双校验、双镜 像及 RAID51型及其改进机型模式。 在双镜像模式中, 数据被镜像两次, 使得 有该数据的三个拷贝, 每次写请求需要产生三个磁盘写操作来更新每个拷贝, 同时也需要使用未保护阵列的三倍存储空间。
RAID51型模式防止单磁盘故障破坏数据, 并且镜像 RAID 5阵列来保护多 达三个任意磁盘故障,对于一个写请求,产生两个磁盘读操作和四个磁盘写操 作。双校验模式将 RAID5型模式扩展为双校验, 该模式下每个写请求产生至少 三个磁盘读操作和三个磁盘写操作。
RAID6是双校验型模式的双磁盘容错方法, 与其他级别的 RAID相比, 增 加了两个独立的错误校验区块,即校验块 P和校验块 Q,对于每个分条( stripe ) 包括两个校验单元: P校验单元和 Q校验单元, 其中 P采用奇偶校验码, Q采 用 Reed-Solomon等其它校验码。 当出现单磁盘故障时, P+Q RAID转变为 N+1 奇偶校验的 RAID5。 当出现双磁盘故障时, P+Q RAID 转变为无容错能力的 RAID0。
发明人在实现本发明的过程中发现,现有技术中 RAID6在进行数据处理时 需要进行伽罗瓦域(Galois Fie ld ) 变换, 该变换过程需要进行复杂的乘加运 算, 使得计算复杂度高。 发明内容
本发明实施例的目的是提供一种多磁盘容错系统及生成校验块、 恢复数 据块的方法, 能够降低多磁盘容错系统中校验块生成的计算复杂度, 提高数 据处理速度。
为实现上述目的, 本发明实施例提供了一种多磁盘容错系统, 包括通过系 统总线连接的磁盘阵列和计算模块;
所述磁盘阵列由 p个磁盘组成, 其中 p为大于或等于 3的自然数, 所述磁 盘阵列的容错磁盘数为 q, 其中 q为小于 /2且不小于 2的自然数;
所述磁盘阵列中的数据按照 (m + ^x p的矩阵 M形式进行排列, 其中 m为小 于或等于 p-d的素数; 所述矩阵 M中第 0行为虚拟的值为 0的虚拟数据块, 第 1行至第 m-1行为数据块, 第 m行至第 m+q-1行为校验区域; 其中校验区域中 的校验块 Cm— 所在校验分组中的各个数据块的行号为 w - /, 列号为 M + k 的取值范围为 /到 -l+/, 其中 /为校验块在校验区域中的行号, l≤/≤g,n为校 验块对应的列号, 0≤«≤ρ-1 ; 所述校验块中的数据为校验块所属校验分组的所 有数据块中数据的异或值;
所述计算模块用于根据校验分组中的数据块进行异或计算以生成所述校 验分组中的校验块, 并在磁盘损毁时根据所述校验块恢复所述数据块。
本发明实施例还提供了一种在上述多磁盘容错系统中生成校验块的方法, 包括:
获取需要生成的校验块所属校验分组中所有数据块的数据;
根据所述校验分组中所有数据块的数据获得所述需要生成的校验块中的 校验数据;
将上述获得的所述校验数据写入到所述磁盘阵列中的相应校验块中。 本发明实施例还提供了一种在上述多磁盘容错系统中恢复数据块的方法, 包括:
获取待恢复数据块所属校验分组中其他数据块的数据;
获取待恢复数据块所属校验分组中校验块的数据;
数据进行进 行异或计算以获得所述待恢复数据块的数据;
将所述计算获得的数据块的值写入到所述待恢复数据块中。
本发明实施例提供了一种多磁盘容错系统、在多磁盘容错系统中生成校验 块的方法以及在多磁盘容错系统中恢复数据块的方法,在多磁盘容错系统中生 成校验块以及恢复数据块的过程中, 不需要进行复杂的乘加运算, 能够有效降 低多磁盘容错系统在数据处理过程中的计算复杂度, 提高数据处理速度。 附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案, 下面将对实 施例或现有技术描述中所需要使用的附图作一简单地介绍,显而易见地, 下面 描述中的附图是本发明的一些实施例, 对于本领域普通技术人员来讲,在不付 出创造性劳动性的前提下, 还可以根据这些附图获得其他的附图。
图 1为本发明实施例提供的多磁盘容错系统实施例的结构示意图; 图 2为本发明实施例中计算模块的结构示意图一;
图 3为本发明实施例中计算模块的结构示意图二;
图 4为本发明实施例提供的多磁盘容错系统具体结构示意图
图 5为本发明实施例提供的多磁盘容错系统的数据布局示意图; 图 6 为本发明实施例提供的多磁盘容错系统中数据块与校验块关系示意 图;
图 Ί 为本发明实施例提供的多磁盘容错系统中生成校验块的方法流程示 意图;
图 8为图 7所示实施例中一个具体实施例的流程示意图;
图 9 为本发明实施例中多磁盘容错系统中恢复数据块的方法的流程示意 图;
图 10为图 9所示实施例中一个具体实施例的流程示意图;
图 11为本发明实施例中双磁盘容错系统在重建单个故障磁盘的流程示意 图;
图 12为本发明实施例中多磁盘容错系统中恢复多个磁盘的方法流程示意 图;
图 1 3为图 12所示实施例中数据恢复路径的示意图。
具体实施方式 下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清 楚、 完整地描述, 显然, 所描述的实施例仅仅是本发明一部分实施例, 而不是 全部的实施例。基于本发明中的实施例, 本领域普通技术人员在没有作出创造 性劳动前提下所获得的所有其他实施例, 都属于本发明保护的范围。 图 1为本发明实施例提供的多磁盘容错系统实施例的结构示意图, 如图 1 所示, 多磁盘容错系统包括通过系统总线 11连接的磁盘阵列 12和计算模块 1 3 , 其中:
磁盘阵列 12由 ρ个磁盘组成, 其中 ρ为大于或等于 3的自然数, 所述磁 盘阵列的容错磁盘数为 q, 其中 q为小于 p/2且不小于 2的自然数; 所述磁盘 阵列中的数据按照 (m + ^xp的矩阵 M形式进行排列, 其中 m为小于或等于 p-q 的素数; 所述矩阵 M中第 0行为虚拟的值为 0的虚拟数据块, 第 1行至第 m-1 行为数据块, 第 m行至第 m+q-1行为校验块; 其中校验块 ^所在校验分组 中的各个数据块的行号为 - /, 列号为《 + 的取值范围为 /到 m-l+/, 其 中 I为校验块在校验区域中的行号, 1≤ /≤ g, n为校验块对应的列号, 0≤η≤Ρ-1 ; 所述校验块的数据为校验块所属校验分组中所有数据块的数据的异或值;
计算模块 1 3 , 用于根据校验分组中的数据块计算生成所述校验分组中的 校验块, 并在磁盘损毁时根据所述校验块恢复所述数据块。
本实施例中提供的多磁盘容错系统, 在任意整数 P ( P > 3 ) 个磁盘中,能 够实现 q ( 2≤q≤Lp / 2」)个磁盘的容错, 并且其中每个磁盘 + ( m 为小于 p-q的素数)行物理单元为一个数据集, 第 0行为全为 0的虚拟物理单元, 第 1行到 m-1行存储数据块, 之后的 q行存储校验块, 其中的校验块的数据为校 验块所属校验分组中所有数据块中数据的异或值。本实施例提供的多磁盘容错 系统能够使得在生成校验块以及恢复数据块时不需要进行复杂的乘加运算,而 仅仅进行异或计算即可,能够有效降低多磁盘容错系统在数据处理过程中的计 算复杂度。
图 2为本发明实施例中计算模块的结构示意图一,如图 2所示,计算模块 1 3包括:
第一获取单元 1 31 , 用于获取校验块所属校验分组中所有数据块的数据; 第一计算单元 1 32 , 用于根据所述第一获取单元获取的校验块所属校验分 组中所有数据块的数据进行异或计算以获取所述校验块的数据; 第一输出单元 133, 用于将第一计算单元 132计算生成的校验块的数据写 入到磁盘阵列中相应的校验块中。
本实施例中是针对生成数据块的情况, 将计算模块按早功能进行划分,校 验块的值为校验块所属校验分组中所有数据块中数据的异或值。
图 3为本发明实施例中计算模块的结构示意图二,如图 3所示,计算模块
13包括第二获取单元 134、 第二计算单元 135和第二输出单元 136, 其中: 第二获取单元 134, 用于获取待恢复数据块所属校验分组中除待恢复数据 块之外其他数据块的数据以及校验块的数据;
第二计算单元 135, 用于根据第二获取单元 134获取的所述数据块的数据 以及所述校验块的数据进行异或计算以获取所述待恢复数据块的数据;
第二输出单元 136, 用于将所述第二计算单元 135计算生成的待恢复数据 块的待恢复写入到所述待恢复数据块中。
图 4为本发明实施例提供的多磁盘容错系统具体结构示意图,如图 4所示, 多磁盘容错系统包括磁盘阵列 21、 主处理器 22、 X0R协处理器 23、 緩存模块 24和系统总线 25, 上述的磁盘阵列 21、 主处理器 22、 X0R协处理器 23和緩 存模块 24由系统总线 25连接, 本具体实施例中磁盘阵列由 个( p>2> )磁盘 组成, 磁盘阵列的容错磁盘数为 q ( 2≤q≤Lp/2」),磁盘阵列中数据的具体存 储格式可参见图 1所述实施例。 上述实施例中的主处理器 22用于完成磁盘阵 列系统的地址转换、 系统管理、 緩存管理等操作, X0R协处理器 23主要完成 数据块的异或计算, X0R协处理器可相当于上述实施例中的计算模块; 緩存模 块 24用于緩存数据。
图 5 为本发明实施例提供的多磁盘容错系统的数据布局示意图, 如图 5 所示, 以 7 (p=7 )个磁盘为例, 每一列代表一个磁盘, 共有 7个磁盘, 分别 是 D0、 Dl、 D2、 D3、 D4、 D5、 D6, 实现 2 ( q=2 ) 个磁盘的容错, 本实施例中 m值取 5, 在一个数据集中共有 7行数据块, 其中第 0行为虚拟的全为 0的数 据块, 不占用磁盘的实际存储空间, 第 1行至第 4行为数据块, 存放的是有效 数据, 第 5、 6行为校验块, 为相应数据块的异或校验值, 对数据块提供冗余 保护, 可以看出本发明实施例中校验块不是使用单独的磁盘存放, 而是均匀分 布在所有磁盘上。
图 6 为本发明实施例提供的多磁盘容错系统中数据块与校验块关系示意 图, 如图 6所示, 仍以 7个磁盘为例, 其中 1-5行的每个数据块都被两个不同 的校验分组所包含, 如磁盘 DO上的第 1行数据块 D ( 1, 0)被标示为 f4, 表 示该数据块既属于第一行校验的 4分组, 也属于第二行校验的 f 分组, 4分组 的校验块 4位于磁盘 D3上, f 分组的校验块 f位于磁盘 D5上, 除虚拟行(即 校验分组的第 0行)外其他所有数据块所属的分组校验块都位于不同的磁盘、 不同的校验行上, 即除第 0行虚拟数据块外, 不存在一个数据块属于同一个磁 盘上的两个校验分组,也不存在同一个数据块属于同一校验行上的两个校验分 组。
在一个校验分组 f 中,校验块 f 的值等于所有标示有 f 的数据块的异或值, 即 f = f4®fl®f7®f6®fi。若 C, 为第 i行第 j列的物理数据块, 本实施例中对于 第一行校验块 C ( 5, 0), 其中 m=5, 1=1, k的取值范围为 1到 5, 其所属校验 分组的数据块的行号为 - /, 列号为《 + 经计算可得数据块 C (4, 1)、 C(3, 2)、 C(2, 3)、 C (1,4)和 C (0, 5)为该校验分组的数据块, 同理可计算第一 行其他校验块所属校验分组数据块的情况 ,各检验块的值为其所属校验分组中 各个数据块的异或值, 具体可见如下的公式:
A、 对于第一行校验块
C(5, 0) =C(4, 1) ©C(3, 2) ©C(2, 3) ©C(l, 4) ©C(0, 5)
C(5, 1) =C(4, 2) ©C(3, 3) ©C(2, 4) ©C(l, 5) ©C(0, 6)
C(5, 2) =C(4, 3) ©C(3, 4) ©C(2, 5) ©C(l, 6) ©C(0, 0)
C(5, 3) =C(4, 4) ©C(3, 5) ©C(2, 6) ©C(l, 0) ©C(0, 1)
C(5, 4) =C(4, 5) ©C(3, 6) ©C(2, 0) ©C(l, 1) ©C(0, 2)
C(5, 5) =C(4, 6) ©C(3, 0) ©C(2, 1) ©C(l, 2) ©C(0, 3)
C(5, 6) =C(4, 0) ©C(3, 1) ©C(2, 2) ©C(l, 3) ©C(0, 4)
对于第二行校验块 C ( 6, 0 ), 其中 m=5, 1=2, k的取值范围为 2到 6, 其 所属校验分组的数据块的行号为 - /, 列号为《 + 经计算可得数据块 C (l,2)、 C (4,3)、 C (2,4)、 C (0, 5)和 C (3, 6)为该校验分组的数据块, 同理可 计算第二行其他校验块所属校验分组数据块的情况,各检验块的值为其所属校 验分组中各个数据块的异或值, 具体可见如下的公式:
B、 对于第二行校验块
C(6, 0) =C(1, 2) ©C(4, 3) ©C(2, 4) ©C(0, 5) ©C(3, 6)
C(6, 1) =C(1, 3) ©C(4, 4) ©C(2, 5) ©C(0, 6) ©C(3, 0)
C(6, 2) =C(1,4) ©C(4, 5) ©C(2, 6) ©C(0, 0) ©C(3, 1)
C(6, 3) =C(1, 5) ©C(4, 6) ©C(2, 0) ©C(0, 1) ©C(3, 2)
C(6, 4) =C(1,6) ©C(4, 0) ©C(2, 1) ©C(0, 2) ©C(3, 3)
C(6, 5) =C(1,0) ©C(4, 1) ©C(2, 2) ©C(0, 3) ©C(3, 4)
C(6, 6) =C(1, 1) ©C(4, 2) ©C(2, 3) ©C(0, 4) ©C(3, 5)
图 Ί 为本发明实施例提供的多磁盘容错系统中生成校验块的方法流程示 意图, 如图 7所示, 包括如下步骤:
步骤 101、 获取需要生成的校验块所属校验分组中各个数据块的数据, 本 实施例中的生成校验块的方法 于图 1所述的多磁盘容错系统,在该多磁盘 系统中, 校验块 ^所在校验分组中的各个数据块的行号和列号满足如下的 条件, 数据块的行号为 - /, 列号为 A的取值范围为 /到 m-l+/, 其中 / 为校验块在校验区域中的行号, 1≤ /≤ g, n为校验块对应的列号, 0≤η≤Ρ-1 ; 具体的在获取到属于上述校验块所在校验分组的所有数据块的数据后,将 其进行异或计算即可得到校验块的值, 具体可以按照如下的公式进行计算: c = ㊉ Γ
在获得校验分组中各个数据块的数据后根据上述公式进行异或计算即可 得到校验块的数据;
步骤 1 02、 根据所述校验分组中所有数据块的数据获得所述需要生成的校 验块中的校验数据;
步骤 1 03、 将计算获得的上述校验块的数据写入到所述磁盘阵列中的相应 校验块中。
本实施例中提供的多磁盘容错系统中生成校验块的方法,通过获取校验分 组中各个数据块的数据,根据所述各个数据块的数据获取校验块的数据, 其中 的数据块和校验块由于采用了如图 1所示实施例的设置方式,能够使得在生成 校验块时不需要进行复杂的乘加运算,能够有效降低多磁盘容错系统在数据处 理过程中的计算复杂度。
上述实施例中的具体计算过程可以按照图 8所示的步骤执行, 包括: 步骤 201 , 令变量 k为校验块在校验区域中的行号 / ;
步骤 202 , 确定属于该校验分组的下一个数据块的行号和列号;
具体可以通过计算 (m - k X /)%m确定行号 i , m为预定的小于 p-q的素数, 通过计算 (n + k)%p确定列号 j; 步骤 203 , 读取步骤 202中确定的数据块 C (i, j)中的数据;
步骤 204 , 令 k的值增加 1 ;
步骤 205 , 判断 k是否小于 m-l + /, 如果是则执行步骤 202 , 如果不是则 执行步骤 206 ;
步骤 206 , 将校验分组中各个数据块的数据进行异或计算并将计算结果写 入到校验块中; 另外在上述步骤 203 中若已经读取了一个以上的数据块的数 据, 则可在从数据块中读取数据后即将已经读取的数据进行异或计算, 并可以 在此后的流程中, 利用上次计算的异或值与本次读取的数据进行异或计算, 在 本步骤中仅将最后的异或计算结果写入校验块即可
本实施例中提供的多磁盘容错系统中生成校验块的方法,可以使用较少的 异或计算获取磁盘数据校验块, 能够有效降低多磁盘容错系统中的计算复杂 度。
图 9 为本发明实施例中多磁盘容错系统中恢复数据块的方法的流程示意 图, 本实施例中恢复数据块的方法是基于图 1所示的多磁盘容错系统, 如图 9 所示, 包括如下步骤:
步骤 301、 获取待恢复数据块所属校验分组中其他数据块的数据以及校验 块的数据; 据进行计算获得所述待恢复数据块的数据;
步骤 303、 将所述计算获得的数据块的值写入到所述待恢复数据块中。 本实施例中提供的多磁盘容错系统中恢复数据块的方法,通过获取待恢复 数据块所属校验分组中其他数据块的数据以及校验块的数据,根据上述数据获 取待恢复的数据,其中的数据块和校验块由于采用了如图 1所示实施例的设置 方式, 能够使得在获取数据块时不需要进行复杂的乘加运算, 能够有效降低多 磁盘容错系统在数据处理过程中的计算复杂度。 得所述待恢复数据块的数据包括:根据所述校验块的数据以及其他数据块的数 据进行异或处理以获取待恢复数据块的数据。
上述实施例中的数据块恢复方法用在双磁盘容错系统中时,可使用校验区 域的第一行的校验块或者第二行的校验块进行校验,例如当待恢复的数据块为 D (i, j) , 可包括如图 10所示的步骤:
步骤 401、计算待恢复数据块 D (i, j)的校验块的位置信息并获取校验块的 数据;
在存在两行校验块时, 可利用第一行或者第二行上的校验块进行数据恢 复, 利用该数据块在校验区域第一行(即第 m行)上的校验块 P (m,w)时, 其 中的列号 w可由如下的计算式获得: w = (j-(m-i))%p , 其中 p为容错磁盘数, 利用该数据块在校验区域第二行(即第 m+1行)上的校验块 P (m+l,y)时, 其 中的列号 y可由如下公式获得: y = (j- (a x m-i)/2)%p ,其中 a的取值情况如下: 在 m-i为奇数或小于 4时 a为 2 , 否则 a为 1。 在获取上述校验块的位置信息 后, 即可获取其中的数据; 本步骤中给出在双磁盘容错系统中根据数据块的位 置信息计算其在校验区域的第一行和第二行的校验块的列号的方法同样适用 于由三个以上的磁盘组成的多磁盘容错系统。
步骤 402、 读取校验分组中其他数据块的数据;
校验分组中其他数据块的行号和列号也可通过计算获得, 对于步骤 401 中在第一行上的校验块 P (m, w)所属的校验分组, 其数据块的行号为(m-k) %m, 列号为(n+k) %p, 其中 k从 1到 m取值; 对于步骤 401 中第二行上的校验块 P (m+l,y)所属的校验分组, 其数据块的行号为(m-2k) %m, 列号为(n+k) %p; 步骤 403、 根据上述数据块所属的校验分组的其他数据块的数据和校验块 的数据得到待恢复的数据块的数据, 并将该数据写入待恢复的数据块中。
所述得到待恢复的数据块的数据可以根据将与所述数据块所属的校验分 组的其他数据块的数据和校验块的数据进行异或计算得到。
上述实施例中提供了在多磁盘容错系统中对待恢复数据块进行恢复的方 法, 能够获得较优的数据块更新代价, 在实现 q个磁盘的容错系统中, 更新一 个数据块只需要 q+1次磁盘写操作即可, 能够提高多磁盘容错系统的写性能。
图 11为本发明实施例中双磁盘容错系统在重建单个故障磁盘的流程示意 图, 本实施例是以双磁盘容错系统中发生单个磁盘故障为例,介绍恢复单个磁 盘的方法, 如图 11所示, 包括如下步骤:
步骤 501 , 初始时设定变量 i为 0;
步骤 502 , 变量 i 自加 1 , 并利用上述实施例中提到的单个损毁数据块的 恢复方法恢复损毁磁盘上的数据块 D(i, j), 即通过获取损毁数据块的所属校验 分组的校验块的数据以及校验分组中其他数据块的数据,并将获取的数据进行 异或计算获得损毁数据块的数据;
步骤 503 , 将步骤 502中恢复的数据写入到数据块 D(i, j);
步骤 504 , 判断 i的值是否小于 m, m为设定的小于 p-q的素数, 其中 p 为磁盘阵列中的磁盘个数, q为容错磁盘数目,在 i的值小于 m时执行步骤 502 , 在 i值大于或等于 m时执行步骤 505 ; 步骤 505, 恢复损毁磁盘上的校验块, 在双磁盘容错系统中每个磁盘包括 两个校验块, 因此本步骤中首先获取校验块 P (m, j )所属的校验分组的各个 数据块的数据并进行异或计算, 将异或计算的结果写入到校验块 P (m, j )中; 获取校验块 P ( m+1 , j )所属的校验分组的各个数据块的数据并进行异或计算, 将异或计算的结果写入到校验块 P (m+1, j ) 中。
图 12为本发明实施例中多磁盘容错系统中恢复多个磁盘的方法流程示意 图, 如图 12所示, 包括如下步骤:
步骤 601、 确定一条恢复路径的起始点;
例如以双磁盘容错系统为例, 令两个故障磁盘的序号为 a和 b, 即两个故 障磁盘在磁盘阵列中的列号为 a和 b, 且 a〉 b, 则确定一条恢复路径有两种情 况, 第一种情况: 当两个故障磁盘相邻时, 即 a=(b+l)%p, 此时存在两条恢复 路径, 一条是以校验块 P(m,(a+l)%p)恢复的 b磁盘上的数据块为起始点的, 另 一条是以校验块 P(m+1,(b-l)%p)恢复 a磁盘上的数据块为起始点的。 第二种情 况: 当两个故障磁盘不相邻时, 即 a≠ (b + 1)%ρ , 此时存在 4条恢复路径: 第一 条是以校验块 P (m, (b+1) %p)恢复的 a磁盘上的数据块为起始点, 第二条是以校 验块 P(m,(a+l)%p)恢复的 b 磁盘上的数据块为起始点, 第三条是以校验块 P(m+1,(b-l)%p)恢复的 a 磁盘上的数据块为起始点, 第四条是以校验块 P(m+1, (a-l)%p)恢复的 b磁盘上的数据块为起始点。
步骤 602、 恢复上述恢复路径上的数据块或者校验块;
具体可以为: 在确定一条恢复路径的开始点后, 交替使用不同校验行上的 校验块来恢复数据块。 例如在双磁盘容错系统中某个故障磁盘 b 上的数据块 D(i, b)是利用第一校验行上的校验块恢复的,则该恢复路径上的下一个恢复的 数据块为另一个故障磁盘 a上的数据块 D (j, a) , 其中 D (j,a)与 D (i,b)所属的 校验分组的校验块在第二校验行上。类似地,如果某个故障磁盘 b上的数据块 D (i, b)是利用第二校验行上的校验块恢复的,则该恢复路径上的下一个恢复的 数据块为另一个故障磁盘 a上的数据块 D (j, a) , 其中 D (j,a)与 D (i,b) 所属 的校验分组的校验块在第一校验行上。 其中在 a>b 的情况下, j 的值可由 j = (i - (a - b))%m确定, 此时 D(j, a)与 D(i, b)所属的校验分组的校验块在第一校验 行上; 或者由 j = (i - 2 X (a - b))%m确定, 此时 D(j, a)与 D(i, b) )所属的校验分组的 校验块在第二校验行上,其中 m为预设的小于磁盘阵列中磁盘数目 p和容错磁 盘数 q的差的任一素数。具体的单个数据块的恢复过程可以见图 9所示的多磁 盘容错系统中数据块恢复的方法的实施例,校验块的恢复可参见图 7所示的实 施例。
步骤 603、 判断上述步骤中恢复的是损毁磁盘上的数据块还是校验块, 如 果是校验块则终止恢复路径, 并执行步骤 604 ,如果是数据块则执行步骤 602; 步骤 604、 判断损毁磁盘上的所有数据块是否都恢复, 如果是则执行步骤 605 , 如果不是则执行步骤 601 , 重新确定一条恢复路径;
步骤 605、 判断损毁磁盘上的校验块是否都恢复, 如果都恢复则结束本流 程, 否则, 执行步骤 606;
步骤 606、 恢复损毁磁盘上未恢复的校验块。
以下以一个具体实施例说明上述的步骤 601、 步骤 602中确定恢复路径并 恢复该路径上的数据块和校验快的过程,以由 7个磁盘组成的磁盘阵列中两个 相邻磁盘 D3和 D4 (即磁盘阵列中列号为 3和 4的磁盘 )发生故障为例, 如图 13 所示, 故障磁盘的数据块恢复存在两条并行路径。 其中一条恢复路径的起 始点是利用从故障磁盘右侧的磁盘 D5上的第一校验行的校验块 C(5,5) (6)恢 复 D3上的数据块 C (0, 3) (f 6) ,然后根据数据块 C (0, 3) (f 6)利用第二校验行的 校验块 C (6, 5) ( f )恢复 D4故障磁盘的 C (3, 4) ( f 3 ),根据数据块 C (3, 4) ( f 3 ) 利用第一校验行的校验块 C (5, 2) ( 3 )恢复故障磁盘 D3的 C (4, 3) (a3 ), 根据 数据块 C (4, 3) ( a3 )利用第二校验行的校验块 C (5, 2) ( a )恢复故障磁盘 D4 的 C (2, 4) ( a2 ), 根据数据块 C (2, 4) (a2 )利用第一校验行的校验块 C (5, 1) ( 2 )恢复故障磁盘 D3的 C (2, 3) ( e2 ), 根据数据块 C (2, 3) ( e2 )所属的校验 分组生成故障磁盘 D4的第二校验行上的校验块 C (6, 4) (e), 该路径终止。 上 述恢复路径中可以简单表示为:
C(0, 3)f6→ C(3, 4)β→ C(4, 3)α3→ C(2, 4)α2→ C(3, 3)e2→ C(6, 4)e
另一条恢复路径的开始点是利用磁盘 D2 的第二校验行的校验块 C(6,2) (c)恢复的故障磁盘 D4的数据块 C (1, 4) (cl) , 然后根据数据块 C (1, 4) (cl) 利用第一校验行的校验块 C (5, 0) ( 1 )恢复故障磁盘 D3的 C (2, 3) ( gl ), 根据 数据块 C (2, 3) ( gl )利用第一校验行的校验块 C (6, 6) ( g )恢复故障磁盘 D4 的 C (0, 4) ( g7 ), 根据数据块 C (0, 4) ( g7 )利用第一校验行的校验块 C (5, 6) ( 7 )恢复故障磁盘 D3的 C (1, 3) ( b7 ), 根据数据块 C (1, 3) ( b7 )利用第二校 验行的校验块 C (6, 1) ( b )恢复故障磁盘 D4的 C (4, 4) ( b4 ),根据数据块 C (4, 4) ( b4 )所属的校验分组生成故障磁盘 D3的第一校验行上的校验块 C (5, 3) ( 4 ), 该路径终止。 上述恢复路径中可以简单表示为:
C(7, 4)cl→ C(2, 3)gl→ C(0, 4)g7→ C(l, 3)b7→ C(4, 4)b4→ C(5, 3)4
另外, 对于剩下的 C(6, 3)d和 C(5,4) 5属于校验块, 由于所有的数据块 都已恢复,对这两个校验块可直接使用校验块的生成方式直接计算可得,至此, 磁盘 D3和 D4的数据块和校验块全部恢复。
从上述两条恢复路径可以看出, 路经的开始点是利用故障磁盘的向右的 第一校验行的校验块或向左的第二校验行的校验块开始的,然后交替使用第一 校验行和第二校验行的校验块来恢复故障磁盘数据块,直到恢复的是校验块为 止, 上述的两条路径可同时进行, 从而能够提高恢复速度。 对于不相邻的双磁 盘故障数据恢复的方法同相邻的双磁盘故障一样,不同的是存在四条并行的恢 复路径而已。
本发明上述实施例中多是以双磁盘容错为例进行说明,同时对于 3个或者 3个以上的多磁盘容错系统也可以使用本发明实施例提供的数据块恢复方法, 其中在确定恢复路径时,各恢复路径上的交叉恢复数据块的顺序和方法有所差 别。
本发明实施例中的一个磁盘可以看作是存储区域网络 (S torage Area Network , 以下简称: SAN )中的一个存储节点, 即将本发明实施例中的多磁盘 容错的系统和方法应用到 SAN技术中,数据的编码和解码的方法与上述实施例 相同。 另外对于分布式存储系统容错,将本发明实施例中的单个磁盘作为分布 式存储系统中的一个网络节点,则可将本发明上述实施例中应用到分布式存储 系统中, 其数据的编码和解码方法与上述实施例相同。
本发明上述实施例提供的多磁盘容错系统、多磁盘容错系统中生成校验块 的方法、多磁盘容错系统中数据块恢复的方法和多磁盘容错系统中多个磁盘恢 复的方法, 能够使用少量的异或计算来产生磁盘数据校验块,有效降低多磁盘 容错系统的计算复杂度, 并获取的最优的数据块更新代价, 在实现 q个磁盘的 容错系统中, 更新一个数据块只需要 q+1次磁盘写操作, 提高了多磁盘容错系 统的写性能。 另外本发明实施例还可以获得各个磁盘负载均衡的特性, 无论计 算校验块还是恢复数据块,各个磁盘的负载都是均衡的,从而能够提高多磁盘 容错系统的整体性能。
最后应说明的是: 以上实施例仅用以说明本发明的技术方案, 而非对其限 制; 尽管参照前述实施例对本发明进行了详细的说明, 本领域的普通技术人员 应当理解: 其依然可以对前述各实施例所记载的技术方案进行修改, 或者对其 中部分技术特征进行等同替换; 而这些修改或者替换, 并不使相应技术方案的 本质脱离本发明各实施例技术方案的精神和范围。

Claims

权 利 要 求
1、一种多磁盘容错系统, 包括通过系统总线连接的磁盘阵列和计算模块, 其特征在于:
所述磁盘阵列由 p个磁盘组成, 其中 p为大于或等于 3的自然数, 所述磁 盘阵列的容错磁盘数为 q , 其中 q为小于 / 2且不小于 2的自然数;
所述磁盘阵列中的数据按照 (rn + ^ x p的矩阵 M形式进行排列, 其中 m为小 于或等于 p-q的素数; 所述矩阵 M中第 0行为虚拟的值为 0的虚拟数据块, 第 1行至第 m-1行为数据块, 第 m行至第 m+q-1行为校验区域; 其中所述校验区 域中的校验块 Cm;所在校验分组中的各个数据块的行号为 m _ k - l , 列号为 n + k , k的取值范围为 /到 m-l+/ ,其中 /为所述校验块在所述校验区域中的行号, l≤/≤g,n为所述校验块对应的列号, 0≤n≤P-l 所述校验块中的数据为校验块 所属校验分组中所有数据块的数据的异或值;
所述计算模块用于根据校验分组中的数据块进行异或计算以生成所述校 验分组中的校验块, 并在磁盘损毁时根据所述校验块恢复所述数据块。
2、 根据权利要求 1所述的多磁盘容错系统, 其特征在于, 所述计算模块 包括:
第一获取单元, 用于获取校验块所属校验分组中所有数据块的数据; 第一计算单元,用于根据所述第一获取单元获取的校验块所属校验分组中 所有数据块的数据进行异或计算以获取所述校验块的数据;
第一输出单元,用于将第一计算单元计算生成的校验块的数据写入到磁盘 阵列中相应的校验块中。
3、 根据权利要求 1所述的多磁盘容错系统, 其特征在于, 所述计算模块 包括:
第二获取单元,用于获取待恢复数据块所属校验分组中除待恢复数据块之 外的其他数据块的数据以及校验块的数据;
第二计算单元,用于根据第二获取单元获取的所述数据块的数据以及所述 校验块的数据进行异或计算以获取所述待恢复数据块的数据;
第二输出单元,用于将所述第二计算单元计算生成的待恢复数据块的数据 写入到所述待恢复数据块中。
4、 一种在如权利要求 1所述的多磁盘容错系统中生成校验块的方法, 其 特征在于, 包括:
获取需要生成的校验块所属校验分组中所有数据块的数据;
根据所述校验分组中所有数据块的数据获得所述需要生成的校验块中的 校验数据;
将上述获得的所述校验数据写入到所述磁盘阵列中的相应校验块中。
5、 根据权利要求 4所述的多磁盘容错系统中生成校验块的方法, 其特征 在于,所述根据校验分组中所有数据块的数据获得所述需要生成的校验块中的 校验数据包括:
根据校验分组中所有数据块的数据根据公式 cm—w,„ = m® c(m_k,) n+k)p获得所 述需要生成的校验块中的校验数据 Cm_l+hn
6、 一种在如权利要求 1所述的多磁盘容错系统中恢复数据块的方法, 其 特征在于, 包括: 获取待恢复数据块所属校验分组中其他数据块的数据;
获取所述待恢复数据块所属校验分组中校验块的数据; 行异或计算以获得所述待恢复数据块的数据;
将所述计算获得的数据块的值写入到所述待恢复数据块中。
7、根据权利要求 6所述的多磁盘容错系统中恢复数据块的方法, 其特征在 于, 所述获取待恢复数据块所属校验分组中校验块的数据包括:
根据如下公式确定待恢复数据块在第 m行的校验块的列号 w以获取校验 块的数据: w = (j'_( _ ))%p, 其中 i为所述待恢复的数据块所在行的行号, j为所述待恢复的数据块所在列的列号。
8、根据权利要求 6所述的多磁盘容错系统中恢复数据块的方法, 其特征在 于, 所述获取待恢复数据块所属校验分组中校验块的数据包括:
根据如下公式确定待恢复数据块在第 m+1行的校验块的列号 y以获取校验 块的数据: y = (j- (axm- i)/2)%p, 其中 i为所述待恢复的数据块所在行的行 号, j为所述待恢复的数据块所在列的列号, 在 m-i为奇数或小于 4时, a为 2, 否则为 1。
PCT/CN2010/075678 2009-08-04 2010-08-03 多磁盘容错系统及生成校验块、恢复数据块的方法 WO2011015134A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US13/365,960 US8489916B2 (en) 2009-08-04 2012-02-03 Multi-disk fault-tolerant system, method for generating a check block, and method for recovering a data block

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN200910090420.2A CN101625652B (zh) 2009-08-04 2009-08-04 多磁盘容错系统及生成校验块、恢复数据块的方法
CN200910090420.2 2009-08-04

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US13/365,960 Continuation US8489916B2 (en) 2009-08-04 2012-02-03 Multi-disk fault-tolerant system, method for generating a check block, and method for recovering a data block

Publications (1)

Publication Number Publication Date
WO2011015134A1 true WO2011015134A1 (zh) 2011-02-10

Family

ID=41521507

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2010/075678 WO2011015134A1 (zh) 2009-08-04 2010-08-03 多磁盘容错系统及生成校验块、恢复数据块的方法

Country Status (3)

Country Link
US (1) US8489916B2 (zh)
CN (1) CN101625652B (zh)
WO (1) WO2011015134A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104052576A (zh) * 2014-06-07 2014-09-17 华中科技大学 一种云存储下基于纠错码的数据恢复方法
CN111258807A (zh) * 2020-01-16 2020-06-09 四川效率源科技有限责任公司 一种逻辑卷管理中raid6缺失磁盘的数据恢复方法
CN114168087A (zh) * 2022-02-11 2022-03-11 苏州浪潮智能科技有限公司 校验数据生成方法、装置、设备及存储介质

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101625652B (zh) 2009-08-04 2011-06-08 成都市华为赛门铁克科技有限公司 多磁盘容错系统及生成校验块、恢复数据块的方法
CN102609422A (zh) * 2011-01-25 2012-07-25 阿里巴巴集团控股有限公司 类目错放识别方法和装置
CN102542048B (zh) * 2011-12-28 2013-09-11 用友软件股份有限公司 数据匹配处理装置和数据匹配处理方法
CN104881243B (zh) * 2014-05-27 2018-02-09 陈杰 阵列数据保护方法及系统
CN104156276B (zh) * 2014-08-14 2017-06-09 浪潮电子信息产业股份有限公司 一种防两块磁盘损坏的raid方法
CN109690466A (zh) * 2016-07-19 2019-04-26 锐思拓公司 实现分解的存储器盘片的方法和设备
CN109388513B (zh) * 2017-08-09 2020-11-03 华为技术有限公司 数据校验的方法、阵列控制器及硬盘
CN113411398B (zh) * 2021-06-18 2022-02-18 全方位智能科技(南京)有限公司 一种基于大数据的文件清理写入及清理管理系统及方法
US20230171099A1 (en) * 2021-11-27 2023-06-01 Oracle International Corporation Methods, systems, and computer readable media for sharing key identification and public certificate data for access token verification
CN114064347B (zh) * 2022-01-18 2022-04-26 苏州浪潮智能科技有限公司 一种数据存储方法、装置、设备及计算机可读存储介质
CN114442950B (zh) * 2022-01-21 2024-01-23 山东云海国创云计算装备产业创新中心有限公司 一种数据恢复方法、系统、装置及计算机可读存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6138125A (en) * 1998-03-31 2000-10-24 Lsi Logic Corporation Block coding method and system for failure recovery in disk arrays
CN1722096A (zh) * 2004-07-13 2006-01-18 鸿富锦精密工业(深圳)有限公司 多磁盘容错系统及方法
US7356757B2 (en) * 2004-02-06 2008-04-08 Hon Hai Precision Industry Co., Ltd. Fault tolerance system and method for one or two failed disks in a disk array
CN101625652A (zh) * 2009-08-04 2010-01-13 成都市华为赛门铁克科技有限公司 多磁盘容错系统及生成校验块、恢复数据块的方法

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6871317B1 (en) * 2001-11-13 2005-03-22 Network Appliance, Inc. Technique for efficiently organizing and distributing parity blocks among storage devices of a storage array
US6848022B2 (en) 2002-10-02 2005-01-25 Adaptec, Inc. Disk array fault tolerant method and system using two-dimensional parity
CN100419700C (zh) * 2004-02-11 2008-09-17 鸿富锦精密工业(深圳)有限公司 磁盘容错系统及方法
US7321905B2 (en) * 2004-09-30 2008-01-22 International Business Machines Corporation System and method for efficient data recovery in a storage array utilizing multiple parity slopes
CN101470582B (zh) * 2007-12-26 2012-12-19 无锡江南计算技术研究所 磁盘阵列控制装置及控制方法
CN101251812A (zh) * 2008-02-28 2008-08-27 浪潮电子信息产业股份有限公司 一种应用于集群系统数据容错的方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6138125A (en) * 1998-03-31 2000-10-24 Lsi Logic Corporation Block coding method and system for failure recovery in disk arrays
US7356757B2 (en) * 2004-02-06 2008-04-08 Hon Hai Precision Industry Co., Ltd. Fault tolerance system and method for one or two failed disks in a disk array
CN1722096A (zh) * 2004-07-13 2006-01-18 鸿富锦精密工业(深圳)有限公司 多磁盘容错系统及方法
CN101625652A (zh) * 2009-08-04 2010-01-13 成都市华为赛门铁克科技有限公司 多磁盘容错系统及生成校验块、恢复数据块的方法

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104052576A (zh) * 2014-06-07 2014-09-17 华中科技大学 一种云存储下基于纠错码的数据恢复方法
CN104052576B (zh) * 2014-06-07 2017-05-10 华中科技大学 一种云存储下基于纠错码的数据恢复方法
CN111258807A (zh) * 2020-01-16 2020-06-09 四川效率源科技有限责任公司 一种逻辑卷管理中raid6缺失磁盘的数据恢复方法
CN114168087A (zh) * 2022-02-11 2022-03-11 苏州浪潮智能科技有限公司 校验数据生成方法、装置、设备及存储介质
CN114168087B (zh) * 2022-02-11 2022-04-22 苏州浪潮智能科技有限公司 校验数据生成方法、装置、设备及存储介质

Also Published As

Publication number Publication date
US20120260125A1 (en) 2012-10-11
US8489916B2 (en) 2013-07-16
CN101625652A (zh) 2010-01-13
CN101625652B (zh) 2011-06-08

Similar Documents

Publication Publication Date Title
WO2011015134A1 (zh) 多磁盘容错系统及生成校验块、恢复数据块的方法
US8327080B1 (en) Write-back cache protection
JP5102776B2 (ja) ストレージアレイにおける三重故障からの効率的な復旧を可能にする三重パリティ技術
JP5302892B2 (ja) ストレージシステムにおける書き込み処理を最適化するためのシステム、及び方法
Xiang et al. Optimal recovery of single disk failure in RDP code storage systems
JP5298393B2 (ja) 並列リードソロモンraid(rs−raid)アーキテクチャ、デバイス、および方法
JP3742494B2 (ja) 大容量記憶装置
US6970987B1 (en) Method for storing data in a geographically-diverse data-storing system providing cross-site redundancy
US8583984B2 (en) Method and apparatus for increasing data reliability for raid operations
US8327250B1 (en) Data integrity and parity consistency verification
US20140351632A1 (en) Storing data in multiple formats including a dispersed storage format
US7069382B2 (en) Method of RAID 5 write hole prevention
US20060136778A1 (en) Process for generating and reconstructing variable number of parity for byte streams independent of host block size
CN109814807B (zh) 一种数据存储方法及装置
US8484506B2 (en) Redundant array of independent disks level 5 (RAID 5) with a mirroring functionality
US20130179750A1 (en) Semiconductor storage device and method of controlling the same
CN101546249A (zh) 磁盘阵列在线容量扩展方法
US20150089328A1 (en) Flex Erasure Coding of Controllers of Primary Hard Disk Drives Controller
US10067833B2 (en) Storage system
US7870464B2 (en) System and method for recovery of data for a lost sector in a storage system
WO2015055033A1 (zh) 一种元数据的保护方法和装置
CN114816837A (zh) 一种纠删码融合方法、系统、电子设备及存储介质
CN104516679A (zh) 一种raid数据处理方法及装置
JP6260193B2 (ja) ストレージシステム、及びストレージプログラム
US20060123321A1 (en) System and method for reconstructing lost data in a storage system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 10806038

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 10806038

Country of ref document: EP

Kind code of ref document: A1