CN102184129B

CN102184129B - Fault tolerance method and device for disk arrays

Info

Publication number: CN102184129B
Application number: CN201110106601.7A
Authority: CN
Inventors: 郑辉; 曹庭华
Original assignee: Hangzhou H3C Technologies Co Ltd
Current assignee: New H3C Technologies Co Ltd
Priority date: 2011-04-27
Filing date: 2011-04-27
Publication date: 2014-03-12
Anticipated expiration: 2031-04-27
Also published as: CN102184129A

Abstract

The invention provides a fault tolerance method and device for disk arrays, wherein the method comprises the following steps: when a disk in a disk array goes wrong, adding a hot spare into the disk array so as to replace the disk which goes wrong, and carrying out reconstruction on the disk array into which the hot spare is added in strips; when a reconstructed current strip has a reconstruction read error, recording the identifier of the current strip into a non-volatile random access memory (NVRAM), and skipping over the current strip, and continuing to carry out reconstruction from the next strip until the reconstruction on the disk array is completed; and aiming at the identifier of each strip recorded in the NVRAM, repairing the reconstruction read error of the strip corresponding to the identifier of the strip by writing, and deleting the identifier of the strip from the NVRAM after the repairing is completed. By using the method and device provided by the invention, the occurrence of problems caused by the reconstruction read errors or service read errors of the disk array in a degraded state can be avoided.

Description

The fault-tolerance approach of disk array and device

Technical field

The present invention relates to field of storage, particularly the fault-tolerance approach of disk array and device.

Background technology

Redundant Array of Independent Disks (RAID) (RAID:Redundant Array of Independent Disks), is called for short disk array, and it is combined into an array by a plurality of independently disks, and good redundancy and the memory property higher than single disk are provided.In field of storage, the redundancy by disk array self is directly or indirectly stored in data on a plurality of independent disks, to reach the object that data are not lost when one or more disk failure, has realized data fault-tolerant.

Wherein, when because some reason is while causing disk array to lose redundancy such as the disk failure in disk array etc., this disk array can be in degrading state.It is example in degrading state that the disk failure of take in disk array causes disk array to lose redundancy and make this disk array, in the prior art, for recovering this because of the redundancy of the disk array of disk failure in degrading state, the mode that conventional mode is rebuild for increasing HotSpare disk, is specially: the disk of replacing fault with HotSpare disk.But, in this process of reconstruction, if there is again disk, rebuild read error, wherein, rebuilding read error is in process of reconstruction, rebuilds the read error that I/O causes disk to occur,, stop rebuilding, now this disk array can only rest on degrading state, cannot get back to redundant state.Other disks in this disk array break down again, and whole disk array will be failed, close I/O passage, and this not only causes this disk array to stop providing business, also can cause the loss of data of storing before this disk array.

In addition, when the disk array in degrading state carries out business while reading, if the business of generation read error, wherein, business misreads and is mistaken for: in business read-write process, the read error that business I/O causes disk to occur,, now this disk array failure, closes I/O passage, this causes this disk array to stop providing business, and the loss of data of storage before causing.

Summary of the invention

The invention provides the fault-tolerance approach and the device that relate to disk array, avoid disk array in degrading state due to the problem that read error or business read error occur to rebuild causes.

Technical scheme provided by the invention comprises:

A fault-tolerance approach for disk array in the method, when the disk in disk array breaks down, increases HotSpare disk in described disk array, the disk breaking down to replace this, and take band and to having increased the disk array of HotSpare disk, rebuild as unit; Its key is, the method comprises:

When rebuilt current band occurs to rebuild read error, the identification record of band before deserving, in Nonvolatile memory, and is skipped to current band, from next band, continue to rebuild, until complete the reconstruction of disk array;

For each tape identification recording in described Nonvolatile memory, by the reconstruction read error of the WriteMode reparation band corresponding with this tape identification, and after completing reparation, from described Nonvolatile memory, delete this tape identification.

A fault-tolerance approach for disk array, the method comprises:

Band in the disk array in degrading state carries out in process that business reads, when read current band generation business read error time, the identification record of band before deserving, in Nonvolatile memory, is controlled to this disk array and kept reduction state, and continue to provide business;

For each tape identification recording in described Nonvolatile memory, by the business read error of the WriteMode reparation band corresponding with this tape identification, and after completing reparation, from described Nonvolatile memory, delete this tape identification.

A fault-tolerant device for disk array, this device comprises: replacement unit and reconstruction unit; Described replacement unit, for when the disk of disk array breaks down, increases HotSpare disk, the disk breaking down to replace this in described disk array; Described reconstruction unit is rebuild having increased the disk array of HotSpare disk as unit for take band; Its key is, described device also comprises:

Record cell, while occurring to rebuild read error for the current band being rebuild by described reconstruction unit, the identification record of band before deserving, in Nonvolatile memory, and is triggered to described reconstruction unit and skips current band, from next band, continue to rebuild, until complete the reconstruction of disk array;

Repair unit, for each tape identification recording for described Nonvolatile memory, by the reconstruction read error of the WriteMode reparation band corresponding with this tape identification, and after completing reparation, from described Nonvolatile memory, delete this tape identification.

A fault-tolerant device for disk array, comprising: business is read processing unit, record cell, control module and reparation unit, wherein,

Business is read processing unit, carries out business read for the band of the disk array in degrading state;

Record cell, for when the current band generation business read error of being read, in Nonvolatile memory, and triggers the identification record of band before deserving described control module and controls disk array and continue to keep reduction state, and continue to provide business;

Repair unit, for each tape identification recording for described Nonvolatile memory, by the business read error of the WriteMode reparation band corresponding with this tape identification, and after completing reparation, from described Nonvolatile memory, delete this tape identification.

As can be seen from the above technical solutions, in the present invention, when current band occurs to rebuild read error, not stop rebuilding, but the band that reconstruction read error occurs is recorded in Nonvolatile memory, skip and deserve front band, from next band, start to continue to rebuild, and there is to rebuild the band of read error for each, by WriteMode, repair the reconstruction read error of this band, recover as early as possible the redundancy of disk array, like this, even if there are a plurality of disks to break down in process of reconstruction, can not cause the whole disk array will be failed yet;

Also have, in the present invention, when current band generation business read error, also not cause whole disk array failure, but by the identification record of band before deserving in Nonvolatile memory, return to bad command, and control this disk array and continue to provide that business is read, business is write; Guarantee like this, on the one hand business continuance, the risk of avoiding on the other hand data to be lost.

Accompanying drawing explanation

The disk array schematic diagram that Fig. 1 provides for the embodiment of the present invention;

The schematic diagram of disk array during disk failure that Fig. 2 provides for the embodiment of the present invention

Fig. 3 is the schematic diagram of realizing of the embodiment of the present invention 1;

Fig. 4 is the schematic diagram of realizing of the embodiment of the present invention 2;

The structure drawing of device that Fig. 5 provides for the embodiment of the present invention;

Another structure drawing of device that Fig. 6 provides for the embodiment of the present invention.

Embodiment

In order to make the object, technical solutions and advantages of the present invention clearer, below in conjunction with the drawings and specific embodiments, describe the present invention.

In actual applications, disk array is all by dividing concurrent reading and writing that the mode of band realizes polylith disk.Fig. 1 is the schematic diagram of a plurality of bands of being divided of disk array provided by the invention.In Fig. 1, a cylinder is designated as a disk, it is connected in parallel and forms disk array, and Fig. 1 be take disk array, and to be divided into 9 identical bands of size be example, can find out that being divided each band obtaining has all taken the storage space of each disk in disk array.

Based on describing above, below by two embodiment, the fault-tolerance approach of disk array provided by the invention is described:

Embodiment 1:

In the present embodiment 1, when the disk in the disk array shown in Fig. 1 breaks down such as disk 3, in this disk array, increase HotSpare disk, to replace this disk breaking down 3, specifically as shown in Figure 2.

Afterwards, take band as unit to having increased the disk array of HotSpare disk, be that the disk array shown in Fig. 2 is rebuild.

In the process of reconstruction of the disk array shown in Fig. 2, if there is to rebuild read error in current rebuilt band, by the identification record of band before deserving in Nonvolatile memory, and skip current band, from next band, continue to rebuild, until complete the reconstruction of disk array, specifically can be referring to Fig. 3.In Fig. 3, the band of oblique line sign is that sequence number is 1,3,5,6 band generation reconstruction read error, and not rebuilt success, is recorded in Nonvolatile memory.Can find out, the present invention is than prior art, the reconstruction that affects disk array integral body not due to occurring to rebuild the band of read error, but continue to continue to rebuild from next band, until complete the reconstruction of disk array, this to disk failure few and this fault to the little application of service impact such as monitoring storage etc., can reduce the risk that disk failure brings whole system as far as possible.

It should be noted that, the present invention in Nonvolatile memory, mainly contains following two objects by the identification record of the band of generation reconstruction read error:

First, due in process of reconstruction, skip the band that read error has occurred to rebuild, therefore, data on this band being skipped are nonredundant, when follow-up when the shared disk of this band is issued to read command, if what read is disk 1 and/or the disk 2 in Fig. 2, can normally issue read command, with according to this read command reading out data; If what read is the HotSpare disk in Fig. 2, does not issue read command, but directly utilize shared other disks except HotSpare disk of this band such as the disk 1 in Fig. 2 and the data in disk 2 calculate the data that need to read from HotSpare disk.Also the data on shared other disks except HotSpare disk of the band that is skipped can read, and data on shared HotSpare disk can not read, it need to carry out corresponding calculating such as XOR calculating etc. obtains according to the data on other shared disks of this band.This is convenient to accurately judge the property of can read of disk, and, the present invention by the identification record of band that occur to rebuild read error in Nonvolatile memory, so that as the least unit of failed disk, having reduced to the full extent because read error occurs to rebuild some band on disk, band causes loss of data and the unsettled risk of disk array on whole disk array.

The second, be convenient to the band that read error occurs to rebuild in identification, to guarantee to repair as early as possible the reconstruction read error of this band, recover as early as possible the data redundancy of this band.

Wherein, the present embodiment 1 can be by the reconstruction read error of the WriteMode reparation band corresponding with Nonvolatile memory discal patch tape identification, and this is repaired operation and can in process of reconstruction or after rebuilding, carry out, and specifically can determine according to practical business situation.Below by following two kinds of modes, this reparation operation is described:

Mode 1:

The manner 1 is that the mode of writing by whole piece band realizes.Be specially: when to band corresponding to Nonvolatile memory discal patch tape identification, (band 1 shown in Fig. 3 of take is example, other bands are such as the principle of the

band

3,5,6 in Fig. 3 is similar) while writing data, if these data are just in time write each disk (comprising HotSpare disk) that full band 1 takies, the corresponding data writing of each disk taking to band 1 respectively, such as the disk 1 in Fig. 3, disk 2 data writing corresponding to HotSpare disk.So far, complete the repair of band 1, band 1 has recovered data redundancy again.

It should be noted that, in the data that write to band 1, can not write the space in each disk that full band 1 takies completely, such as, band 1 takies respectively disk 1, on disk 2 and HotSpare disk, size is the space of 16k, and now the size of these data is only 4k, can only write to the front 4k space of disk 1, the present invention utilizes the go forward data in 4k space of this disk 1 and disk 2 to carry out corresponding calculating, calculate the data that the Zhong Qian4k space, HotSpare disk space for taking to band 1 writes, and the front 4k in the storage space of the HotSpare disk that takies of data to this band 1 that writes this calculating.In this case, the present invention does not think that band 1 completes repair, still thinks that this band 1 does not also recover data redundancy.That is to say, the present invention is only write when full completely at band 1, has just thought the repair of band 1, and band 1 recovers data redundancy.

Mode 2:

The manner 2 is by writing setting data to band corresponding to Nonvolatile memory discal patch tape identification (band 1 shown in Fig. 3 of take is example, and other bands are such as the principle of the

band

3,5,6 in Fig. 3 is similar) such as 0 mode realizes.Which 2 is the reconstruction read errors of repairing by force this band 1, to recover the data redundancy of band 1.Be specially: the address of determining band 1 according to the sign of the band 1 of Nonvolatile memory record, according to the significance level of these band 1 data of storing of this definite adress analysis, if determine the significance level of these data, be less than setting threshold, by the reconstruction read error of following operation reparation and band 1: to writing full setting data with shared other disks except HotSpare disk of band 1, and to writing with the shared HotSpare disk of this band 1 data that calculate according to the setting data of other shared disks of this band.So far, complete the repair of band 1, band 1 has recovered data redundancy again.Certainly, if determine the significance level of these data, be more than or equal to setting threshold, can 1 carry out in the manner described above.

So far, by having realized with upper type 1 and mode 2, repair the operation that band is rebuild read error.When band is done after reparation, from described Nonvolatile memory, delete this tape identification.

Above embodiment 1 is described, below embodiment 2 is described.

Embodiment 2:

The present embodiment 2 is different from embodiment 1, and embodiment 1 mainly carries out for rebuilding read error, and the present embodiment 2 is mainly for the disk array at degrading state, to carry out the process that business reads to be described.

Disk array in degrading state in the present embodiment 2 can be disk array and loses the disk array after redundancy, be specially rebuilt disk array before or in process of reconstruction, or for causing stopping the disk array rebuild etc. because rebuild read error, the embodiment of the present invention does not limit.Band in disk array in degrading state to this carries out in process that business reads, read current band generation business read error time, by the identification record of band before deserving in Nonvolatile memory, return to bad command, and control this disk array and continue to provide that business is read, business is write, and control disk array and still keep degrading state, specifically can be referring to Fig. 4.In Fig. 4, having there is business read error in the band 1 of oblique line sign, is recorded in Nonvolatile memory.In existing mode, at disk array, after degrading state, if there is again business read error, this disk array failure, cannot provide business, and the data of storage also exist the risk being lost before.And in the present invention, when the disk array generation business read error in degrading state, although can not reading out data, but control this disk array and continue externally to provide business, still make the reading and writing passage of this disk array in open mode, and, the state of this disk array remains unchanged, this has guaranteed the continuity of follow-up business, also the risk that is lost of stored data before not existing, at disk failure, few and this fault reads the less application of impact such as the advantage of monitoring application is very obvious to partial data for this.

In the present embodiment 2, the band of the business of generation read error is recorded to Nonvolatile memory, its object is mainly the band of being convenient to identify generation business read error, to guarantee to repair as early as possible the business read error of this band.

Wherein, in the present embodiment 2, can repair by WriteMode the business read error of band.Mode 1 and the mode 2 of this reparation operation specifically and in embodiment 1 is similar, is specially:

First method:

First method is that the mode of writing by whole piece band realizes.Be specially: when writing data to band corresponding to Nonvolatile memory discal patch tape identification (band 1 shown in Fig. 4 of take is example), when these data are just in time write the space of the disk that completely this band 1 takies, determined the repair of band 1, band 1 has recovered data redundancy again.

Second method:

Second method is by writing setting data to band corresponding to Nonvolatile memory discal patch tape identification (band 1 shown in Fig. 4 of take is example) such as 0 mode realizes.This second method is the business read error of repairing by force this band 1, to recover the data redundancy of band 1.Be specially: the address of determining band 1 according to the sign of the band 1 of Nonvolatile memory record, according to the significance level of these band 1 data of storing of this definite adress analysis, if determine the significance level of these data, be less than setting threshold, by following operation, repair the business read error of band 1: to each shared disk of band 1 (being disk 1 and disk 2), write full setting data in Fig. 4.So far, complete the repair of band 1, band 1 has recovered data redundancy again.Certainly, if determine the significance level of these data, be more than or equal to setting threshold, adopt first method.

So far, by above first method and second method, realized the operation of repairing band business read error.When band is done after reparation, from described Nonvolatile memory, delete this tape identification.

So far, complete the description of embodiment 2.

The method above embodiment of the present invention being provided is described, and the device below embodiment of the present invention being provided is described.

Referring to Fig. 5, the structure drawing of device that Fig. 5 provides for the embodiment of the present invention.This installs corresponding embodiment 1, comprising: replacement unit and reconstruction unit;

Wherein, described replacement unit, for when the disk of disk array breaks down, increases HotSpare disk, the disk breaking down to replace this in described disk array;

Described reconstruction unit is rebuild having increased the disk array of HotSpare disk as unit for take band;

Crucially, as shown in Figure 5, described device also comprises:

In the present invention, described reparation unit is by writing to the whole band corresponding with this tape identification the reconstruction read error that data are repaired the band corresponding with this tape identification; Or, determine the significance level of band the store data corresponding with this tape identification, if determine the significance level of these data, be less than setting threshold, the reconstruction read error of repairing this band by following operation: write full setting data to other disks except HotSpare disk that this band is shared, and write to the shared HotSpare disk of this band the data that calculate according to the setting data in described other disks.

Preferably, as shown in Figure 5, described device also comprises:

Processing unit, for when need to be shared to the band corresponding with described Nonvolatile memory discal patch tape identification HotSpare disk reading out data time, do not issue read command, utilize the data in shared other disks except HotSpare disk of this band to calculate the data that need to read from HotSpare disk; When need to be to the band corresponding with described Nonvolatile memory discal patch tape identification shared other disk reading out datas except HotSpare disk, to these other disks, issue read command, with according to this read command reading out data.

So far, complete the structure drawing of device shown in Fig. 5.

Referring to Fig. 6, another installation drawing that Fig. 6 provides for the embodiment of the present invention.This installs corresponding embodiment 2, and as shown in Figure 6, this device comprises: business is read processing unit, record cell, control module and reparation unit, wherein,

Wherein, described reparation unit is by writing to the whole band corresponding with this tape identification the business read error that data are repaired the band corresponding with this tape identification; Or,

Determine the significance level of band the store data corresponding with this tape identification, if determine the significance level of these data, be less than setting threshold, the business read error of repairing the band corresponding with this tape identification by following operation: to writing setting data with the shared disk of band corresponding to this tape identification.

So far, complete the unit describe shown in Fig. 6.

As can be seen from the above technical solutions, in the present invention, when current band occurs to rebuild read error, not stop rebuilding, but the band that reconstruction read error occurs is recorded in Nonvolatile memory, from next band, start to continue to rebuild, and there is to rebuild the band of read error for each, by WriteMode, repair the reconstruction read error of this band, recover as early as possible the redundancy of disk array, like this, even if there are a plurality of disks to break down in process of reconstruction, can not cause the whole disk array will be failed yet;

The foregoing is only preferred embodiment of the present invention, in order to limit the present invention, within the spirit and principles in the present invention not all, any modification of making, be equal to replacement, improvement etc., within all should being included in the scope of protection of the invention.

Claims

1. the fault-tolerance approach of a disk array, in the method, when the disk in disk array breaks down, in described disk array, increase HotSpare disk, the disk breaking down to replace this, and take band and to having increased the disk array of HotSpare disk, rebuild as unit; It is characterized in that, the method comprises:

2. method according to claim 1, is characterized in that, the described reconstruction read error by the WriteMode reparation band corresponding with this tape identification comprises:

By writing to the whole band corresponding with this tape identification the reconstruction read error that data are repaired the band corresponding with this tape identification; Or,

Determine the significance level of band the store data corresponding with this tape identification, if determine the significance level of these data, be less than setting threshold, the reconstruction read error of repairing this band by following operation: write full setting data to other disks except HotSpare disk that this band is shared, and write to the shared HotSpare disk of this band the data that calculate according to the setting data in described other disks.

3. method according to claim 1, is characterized in that, the method further comprises:

When need to be shared to the band corresponding with described Nonvolatile memory discal patch tape identification HotSpare disk reading out data time, do not issue read command, utilize the data in shared other disks except HotSpare disk of this band to calculate the data that need to read from HotSpare disk;

When need to be to the band corresponding with described Nonvolatile memory discal patch tape identification shared other disk reading out datas except HotSpare disk, to these other disks, issue read command, with according to this read command reading out data.

4. a fault-tolerance approach for disk array, is characterized in that, the method comprises:

5. method according to claim 4, is characterized in that, the described business read error by the WriteMode reparation band corresponding with this tape identification comprises:

By writing to the whole band corresponding with this tape identification the business read error that data are repaired the band corresponding with this tape identification; Or,

6. a fault-tolerant device for disk array, this device comprises: replacement unit and reconstruction unit; Described replacement unit, for when the disk of disk array breaks down, increases HotSpare disk, the disk breaking down to replace this in described disk array; Described reconstruction unit is rebuild having increased the disk array of HotSpare disk as unit for take band; It is characterized in that, described device also comprises:

7. device according to claim 6, is characterized in that, described reparation unit is by writing to the whole band corresponding with this tape identification the reconstruction read error that data are repaired the band corresponding with this tape identification; Or, determine the significance level of band the store data corresponding with this tape identification, if determine the significance level of these data, be less than setting threshold, the reconstruction read error of repairing this band by following operation: write full setting data to other disks except HotSpare disk that this band is shared, and write to the shared HotSpare disk of this band the data that calculate according to the setting data in described other disks.

8. device according to claim 6, is characterized in that, described device also comprises:

9. a fault-tolerant device for disk array, is characterized in that, this device comprises: business is read processing unit, record cell, control module and reparation unit, wherein,

10. device according to claim 9, is characterized in that, described reparation unit is by writing to the whole band corresponding with this tape identification the business read error that data are repaired the band corresponding with this tape identification; Or,