Embodiment
Below, describe embodiments of the invention in detail with reference to accompanying drawing.
In the present invention, 3 kinds of RAID operational modes have been defined: standard RAID operational mode, fault-tolerant RAID operational mode and tape error RAID operational mode.Under standard RAID operational mode, the RAID system is according to the RAID operational mode operation that meets the RAID standard, if in record data are handled, generation data write error mistake, then point out low-quality disk and stop operation this low-quality disk, do not allow the data write error mistake, do not allow the appearance of bad piece of disk simultaneously.Under fault-tolerant RAID operational mode, the RAID management equipment if there is bad piece of disk, then made good use of piece and is replaced bad piece and guarantee that data are correctly write by making when record data are handled.Different with fault-tolerant RAID operational mode, under tape error RAID operational mode, the RAID management equipment is when handling by record data, allow the partial data write error and lost the disk write data that continues to use bad piece, and when can not the reconstructing part divided data, still utilize data in the low-quality disk to carry out data and recover.
What the present invention at first proposed is a kind of RAID management equipment of supporting above 3 kinds of operational modes.
Fig. 1 is the schematic block diagram that illustrates according to the RAID management equipment 100 of exemplary embodiment of the present invention.With reference to Fig. 1, RAID management equipment 100 comprises Management Controller 110, tape error RAID mode manager 120, fault-tolerant RAID mode manager 130 and standard RAID mode manager 140.
Management Controller 110 is according to the RAID operational mode that sets in advance, and in control tape error RAID mode manager 120, fault-tolerant RAID mode manager 130 and the standard RAID mode manager 140 one operates the RAID disk array.Described RAID operational mode can be carried out static setting in the configuration file of RAID, perhaps the user interface by the setting of RAID management equipment dynamically is provided with.
Standard RAID mode manager 140 is under the control of control manager 110, in above-mentioned standard RAID operational mode (as RAID 0 rank, RAID 1 rank, RAID 5 ranks, RAID 6 ranks etc.) operation down, wherein, in record data are handled,, then point out low-quality disk and stop operation this low-quality disk if the data write error mistake takes place, do not allow to take place write error, also do not allow the appearance of bad piece of disk, otherwise just dish is fallen in prompting, and stop data and write.
Fault-tolerant RAID mode manager 130 comprises remap unit 132, and described remap unit 132 uses the good piece in the reserved area of reserving to replace fault-tolerant RAID mode manager 130 detected bad pieces of disk.Losing of data if because of existing bad piece of disk to cause the write data failure, then use in the good piece of remap unit 132 in the reserved area of reserving and write corresponding data, to guarantee to write data, do not take place in fault-tolerant RAID mode manager 13 when write data.If data write still unsuccessful, for example, finish using in the reserved area of reservation, makes good use of piece and come bad piece of Replace Disk and Press Anykey To Reboot write data thereby can't make, and then prompting is fallen to coil, and stop data and write.
Tape error RAID mode manager 120 comprises tape error data processing unit 122.When because of the bad piece generation of disk write data mistake, tape error data processing unit 122 continues write operation, allows losing of data.In addition, according to another exemplary embodiment of the present invention, when the user selected to coil reconstruct, tape error data processing unit 122 coiled reconstruction processing.In restructuring procedure, if cause this band reconfiguring false owing to some data in other disk of not kicked out of in the former RAID array make a mistake, on the disk that the locational data of respective strap of then duplicating the low-quality disk of having been kicked out of in the former RAID array are reconstructed in the new adding RAID array, and continue reconstruction processing and do not stop reconstruct.
According to another preferred embodiment of the invention, tape error RAID mode manager 120 can also comprise low-quality disk alarm unit 121.The bad piece of 121 pairs of disks of low-quality disk alarm unit is followed the tracks of, and when the bad block count of particular disk reaches predetermined maximum bad piece and counts, reports to the police by predetermined alarm mode, and prompting user or keeper are reconstructed low-quality disk.The form that described predetermined alarm mode can adopt the window on the subscriber administration interface or write down in journal file.On the other hand, according to another preferred embodiment of the invention, low-quality disk alarm unit 121 can also be in the warning that bad block count equals to be scheduled to when bad piece is counted, to point out user or keeper's particular disk to have the bad piece of some numbers with the similar mode of above-mentioned alarm mode, the bad piece number of described warning is less than the bad piece number of maximum, thereby sets up dual early warning mechanism.
According to another preferred embodiment of the invention, tape error RAID mode manager 120 can be provided for the built-in or external protected location of writing system data, to guarantee correctly writing of significant data.With reference to Fig. 4 tape error RAID mode manager 120 is made a more detailed description after a while.
Fig. 2 illustrates the indicative flowchart of handling according to the operation of the RAID management equipment 100 of exemplary embodiment of the present invention.
With reference to Fig. 2, at operation S210, Management Controller 110 determines that default RAID operational mode is a tape error RAID operational mode.If not tape error RAID operational mode, then at operation S260, Management Controller 110 determines that default RAID operational mode is fault-tolerant RAID operational mode.If also be not fault-tolerant RAID operational mode, Management Controller 110 control criterion RAID operational mode managers 140 move (operation S270) under the standard RAID mode.
If at operation S260, determine that default RAID operational mode is fault-tolerant RAID operational mode, then the fault-tolerant RAID mode manager 130 of Management Controller 110 controls moves (operation S280) under fault-tolerant RAID operational mode.
If at operation S210, Management Controller 110 determines that default RAID operational mode is a tape error RAID operational mode, and then Management Controller 110 control tape error RAID mode managers 120 move the work of tenaculum low-quality disk under tape error RAID operational mode.In the processing of write data, if make a mistake, then at operation S220, the 121 bad block counts of record of low-quality disk alarm unit.At operation S230, determine whether bad block count surpasses maximum bad piece number, and just, whether bad block count surpasses the highest bad piece number of predetermined permission then.If at operation S230, determine that bad block count has reached maximum bad piece number, then low-quality disk alarm unit 121 is reported to the police (operation S240) by predetermined alarm mode, and prompting user or keeper are reconstructed low-quality disk.At operation S250, tape error RAID mode manager 120 carries out tape error reconstruct.The processing of described tape error reconstruct is described in detail with reference to Fig. 5 to Fig. 9.
Fig. 3 illustrates another exemplary embodiment according to the present invention process flow diagram that write data is handled under tape error RAID operational mode.
With reference to Fig. 3, bad piece alarm unit 121 is initialized as zero with its bad block count.At operation S310, tape error RAID mode manager 120 carries out data writing operation.At operation S320, determine whether to take place the write data mistake.If the write data mistake does not take place,, carry out the write operation of next data then at operation S370.
If at operation S320, determine to have taken place the write data mistake, then bad piece alarm unit 121 adds 1 with bad block count.At operation S340, bad piece alarm unit 121 determines whether bad block count equals to warn bad piece number.If both are identical, then at operation S342, bad piece alarm unit 121 is warned user or keeper by predetermined mode, has accumulated a considerable amount of bad pieces on the particular disk.Otherwise if both differences, then at operation S350, bad piece alarm unit 121 determines further whether bad block count has reached maximum bad block count.If determine that bad block count has reached maximum bad piece number, then bad piece alarm unit 121 is reported to the police by predetermined alarm mode, and prompting user or keeper are reconstructed low-quality disk.Then, at S345, under keeper's control, tape error RAID mode manager 120 carries out low-quality disk reconstruct.
On the contrary, if at operation S350, determine that bad block count does not reach maximum bad piece number, then at operation S360, tape error RAID mode manager 120 is skipped this data, and at operation S370, carries out writing of next data.
This shows that Fig. 2 is different on low-quality disk alarm mode with the embodiment shown in Fig. 3.
Fig. 4 is the logic diagram that illustrates according to the tape error RAID mode manager 120 of exemplary embodiment of the present invention.
With reference to Fig. 4, the tape error RAID mode manager 120 among Fig. 1 comprises low-quality disk alarm unit 121 and tape error data processing unit 122.According to exemplary embodiment of the present invention, tape error RAID mode manager 120 can not comprise low-quality disk alarm unit 121.
Low-quality disk alarm unit 121 comprises bad piece register 1211, bad block analysis device 1215 and low-quality disk alarm 1215.
The bad piece of each dish of 1211 couples of RAID of bad piece register is followed the tracks of, is write down and counts.
The bad block count of 1215 pairs of each dishes of bad block analysis device is analyzed, and need to determine whether to report to the police, and prompting user or keeper are coiled reconstruct.According to exemplary embodiment of the present invention, bad block analysis device 1215 determines whether bad block count surpasses maximum bad piece number when analyzing.If determine that bad block count has reached maximum bad piece number, then bad block analysis device 1215 is controlled low-quality disk alarm unit 121 and is reported to the police by predetermined alarm mode.
According to another exemplary embodiment of the present invention, bad block analysis device 1215 determines whether bad block count equals to warn bad piece number when analyzing.If both are identical, then bad block analysis device 1215 is controlled low-quality disk alarm 1215 and is warned user or keeper by predetermined mode, has accumulated a considerable amount of bad pieces on the particular disk.Otherwise bad block analysis device 1215 determines further whether bad block count has reached maximum bad block count.If determine that bad block count has reached maximum bad piece number, the bad piece alarm unit 121 of then bad block analysis device 1215 controls is reported to the police by predetermined alarm mode, and prompting user or keeper are reconstructed low-quality disk.
Tape error data processing unit 122 comprises read-write processing module 1225 and reconstructed module 1221.
Read-write processing module 1225 is carried out read-write operation according to meeting other mode of RAID level to the RAID dish.Because of the bad piece generation of disk write data mistake the time, read-write processing module 1225 is skipped bad piece, and points out 1211 pairs of bad pieces of bad piece register to follow the tracks of and record.
When reconstructed module 1221 was selected to coil reconstruct the user, tape error data processing unit 122 coiled reconstruction processing.In restructuring procedure, if cause this band reconfiguring false owing to some data in other disk of not kicked out of in the former RAID array make a mistake, on the disk that the locational data of respective strap of then duplicating the low-quality disk of having been kicked out of in the former RAID array are reconstructed in the new adding RAID array, and continue reconstruction processing and do not stop reconstruct.
According to exemplary embodiment of the present invention, RAID management equipment 100 can be provided with internal or external protected location, thereby makes read-write processing module 1225 with system data write protection district.
Fig. 5 is the process flow diagram of RAID dish reconstructing method that the RAID management equipment 100 of Fig. 1 is shown.The reconstruct of described RAID dish can start and realization by independent administration interface.
With reference to Fig. 5, the reconstruct of RAID dish is inserted new building and is started reconstruct (operation S510) from the keeper.
At operation S520, the data reconstruction of a band of RAID management equipment 100 beginnings.At S530, determine whether restructural restore data of described band.Cause this band reconfiguring false owing to some data in other disk of not kicked out of in the former RAID array make a mistake, then determining can not this strip data of reconstruct.For example, in the RAID5 rank, still can not recover the situation of strip data for the parity checking that utilizes band, determining can not the reconstruct restore data.And for example, in the RAID6 rank, be reconstructed under the situation of all failing according to 2 parity check bit, just determining can not the reconstruct restore data.
If determine the restructural strip data, then at operation S560, RAID management equipment 100 is carried out restore data and duplicate strip data on new building, proceeds to operation S550 then.
If at operation S530, determining can not the reconstruct strip data, and then at operation S540, RAID management equipment 100 copies to new building with corresponding strip data on the low-quality disk.Particularly, in restructuring procedure, on the disk that the locational data of respective strap of duplicating the low-quality disk of having been kicked out of in the former RAID array are reconstructed in the new adding RAID array, and continue reconstruction processing and do not stop reconstruct.Here, about other system of different RAID level, can carry out different processing.Such as, in 0 grade of other system of RAID, because not backup or verification recovery design, so when running into reconfiguring false, can only skip this bad piece; And for example, in RAID 5 ranks, can duplicate by the data of reconstruct dish.
Then, at operation S550, judged whether to finish the reconstruct of dish.If finished the reconstruct of dish, then at operation S570, the keeper can finish reconstruction processing, and removes low-quality disk.If do not finish the reconstruct of dish as yet, then return S520 carries out the reconstruct of next band.
Fig. 6 to Fig. 9 is the process flow diagram that illustrates according to other dish reconstructing method of several typical R AID levels of exemplary embodiment of the present invention.Because 0 grade of RAID is clipped to the method that RAID 6 ranks use on data processing characteristics are arranged respectively all, therefore this only with regard to the band reconstructed operation (S520) among each RAID rank and Fig. 5, determine whether can the reconstruct restore data operation (S530) and not the relevant operation of processing (S540) during restructural specifically describe.
Because RAID 0 rank does not provide the redundant data storage function, so with reference to Fig. 6,0 grade of other dish reconstruct only is to be copied to new building (S620) by the data of reconstruct dish according to RAID of the present invention.When running into reconfiguring false, only skip misdata (S630), and do not do special processing.
RAID 1 rank provides the copy of total data.In dish reconstruction processing according to the present invention,, at first will coil (copy) data well and copy to (S720) on the new building with reference to Fig. 7.At operation S730, when determining in duplicating, to run into mistake, then duplicate by the corresponding data of reconstruct dish (S740).
3~5 grades of other systems of RAID are comparatively similar on the dish reconstruction processing.According to dish reconstructing method of the present invention,,, use verification to carry out the data reconstruction of a band at first at operation S810 with reference to Fig. 8.Then, at operation S830, if determine band reconstruct failure, then at operation S840, on the disk that the locational data of respective strap of duplicating the low-quality disk of having been kicked out of in the former RAID array are reconstructed in the new adding RAID array, and continue reconstruction processing and do not stop reconstruct.
RAID 6 ranks are used the duplication check position when write data, thereby improve the redundant performance of data.Therefore, according to dish reconstructing method of the present invention, with reference to Fig. 9, at operation S920, RAID management equipment 100 is at first according to band of check bit 1 reconstruct.At operation S930, determine whether this reconstruct is successful.If the reconstruct success then proceeds to operation S970, determine whether to continue the more reconstruct of multi-ribbon.
If this reconstruct failure, then at operation S940, RAID management equipment 100 is according to this band of check bit 2 reconstruct.Then, in operation 950, determine whether the reconstruct of operation S940 is successful.If determine the reconstruct success of operation S940, then proceed to operation S970, determine whether to continue the more reconstruct of multi-ribbon.
If the reconstruct failure of operation S940, then RAID management equipment 100 will be copied to the respective strap of new building by the data of reconstruct dish.
Thereby, can carry out differentiated control to RAID according to RAID management equipment of the present invention and method, and allow to lead astray the dish operation.Because of the bad piece generation of disk write error the time, allow this loss of data that will write, thereby guarantee in the bad piece scope of some occurring, do not fall dish.This technical scheme is for the store non-critical data, and the hypervelocity record as in the traffic is highly profitable.
Simultaneously, under can not the recovered part data conditions, allow partial data to lose, still can coil reconstruct because of bad piece of disk according to RAID management equipment of the present invention and method.
In RAID management equipment according to the present invention and method, can also be in conjunction with fault-tolerant RAID operational mode and tape error RAID operational mode, just, when write error takes place, thereby can adopt the mode that remaps to replace bad piece of disk write data earlier with the good piece in the reserved area; Only after finishing using in the reserved area, just enter tape error RAID operational mode.
In addition, in RAID management equipment according to the present invention and method, can also be used alternatingly three kinds of mode operations according to user's setting.To the key data that do not allow to lose, write data under standard RAID operational mode; To general significant data, under fault-tolerant RAID operational mode, write; And, under tape error RAID operational mode, carry out write operation to allowing the data of partial loss.
Simultaneously, as can be seen, when coiling reconstruct, also be applicable to according to any or its reconstruct in standard RAID operational mode, fault-tolerant RAID operational mode and the tape error RAID operational mode in conjunction with the RAID dish that carries out write data according to RAID management equipment of the present invention and method.
Though for exemplary purpose has been described exemplary embodiment of the present invention, but those skilled in the art will appreciate that, in not breaking away from, under the situation of disclosed scope and spirit of the present invention, can make various modifications, interpolation and alternative as claim.Effect of the present invention is not limited to above-mentioned effect, and other effects of not mentioning above can being expressly understood of the qualification that requires of those skilled in the art's accessory rights.