Embodiment
Below, describe embodiments of the invention in detail with reference to accompanying drawing.
In the present invention, 3 kinds of RAID operational modes have been defined: standard RAID operational mode, fault-tolerant RAID operational mode and tape error RAID operational mode.Under standard RAID operational mode, RAID system is according to the RAID operational mode operation that meets RAID standard, if in record data are processed, generation data write error mistake, point out low-quality disk and stop the operation to this low-quality disk, do not allow data write error mistake, do not allow the appearance of bad piece of disk simultaneously.Under fault-tolerant RAID operational mode, RAID management equipment, in the time that record data are processed, if there is bad piece of disk, is replaced bad piece and is ensured that data are correctly written by making to make good use of piece.Different from fault-tolerant RAID operational mode, under tape error RAID operational mode, RAID management equipment is in the time processing by record data, allow partial data write error and be lost to continue use have the disk of bad piece to write data, and when can not reconstructing part divided data, still utilize the data in low-quality disk to carry out data recovery.
What first the present invention proposed is a kind of RAID management equipment of supporting above 3 kinds of operational modes.
Fig. 1 is the schematic block diagram illustrating according to the RAID management equipment 100 of exemplary embodiment of the present invention.With reference to Fig. 1, RAID management equipment 100 comprises Management Controller 110, tape error RAID mode manager 120, fault-tolerant RAID mode manager 130 and standard RAID mode manager 140.
Management Controller 110 is according to the RAID operational mode setting in advance, and one that controls in tape error RAID mode manager 120, fault-tolerant RAID mode manager 130 and standard RAID mode manager 140 operates RAID disk array.Described RAID operational mode can be carried out static setting in the configuration file of RAID, or the user interface arranging by RAID management equipment dynamically arranges.
Standard RAID mode manager 140 is under the control of control and management device 110, in the lower operation of above-mentioned standard RAID operational mode (as RAID 0 rank, RAID 1 rank, RAID 5 ranks, RAID 6 ranks etc.), wherein, in record data are processed, if there is data write error mistake, point out low-quality disk and stop the operation to this low-quality disk, do not allow to occur write error, also do not allow the appearance of bad piece of disk, otherwise just dish is fallen in prompting, and stop data and write.
Fault-tolerant RAID mode manager 130 comprises remap unit 132, and described remap unit 132 is used the good piece in reserved reserved area to replace the bad piece of disk that fault-tolerant RAID mode manager 130 detects.Fault-tolerant RAID mode manager 13, in the time writing data, if because existing bad piece of disk to cause writing data failure, uses in the good piece of remap unit 132 in reserved reserved area and writes corresponding data, to guarantee data writing, the loss of data does not occur.If data write still unsuccessful, for example, finish using in reserved reserved area, carrys out bad piece of Replace Disk and Press Anykey To Reboot and write data thereby cannot make to make good use of piece, and prompting is fallen to coil, and stop data and write.
Tape error RAID mode manager 120 comprises tape error data processing unit 122.In the time writing error in data because of bad piece of disk, tape error data processing unit 122 continues write operation, allows the loss of data.In addition,, according to another exemplary embodiment of the present invention, in the time that user selects to coil reconstruct, tape error data processing unit 122 coils reconstruction processing.In restructuring procedure, if cause this band reconfiguring false because some data in other disk of not kicked out of in former RAID array make a mistake, copy the locational data of respective strap of the low-quality disk of having been kicked out of in former RAID array to newly adding on the disk being reconstructed in RAID array, and continue reconstruction processing and do not stop reconstruct.
According to another preferred embodiment of the invention, tape error RAID mode manager 120 can also comprise low-quality disk alarm unit 121.Low-quality disk alarm unit 121 is followed the tracks of the bad piece of disk, and in the time that the bad block count of particular disk reaches the predetermined bad piece of maximum and counts, reports to the police by predetermined alarm mode, and prompting user or keeper are reconstructed low-quality disk.The form that described predetermined alarm mode can adopt the window on subscriber administration interface or record in journal file.On the other hand, according to another preferred embodiment of the invention, when the bad piece of warning that low-quality disk alarm unit 121 can also equal to be scheduled in bad block count is counted, to point out user or keeper's particular disk to there is the bad piece of some numbers with the similar mode of above-mentioned alarm mode, the bad piece number of described warning is less than maximum bad piece number, thereby sets up dual early warning mechanism.
According to another preferred embodiment of the invention, tape error RAID mode manager 120 can be provided for the built-in or external protected location of writing system data, to guarantee correctly writing of significant data.With reference to Fig. 4, tape error RAID mode manager 120 is made a more detailed description after a while.
Fig. 2 illustrates according to the indicative flowchart of the operation processing of the RAID management equipment 100 of exemplary embodiment of the present invention.
With reference to Fig. 2, at operation S210, Management Controller 110 determines that default RAID operational mode is tape error RAID operational mode.If not tape error RAID operational mode, at operation S260, Management Controller 110 determines that default RAID operational mode is fault-tolerant RAID operational mode.If be not also fault-tolerant RAID operational mode, Management Controller 110 control criterion RAID operational mode managers 140 move (operation S270) under standard RAID mode.
If at operation S260, determine that default RAID operational mode is fault-tolerant RAID operational mode, Management Controller 110 is controlled fault-tolerant RAID mode manager 130 and under fault-tolerant RAID operational mode, is moved (operation S280).
If at operation S210, Management Controller 110 determines that default RAID operational mode is tape error RAID operational mode, and Management Controller 110 is controlled tape error RAID mode manager 120 and moved under tape error RAID operational mode, the work of tenaculum low-quality disk.In the processing of writing data, if made a mistake,, at operation S220, low-quality disk alarm unit 121 records bad block count.Then at operation S230, determine whether bad block count exceedes maximum bad piece number, and namely, whether bad block count exceedes the highest bad piece number of predetermined permission.If at operation S230, determine that bad block count has reached maximum bad piece number, low-quality disk alarm unit 121 is reported to the police (operation S240) by predetermined alarm mode, and prompting user or keeper are reconstructed low-quality disk.At operation S250, tape error RAID mode manager 120 carries out tape error reconstruct.The processing of described tape error reconstruct is described in detail with reference to Fig. 5 to Fig. 9.
Fig. 3 illustrates that another exemplary embodiment according to the present invention writes the process flow diagram of data processing under tape error RAID operational mode.
With reference to Fig. 3, bad piece alarm unit 121 is initialized as zero by its bad block count.At operation S310, tape error RAID mode manager 120 carries out data writing operation.At operation S320, determine whether to write error in data.If do not write error in data, at operation S370, carry out the write operation of next data.
If at operation S320, determine error in data has occurred to write, bad piece alarm unit 121 adds 1 by bad block count.At operation S340, bad piece alarm unit 121 determines whether bad block count equals to warn bad piece number.If both are identical,, at operation S342, bad piece alarm unit 121 is warned user or keeper by predetermined mode, has accumulated a considerable amount of bad pieces in particular disk.Otherwise if both are different, at operation S350, bad piece alarm unit 121 further determines whether bad block count has reached maximum bad block count.If determine that bad block count has reached maximum bad piece number, bad piece alarm unit 121 is reported to the police by predetermined alarm mode, and prompting user or keeper are reconstructed low-quality disk.Then,, at S345, under keeper's control, tape error RAID mode manager 120 carries out low-quality disk reconstruct.
On the contrary, if at operation S350, determine that bad block count does not reach maximum bad piece number,, at operation S360, tape error RAID mode manager 120 is skipped this data, and at operation S370, carries out writing of next data.
As can be seen here, the embodiment shown in Fig. 2 and Fig. 3 is different in low-quality disk alarm mode.
Fig. 4 is the logic diagram illustrating according to the tape error RAID mode manager 120 of exemplary embodiment of the present invention.
With reference to Fig. 4, the tape error RAID mode manager 120 in Fig. 1 comprises low-quality disk alarm unit 121 and tape error data processing unit 122.According to exemplary embodiment of the present invention, tape error RAID mode manager 120 can not comprise low-quality disk alarm unit 121.
Low-quality disk alarm unit 121 comprises bad piece register 1211, bad block analysis device 1215 and low-quality disk alarm 1215.
Bad piece register 1211 is followed the tracks of, records and count the bad piece of the each dish of RAID.
Bad block analysis device 1215 is analyzed the bad block count of each dish, and determines whether to need to report to the police, and prompting user or keeper are coiled reconstruct.According to exemplary embodiment of the present invention, bad block analysis device 1215, in the time analyzing, determines whether bad block count exceedes maximum bad piece number.If determine that bad block count has reached maximum bad piece number, bad block analysis device 1215 is controlled low-quality disk alarm unit 121 and is reported to the police by predetermined alarm mode.
According to another exemplary embodiment of the present invention, bad block analysis device 1215, in the time analyzing, determines whether bad block count equals to warn bad piece number.If both are identical, bad block analysis device 1215 is controlled low-quality disk alarm 1215 and is warned user or keeper by predetermined mode, has accumulated a considerable amount of bad pieces in particular disk.Otherwise bad block analysis device 1215 further determines whether bad block count has reached maximum bad block count.If determine that bad block count has reached maximum bad piece number, bad block analysis device 1215 is controlled bad piece alarm unit 121 and is reported to the police by predetermined alarm mode, and prompting user or keeper are reconstructed low-quality disk.
Tape error data processing unit 122 comprises read-write processing module 1225 and reconstructed module 1221.
Read-write processing module 1225 is carried out read-write operation according to the mode that meets RAID rank to RAID dish.In the time writing error in data because of bad piece of disk, read-write processing module 1225 is skipped bad piece, and points out bad piece register 1211 to follow the tracks of and record bad piece.
Reconstructed module 1221 is in the time that user selects to coil reconstruct, and tape error data processing unit 122 coils reconstruction processing.In restructuring procedure, if cause this band reconfiguring false because some data in other disk of not kicked out of in former RAID array make a mistake, copy the locational data of respective strap of the low-quality disk of having been kicked out of in former RAID array to newly adding on the disk being reconstructed in RAID array, and continue reconstruction processing and do not stop reconstruct.
According to exemplary embodiment of the present invention, RAID management equipment 100 can arrange internal or external protected location, thereby makes to read and write processing module 1225 by system data write protection district.
Fig. 5 is the process flow diagram that the RAID dish reconstructing method of the RAID management equipment 100 of Fig. 1 is shown.Described RAID dish reconstruct can be started and be realized by independent administration interface.
With reference to Fig. 5, the reconstruct of RAID dish is inserted new building and is started (operation S510) reconstruct from keeper.
At operation S520, RAID management equipment 100 starts the data reconstruction of a band.At S530, determine whether restructural recovers data to described band.Cause this band reconfiguring false because some data in other disk of not kicked out of in former RAID array make a mistake, determining can not this strip data of reconstruct.For example, in RAID5 rank, for utilizing bar tape parity still can not recover the situation of strip data, determine and can not reconstruct recover data.And for example, in RAID6 rank, be reconstructed all failures according to 2 parity check bit in the situation that, just determine and can not reconstruct recover data.
If determine restructural strip data,, at operation S560, RAID management equipment 100 is recovered data and on new building, is copied strip data, then proceeds to operation S550.
If at operation S530, determining can not reconstruct strip data, and at operation S540, corresponding strip data on low-quality disk is copied to new building by RAID management equipment 100.Particularly, in restructuring procedure, copy the locational data of respective strap of the low-quality disk of having been kicked out of in former RAID array to newly adding on the disk being reconstructed in RAID array, and continue reconstruction processing and do not stop reconstruct.Here about the system of different RAID ranks, can carry out different processing.Such as, in the system of RAID 0 rank, due to not backup or verification recovery design, so in the time running into reconfiguring false, can only skip this bad piece; And for example, in RAID 5 ranks, can copy the data that are reconstructed dish.
Then,, at operation S550, judged whether to complete the reconstruct of dish.If completed the reconstruct of dish,, at operation S570, keeper can finish reconstruction processing, and removes low-quality disk.If not yet complete the reconstruct of dish, return to operation S520, carry out the reconstruct of next band.
Fig. 6 to Fig. 9 is the process flow diagram illustrating according to the dish reconstructing method of the several typical RAID rank of exemplary embodiment of the present invention.All respectively have feature because 0 grade of RAID is clipped to the method that RAID 6 ranks use in data processing, therefore this only with regard to each RAID rank and band reconstructed operation (S520) in Fig. 5, determine whether can reconstruct to recover the operation (S530) of data and not the relevant operation of processing (S540) when restructural be specifically described.
Because RAID 0 rank does not provide redundant data storage function, so with reference to Fig. 6, be only to new building (S620) by the data Replica that is reconstructed dish according to the dish reconstruct of RAID 0 rank of the present invention.In the time running into reconfiguring false, only skip misdata (S630), and do not do special processing.
RAID 1 rank provides the copy of total data.In dish reconstruction processing according to the present invention, with reference to Fig. 7, first will get well dish (copy) data Replica (S720) to new building.At operation S730, determine run into mistake in copying time, copy the corresponding data (S740) that is reconstructed dish.
The system of RAID 3~5 ranks is comparatively similar in dish reconstruction processing.According to dish reconstructing method of the present invention, with reference to Fig. 8, first at operation S810, use verification to carry out the data reconstruction of a band.Then, at operation S830, if determine band reconstruct failure, at operation S840, copy the locational data of respective strap of the low-quality disk of having been kicked out of in former RAID array to newly adding on the disk being reconstructed in RAID array, and continue reconstruction processing and do not stop reconstruct.
RAID 6 ranks, in the time writing data, are used duplication check position, thereby improve the redundancy performance of data.Therefore,, according to dish reconstructing method of the present invention, with reference to Fig. 9, at operation S920, RAID management equipment 100 is first according to band of check bit 1 reconstruct.At operation S930, determine that whether this reconstruct is successful.If reconstruct success, proceeds to operation S970, determine whether to continue the more reconstruct of multi-ribbon.
If this reconstruct failure,, at operation S940, RAID management equipment 100 is according to this band of check bit 2 reconstruct.Then,, in operation 950, determine that whether the reconstruct of operation S940 is successful.If determine the reconstruct success of operation S940, proceed to operation S970, determine whether to continue the more reconstruct of multi-ribbon.
If the reconstruct failure of operation S940, RAID management equipment 100 will be reconstructed the data Replica of dish to the respective strap of new building.
Thereby, can carry out differentiated control to RAID according to RAID management equipment of the present invention and method, and allow the operation of band low-quality disk.When because of the bad piece generation of disk write error, allow this loss of data that will write, thereby ensure within the scope of the bad piece that occurs some, do not fall dish.This technical scheme, for store non-critical data, as the hypervelocity record in traffic, is highly profitable.
Meanwhile, can not recovered part data because of bad piece of disk in the situation that, allow partial data to lose according to RAID management equipment of the present invention and method, still can coil reconstruct.
In RAID management equipment according to the present invention and method, can also be in conjunction with fault-tolerant RAID operational mode and tape error RAID operational mode, namely, in the time there is write error, thereby can first adopt the good piece of the mode of remapping in reserved area to replace bad piece of disk to write data; Only, after finishing using in reserved area, just enter tape error RAID operational mode.
In addition, in RAID management equipment according to the present invention and method, can also be according to user's setting, be used alternatingly three kinds of mode operations.To the key data that do not allow to lose, under standard RAID operational mode, write data; To general significant data, under fault-tolerant RAID operational mode, write; And to allowing the data of partial loss, under tape error RAID operational mode, carry out write operation.
Simultaneously, can find out, in the time coiling reconstruct, be also applicable to according to any or its in standard RAID operational mode, fault-tolerant RAID operational mode and tape error RAID operational mode in conjunction with reconstruct of RAID dish of writing data according to RAID management equipment of the present invention and method.
Although for exemplary object has been described exemplary embodiment of the present invention, but those skilled in the art will appreciate that, in not departing from as claim, disclosed scope and spirit of the present invention in the situation that, can make various amendments, interpolation and substitute.Effect of the present invention is not limited to above-mentioned effect, and other effects of not mentioning above the restriction of claim can be expressly understood of those skilled in the art.