Background technology
Writing sky (Write hole) is the superfluous She's disk array of a kind of independence (Redundant Array ofIndependent Disk, RAID) the inconsistent problem of data.With RAID5 is example, can preserve data block and check block in the hard disk, stores superfluous She and consistance and is embodied in: P=D1 ∧ D2 ∧ D3 ∧ D4, and wherein, ∧ represents XOR, and D1, D2, D3, D4 are data block, and P is a check block.Suppose new write request preparation renewal D1 and D2, new write request has been carried data: ND1 and ND2, if new data can successfully write itemize, then new data block comprises: ND1, ND2, D3, D4.In order to calculate new check block NP, can adopt big WriteMode and little WriteMode, wherein, big WriteMode is: NP=ND1 ∧ ND2 ∧ D3 ∧ D4, need read D3, D4; Little WriteMode is: NP=P ∧ D1 ∧ D2 ∧ ND1 ∧ ND2, need read D1, D2, P.Under the situation of itemize data consistent, select upper case and lower case can, for guaranteed performance, the data which kind of mode is read are few, just use which kind of mode.
Uncompleted itemize write request, at the controller failure of write data, relevant stripe unit data are uncertain in the data disks that can cause this time to be write relating to, the check disk such as just.Suppose that an itemize is write and carried ND1, ND2 data, then this time write and do not finished the data that can cause following problem: xD1, xD2, xP and can not determine, may be old, may be new, may be new and old addition.The data of D3, D4 are not changed.At this moment, the itemize data will be in inconsistent state, promptly xP!=xD1 ∧ xD2 ∧ D3 ∧ D4.Mistake will appear in calculating before itemize is under this state, and follow-up itemize is write.The inconsistent problem of this itemize data is called Write Hole problem.
In order to solve Write Hole problem, need to rebuild the itemize of writing after unusual, after can returning to before the data writing operation or correctly writing, for example, return to data D1, the D2, the P that need before the data writing operation to preserve before the data writing operation to the itemize data; Return to correctly to write and need to preserve afterwards ND1, ND2, NP.No matter be which kind of state that returns to, all need to increase an equipment and preserve corresponding data, the extra equipment that increases can increase cost undoubtedly, and rebuilds that to write the algorithm that the itemize after unusual realizes comparatively complicated.
Embodiment
For the purpose, technical scheme and the advantage that make the embodiment of the invention clearer, below in conjunction with the accompanying drawing in the embodiment of the invention, technical scheme in the embodiment of the invention is clearly and completely described, obviously, described embodiment is the present invention's part embodiment, rather than whole embodiment.Based on the embodiment among the present invention, those of ordinary skills belong to the scope of protection of the invention not making the every other embodiment that is obtained under the creative work prerequisite.
Fig. 1 is the method flow synoptic diagram of first embodiment of the invention, comprising:
Step 11: when writing new data in the file unit of RAID, described new data are kept in first file unit, described first file unit is different with second file unit, wherein, described second file unit is used to preserve old data, described new data are used to upgrade described old data, and described RAID adopts file mode to manage;
Step 12: be closed or after the described first file unit place band all write, delete described old data in described RAID corresponding file.
In the embodiment of the invention,, adopt file-level RAID mode to substitute existing level RAID mode and solve Write hole problem in order to reduce cost.
Present embodiment can be kept at new data in the file unit different with old data by adopting file-level RAID, can solve the sky problem of writing according to the inherent attribute of file, does not need to increase other memory device, can reduce cost.
Fig. 2 is the comparison diagram of piece level RAID hierarchical structure and file-level RAID hierarchical structure in the embodiment of the invention, referring to Fig. 2, piece level RAID hierarchical structure comprises application layer, file system, piece level RAID and hard disk, and file-level RAID hierarchical structure comprises application, file system, file-level RAID, single-deck file system and hard disk.With respect to piece level RAID, file-level RAID organizes RAID (comprising the single-deck file system) with file mode on each hard disk, utilize the inherent attribute of file to solve Write hole, for example, the inherent attribute of file comprises: a file can comprise a plurality of file units, therefore, new data can be kept in the different file units with old data.
Fig. 3 is the method flow synoptic diagram of second embodiment of the invention, comprising:
Step 31: the itemize data that come from different files are write in each hard disk successively.
For example, Fig. 4 is the synoptic diagram of RAID5 or RAID6 in the embodiment of the invention, and referring to Fig. 4, the data that come from file 0 write the D0 position of hard disk 0, and the data that come from file 1 write the D1 position of hard disk 1, and the data that come from file 2 write the D2 position of hard disk 2.In RAID5 or RAID6, realize superfluous She with check block, for example, P0, Q0 are check block.
Fig. 5 is the synoptic diagram of RAID1 in the embodiment of the invention, referring to Fig. 5, adopts the mirror image mode to realize redundancy, for example, preserves identical data in hard disk 0 and the hard disk 1.The embodiment of the invention adopts file mode that RAID is managed, and therefore can be unit storage data with the file unit on hard disk, utilizes the attribute of file unit to handle.
Step 32: after new data write file unit, new data are write in the file unit different with old data, promptly new data do not cover old data.
For example,, when upgrading D0, write new data at the other file unit place of hard disk 0 referring to Fig. 4, rather than the D0 in the overlay file unit 0.
Particularly, referring to Fig. 6, Fig. 6 supposes that for writing the synoptic diagram of new data in the embodiment of the invention old data have write file unit 0 and file unit 1, and when writing new data, these new data write in the file unit 1 '.
Step 33: when file is closed or file unit place band has all been write, the deletion legacy data.
Wherein, for RAID1, file unit place band is meant the band that mirror image is right.
Can determine in the following way whether band has all been write:
Fig. 7 is the synoptic diagram of written document unit in the embodiment of the invention, referring to Fig. 7, before the written document unit, writes a magic number (magic number) (beginning to write the time of file unit such as data) earlier in this document unit extensions attribute A; Magic number is used for the form of tab file or agreement, after writing data manipulation and finishing, identical magic number is write among this document unit extensions attribute B again.Have only magic number to equate to think that just this document unit is complete.
That is, writing flow process should comprise:
1, the extended attribute A of written document unit
2, written document location contents
3, the extended attribute B of written document unit
For example, write data " 1234567890ABCDEFG ", writing these data before file unit, write the current time " 20110131085731 " earlier, write these data then, write and write " 20110131085731 " after finishing again.If next time is when reading, find that the magic number before and after the data is identical, illustrate that then data write operation finishes, because the content that writes at first with write at last is identical, the content that writes of centre also should be finished so; Otherwise illustrate and loss of data occurred.
Step 34: when certain band of reading and writing of files level RAID, find the phenomenon that the affiliated file unit of this band new and old file unit occurs and deposits, illustrate that then Write hole may appear in this band, need this band is recovered to handle.
Particularly, under the new and old file unit and the situation of depositing, mistake may occur in the subsequent calculations process, at this moment, write hole problem just may appear in this band that comprises new and old file unit.In order to solve write hole problem, the data of described band correspondence need be returned to before the data write operation or after data write operation finishes, promptly need data with this band correspondence to return to before the data writing operation or after correctly writing, this band is recovered to handle.
Wherein be the daily record mode because write data adopts, promptly revise certain file unit before, such as the file of this document unit ABC.data by name, with revising the filename of this file unit earlier, such as being revised as ABC.data.bak; The data that will newly write are write among the file ABC.data by name again then.When the file at file unit place is closed or file unit place band has all been write or the user initiatively requires content among the synchronous Cache to hard disk (need descend electricity etc. such as system), will delete legacy data, promptly delete ABC.data.bak.If (system that generally all occurs in power on back visit certain file unit of certain file for the first time) finds to preserve and has new and old file unit on the hard disk of data and deposit sometime, be that ABC.data.bak and ABC.data exist, write hole may appear in shows slice so.
Particularly, for RAID1, choosing file unit modification time file unit up-to-date and extended attribute A=B, to recover mirror image right.Wherein, recover mirror image, promptly allow this group file unit mirror image again the data unanimity to for cover the image file unit of this document unit with the content of selected file unit.
For RAID5 or RAID6, if up-to-date and complete file unit can satisfy big WriteMode, will adopt big WriteMode to recover the band consistance again, promptly recover raw data; If up-to-date and complete file unit can not satisfy big WriteMode, will adopt little WriteMode to recover the band consistance again, promptly recover raw data.That is to say, when new data is correct, just adopt big WriteMode to recover the band consistance, when legacy data is correct, just recover the band consistance again with little WriteMode.
After the band consistance is recovered successfully, will delete no file unit.
Fig. 8 is the apparatus structure synoptic diagram of third embodiment of the invention, comprises writing module 81 and removing module 82; Writing module 81 is used for when the file unit of RAID writes new data, described new data are kept in first file unit, described first file unit is different with second file unit, wherein, described second file unit is used to preserve old data, described new data are used to upgrade described old data, and described RAID adopts file mode to manage; Removing module 82 is used for being closed or after the described first file unit place band all write, deleting described old data in described RAID corresponding file.
Described removing module 82 judges that in the following way the first file unit place band all write: if first magic number that writes down in the extended attribute of the file unit of described RAID is identical with second magic number, the then described first file unit place band has all been write, wherein, described first magic number is the attributes of data when beginning to write the file unit of described RAID, described second magic number be data write when having write in the extended attribute with the identical value of described first magic number.
For example, write data " 1234567890ABCDEFG ", writing these data before file unit, write the current time " 20110131085731 " earlier, write these data then, write and write " 20110131085731 " after finishing again.If next time is when reading, find that the magic number before and after the data is identical, illustrate that then data write operation finishes, because the content that writes at first with write at last is identical, the content that writes of centre also should be finished so; Otherwise illustrate and loss of data occurred.
Present embodiment can also comprise the recovery module, there is the file unit that comprises new data and old data by the band read and write if be used for RAID, this band of being read and write is recovered to handle, return to the data of the described band correspondence of being read and write before the data write operation or after data write operation finishes; Described recovery is handled and comprised: if adopt RAID1, then modification time up-to-date and file unit that first magic number is identical with second magic number in select File unit carries out mirror image to recovering; If adopt RAID5 or RAID6,, will adopt big WriteMode to recover again, otherwise employing small letter mode is recovered if up-to-date and complete file unit can satisfy big WriteMode.
Present embodiment can be kept at new data in the file unit different with old data by adopting file-level RAID, can solve the sky problem of writing according to the inherent attribute of file, does not need to increase other memory device, can reduce cost.
Be understandable that the reference mutually of the correlated characteristic in said method and the equipment.In addition, " first " in the foregoing description, " second " etc. are to be used to distinguish each embodiment, and do not represent the quality of each embodiment.
One of ordinary skill in the art will appreciate that: all or part of step that realizes said method embodiment can be finished by the relevant hardware of programmed instruction, aforesaid program can be stored in the computer read/write memory medium, this program is carried out the step that comprises said method embodiment when carrying out; And aforesaid storage medium comprises: various media that can be program code stored such as ROM, RAM, magnetic disc or CD.
It should be noted that at last: above embodiment only in order to technical scheme of the present invention to be described, is not intended to limit; Although with reference to previous embodiment the present invention is had been described in detail, those of ordinary skill in the art is to be understood that: it still can be made amendment to the technical scheme that aforementioned each embodiment put down in writing, and perhaps part technical characterictic wherein is equal to replacement; And these modifications or replacement do not make the essence of appropriate technical solution break away from the spirit and scope of various embodiments of the present invention technical scheme.