CN102023820B - Method for constructing double-disk fault-tolerant vertical packet parallel access disk array - Google Patents
Method for constructing double-disk fault-tolerant vertical packet parallel access disk array Download PDFInfo
- Publication number
- CN102023820B CN102023820B CN2010105756252A CN201010575625A CN102023820B CN 102023820 B CN102023820 B CN 102023820B CN 2010105756252 A CN2010105756252 A CN 2010105756252A CN 201010575625 A CN201010575625 A CN 201010575625A CN 102023820 B CN102023820 B CN 102023820B
- Authority
- CN
- China
- Prior art keywords
- data
- block
- disk
- band
- sub
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Landscapes
- Signal Processing For Digital Recording And Reproducing (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to a method for constructing a double-disk fault-tolerant vertical packet parallel access disk array SVE-RAID 6, which is applicable to continuous data storage and belongs to the technical field of independent disk redundant arrays. In the invention, a double-disk fault-tolerant vertical packet parallel access disk array SVE-RAID 6 is designed according to the characteristics of continuous data storage mainly through the following steps: division and layout of strips, data blocks, check blocks and data sub-blocks of the disk array; disk working state conversion scheduling policy; optimization processing method during data wiring; and failed disk processing method and the like. The disk array SVE-RAID 6 not only has the advantages that the redundant data protection mechanism is excellent, the power consumption can be obviously reduced and the service life of the disk can be prolonged.
Description
Technical field
The present invention relates to a kind of RAID (Redundant Arrays of IndependentDisks; RAID) construction method; Be particularly related to the construction method of a kind of pair of disk tolerance vertical-type grouping concurrent access disk array; Be applicable to the continuous data storage, belong to the RAID technical field.
Background technology
Field of storage in modern times; For the reliability that improves the storage data and the input, the output performance that improve storage system; People have designed the several data storage scheme; The normally various types of RAIDs of these data storage scheme (Redundant Arrays of Independent Disks, RAID).Through using specific hardware or software, RAID joins together a plurality of physical storage devices such as disk, forms a unified logical memory device.
Make an explanation in the face of technical term commonly used among the RAID down:
Band: be called Stripe again; Being the set of the relevant piecemeal in position on the different disk of disk array, is the unit that organizes stick on the different disk.
Striping: be called Striping again; Be meant the data block that is divided into one section continuous data identical size, be written to every segment data respectively the method on the different disk of disk array.
Disk mirroring is meant on copy source data to or the more disk.
Error correction is meant utilizes certain computing, like XOR, generates and preserve redundant data, redundant data capable of using, the data of losing or makeing mistakes on the regeneration disk.
RAID relatively more commonly used has RAID 0, RAID 1, RAID 5, RAID 6, RAID 10 etc.Wherein RAID 0 does not have redundant ability, and RAID 1 has just done mirror image to disk, and other 3 kinds of arrays are made up of a plurality of disks respectively, and they are with the disk write data of mode in array of band, and checking data leaves on each disk in the array.Each band of RAID 5 contains 1 check block, supports to damage one of them disk arbitrarily, come data reconstruction through the check block on other disk; Each band of RAID 6 contains 2 check blocks, supports to damage arbitrarily wherein two disks, comes restore data through the check block on other disk; RAID 10 carries out mirror image to disk groups earlier; Again disk is carried out striping; So it does not contain parity block; After a disk failures, pass through its corresponding mirror drive restore data, 50% disk failures under maximum permission diverse locations under the perfect condition, the situation of worst is to damage simultaneously with a pair of mirrored disk.
6 couples of the RAID of P+Q mode carry out XOR with the data block in the band, generate the checking data P of this band, utilize the multiplication and the addition (XOR) of galois field (GF), generate the checking data Q of this band, suppose that the data block in the band is D
0And D
1, available formula 1, formula 2 generate check block P and Q respectively:
Wherein
The expression XOR, * representes galois field multiplication, A
0, A
1Galois field multiplication coefficient for inequality.
When any two disks break down in the array, can through type 1 with formula 2, realize that data recover, be implemented as follows:
1, if check block P, Q place disk breaks down, only need by with the data block in the band, respectively according to generating P, the formula 1 of Q, formula 2, calculate P, Q gets final product;
2, if check block Q and arbitrary data block place disk break down, might as well establish this data block is D1, at first utilizes check block P to recover D
1, derive by formula 1:
3, if check block P and arbitrary data block place disk break down, might as well establish this data block is D
1, at first utilize check block Q to recover D
1, derive by formula 2:
So recover D
1As follows:
Wherein/and be the Jia Luohua division, D
1After success recovers, utilize formula 1 to recover proof mass P again;
4, if any two data blocks place disk breaks down, might as well establish these two data blocks is D
0And D
1, get by formula 1
D
1Being updated to formula 2 gets:
D
0After the recovery, utilize formula 1 to recover D again
1
For data block and check block P and the Q in 6, one bands of RAID of P+Q mode, disperse to be stored in the array on the different storage devices; And data block is bigger; Guaranteed that most of I/O operation can both drop in the data block of a disk, checking data also is a distributed store, has increased the concurrent possibility of a plurality of I/O so greatly; And because a plurality of disks are united the redundant data of using two disk sizes; So the required redundant data amount of unit valid data is less, especially when comprising a plurality of disk among the RAID 6, this advantage is more obvious.Like the RAID 6 that 10 disks constitute, redundant data accounts for 2 disk sizes, for all storing 1/5 of data.Because RAID 6 has two disk tolerance characteristics, and has good comprehensive performances, so very high to the data security requirement, also higher to performance requirement simultaneously field has obtained widespread use.
The disk duty comprises: stop, operation, ready three kinds of states.
During the disk halted state, disk spindle stops the rotation, magnetic head stops tracking;
During the disk running status, disk is being carried out read-write operation, comprises rotating shaft rotation, magnetic head tracking;
During the disk ready state, expression will be read and write, and disk spindle rotation this moment, magnetic head be tracking not.
Disk array commonly used in order to improve data access speed and to satisfy the concurrent demand of I/O, is dispatched all disk concurrent workings to satisfy high-performance usually.Yet; In the continuous data field of storage, in application such as video monitoring, VTL (VTL), continuous data protection (CDP), continuous data stream is written in the disk in proper order; What carry out is sequential storage; The address of promptly storing data is continuous, satisfies the principle of locality of storage space visit, does not therefore have the concurrent demand of I/O.And disk array commonly used; The RAID 6 that comprises the P+Q mode; In order to improve data access speed and to satisfy the concurrent demand of I/O; Disperse continuous data to store on the different disks, member's disk of RAID remains duty, and high I/O handling capacity that these disk concurrent workings are provided and I/O concurrency can not be utilized by the application of continuous data storage fully.In the continuous data storage was used, all member's disks were simultaneously in running order, not only waste a large amount of energy, and disk keeps continuous running status, has shortened the serviceable life of disk.
Along with the disk development of technology, the continuous readwrite bandwidth of monolithic disk is significantly improved, and stores for continuous data; If can when having the redundant data protection mechanism, make full use of the transmission bandwidth of monolithic disk, according to the demand of practical application; Make a part of disk work in the array; Accomplish store tasks, other idle disk scheduling to halted state, after the current memory block (disk marks off several big memory blocks) of the disk of duty is used up; Be dispatched to halted state, original idle disk then gets into duty.Disk spindle and magnetic head motor quit work, and not only can practice thrift mass energy, and because disk needn't keep continuous operation, have prolonged the serviceable life of disk.
The energy consumption of disk under the different operating state seen table 1.
The energy consumption of table 1 disk under the different operating state
The disk duty | Read-write operation | Idling conditions | Halted state |
Average power consumption | 11.5W | 8.5W | Be about 0W |
Summary of the invention
The objective of the invention is to overcome the deficiency that prior art exists, satisfying under the prerequisite of setting memory property, propose a kind of construction method that is suitable for the energy-saving disk array of continuous data storage.
The invention provides the construction method of a kind of pair of disk tolerance vertical-type grouping concurrent access disk array, two disk tolerance vertical-type grouping concurrent access disk arrays abbreviate disk array SVE-RAID 6 as.
The objective of the invention is to realize through following technical proposals:
The construction method of a kind of disk array SVE-RAID 6 may further comprise the steps:
One, disk array is carried out the division and the layout of band, data block, check block and data sub-block, concrete grammar is: comprise in the scene of N disk at disk array, wherein N >=4 and N are positive integer; N disk is configured to the disk matrix of 1 row * N row; Whole disks are carried out band divide, be divided into N band and, N storage block arranged on each band for each band serial number; Wherein 1 is the P check block; 1 is the Q check block, and N-2 is data block in addition, and check block P carries out XOR by the data block of the N-2 in the same strap and draws; Check block Q carries out the galois field multiplication by the data block of the N-2 in the same strap and additive operation draws, and the generation formula of P, Q is following:
D wherein
1, D
2..., D
N-2Be the data block of the N-2 in the same band,
The expression XOR, A
1, A
2..., A
N-2Be the galois field multiplication coefficient of inequality, * representes the galois field multiplication.
(wherein i representes the sequence number of this storage block place band for i, a j) storage block among the expression disk array SVE-RAID 6, and j representes to belong to the sequence number of disk, and (i j) is positioned on the disk j 1≤i, j≤N to storage block X with X; Check block P in the i band representes with P (i), with storage block X (i j) has following relation:
Check block Q in the i band representes with Q (i), with storage block X (i j) has following relation:
Q(i)=X(i,N+1-i)(6)
Other storage block is a data block, and (wherein u representes the sequence number of data block place band for u, v) expression, and 1≤u≤N, v represent the sequence number (ignoring check block P and Q) of the data block of this data block in affiliated band, and sequence number is since 1,1≤v≤N-2 with D.Data block D (u, v) the corresponding relation with storage block satisfies formula 7:
In order to improve continuous memory bandwidth; And acquisition different successive memory bandwidth; To satisfy the demand of different memory rates; N-2 on each a band data block is divided into p (p >=2 and p are positive integer) group, and every group comprises the individual data block of q (q >=1 and q are positive integer, and the q value is confirmed the demand of continuous bandwidth according to the actual storage task); And relation: p * q=N-2 below satisfying.
Each data block is divided into K equal-sized data sub-block, and K >=2 and K are positive integer, are followed successively by the sub-block number of total data according to group number then, and the data sub-block that group number is identical is according to the band serial number.That is: the 1st the sub-block number of data is 1 in the 1st data block of the 1st group of first band; Be that the 1st the sub-block number of data is 2 in the 2nd data block in the 1st group of first band; Be that the 1st the sub-block number of data is q in q the data block in the 1st group of first band; Be that the 2nd the sub-block number of data is q+1 in the 1st data block in the 1st group of first band then; Be that the 2nd the sub-block number of data is q+2 in the 2nd data block of the 1st group of first band; Be that the 2nd the sub-block number of data is 2 * q in q the data block in the 1st group of first band; By that analogy, be that K the sub-block number of data is (K-1) * q+1 in the 1st data block in the 1st group of first band; Be that K the sub-block number of data is (K-1) * q+2 in the 2nd data block of the 1st group of first band; Be that K the sub-block number of data is K * q in q the data block in the 1st group of first band;
Be that the 1st the sub-block number of data is K * q+1 in the 1st data block of the 1st group of second band then; Be that the 1st the sub-block number of data is K * q+2 in the 2nd data block of the 1st group of second band; Be that the 1st the sub-block number of data is (K+1) * q in q the data block of the 1st group of second band; Be that the 2nd the sub-block number of data is (K+1) * q+1 in the 1st data block of the 1st group of second band then; Be that the 2nd the sub-block number of data is (K+1) * q+2 in the 2nd data block of the 1st group of second band; Be that the 2nd the sub-block number of data is (K+2) * q in q the data block of the 1st group of second band; By that analogy, be that K the sub-block number of data is 2 * K * q-q+1 in the 1st data block of the 1st group of second band; Be that K the sub-block number of data is 2 * K * q-q+2 in the 2nd data block of the 1st group of second band; Be that K the sub-block number of data is 2 * K * q in q the data block of the 1st group of second band;
In like manner be that (the 1st the sub-block number of data is (i-1) * K * q+1 to i in the 1st data block that the band of 1≤i≤N) is the 1st group; Be that the 1st the sub-block number of data is (i-1) * K * q+2 in the 2nd data block of the 1st group of i band; Be that the 1st the sub-block number of data is (i-1) * K * q+q in q the data block of the 1st group of i band; Be that the 2nd the sub-block number of data is (i-1) * K * q+q+1 in the 1st data block of the 1st group of i band then; Be that the 2nd the sub-block number of data is (i-1) * K * q+q+2 in the 2nd data block of the 1st group of i band; Be that the 2nd the sub-block number of data is (i-1) * K * q+q * 2 in q the data block of the 1st group of i band; By that analogy, be that K the sub-block number of data is i * K * q-q+1 in the 1st data block of the 1st group of i band; Be that K the sub-block number of data is i * K * q-q+2 in the 2nd data block of the 1st group of i band; Be that K the sub-block number of data is i * K * q in q the data block of the 1st group of i band;
By that analogy, according to identical numbering principle, for the 2nd group of data sub-block of each band number, for the 3rd group of data sub-block of each band number ..., number for the p of each band group data sub-block.Number adjacent data sub-block, its logical address is adjacent.
The band sequence number is adjacent, the identical group of group sequence number, and its logical address is adjacent, the adjacent group of group sequence number in the N band and first band, and logical address is adjacent, and the group that logical address is adjacent is called adjacent set.
Each check block P, check block Q also are divided into K equal-sized syndrome piece; Syndrome piece among check block P, the Q is drawn through XOR, galois field multiplication and addition respectively by N-2 identical data sub-block of deviation post in the band of place; The syndrome piece of this N-2 data sub-block and generation thereof constitutes a verification bar.
Above-mentioned layout type, but q data block place disk in the concurrent access group makes store tasks be distributed in one group again, and being convenient to dispatch all the other does not so temporarily have the disk of task to enter into halted state, to reach the purpose energy-conservation and loss of reduction disk.
Two, according to following strategy, the scheduling disk carries out the duty conversion:
For being provided with a semaphore, each disk (representes that with Semi 1≤i≤N), the initial value of signalization amount Semi is 0.
The 1st step:, calculate affiliated band u of this data place data block and affiliated group (representing) with r according to the logical address (representing) of visit data with LBA.Through type 8 obtains:
Wherein, organize the big or small sum of each data block in the big or small expression group, MOD representes modular arithmetic, adds 1 expression band numbering and group # all since 1.
Then, confirm q data block D (u, r * q-q+1), D (u in the u band r group with formula 7; R * q-q+2) ..., D (u; The disk at the place of r * q-q+q) confirms that with formula 5 disk at this band check block P (u) place is N-u (or N), and the disk that belongs to formula 6 definite this band check block Q (u) is N+1-u; Above disk is handled respectively:, then be dispatched to ready state if disk is in halted state.
Band sequence number u and group sequence number r are respectively applied for the band sequence number and group sequence number that the record current accessed is operated.
The 2nd step: q data block place disk in u band r group, u band check block P (u), when Q (u) place disk is in ready state, begin this q+2 of concurrent access disk, the disk that quilt is visited transfers running status to by ready state.
The 3rd step: if this visit finishes in u band r group; After then this visit finishes; Disk transfers ready state to by running status under check block P (u) under q the data block in the u band r group in disk, the u band, the Q (u), withdraws from this scheduling then; Otherwise; The t moment (t is that disk transfers the needed time of ready state to by halted state) before u band r group access finishes; Confirm band numbering (representing) and the group # (representing) of adjacent set with R with I; And check block P (I), check block Q (I) place disk in disk, the I band under q data block in the I band R group handled respectively: be in halted state as if disk, then be dispatched to ready state; If disk is in running status, then the semaphore of this disk adds 1.After u band r group access finishes; Disk under check block P (u) in disk, the u band under q the data block in the u band r group, the Q (u) is handled respectively: the semaphore of disk is subtracted 1; If the semaphore of this disk is less than 0; Then transfer this disk to halted state by running status, and the semaphore of resetting this disk is 0; Otherwise, transfer this disk to ready state by running status.
The 4th step: be updated to I to the u value of record current accessed place band sequence number, the value of the r that writes down current accessed place group sequence number is updated to R, repeat the 2nd and go on foot the 4th and go on foot.
According to above strategy; When carrying out continuous read-write operation, in the overwhelming majority times, have only one group of data block place disk and 2 check blocks place disks (q+2 disk) to be in running status; All the other temporarily do not have the disk of task to be scheduled for halted state, to reach purpose of energy saving.
Preferably, when carrying out continuous read operation, check block P, Q place disk quit work.
For each write operation owing to only upgrade the q piece disk at one group of data block place and with the data on the check block place disk in the band, execution be to read to rewrite, promptly be written in parallel to r (q the new data D of group of 1≤r≤p)
1, D
2..., D
qThe time, need to generate new checking data P, Q on the affiliated verification bar of this group new data with formula 9,10:
Wherein, D '
iBe new data D in the disk
iLegacy data on the writing position, P ', Q ' are legacy data D '
iOld checking data on the place verification bar also is the old checking data on new checking data P, the Q writing position, A
R * q-q+iBe data D
iThe Jia Luohua multiplication coefficient, subscript r * q-q+i representes the data D in the r group
iNumbering in affiliated verification bar in all data sub-block.
In order to generate new checking data P, Q, except carrying out outside the XOR, Jia Luohua multiplication (when hardware is realized time very short, can ignore), must first parallel read-out legacy data D '
i(1≤i≤q) and old checking data P ', Q '; As far as single disk; Each write operation all comprises to be read legacy data earlier and writes two operations of new data again, in order to reduce the switching times of magnetic head reading and writing data, makes full use of the disk bandwidth; Said method further comprises: the optimization processing method of write operation is specially:
Set up a new data buffer zone, the buffer memory application program writes the new data of disk array; Two new checking data buffer zones, two types of new checking datas of buffer memory generation respectively; Q legacy data that disk is read in advance from array deposited in data pre-head district, q Geju City respectively; Two Geju City checking datas are read the district in advance, deposit two types of old checking datas from array, reading in advance respectively.Each reads to distinguish equal and opposite in direction in advance, and much larger than the new data buffer zone.
The sub-piece of legacy data that the sub-piece of each new data is corresponding, two kinds of sub-pieces of old checking data of identical verification bar have been read corresponding reading in advance in advance and have been distinguished before write operation.
Carry out following operation steps and generate new checking data P:
The 2.1st step: when the new data buffer zone is not sky, from the new data buffer zone, take out q data sub-block at every turn, be split as the data sub-block D that belongs to the q piece disk in the group r
1, D
2..., D
qAfter, be written in parallel to q piece disk, be sent to the XOR device simultaneously;
The 2.2nd step: the sub-piece D ' of legacy data of correspondence in the data pre-head district, q Geju City
1, D '
2...., D '
q, be sent to the XOR device, carry out XOR with the data sub-block that the 2.1st step was sent here, generate data sub-block X
1, X
2..., X
q, wherein
1≤i≤q;
The 2.3rd step: data sub-block X
1, X
2..., X
q, being sent to the XOR device, old checking data is read to distinguish the sub-piece P ' of old checking data corresponding in 1 in advance and is sent to the XOR device, and XOR generates the sub-piece P of new checking data,
And it is temporary to new checking data buffer zone 1;
The 2.4th step: the sub-piece P ' of old checking data that upgrades in the corresponding verification bar is the sub-piece P of new checking data.
Carry out following operation steps and generate new checking data Q:
The 3.1st step: from the new data buffer zone, take out q data sub-block, be split as the data sub-block D that belongs to the q piece disk in the group r
1, D
2..., D
qAfter, multiply by (Jia Luohua takes advantage of) corresponding coefficient A respectively
R * q-q+i, long-pending A
R * q-q+i* D
iBe sent to the XOR device, 1≤i≤q;
The 3.2nd step: the sub-piece D ' of legacy data of correspondence in the data pre-head district, q Geju City
1, D '
2..., D '
q, multiply by (Jia Luohua takes advantage of) corresponding coefficient A respectively
R * q-q+i, long-pending A
R * q-q+i* D '
iBe sent to the XOR device, with the 3.1st long-pending A that sends here of step
R * q-q+i* D
iCarry out XOR, generate data sub-block
1≤i≤q;
The 3.3rd step: the sub-piece of variance data
is sent to the XOR device; Old checking data reads to distinguish the sub-piece Q ' of old checking data corresponding in 2 in advance; Be sent to the XOR device; Generate the sub-piece Q of new checking data with
XOR;
is temporary to new checking data buffer zone 2,1≤i≤q;
The 3.4th step: the sub-piece Q ' of old checking data that upgrades in the corresponding verification bar is the sub-piece Q of new checking data.
If write data initial logical address LBA can not be divided exactly by the size of q data sub-block; And during the individual data sub-block of surplus k (k is a positive integer); Or in the new data buffer zone when difference k and not enough q data sub-block, then only get this k data sub-block and write corresponding disk, and utilize the legacy data and the old checking data of same verification bar correspondence; Generate new checking data according to above method and get final product, all the other q-k disk does not have writing task on the same group.
Owing to can directly read sub-piece of legacy data and the sub-piece of old checking data from reading the district in advance; Disk just can be carried out write operation continuously, have only data sub-block in reading in advance to distinguish to use up after, just once read in advance to fill up and read the district in advance; Through reading in advance; The read operation of a plurality of dispersions is converted into a continuous read operation, and the tracking that has reduced magnetic head moves number of times, can be provided with according to actual conditions and read the district in advance for enough greatly.
According to above analysis,, estimate that easily the bandwidth of writing continuously of this disk array SVE-RAID 6 is about qBW when reading the district in advance enough greatly the time
Disk/ 2, wide about qBR continuously reads tape
Disk, BW wherein
DiskWrite bandwidth, BR continuously for the monolithic disk
DiskThe continuous tape reading of monolithic disk is wide, and q is disk number in the group.
Said method further comprises: the disposal route of failed disk being carried out reading and writing data; Be specially:
Break down as disk, in the time of can't carrying out the normal data read-write, activate all disks entering ready states among the disk array SVE-RAID6 this moment, suspend the duty of using said disk duty control method control disk array SVE-RAID 6.
With the band is unit; Disposal route when storage block breaks down in the band is discussed, because disk failure shows as disk array and is distributed in all the storage block faults on this failed disk; To the processing of disk failure, can be through handling the realization of the storage block fault on each band in the array respectively.
When the storage block in band breaks down, can be divided into following 3 kinds of situation and handle:
Situation 1: if only check block breaks down, the data on the piece that continues to read and write data, but be in substance redundant data verification state (1 check block fault) or irredundant data check state (2 check block faults) this moment; After changing fault check block place disk, regenerate check block.
Situation 2:, the reading and writing operation is handled respectively if only data block (1 or 2) breaks down:
Read operation: utilize with other data block of band and the check block P or Q (1 the data block fault) computing of this band, or with P and Q (2 data block faults) computing, generate the data that quilt is read on this data block; After changing fault data piece place disk, use identical method recovered failure data block.
Write operation: 2 kinds of processing modes are arranged:
The 1st kind of processing mode: directly write data is write on the P or Q (1 data block fault) of this band, or on P and the Q (2 data block faults), and write down the check block that writes, behind the disk of replacing fault data piece place, carry out reduction of data and regeneration checking data.Reduction of data promptly is moved back into the origin-location to the data block that is deposited with in the check block, and the regeneration checking data is the check block that writes that regenerates record.
The 2nd kind of processing mode: use subsequent use dish to keep in to write the data on the failed disk, after changing failed disk, be moved back into the origin-location to the data block that is deposited with in the subsequent use dish again.
Situation 3: if 1 data block and 1 check block break down, the disposal route of fault data piece is identical when only 1 data block breaks down, and just is in substance redundant data guard mode this moment; After the fault data piece recovers,, identical when only 1 check block breaks down for the disposal route of fault check block.
After changing normal condition over to, launch said disk duty control method once more the duty of disk array SVE-RAID 6 is controlled.
Beneficial effect
The construction method of a kind of disk array SVE-RAID 6 of the present invention compares with prior art, has the following advantages:
1, significantly reduces energy resource consumption.Its data layout has dual redundant data protection mechanism; Take grouping strategy simultaneously; The overwhelming majority has only one group of in running order and working time long enough of disk the times, and making mostly in the array does not have the disk of task to be in halted state, to reach purpose of energy saving.
2, can select continuous memory bandwidth according to mission requirements.After taking grouping strategy, can set the q value based on mission requirements, the q piece disk in the concurrent access group adopts data buffering and the measure of reading in advance to improve simultaneously and reads to rewrite to writing the influence of bandwidth, therefore bigger continuous readwrite bandwidth can be provided.
3, fully increase the service life.Owing to temporarily do not have the disk of task to be scheduled for halted state among the disk array SVE-RAID 6; Stopped the high speed rotating of disk spindle; And disk starts, stand-by time is fully long at interval, so can reduce probability that disk breaks down and fully prolong array serviceable life.
Description of drawings
Fig. 1 is disk, band, data block, the check block numbering synoptic diagram among the disk array SVE-RAID 6 in the embodiment of the present invention;
Fig. 2 is the synoptic diagram that divides into groups among the disk array SVE-RAID 6 in the embodiment of the present invention;
Fig. 3 is data sub-block and a syndrome block number synoptic diagram among the disk array SVE-RAID 6 in the embodiment of the present invention;
Fig. 4 is that the disk duty in the embodiment of the present invention shifts synoptic diagram;
Fig. 5 is for generating the optimization synoptic diagram of new checking data P in the embodiment of the present invention during write operation;
Fig. 6 is for generating the optimization synoptic diagram of new checking data Q in the embodiment of the present invention during write operation;
Embodiment
Below in conjunction with accompanying drawing and specific embodiment the present invention is elaborated.
In the present embodiment, disk array SVE-RAID 6 is made up of 6 Seagate Cheetah 10K disks, and its correlation parameter is seen table 2.6 disks are carried out band divide, as shown in Figure 1, be divided into 6 bands, be numbered 1~6.6 storage blocks are arranged on each band, and wherein 2 storage blocks are check block, and other 4 storage blocks are data block; The the 1st, 6 storage block in the 6th band; The the 5th, 6 storage block in the 4th, 5 storage block, the 1st band in the 3rd, 4 storage block, the 2nd band in the 2nd, 3 storage block, the 3rd band in the 1st, 2 storage block, the 4th band in the 5th band is respectively check block P, the check block Q of this band; Other storage block is a data block; 4 data blocks in the same band are carried out XOR, generate the check block P of this band, carry out galois field multiplication and XOR, generate the check block Q of this band, calculate as follows:
Wherein
Be XOR, * is the galois field multiplication, A
0, A
1, A
2, A
3For the galois field multiplication coefficient of inequality, for galois field GF (2
8), might as well get A
0=1, A
1=2, A
2=4, A
3=8.In like manner:
The correlation parameter of table 2Seagate Cheetah 10K disk
In order to improve continuous memory bandwidth, 4 data blocks on each band are divided into 2 groups, every group comprises 2 data blocks, and the grouping situation is as shown in Figure 2.
Data sub-block is designed to 4KB, so each data block is divided into K=13,107,200 data sub-block are followed successively by the sub-block number of total data according to group number then, and the data sub-block that group number is identical is according to the band serial number, as shown in Figure 3.Promptly the 1st the sub-block number of data is 1 in the 1st data block of the 1st group of first band; Be that the 1st the sub-block number of data is 2 in the 2nd data block in the 1st group of first band; Be that the 2nd the sub-block number of data is 3 in the 1st data block in the 1st group of first band then; Be that the 2nd the sub-block number of data is 4 in the 2nd data block of the 1st group of first band; By that analogy, be that K the sub-block number of data is 2 * K-1 in the 1st data block in the 1st group of first band; Be that K the sub-block number of data is K * 2 in the 2nd data block of the 1st group of first band;
Be that the 1st the sub-block number of data is K * 2+1 in the 1st data block of the 1st group of second band then; Be that the 1st the sub-block number of data is K * 2+2 in the 2nd data block of the 1st group of second band; Be that the 2nd the sub-block number of data is K * 2+3 in the 1st data block of the 1st group of second band then; Be that the 2nd the sub-block number of data is K * 2+4 in the 2nd data block of the 1st group of second band; By that analogy, be that K the sub-block number of data is 4 * K-1 in the 1st data block of the 1st group of second band; Be that K the sub-block number of data is 4 * K in the 2nd data block of the 1st group of second band;
By that analogy, the 1st the sub-block number of data is 10 * K+1 in the 1st data block of the 1st group of the 6th band; Be that the 1st the sub-block number of data is 10 * K+2 in the 2nd data block of the 1st group of the 6th band; Be that the 2nd the sub-block number of data is 10 * K+3 in the 1st data block of the 1st group of the 6th band then; Be that the 2nd the sub-block number of data is 10 * K+4 in the 2nd data block of the 1st group of the 6th band; By that analogy, be that K the sub-block number of data is 12 * K-1 in the 1st data block of the 1st group of the 6th band; Be that K the sub-block number of data is 12 * K in the 2nd data block of the 1st group of the 6th band;
By that analogy, according to identical numbering principle, for the 2nd group of data sub-block of each band numbered.Number adjacent data sub-block, its logical address is adjacent.The identical group of sequence number in the sequence number adjacent ribbons, its logical address is adjacent, the adjacent group of sequence number in the 6th band and the 1st band, logical address is adjacent.
Each check block also is divided into K equal-sized syndrome piece, said syndrome piece P (1,1) ..., P (1; K) ..., P (6; 1) ..., P (6, K) by the place band each data block in 4 identical data sub-block of deviation post draw said syndrome piece Q (1 through XOR; 1) ..., Q (1; K) ..., Q (6,1) ..., Q (6, K) by the place band each data block in 4 identical data sub-block of deviation post multiply by (Jia Luohua multiplication) coefficient A respectively
0, A
1, A
2, A
3After, carry out XOR again and draw, form a verification bar like syndrome piece P (1,2), the Q (1,2) of data sub-block 3,4,12 * K+3,12 * K+4 and generation thereof, as shown in Figure 3.
For each disk is provided with a semaphore Sem i, 1≤i≤6, initial value is 0, and when normal continuous data was visited, disk duty scheduling instance was following:
Instance 1: certain visit is initial, the end address is all the 1st group of the 3rd band.
Logical address according to visit data; Obtain the data start address and be positioned at the 1st group of the 3rd band; On disk 1,2, two check blocks of this band are positioned on the disk 3,42 data blocks in the 1st group of further the 3rd band, if disk 1,2,3 or 4 is in halted state respectively; Then be dispatched to ready state, among Fig. 4 1. shown in; When beginning to visit the data on the disk 1,2,3,4, disk 1,2,3,4 transfers running status to by ready state, among Fig. 4 2. shown in; Because this visit finishes in the 1st group of the 3rd band, after this visit finished, disk 1,2,3,4 transferred ready state to by running status, among Fig. 4 3. shown in, withdraw from this scheduling then.
Instance 2: certain visit start address is the 1st group of the 6th band, and the end address is the 2nd group of 1 band.
Logical address according to visit data; Obtain the data start address and be positioned at the 1st group of the 6th band; Further the 1st group of 2 the interior data blocks of the 6th band that get are respectively on disk 2,3; Two check blocks of this band are positioned on the disk 1,6, if disk 2,3,1 or 6 is in halted state, then are dispatched to ready state; When beginning to visit the data on the disk 2,3,1,6, disk 2,3,1,6 transfers running status to by ready state; Because this visit does not finish in the 1st group of the 6th band; Therefore t judged in the 2nd group of adjacent set the 1st band duty of 2 data blocks at two check blocks place disks 5,6 of disk 3,4 and this band, because disk 3,6 is in running status constantly before the 6th band the 1st group access finished; Its semaphore adds 1; Get Sem 3=Sem 6=1, disk 4,5 is in halted state, so change ready state over to; After the 6th band the 1st group access finishes; Semaphore Sem 1, Sem 2, Sem 3, Sem 6 subtract 1, get Sem 1=Sem 2=-1, Sem 3=Sem 6=0; So disk 1,2 transfers halted state to by running status; Among Fig. 4 4. shown in, and recover Sem 1=Sem 2=0, disk 3,6 transfers ready state to by running status.
When beginning to visit the data on the disk 3,4,5,6, disk 3,4,5,6 forwards running status to by ready state; Because this visit finishes in the 2nd group of the 1st band, after then this visit finished, disk 3,4,5,6 transferred ready state to by running status, withdraws from this scheduling then.
For normal continuous data visit, in the overwhelming majority time, this disk array has only 4 disk work, and power consumption is about same disc and counts 4/6 of RAID 6.
If this array is made up of disk Seagate Cheetah 10K; Parameter is seen table 2; Disk size is 300GB; The minimum interval that calculates the startup of monolithic disk is visit 300GB/6 * required time of 2=100GB data, so the startup of disk interval and number of times can be ignored the influence in disk life-span.
Set up a new data buffer zone, the buffer memory application program writes the new data of disk array; Two new checking data buffer zones, two types of new checking datas of buffer memory generation respectively; 2 legacy datas that disk is read in advance from array are deposited in data pre-head district, 2 Geju City respectively; 2 Geju City checking datas are read the district in advance, deposit two types of old checking datas from array, reading in advance respectively.Each reads to distinguish equal and opposite in direction in advance, is 10 times of new data buffer zone.
The sub-piece of legacy data on the sub-piece writing position of new data, before write operation, read the corresponding district of reading in advance in advance with two types of sub-pieces of old checking data of the identical verification bar of the sub-piece of legacy data.
Carry out following operation steps and generate new checking data P:
The 2.1st step: when the new data buffer zone is not sky, from the new data buffer zone, take out 2 data sub-block at every turn, be split as the data sub-block D that belongs to 2 data blocks in the 2nd group (is example with the 2nd group)
1, D
2After, be written in parallel to 2 disks, be sent to the XOR device simultaneously, among Fig. 5 1.;
The 2.2nd step: the sub-piece D ' of legacy data of correspondence in the data pre-head district, 2 Geju City
1, D '
2, be sent to the XOR device, with data sub-block D
1, D
2Carry out XOR, generate data sub-block
Among Fig. 5 2.;
The 2.3rd step: data sub-block X
1, X
2Be sent to the XOR device, 3., the sub-piece P ' of old checking data corresponding during old checking data is read to distinguish in advance is sent to the XOR device among Fig. 5, and 4., XOR generates the sub-piece P of new checking data among Fig. 5,
Temporary to new checking data buffer zone 1, among Fig. 5 5.;
The 2.4th step: the sub-piece P ' of old checking data that upgrades in the corresponding verification bar of disk is the sub-piece P of new checking data.
Carry out following operation steps and generate new checking data Q:
The 3.1st step: from the new data buffer zone, take out 2 data sub-block, be split as the data sub-block D that belongs to the 2nd group of (is example with the 2nd group) interior 2 data blocks
1, D
2After, multiply by (Jia Luohua takes advantage of) corresponding coefficient A respectively
2 * 2-2+1Be A
3, A
2 * 2-2+2Be A
4, long-pending A
3* D
1, A
4* D
2Be sent to the XOR device;
The 3.2nd step: the sub-piece D ' of legacy data of correspondence in the data pre-head district, 2 Geju City
1, D '
2, multiply by (Jia Luohua takes advantage of) corresponding coefficient A respectively
3, A
4, long-pending A
3* D '
1, A
4* D '
2Send the XOR device, with long-pending A
3* D
1, A
4* D
2Carry out XOR, generate data sub-block
The 3.3rd step: data sub-block
is sent to the XOR device; Old checking data reads to distinguish the sub-piece Q ' of old checking data corresponding in 2 in advance; Be sent to the XOR device; Generate the sub-piece Q of new checking data with
XOR;
is temporary to new checking data buffer zone 2,1≤i≤2;
The 3.4th step: the sub-piece Q ' of old checking data that upgrades in the corresponding verification bar is the sub-piece Q of new checking data.
The disk that present embodiment adopts is Seagate Cheetah 10K, and the continuous tape reading of its monolithic disk is wide to be 85MB/s, and writing bandwidth continuously is 84MB/s.The bandwidth of writing continuously that can get this disk array is about 84MB/s, and the wide 170MB/s of being about that reads tape continuously writes when can satisfy the high definition video steaming of 42 road 2MB/s, reads in the time of 85 road such video flowings.
Break down as disk, in the time of can't carrying out the normal data read-write, need to activate all disks entering ready states in the array, suspend and use said disk duty control method.
When any two disks broke down simultaneously in the array, disposal route was identical, might as well break down simultaneously with disk 1, disk 2 to be example, described.
Suppose that disk 1, disk 2 in the array break down simultaneously, show as that the 1st, 2 storage block on each band breaks down in the array, the fault type of each band storage block is following:
1st, on 2,3 bands, 2 data blocks break down simultaneously, and disposal route is following:
During read operation, utilize, generate the data of being read on the fault data piece with other data block of band and the check block P and the Q computing of this band; After changing failed disk, with identical method recovered failure data block.
During write operation,, write respectively on the P and Q of this band, and write down the check block that writes writing two data on the fault data piece; After changing disk, carry out reduction of data and regeneration checking data.Reduction of data promptly is moved back into the origin-location to the data block that is deposited with in the check block, and the regeneration checking data is the check block that writes that regenerates record.
4th, on 6 bands, 1 data block and 1 check block break down simultaneously, and disposal route is following:
During read operation, utilize other data block and another check block computing of this band, generate the data of being read on this fault data piece; After changing fault data piece disk, with identical method recovered failure data block.
During write operation, write data is write on another check block of this band, and write down the check block that writes, behind the replacing fault data piece disk, be moved back into the origin-location to the data block that is deposited with in the check block, and regenerate the check block that writes of record.
After the fault data piece recovers, and after changing fault check block disk, regenerate the fault check block.
On the 5th band, 2 check blocks break down simultaneously, and disposal route is following:
Can continue to read and write the data on this strip data piece, but be in irredundant data check state this moment; After changing disk, regenerate the data on the fault check block.
When any 1 disk broke down in the array, disposal route was identical, and might as well break down with disk 3 is example, describes.When the disk 3 in the array breaks down, show as that the 3rd storage block on each band breaks down in the array, the fault type of each band storage block is following:
1st, on 2,5,6 bands, 1 data block breaks down, and disposal route is following:
During read operation, utilize, generate the data of being read on the fault data piece with other data block of band and the check block P computing of this band; After changing failed disk, with identical method recovered failure data block.
During write operation,, write on the P of this band, and write down the check block that writes the data on the Write fault data block; After changing disk, be moved back into the origin-location to the data block that is deposited with among the check block P, and regenerate the check block that writes of record.
3rd, on 4 bands, 1 check block breaks down, and disposal route is following:
Continue the data on this strip data of read-write piece; After changing disk, regenerate the data on the check block.
After changing normal condition over to, launch said disk duty control method once more the duty of disk array SVE-RAID 6 is controlled.
The above only is a preferred implementation of the present invention; Should be understood that; For those skilled in the art, under the prerequisite that does not break away from the principle of the invention, can also make some improvement; Perhaps part technical characterictic wherein is equal to replacement, these improvement and replacement also should be regarded as protection scope of the present invention.
Claims (6)
1. the construction method of two disk tolerance vertical-type grouping concurrent access disk arrays is characterized in that, may further comprise the steps:
One, disk array is carried out the division and the layout of band, data block, check block and data sub-block, concrete grammar is: comprise in the scene of N disk at disk array, wherein N >=4 and N are positive integer; N disk is configured to the disk matrix of 1 row * N row; Whole disks are carried out band divide, be divided into N band and, N storage block arranged on each band for each band serial number; Wherein 1 is the P check block; 1 is the Q check block, and N-2 is data block in addition, and check block P carries out XOR by the data block of the N-2 in the same strap and draws; Check block Q carries out the galois field multiplication by the data block of the N-2 in the same strap and additive operation draws, and the generation formula of P, Q is following:
P=D
0⊕D
1⊕…⊕D
N-2(1)
Q=(A
0*D
0)⊕(A
1*D
1)⊕…⊕(A
N-2*D
N-2)(2)
D wherein
1, D
2..., D
N-2Be the data block of the N-2 in the same band, ⊕ representes XOR, A
1, A
2..., A
N-2Be the galois field multiplication coefficient of inequality, * representes the galois field multiplication;
(wherein i representes the sequence number of this storage block place band for i, a j) storage block among the two disk tolerance vertical-type grouping concurrent access disk array SVE-RAID 6 of expression with X; J representes to belong to the sequence number of disk, and (i j) is positioned on the disk j storage block X; 1≤i, j≤N; Check block P in the i band representes with P (i), with storage block X (i j) has following relation:
Check block Q in the i band representes with Q (i), with storage block X (i j) has following relation:
Q(i)=X(i,N+1-i)(4)
Other storage block is a data block, with D (wherein u representes the sequence number of data block place band for u, v) expression, when 1≤u≤N, v represent to ignore check block P and Q, the sequence number of the data block of this data block in affiliated band, sequence number is since 1,1≤v≤N-2; Data block D (u, v) the corresponding relation with storage block satisfies formula 5:
In order to improve continuous memory bandwidth; And obtain the different successive memory bandwidth, to satisfy the demand of different memory rates, the data block of the N-2 on each band is divided into the p group; P >=2 and p are positive integer; Every group comprises q data block, and q >=1 and q are positive integer, and the q value is confirmed the demand of continuous bandwidth according to the actual storage task; And relation: p * q=N-2 below satisfying;
Each data block is divided into K equal-sized data sub-block; K >=2 and K are positive integer, are followed successively by the sub-block number of total data, the data sub-block that group number is identical according to group number then; According to the band serial number, that is: the 1st the sub-block number of data is 1 in the 1st data block of the 1st group of first band; Be that the 1st the sub-block number of data is 2 in the 2nd data block in the 1st group of first band; Be that the 1st the sub-block number of data is q in q the data block in the 1st group of first band; Be that the 2nd the sub-block number of data is q+1 in the 1st data block in the 1st group of first band then; Be that the 2nd the sub-block number of data is q+2 in the 2nd data block of the 1st group of first band; Be that the 2nd the sub-block number of data is 2 * q in q the data block in the 1st group of first band; By that analogy, be that K the sub-block number of data is (K-1) * q+1 in the 1st data block in the 1st group of first band; Be that K the sub-block number of data is (K-1) * q+2 in the 2nd data block of the 1st group of first band; Be that K the sub-block number of data is K * q in q the data block in the 1st group of first band;
Be that the 1st the sub-block number of data is K * q+1 in the 1st data block of the 1st group of second band then; Be that the 1st the sub-block number of data is K * q+2 in the 2nd data block of the 1st group of second band; Be that the 1st the sub-block number of data is (K+1) * q in q the data block of the 1st group of second band; Be that the 2nd the sub-block number of data is (K+1) * q+1 in the 1st data block of the 1st group of second band then; Be that the 2nd the sub-block number of data is (K+1) * q+2 in the 2nd data block of the 1st group of second band; Be that the 2nd the sub-block number of data is (K+2) * q in q the data block of the 1st group of second band; By that analogy, be that K the sub-block number of data is 2 * K * q-q+1 in the 1st data block of the 1st group of second band; Be that K the sub-block number of data is 2 * K * q-q+2 in the 2nd data block of the 1st group of second band; Be that K the sub-block number of data is 2 * K * q in q the data block of the 1st group of second band;
In like manner be that the 1st the sub-block number of data is (i-1) * K * q+1 in the 1st data block of the 1st group of i band, 1≤i≤N; Be that the 1st the sub-block number of data is (i-1) * K * q+2 in the 2nd data block of the 1st group of i band; Be that the 1st the sub-block number of data is (i-1) * K * q+q in q the data block of the 1st group of i band; Be that the 2nd the sub-block number of data is (i-1) * K * q+q+1 in the 1st data block of the 1st group of i band then; Be that the 2nd the sub-block number of data is (i-1) * K * q+q+2 in the 2nd data block of the 1st group of i band; Be that the 2nd the sub-block number of data is (i-1) * K * q+q * 2 in q the data block of the 1st group of i band; By that analogy, be that K the sub-block number of data is i * K * q-q+1 in the 1st data block of the 1st group of i band; Be that K the sub-block number of data is i * K * q-q+2 in the 2nd data block of the 1st group of i band; Be that K the sub-block number of data is i * K * q in q the data block of the 1st group of i band;
By that analogy; According to identical numbering principle; For the 2nd group of data sub-block of each band number, for the 3rd group of data sub-block of each band number ..., number for the p of each band group data sub-block, number adjacent data sub-block, its logical address is adjacent;
The band sequence number is adjacent, the identical group of group sequence number, and its logical address is adjacent, the adjacent group of group sequence number in the N band and first band, and logical address is adjacent, and the group that logical address is adjacent is called adjacent set;
Each check block P, check block Q also are divided into K equal-sized syndrome piece; Syndrome piece among check block P, the Q is drawn through XOR, galois field multiplication and addition respectively by N-2 identical data sub-block of deviation post in the band of place; The syndrome piece of this N-2 data sub-block and generation thereof constitutes a verification bar;
Two, according to following strategy, the scheduling disk carries out the duty conversion:
For each disk is provided with a semaphore, represent with Sem i, 1≤i≤N, the initial value of signalization amount Sem i is 0;
The 1st step: according to the logical address of visit data, represent, calculate affiliated band u of this visit data place data block and affiliated group, represent with r with LBA; Through type 6 obtains:
Wherein, organize the big or small sum of each data block in the big or small expression group, MOD representes modular arithmetic, adds 1 expression band numbering and group # all since 1;
Then, confirm q data block D (u, r * q-q+1), D (u in the u band r group with formula 5; R * q-q+2) ..., D (u; The disk at the place of r * q-q+q) confirms that with formula 3 disk at this band check block P (u) place be (N-u) or N, and the disk that belongs to formula 4 definite this band check block Q (u) is N+1-u; Above disk is handled respectively:, then be dispatched to ready state if disk is in halted state;
Band sequence number u and group sequence number r are respectively applied for the band sequence number and group sequence number that the record current accessed is operated;
The 2nd step: q data block place disk in u band r group, u band check block P (u), when Q (u) place disk is in ready state, begin this q+2 of concurrent access disk, the disk that quilt is visited transfers running status to by ready state;
The 3rd step: if this visit finishes in u band r group; After then this visit finishes; Disk transfers ready state to by running status under check block P (u) under q the data block in the u band r group in disk, the u band, the Q (u), withdraws from this scheduling then; Otherwise t confirms the band numbering and the group # of adjacent set constantly before u band r group access finishes, and wherein t is that disk transfers the needed time of ready state to by halted state, and the band numbering representes that with I group # is represented with R; And check block P (I), check block Q (I) place disk in disk, the I band under q data block in the I band R group handled respectively: be in halted state as if disk, then be dispatched to ready state; If disk is in running status, then the semaphore of this disk adds 1; After u band r group access finishes; Disk under check block P (u) in disk, the u band under q the data block in the u band r group, the Q (u) is handled respectively: the semaphore of disk is subtracted 1; If the semaphore of this disk is less than 0; Then transfer this disk to halted state by running status, and the semaphore of resetting this disk is 0; Otherwise, transfer this disk to ready state by running status;
The 4th step: be updated to I to the u value of record current accessed place band sequence number, the value of the r that writes down current accessed place group sequence number is updated to R, repeat the 2nd and go on foot the 4th and go on foot.
2. the construction method of a kind of pair of disk tolerance vertical-type grouping concurrent access disk array as claimed in claim 1 is characterized in that:
When carrying out continuous read operation, check block place disk quits work.
3. the construction method of a kind of pair of disk tolerance vertical-type grouping concurrent access disk array as claimed in claim 1 is characterized in that:
Said method further comprises: the optimization processing method of write operation is specially:
Set up a new data buffer zone, the buffer memory application program writes the new data of disk array; Two new checking data buffer zones, two types of new checking datas of buffer memory generation respectively; Q legacy data that disk is read in advance from array deposited in data pre-head district, q Geju City respectively; Two Geju City checking datas are read the district in advance, deposit two types of old checking datas from array, reading in advance respectively; The sub-piece of legacy data that the sub-piece of each new data is corresponding, two kinds of sub-pieces of old checking data of identical verification bar have been read corresponding reading in advance in advance and have been distinguished before write operation;
Carry out following operation steps and generate new checking data P:
The 2.1st step: when the new data buffer zone is not sky, from the new data buffer zone, take out q data sub-block at every turn, be split as the data sub-block D that belongs to the q piece disk in the group r
1, D
2..., D
qAfter, be written in parallel to q piece disk, be sent to the XOR device simultaneously;
The 2.2nd step: the sub-piece of legacy data of correspondence in the data pre-head district, q Geju City
Be sent to the XOR device, carry out XOR, generate data sub-block X with the data sub-block that the 2.1st step was sent here
1, X
2..., X
q, wherein
1≤i≤q;
The 2.3rd step: data sub-block X
1, X
2..., X
q, being sent to the XOR device, old checking data is read to distinguish the sub-piece P ' of old checking data corresponding in 1 in advance and is sent to the XOR device, and XOR generates the sub-piece P of new checking data, P=X
1⊕ X
2⊕ ... ⊕ X
q⊕ P ', and temporary to new checking data buffer zone 1;
The 2.4th step: the sub-piece P ' of old checking data that upgrades in the corresponding verification bar is the sub-piece P of new checking data;
Carry out following operation steps and generate new checking data Q:
The 3.1st step: from the new data buffer zone, take out q data sub-block, be split as the data sub-block D that belongs to the q piece disk in the group r
1, D
2..., D
qAfter, respectively with corresponding coefficient A
R * q-q+iMake the Jia Luohua multiplication, long-pending A
R * q-q+i* D
iBe sent to the XOR device, 1≤i≤q;
The 3.2nd step: the sub-piece of legacy data of correspondence in the data pre-head district, q Geju City
Respectively with corresponding coefficient A
R * q-q+iMake the Jia Luohua multiplication, long-pending
Be sent to the XOR device, with the 3.1st long-pending A that sends here of step
R * q-q+i* D
iCarry out XOR, generate data sub-block
1≤i≤q;
The 3.3rd step: the sub-piece of variance data
is sent to the XOR device; Old checking data reads to distinguish the sub-piece Q ' of old checking data corresponding in 2 in advance; Be sent to the XOR device; Generate the sub-piece Q of new checking data with
XOR;
is temporary to new checking data buffer zone 2,1≤i≤q;
The 3.4th step: the sub-piece Q ' of old checking data that upgrades in the corresponding verification bar is the sub-piece Q of new checking data;
If writing data initial logical address LBA can not be divided exactly by the size of q data sub-block; And during surplus k data sub-block, k is a positive integer, or in the new data buffering area when k of difference and not enough q data sub-block; Then only get this k data sub-block and write corresponding disk; And utilize same verification bar corresponding legacy data and old checking data, and generating new checking data based on above method and get final product, all the other q-k disk does not have writing task on the same group;
Owing to can directly read sub-piece of legacy data and the sub-piece of old checking data from reading the district in advance; Disk just can be carried out write operation continuously; After having only data sub-block in reading in advance to distinguish to use up, just once read in advance to fill up and read the district in advance, through reading in advance; The read operation of a plurality of dispersions is converted into a continuous read operation, and the tracking that has reduced magnetic head moves number of times.
4. the construction method of a kind of pair of disk tolerance vertical-type grouping concurrent access disk array as claimed in claim 3 is characterized in that:
Legacy data is read the district in advance and old checking data reads to distinguish equal and opposite in direction in advance, and much larger than the new data buffer zone.
5. the construction method of a kind of pair of disk tolerance vertical-type grouping concurrent access disk array as claimed in claim 3 is characterized in that:
New checking data buffer zone and new data buffer size equate.
6. the construction method of a kind of pair of disk tolerance vertical-type grouping concurrent access disk array as claimed in claim 1 is characterized in that:
Said method further comprises: the disposal route of failed disk being carried out reading and writing data; Be specially:
Break down as disk, in the time of can't carrying out the normal data read-write, activate all disks entering ready states among two disk tolerance vertical-type grouping concurrent access disk array SVE-RAID 6 this moment, suspend and use said disk duty control method;
With the band is unit; Disposal route when storage block breaks down in the band is discussed, because disk failure shows as disk array and is distributed in all the storage block faults on this failed disk; To the processing of disk failure, can be through handling the realization of the storage block fault on each band in the array respectively;
When the storage block in band breaks down, can be divided into following 3 kinds of situation and handle:
Situation 1: if only check block breaks down, the data on the piece that continues to read and write data, but be in i.e. i.e. 2 the check block faults of 1 check block fault or irredundant data check state of substance redundant data verification state this moment; After changing fault check block place disk, regenerate check block;
Situation 2:, the reading and writing operation is handled respectively if only 1 or 2 data blocks break down:
Read operation: utilize with other data block of band when 1 data block fault with the check block P or the Q computing of this band, or when 2 data block faults and P and Q computing, generate the data that quilt is read on this data block; After changing fault data piece place disk, use identical method recovered failure data block;
Write operation: 2 kinds of processing modes are arranged:
The 1st kind of processing mode: directly write data is write on the P or Q of this band when 1 data block fault; Or when 2 data block faults, write on P and the Q; And write down the check block that writes, behind the disk of replacing fault data piece place, carry out reduction of data and regeneration checking data; Reduction of data promptly is moved back into the origin-location to the data block that is deposited with in the check block, and the regeneration checking data is the check block that writes that regenerates record;
The 2nd kind of processing mode: use subsequent use dish to keep in to write the data on the failed disk, after changing failed disk, be moved back into the origin-location to the data block that is deposited with in the subsequent use dish again;
Situation 3: if 1 data block and 1 check block break down, the disposal route of fault data piece is identical when only 1 data block breaks down, and just is in substance redundant data guard mode this moment; After the fault data piece recovers,, identical when only 1 check block breaks down for the disposal route of fault check block;
After changing normal condition over to, launch said disk duty control method once more the duty of two disk tolerance vertical-type grouping concurrent access disk array SVE-RAID 6 is controlled.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2010105756252A CN102023820B (en) | 2010-12-01 | 2010-12-01 | Method for constructing double-disk fault-tolerant vertical packet parallel access disk array |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2010105756252A CN102023820B (en) | 2010-12-01 | 2010-12-01 | Method for constructing double-disk fault-tolerant vertical packet parallel access disk array |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102023820A CN102023820A (en) | 2011-04-20 |
CN102023820B true CN102023820B (en) | 2012-08-22 |
Family
ID=43865153
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2010105756252A Active CN102023820B (en) | 2010-12-01 | 2010-12-01 | Method for constructing double-disk fault-tolerant vertical packet parallel access disk array |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102023820B (en) |
Families Citing this family (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103034559B (en) * | 2012-12-18 | 2016-06-08 | 无锡众志和达数据计算股份有限公司 | PQ inspection module and the method for inspection based on RDMA architecture design |
US9235471B2 (en) * | 2013-04-29 | 2016-01-12 | Netapp, Inc. | Background initialization for protection information enabled storage volumes |
CN105260261B (en) * | 2015-11-19 | 2018-06-15 | 四川神琥科技有限公司 | A kind of mail restoration methods |
KR102572357B1 (en) * | 2016-02-03 | 2023-08-29 | 삼성전자주식회사 | Raid-6 data storage device and data processing system having the same |
CN110659152B (en) * | 2018-06-28 | 2021-04-09 | 华为技术有限公司 | Data processing method and equipment |
CN109189331B (en) * | 2018-08-09 | 2021-08-17 | 河北工业大学 | A dynamic data layout method |
CN109491613B (en) * | 2018-11-13 | 2021-11-02 | 深圳龙岗智能视听研究院 | Continuous data protection storage system and storage method using same |
CN111857552A (en) * | 2019-04-30 | 2020-10-30 | 伊姆西Ip控股有限责任公司 | Storage management method, electronic device and computer program product |
CN111258807B (en) * | 2020-01-16 | 2023-05-09 | 四川效率源科技有限责任公司 | Data recovery method for raid6 missing disk in logical volume management |
CN113297000B (en) * | 2021-05-20 | 2022-12-23 | 山东云海国创云计算装备产业创新中心有限公司 | RAID (redundant array of independent disks) coding circuit and coding method |
CN116909488B (en) * | 2023-09-11 | 2024-01-26 | 苏州浪潮智能科技有限公司 | Method, system, medium, equipment and disk array card for accessing disk array |
CN117149098B (en) * | 2023-10-31 | 2024-02-06 | 苏州元脑智能科技有限公司 | Stripe unit distribution method and device, computer equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1692335A (en) * | 2002-11-01 | 2005-11-02 | 国际商业机器公司 | Method and means for tolerating multiple dependent or arbitrary double disk failures in a disk array |
CN101055511A (en) * | 2007-05-16 | 2007-10-17 | 华为技术有限公司 | Memory array system and its data operation method |
EP1936486A2 (en) * | 2006-11-28 | 2008-06-25 | Hitachi, Ltd. | Storage system comprising power saving function |
CN101349979A (en) * | 2008-09-05 | 2009-01-21 | 清华大学 | Double-head user data update method for large-scale fault-tolerant disk array storage system |
-
2010
- 2010-12-01 CN CN2010105756252A patent/CN102023820B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1692335A (en) * | 2002-11-01 | 2005-11-02 | 国际商业机器公司 | Method and means for tolerating multiple dependent or arbitrary double disk failures in a disk array |
EP1936486A2 (en) * | 2006-11-28 | 2008-06-25 | Hitachi, Ltd. | Storage system comprising power saving function |
CN101055511A (en) * | 2007-05-16 | 2007-10-17 | 华为技术有限公司 | Memory array system and its data operation method |
CN101349979A (en) * | 2008-09-05 | 2009-01-21 | 清华大学 | Double-head user data update method for large-scale fault-tolerant disk array storage system |
Also Published As
Publication number | Publication date |
---|---|
CN102023820A (en) | 2011-04-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102023820B (en) | Method for constructing double-disk fault-tolerant vertical packet parallel access disk array | |
CN102023819B (en) | Method for constructing double-disk fault tolerance horizontal grouping and parallel access disk array | |
CN102033716B (en) | Method for constructing energy-saving type disc array with double discs for fault tolerance | |
CN101976175B (en) | Method for constructing disk array by horizontal grouping parallel concentrated verification | |
CN101976174B (en) | Method for constructing energy-saving disk array of vertical configuration distribution check | |
CN101976177B (en) | Method for constructing vertical grouped disk array capable of being subject to parallel centralized check | |
CN101976178B (en) | Method for constructing vertically-arranged and centrally-inspected energy-saving disk arrays | |
CN101976176B (en) | Method for constructing disk array of horizontal grouping parallel distribution check | |
CN103049222B (en) | A kind of RAID5 writes IO optimized treatment method | |
US7904647B2 (en) | System for optimizing the performance and reliability of a storage controller cache offload circuit | |
CN104035830A (en) | Method and device for recovering data | |
CN101622606A (en) | Be used for device, system and method as the solid-state memory of the high-speed cache of high capacity, nonvolatile memory | |
CN102147713B (en) | Method and device for managing network storage system | |
CN101470640B (en) | Method and system for providing deferred maintenance on storage subsystems | |
CN101715575A (en) | Adopt device, the system and method for data pipe management data | |
CN102662607A (en) | RAID6 level mixed disk array, and method for accelerating performance and improving reliability | |
CN105930097A (en) | Distributed verification redundant array of independent disks capable of eliminating local parallel read-modify-write operation | |
WO2016137402A1 (en) | Data stripping, allocation and reconstruction | |
CN101976179B (en) | Construction method of vertical-type grouping parallel-distributed and checked disk array | |
CN102177496A (en) | System and method for transferring data between different RAID data storage types for current data and replay data | |
CN102164165B (en) | Management method and device for network storage system | |
CN101625586A (en) | Method, equipment and computer for managing energy conservation of storage device | |
CN105930099B (en) | The fault-tolerant disk array of double plate of small write operation in a kind of elimination local parallel | |
JP3669103B2 (en) | Storage device and storage device subsystem | |
CN105930098B (en) | The concentration of small write operation verifies formula disk array in a kind of elimination local parallel |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |