CN102023820B - Method for constructing double-disk fault-tolerant vertical packet parallel access disk array - Google Patents

Method for constructing double-disk fault-tolerant vertical packet parallel access disk array Download PDF

Info

Publication number
CN102023820B
CN102023820B CN2010105756252A CN201010575625A CN102023820B CN 102023820 B CN102023820 B CN 102023820B CN 2010105756252 A CN2010105756252 A CN 2010105756252A CN 201010575625 A CN201010575625 A CN 201010575625A CN 102023820 B CN102023820 B CN 102023820B
Authority
CN
China
Prior art keywords
data
block
disk
band
sub
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN2010105756252A
Other languages
Chinese (zh)
Other versions
CN102023820A (en
Inventor
孙志卓
谭毓安
周泽湘
王道邦
李艳国
章珉
武志民
范利东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING TOYOU FEIJI ELECTRONICS Co Ltd
Original Assignee
BEIJING TOYOU FEIJI ELECTRONICS Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING TOYOU FEIJI ELECTRONICS Co Ltd filed Critical BEIJING TOYOU FEIJI ELECTRONICS Co Ltd
Priority to CN2010105756252A priority Critical patent/CN102023820B/en
Publication of CN102023820A publication Critical patent/CN102023820A/en
Application granted granted Critical
Publication of CN102023820B publication Critical patent/CN102023820B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention relates to a method for constructing a double-disk fault-tolerant vertical packet parallel access disk array SVE-RAID 6, which is applicable to continuous data storage and belongs to the technical field of independent disk redundant arrays. In the invention, a double-disk fault-tolerant vertical packet parallel access disk array SVE-RAID 6 is designed according to the characteristics of continuous data storage mainly through the following steps: division and layout of strips, data blocks, check blocks and data sub-blocks of the disk array; disk working state conversion scheduling policy; optimization processing method during data wiring; and failed disk processing method and the like. The disk array SVE-RAID 6 not only has the advantages that the redundant data protection mechanism is excellent, the power consumption can be obviously reduced and the service life of the disk can be prolonged.

Description

The construction method of a kind of pair of disk tolerance vertical-type grouping concurrent access disk array
Technical field
The present invention relates to a kind of RAID (Redundant Arrays of IndependentDisks; RAID) construction method; Be particularly related to the construction method of a kind of pair of disk tolerance vertical-type grouping concurrent access disk array; Be applicable to the continuous data storage, belong to the RAID technical field.
Background technology
Field of storage in modern times; For the reliability that improves the storage data and the input, the output performance that improve storage system; People have designed the several data storage scheme; The normally various types of RAIDs of these data storage scheme (Redundant Arrays of Independent Disks, RAID).Through using specific hardware or software, RAID joins together a plurality of physical storage devices such as disk, forms a unified logical memory device.
Make an explanation in the face of technical term commonly used among the RAID down:
Band: be called Stripe again; Being the set of the relevant piecemeal in position on the different disk of disk array, is the unit that organizes stick on the different disk.
Striping: be called Striping again; Be meant the data block that is divided into one section continuous data identical size, be written to every segment data respectively the method on the different disk of disk array.
Disk mirroring is meant on copy source data to or the more disk.
Error correction is meant utilizes certain computing, like XOR, generates and preserve redundant data, redundant data capable of using, the data of losing or makeing mistakes on the regeneration disk.
RAID relatively more commonly used has RAID 0, RAID 1, RAID 5, RAID 6, RAID 10 etc.Wherein RAID 0 does not have redundant ability, and RAID 1 has just done mirror image to disk, and other 3 kinds of arrays are made up of a plurality of disks respectively, and they are with the disk write data of mode in array of band, and checking data leaves on each disk in the array.Each band of RAID 5 contains 1 check block, supports to damage one of them disk arbitrarily, come data reconstruction through the check block on other disk; Each band of RAID 6 contains 2 check blocks, supports to damage arbitrarily wherein two disks, comes restore data through the check block on other disk; RAID 10 carries out mirror image to disk groups earlier; Again disk is carried out striping; So it does not contain parity block; After a disk failures, pass through its corresponding mirror drive restore data, 50% disk failures under maximum permission diverse locations under the perfect condition, the situation of worst is to damage simultaneously with a pair of mirrored disk.
6 couples of the RAID of P+Q mode carry out XOR with the data block in the band, generate the checking data P of this band, utilize the multiplication and the addition (XOR) of galois field (GF), generate the checking data Q of this band, suppose that the data block in the band is D 0And D 1, available formula 1, formula 2 generate check block P and Q respectively:
P = D 0 ⊕ D 1 - - - ( 1 )
Q = ( A 0 * D 0 ) ⊕ ( A 1 * D 1 ) - - - ( 2 )
Wherein
Figure BSA00000375316900023
The expression XOR, * representes galois field multiplication, A 0, A 1Galois field multiplication coefficient for inequality.
When any two disks break down in the array, can through type 1 with formula 2, realize that data recover, be implemented as follows:
1, if check block P, Q place disk breaks down, only need by with the data block in the band, respectively according to generating P, the formula 1 of Q, formula 2, calculate P, Q gets final product;
2, if check block Q and arbitrary data block place disk break down, might as well establish this data block is D1, at first utilizes check block P to recover D 1, derive by formula 1:
D 1 = P ⊕ D 0 (XOR character:
Figure BSA00000375316900025
Figure BSA00000375316900026
) D 1After the recovery, utilize formula 2 to recover check block Q again;
3, if check block P and arbitrary data block place disk break down, might as well establish this data block is D 1, at first utilize check block Q to recover D 1, derive by formula 2:
( A 1 * D 1 ) = ( A 0 * D 0 ) ⊕ Q
So recover D 1As follows:
D 1 = ( A 1 * D 1 ) / A 1 = [ ( A 0 * D 0 ) ⊕ Q ] / A 1
Wherein/and be the Jia Luohua division, D 1After success recovers, utilize formula 1 to recover proof mass P again;
4, if any two data blocks place disk breaks down, might as well establish these two data blocks is D 0And D 1, get by formula 1 D 1Being updated to formula 2 gets:
Q = ( A 0 * D 0 ) ⊕ [ A 1 * ( P ⊕ D 0 ) ] = [ ( A 0 ⊕ A 1 ) * D 0 ] ⊕ ( A 1 * P ) (* is right
Figure BSA000003753169000211
Satisfy law of distribution) this formula and D 0Relevant, and do not comprise D 1, utilize this formula to find the solution D 0As follows:
( A 0 ⊕ A 1 ) * D 0 = ( A 1 * P ) ⊕ Q
D 0 = [ ( A 1 * P ) ⊕ Q ] / ( A 0 ⊕ A 1 )
D 0After the recovery, utilize formula 1 to recover D again 1
For data block and check block P and the Q in 6, one bands of RAID of P+Q mode, disperse to be stored in the array on the different storage devices; And data block is bigger; Guaranteed that most of I/O operation can both drop in the data block of a disk, checking data also is a distributed store, has increased the concurrent possibility of a plurality of I/O so greatly; And because a plurality of disks are united the redundant data of using two disk sizes; So the required redundant data amount of unit valid data is less, especially when comprising a plurality of disk among the RAID 6, this advantage is more obvious.Like the RAID 6 that 10 disks constitute, redundant data accounts for 2 disk sizes, for all storing 1/5 of data.Because RAID 6 has two disk tolerance characteristics, and has good comprehensive performances, so very high to the data security requirement, also higher to performance requirement simultaneously field has obtained widespread use.
The disk duty comprises: stop, operation, ready three kinds of states.
During the disk halted state, disk spindle stops the rotation, magnetic head stops tracking;
During the disk running status, disk is being carried out read-write operation, comprises rotating shaft rotation, magnetic head tracking;
During the disk ready state, expression will be read and write, and disk spindle rotation this moment, magnetic head be tracking not.
Disk array commonly used in order to improve data access speed and to satisfy the concurrent demand of I/O, is dispatched all disk concurrent workings to satisfy high-performance usually.Yet; In the continuous data field of storage, in application such as video monitoring, VTL (VTL), continuous data protection (CDP), continuous data stream is written in the disk in proper order; What carry out is sequential storage; The address of promptly storing data is continuous, satisfies the principle of locality of storage space visit, does not therefore have the concurrent demand of I/O.And disk array commonly used; The RAID 6 that comprises the P+Q mode; In order to improve data access speed and to satisfy the concurrent demand of I/O; Disperse continuous data to store on the different disks, member's disk of RAID remains duty, and high I/O handling capacity that these disk concurrent workings are provided and I/O concurrency can not be utilized by the application of continuous data storage fully.In the continuous data storage was used, all member's disks were simultaneously in running order, not only waste a large amount of energy, and disk keeps continuous running status, has shortened the serviceable life of disk.
Along with the disk development of technology, the continuous readwrite bandwidth of monolithic disk is significantly improved, and stores for continuous data; If can when having the redundant data protection mechanism, make full use of the transmission bandwidth of monolithic disk, according to the demand of practical application; Make a part of disk work in the array; Accomplish store tasks, other idle disk scheduling to halted state, after the current memory block (disk marks off several big memory blocks) of the disk of duty is used up; Be dispatched to halted state, original idle disk then gets into duty.Disk spindle and magnetic head motor quit work, and not only can practice thrift mass energy, and because disk needn't keep continuous operation, have prolonged the serviceable life of disk.
The energy consumption of disk under the different operating state seen table 1.
The energy consumption of table 1 disk under the different operating state
The disk duty Read-write operation Idling conditions Halted state
Average power consumption 11.5W 8.5W Be about 0W
Summary of the invention
The objective of the invention is to overcome the deficiency that prior art exists, satisfying under the prerequisite of setting memory property, propose a kind of construction method that is suitable for the energy-saving disk array of continuous data storage.
The invention provides the construction method of a kind of pair of disk tolerance vertical-type grouping concurrent access disk array, two disk tolerance vertical-type grouping concurrent access disk arrays abbreviate disk array SVE-RAID 6 as.
The objective of the invention is to realize through following technical proposals:
The construction method of a kind of disk array SVE-RAID 6 may further comprise the steps:
One, disk array is carried out the division and the layout of band, data block, check block and data sub-block, concrete grammar is: comprise in the scene of N disk at disk array, wherein N >=4 and N are positive integer; N disk is configured to the disk matrix of 1 row * N row; Whole disks are carried out band divide, be divided into N band and, N storage block arranged on each band for each band serial number; Wherein 1 is the P check block; 1 is the Q check block, and N-2 is data block in addition, and check block P carries out XOR by the data block of the N-2 in the same strap and draws; Check block Q carries out the galois field multiplication by the data block of the N-2 in the same strap and additive operation draws, and the generation formula of P, Q is following:
P = D 0 ⊕ D 1 ⊕ · · · ⊕ D N - 2 - - - ( 3 )
Q = ( A 0 * D 0 ) ⊕ ( A 1 * D 1 ) ⊕ · · · ⊕ ( A N - 2 * D N - 2 ) - - - ( 4 )
D wherein 1, D 2..., D N-2Be the data block of the N-2 in the same band, The expression XOR, A 1, A 2..., A N-2Be the galois field multiplication coefficient of inequality, * representes the galois field multiplication.
(wherein i representes the sequence number of this storage block place band for i, a j) storage block among the expression disk array SVE-RAID 6, and j representes to belong to the sequence number of disk, and (i j) is positioned on the disk j 1≤i, j≤N to storage block X with X; Check block P in the i band representes with P (i), with storage block X (i j) has following relation:
P ( i ) = X ( i , N - i , ) i ≠ N X ( N , N ) i = N - - - ( 5 )
Check block Q in the i band representes with Q (i), with storage block X (i j) has following relation:
Q(i)=X(i,N+1-i)(6)
Other storage block is a data block, and (wherein u representes the sequence number of data block place band for u, v) expression, and 1≤u≤N, v represent the sequence number (ignoring check block P and Q) of the data block of this data block in affiliated band, and sequence number is since 1,1≤v≤N-2 with D.Data block D (u, v) the corresponding relation with storage block satisfies formula 7:
D ( u , v ) = X ( u , v ) u + v < N X ( u , v + 2 ) u + v &GreaterEqual; N X ( u , v + 1 ) u = N And u ≠ N (7)
In order to improve continuous memory bandwidth; And acquisition different successive memory bandwidth; To satisfy the demand of different memory rates; N-2 on each a band data block is divided into p (p >=2 and p are positive integer) group, and every group comprises the individual data block of q (q >=1 and q are positive integer, and the q value is confirmed the demand of continuous bandwidth according to the actual storage task); And relation: p * q=N-2 below satisfying.
Each data block is divided into K equal-sized data sub-block, and K >=2 and K are positive integer, are followed successively by the sub-block number of total data according to group number then, and the data sub-block that group number is identical is according to the band serial number.That is: the 1st the sub-block number of data is 1 in the 1st data block of the 1st group of first band; Be that the 1st the sub-block number of data is 2 in the 2nd data block in the 1st group of first band; Be that the 1st the sub-block number of data is q in q the data block in the 1st group of first band; Be that the 2nd the sub-block number of data is q+1 in the 1st data block in the 1st group of first band then; Be that the 2nd the sub-block number of data is q+2 in the 2nd data block of the 1st group of first band; Be that the 2nd the sub-block number of data is 2 * q in q the data block in the 1st group of first band; By that analogy, be that K the sub-block number of data is (K-1) * q+1 in the 1st data block in the 1st group of first band; Be that K the sub-block number of data is (K-1) * q+2 in the 2nd data block of the 1st group of first band; Be that K the sub-block number of data is K * q in q the data block in the 1st group of first band;
Be that the 1st the sub-block number of data is K * q+1 in the 1st data block of the 1st group of second band then; Be that the 1st the sub-block number of data is K * q+2 in the 2nd data block of the 1st group of second band; Be that the 1st the sub-block number of data is (K+1) * q in q the data block of the 1st group of second band; Be that the 2nd the sub-block number of data is (K+1) * q+1 in the 1st data block of the 1st group of second band then; Be that the 2nd the sub-block number of data is (K+1) * q+2 in the 2nd data block of the 1st group of second band; Be that the 2nd the sub-block number of data is (K+2) * q in q the data block of the 1st group of second band; By that analogy, be that K the sub-block number of data is 2 * K * q-q+1 in the 1st data block of the 1st group of second band; Be that K the sub-block number of data is 2 * K * q-q+2 in the 2nd data block of the 1st group of second band; Be that K the sub-block number of data is 2 * K * q in q the data block of the 1st group of second band;
In like manner be that (the 1st the sub-block number of data is (i-1) * K * q+1 to i in the 1st data block that the band of 1≤i≤N) is the 1st group; Be that the 1st the sub-block number of data is (i-1) * K * q+2 in the 2nd data block of the 1st group of i band; Be that the 1st the sub-block number of data is (i-1) * K * q+q in q the data block of the 1st group of i band; Be that the 2nd the sub-block number of data is (i-1) * K * q+q+1 in the 1st data block of the 1st group of i band then; Be that the 2nd the sub-block number of data is (i-1) * K * q+q+2 in the 2nd data block of the 1st group of i band; Be that the 2nd the sub-block number of data is (i-1) * K * q+q * 2 in q the data block of the 1st group of i band; By that analogy, be that K the sub-block number of data is i * K * q-q+1 in the 1st data block of the 1st group of i band; Be that K the sub-block number of data is i * K * q-q+2 in the 2nd data block of the 1st group of i band; Be that K the sub-block number of data is i * K * q in q the data block of the 1st group of i band;
By that analogy, according to identical numbering principle, for the 2nd group of data sub-block of each band number, for the 3rd group of data sub-block of each band number ..., number for the p of each band group data sub-block.Number adjacent data sub-block, its logical address is adjacent.
The band sequence number is adjacent, the identical group of group sequence number, and its logical address is adjacent, the adjacent group of group sequence number in the N band and first band, and logical address is adjacent, and the group that logical address is adjacent is called adjacent set.
Each check block P, check block Q also are divided into K equal-sized syndrome piece; Syndrome piece among check block P, the Q is drawn through XOR, galois field multiplication and addition respectively by N-2 identical data sub-block of deviation post in the band of place; The syndrome piece of this N-2 data sub-block and generation thereof constitutes a verification bar.
Above-mentioned layout type, but q data block place disk in the concurrent access group makes store tasks be distributed in one group again, and being convenient to dispatch all the other does not so temporarily have the disk of task to enter into halted state, to reach the purpose energy-conservation and loss of reduction disk.
Two, according to following strategy, the scheduling disk carries out the duty conversion:
For being provided with a semaphore, each disk (representes that with Semi 1≤i≤N), the initial value of signalization amount Semi is 0.
The 1st step:, calculate affiliated band u of this data place data block and affiliated group (representing) with r according to the logical address (representing) of visit data with LBA.Through type 8 obtains:
Figure BSA00000375316900071
Wherein, organize the big or small sum of each data block in the big or small expression group, MOD representes modular arithmetic, adds 1 expression band numbering and group # all since 1.
Then, confirm q data block D (u, r * q-q+1), D (u in the u band r group with formula 7; R * q-q+2) ..., D (u; The disk at the place of r * q-q+q) confirms that with formula 5 disk at this band check block P (u) place is N-u (or N), and the disk that belongs to formula 6 definite this band check block Q (u) is N+1-u; Above disk is handled respectively:, then be dispatched to ready state if disk is in halted state.
Band sequence number u and group sequence number r are respectively applied for the band sequence number and group sequence number that the record current accessed is operated.
The 2nd step: q data block place disk in u band r group, u band check block P (u), when Q (u) place disk is in ready state, begin this q+2 of concurrent access disk, the disk that quilt is visited transfers running status to by ready state.
The 3rd step: if this visit finishes in u band r group; After then this visit finishes; Disk transfers ready state to by running status under check block P (u) under q the data block in the u band r group in disk, the u band, the Q (u), withdraws from this scheduling then; Otherwise; The t moment (t is that disk transfers the needed time of ready state to by halted state) before u band r group access finishes; Confirm band numbering (representing) and the group # (representing) of adjacent set with R with I; And check block P (I), check block Q (I) place disk in disk, the I band under q data block in the I band R group handled respectively: be in halted state as if disk, then be dispatched to ready state; If disk is in running status, then the semaphore of this disk adds 1.After u band r group access finishes; Disk under check block P (u) in disk, the u band under q the data block in the u band r group, the Q (u) is handled respectively: the semaphore of disk is subtracted 1; If the semaphore of this disk is less than 0; Then transfer this disk to halted state by running status, and the semaphore of resetting this disk is 0; Otherwise, transfer this disk to ready state by running status.
The 4th step: be updated to I to the u value of record current accessed place band sequence number, the value of the r that writes down current accessed place group sequence number is updated to R, repeat the 2nd and go on foot the 4th and go on foot.
According to above strategy; When carrying out continuous read-write operation, in the overwhelming majority times, have only one group of data block place disk and 2 check blocks place disks (q+2 disk) to be in running status; All the other temporarily do not have the disk of task to be scheduled for halted state, to reach purpose of energy saving.
Preferably, when carrying out continuous read operation, check block P, Q place disk quit work.
For each write operation owing to only upgrade the q piece disk at one group of data block place and with the data on the check block place disk in the band, execution be to read to rewrite, promptly be written in parallel to r (q the new data D of group of 1≤r≤p) 1, D 2..., D qThe time, need to generate new checking data P, Q on the affiliated verification bar of this group new data with formula 9,10:
X i = D i &CirclePlus; D i &prime; P = X 1 &CirclePlus; X 2 &CirclePlus; &CenterDot; &CenterDot; &CenterDot; &CirclePlus; X q &CirclePlus; P &prime; 1≤i≤q (9)
X i * = ( A r &times; q - q + i * D i ) &CirclePlus; ( A r &times; q - q + i * D i &prime; ) Q = X 1 * &CirclePlus; X 2 * &CirclePlus; &CenterDot; &CenterDot; &CenterDot; &CirclePlus; X q * &CirclePlus; Q &prime; 1≤i≤q(10)
Wherein, D ' iBe new data D in the disk iLegacy data on the writing position, P ', Q ' are legacy data D ' iOld checking data on the place verification bar also is the old checking data on new checking data P, the Q writing position, A R * q-q+iBe data D iThe Jia Luohua multiplication coefficient, subscript r * q-q+i representes the data D in the r group iNumbering in affiliated verification bar in all data sub-block.
In order to generate new checking data P, Q, except carrying out outside the XOR, Jia Luohua multiplication (when hardware is realized time very short, can ignore), must first parallel read-out legacy data D ' i(1≤i≤q) and old checking data P ', Q '; As far as single disk; Each write operation all comprises to be read legacy data earlier and writes two operations of new data again, in order to reduce the switching times of magnetic head reading and writing data, makes full use of the disk bandwidth; Said method further comprises: the optimization processing method of write operation is specially:
Set up a new data buffer zone, the buffer memory application program writes the new data of disk array; Two new checking data buffer zones, two types of new checking datas of buffer memory generation respectively; Q legacy data that disk is read in advance from array deposited in data pre-head district, q Geju City respectively; Two Geju City checking datas are read the district in advance, deposit two types of old checking datas from array, reading in advance respectively.Each reads to distinguish equal and opposite in direction in advance, and much larger than the new data buffer zone.
The sub-piece of legacy data that the sub-piece of each new data is corresponding, two kinds of sub-pieces of old checking data of identical verification bar have been read corresponding reading in advance in advance and have been distinguished before write operation.
Carry out following operation steps and generate new checking data P:
The 2.1st step: when the new data buffer zone is not sky, from the new data buffer zone, take out q data sub-block at every turn, be split as the data sub-block D that belongs to the q piece disk in the group r 1, D 2..., D qAfter, be written in parallel to q piece disk, be sent to the XOR device simultaneously;
The 2.2nd step: the sub-piece D ' of legacy data of correspondence in the data pre-head district, q Geju City 1, D ' 2...., D ' q, be sent to the XOR device, carry out XOR with the data sub-block that the 2.1st step was sent here, generate data sub-block X 1, X 2..., X q, wherein 1≤i≤q;
The 2.3rd step: data sub-block X 1, X 2..., X q, being sent to the XOR device, old checking data is read to distinguish the sub-piece P ' of old checking data corresponding in 1 in advance and is sent to the XOR device, and XOR generates the sub-piece P of new checking data, And it is temporary to new checking data buffer zone 1;
The 2.4th step: the sub-piece P ' of old checking data that upgrades in the corresponding verification bar is the sub-piece P of new checking data.
Carry out following operation steps and generate new checking data Q:
The 3.1st step: from the new data buffer zone, take out q data sub-block, be split as the data sub-block D that belongs to the q piece disk in the group r 1, D 2..., D qAfter, multiply by (Jia Luohua takes advantage of) corresponding coefficient A respectively R * q-q+i, long-pending A R * q-q+i* D iBe sent to the XOR device, 1≤i≤q;
The 3.2nd step: the sub-piece D ' of legacy data of correspondence in the data pre-head district, q Geju City 1, D ' 2..., D ' q, multiply by (Jia Luohua takes advantage of) corresponding coefficient A respectively R * q-q+i, long-pending A R * q-q+i* D ' iBe sent to the XOR device, with the 3.1st long-pending A that sends here of step R * q-q+i* D iCarry out XOR, generate data sub-block
Figure BSA00000375316900094
1≤i≤q;
The 3.3rd step: the sub-piece of variance data
Figure BSA00000375316900095
is sent to the XOR device; Old checking data reads to distinguish the sub-piece Q ' of old checking data corresponding in 2 in advance; Be sent to the XOR device; Generate the sub-piece Q of new checking data with XOR; is temporary to new checking data buffer zone 2,1≤i≤q;
The 3.4th step: the sub-piece Q ' of old checking data that upgrades in the corresponding verification bar is the sub-piece Q of new checking data.
If write data initial logical address LBA can not be divided exactly by the size of q data sub-block; And during the individual data sub-block of surplus k (k is a positive integer); Or in the new data buffer zone when difference k and not enough q data sub-block, then only get this k data sub-block and write corresponding disk, and utilize the legacy data and the old checking data of same verification bar correspondence; Generate new checking data according to above method and get final product, all the other q-k disk does not have writing task on the same group.
Owing to can directly read sub-piece of legacy data and the sub-piece of old checking data from reading the district in advance; Disk just can be carried out write operation continuously, have only data sub-block in reading in advance to distinguish to use up after, just once read in advance to fill up and read the district in advance; Through reading in advance; The read operation of a plurality of dispersions is converted into a continuous read operation, and the tracking that has reduced magnetic head moves number of times, can be provided with according to actual conditions and read the district in advance for enough greatly.
According to above analysis,, estimate that easily the bandwidth of writing continuously of this disk array SVE-RAID 6 is about qBW when reading the district in advance enough greatly the time Disk/ 2, wide about qBR continuously reads tape Disk, BW wherein DiskWrite bandwidth, BR continuously for the monolithic disk DiskThe continuous tape reading of monolithic disk is wide, and q is disk number in the group.
Said method further comprises: the disposal route of failed disk being carried out reading and writing data; Be specially:
Break down as disk, in the time of can't carrying out the normal data read-write, activate all disks entering ready states among the disk array SVE-RAID6 this moment, suspend the duty of using said disk duty control method control disk array SVE-RAID 6.
With the band is unit; Disposal route when storage block breaks down in the band is discussed, because disk failure shows as disk array and is distributed in all the storage block faults on this failed disk; To the processing of disk failure, can be through handling the realization of the storage block fault on each band in the array respectively.
When the storage block in band breaks down, can be divided into following 3 kinds of situation and handle:
Situation 1: if only check block breaks down, the data on the piece that continues to read and write data, but be in substance redundant data verification state (1 check block fault) or irredundant data check state (2 check block faults) this moment; After changing fault check block place disk, regenerate check block.
Situation 2:, the reading and writing operation is handled respectively if only data block (1 or 2) breaks down:
Read operation: utilize with other data block of band and the check block P or Q (1 the data block fault) computing of this band, or with P and Q (2 data block faults) computing, generate the data that quilt is read on this data block; After changing fault data piece place disk, use identical method recovered failure data block.
Write operation: 2 kinds of processing modes are arranged:
The 1st kind of processing mode: directly write data is write on the P or Q (1 data block fault) of this band, or on P and the Q (2 data block faults), and write down the check block that writes, behind the disk of replacing fault data piece place, carry out reduction of data and regeneration checking data.Reduction of data promptly is moved back into the origin-location to the data block that is deposited with in the check block, and the regeneration checking data is the check block that writes that regenerates record.
The 2nd kind of processing mode: use subsequent use dish to keep in to write the data on the failed disk, after changing failed disk, be moved back into the origin-location to the data block that is deposited with in the subsequent use dish again.
Situation 3: if 1 data block and 1 check block break down, the disposal route of fault data piece is identical when only 1 data block breaks down, and just is in substance redundant data guard mode this moment; After the fault data piece recovers,, identical when only 1 check block breaks down for the disposal route of fault check block.
After changing normal condition over to, launch said disk duty control method once more the duty of disk array SVE-RAID 6 is controlled.
Beneficial effect
The construction method of a kind of disk array SVE-RAID 6 of the present invention compares with prior art, has the following advantages:
1, significantly reduces energy resource consumption.Its data layout has dual redundant data protection mechanism; Take grouping strategy simultaneously; The overwhelming majority has only one group of in running order and working time long enough of disk the times, and making mostly in the array does not have the disk of task to be in halted state, to reach purpose of energy saving.
2, can select continuous memory bandwidth according to mission requirements.After taking grouping strategy, can set the q value based on mission requirements, the q piece disk in the concurrent access group adopts data buffering and the measure of reading in advance to improve simultaneously and reads to rewrite to writing the influence of bandwidth, therefore bigger continuous readwrite bandwidth can be provided.
3, fully increase the service life.Owing to temporarily do not have the disk of task to be scheduled for halted state among the disk array SVE-RAID 6; Stopped the high speed rotating of disk spindle; And disk starts, stand-by time is fully long at interval, so can reduce probability that disk breaks down and fully prolong array serviceable life.
Description of drawings
Fig. 1 is disk, band, data block, the check block numbering synoptic diagram among the disk array SVE-RAID 6 in the embodiment of the present invention;
Fig. 2 is the synoptic diagram that divides into groups among the disk array SVE-RAID 6 in the embodiment of the present invention;
Fig. 3 is data sub-block and a syndrome block number synoptic diagram among the disk array SVE-RAID 6 in the embodiment of the present invention;
Fig. 4 is that the disk duty in the embodiment of the present invention shifts synoptic diagram;
Fig. 5 is for generating the optimization synoptic diagram of new checking data P in the embodiment of the present invention during write operation;
Fig. 6 is for generating the optimization synoptic diagram of new checking data Q in the embodiment of the present invention during write operation;
Embodiment
Below in conjunction with accompanying drawing and specific embodiment the present invention is elaborated.
In the present embodiment, disk array SVE-RAID 6 is made up of 6 Seagate Cheetah 10K disks, and its correlation parameter is seen table 2.6 disks are carried out band divide, as shown in Figure 1, be divided into 6 bands, be numbered 1~6.6 storage blocks are arranged on each band, and wherein 2 storage blocks are check block, and other 4 storage blocks are data block; The the 1st, 6 storage block in the 6th band; The the 5th, 6 storage block in the 4th, 5 storage block, the 1st band in the 3rd, 4 storage block, the 2nd band in the 2nd, 3 storage block, the 3rd band in the 1st, 2 storage block, the 4th band in the 5th band is respectively check block P, the check block Q of this band; Other storage block is a data block; 4 data blocks in the same band are carried out XOR, generate the check block P of this band, carry out galois field multiplication and XOR, generate the check block Q of this band, calculate as follows:
Q ( 11,12,13,14 ) = ( A 0 * 11 ) &CirclePlus; ( A 1 * 12 ) &CirclePlus; ( A 2 * 13 ) &CirclePlus; ( A 3 * 14 )
Wherein
Figure BSA00000375316900122
Be XOR, * is the galois field multiplication, A 0, A 1, A 2, A 3For the galois field multiplication coefficient of inequality, for galois field GF (2 8), might as well get A 0=1, A 1=2, A 2=4, A 3=8.In like manner:
Q ( 21,22,23,26 ) = ( A 0 * 21 ) &CirclePlus; ( A 1 * 22 ) &CirclePlus; ( A 2 * 23 ) &CirclePlus; ( A 3 * 26 )
Q ( 31,32,35,36 ) = ( A 0 * 31 ) &CirclePlus; ( A 1 * 32 ) &CirclePlus; ( A 2 * 35 ) &CirclePlus; ( A 3 * 36 )
Q ( 41,44,45,46 ) = ( A 0 * 41 ) &CirclePlus; ( A 1 * 44 ) &CirclePlus; ( A 2 * 45 ) &CirclePlus; ( A 3 * 46 )
Q ( 53,54,55,56 ) = ( A 0 * 53 ) &CirclePlus; ( A 1 * 54 ) &CirclePlus; ( A 2 * 55 ) &CirclePlus; ( A 3 * 56 )
Q ( 62,63,64,65 ) = ( A 0 * 62 ) &CirclePlus; ( A 1 * 63 ) &CirclePlus; ( A 2 * 64 ) &CirclePlus; ( A 3 * 65 )
The correlation parameter of table 2Seagate Cheetah 10K disk
Figure BSA00000375316900128
In order to improve continuous memory bandwidth, 4 data blocks on each band are divided into 2 groups, every group comprises 2 data blocks, and the grouping situation is as shown in Figure 2.
Data sub-block is designed to 4KB, so each data block is divided into K=13,107,200 data sub-block are followed successively by the sub-block number of total data according to group number then, and the data sub-block that group number is identical is according to the band serial number, as shown in Figure 3.Promptly the 1st the sub-block number of data is 1 in the 1st data block of the 1st group of first band; Be that the 1st the sub-block number of data is 2 in the 2nd data block in the 1st group of first band; Be that the 2nd the sub-block number of data is 3 in the 1st data block in the 1st group of first band then; Be that the 2nd the sub-block number of data is 4 in the 2nd data block of the 1st group of first band; By that analogy, be that K the sub-block number of data is 2 * K-1 in the 1st data block in the 1st group of first band; Be that K the sub-block number of data is K * 2 in the 2nd data block of the 1st group of first band;
Be that the 1st the sub-block number of data is K * 2+1 in the 1st data block of the 1st group of second band then; Be that the 1st the sub-block number of data is K * 2+2 in the 2nd data block of the 1st group of second band; Be that the 2nd the sub-block number of data is K * 2+3 in the 1st data block of the 1st group of second band then; Be that the 2nd the sub-block number of data is K * 2+4 in the 2nd data block of the 1st group of second band; By that analogy, be that K the sub-block number of data is 4 * K-1 in the 1st data block of the 1st group of second band; Be that K the sub-block number of data is 4 * K in the 2nd data block of the 1st group of second band;
By that analogy, the 1st the sub-block number of data is 10 * K+1 in the 1st data block of the 1st group of the 6th band; Be that the 1st the sub-block number of data is 10 * K+2 in the 2nd data block of the 1st group of the 6th band; Be that the 2nd the sub-block number of data is 10 * K+3 in the 1st data block of the 1st group of the 6th band then; Be that the 2nd the sub-block number of data is 10 * K+4 in the 2nd data block of the 1st group of the 6th band; By that analogy, be that K the sub-block number of data is 12 * K-1 in the 1st data block of the 1st group of the 6th band; Be that K the sub-block number of data is 12 * K in the 2nd data block of the 1st group of the 6th band;
By that analogy, according to identical numbering principle, for the 2nd group of data sub-block of each band numbered.Number adjacent data sub-block, its logical address is adjacent.The identical group of sequence number in the sequence number adjacent ribbons, its logical address is adjacent, the adjacent group of sequence number in the 6th band and the 1st band, logical address is adjacent.
Each check block also is divided into K equal-sized syndrome piece, said syndrome piece P (1,1) ..., P (1; K) ..., P (6; 1) ..., P (6, K) by the place band each data block in 4 identical data sub-block of deviation post draw said syndrome piece Q (1 through XOR; 1) ..., Q (1; K) ..., Q (6,1) ..., Q (6, K) by the place band each data block in 4 identical data sub-block of deviation post multiply by (Jia Luohua multiplication) coefficient A respectively 0, A 1, A 2, A 3After, carry out XOR again and draw, form a verification bar like syndrome piece P (1,2), the Q (1,2) of data sub-block 3,4,12 * K+3,12 * K+4 and generation thereof, as shown in Figure 3.
For each disk is provided with a semaphore Sem i, 1≤i≤6, initial value is 0, and when normal continuous data was visited, disk duty scheduling instance was following:
Instance 1: certain visit is initial, the end address is all the 1st group of the 3rd band.
Logical address according to visit data; Obtain the data start address and be positioned at the 1st group of the 3rd band; On disk 1,2, two check blocks of this band are positioned on the disk 3,42 data blocks in the 1st group of further the 3rd band, if disk 1,2,3 or 4 is in halted state respectively; Then be dispatched to ready state, among Fig. 4 1. shown in; When beginning to visit the data on the disk 1,2,3,4, disk 1,2,3,4 transfers running status to by ready state, among Fig. 4 2. shown in; Because this visit finishes in the 1st group of the 3rd band, after this visit finished, disk 1,2,3,4 transferred ready state to by running status, among Fig. 4 3. shown in, withdraw from this scheduling then.
Instance 2: certain visit start address is the 1st group of the 6th band, and the end address is the 2nd group of 1 band.
Logical address according to visit data; Obtain the data start address and be positioned at the 1st group of the 6th band; Further the 1st group of 2 the interior data blocks of the 6th band that get are respectively on disk 2,3; Two check blocks of this band are positioned on the disk 1,6, if disk 2,3,1 or 6 is in halted state, then are dispatched to ready state; When beginning to visit the data on the disk 2,3,1,6, disk 2,3,1,6 transfers running status to by ready state; Because this visit does not finish in the 1st group of the 6th band; Therefore t judged in the 2nd group of adjacent set the 1st band duty of 2 data blocks at two check blocks place disks 5,6 of disk 3,4 and this band, because disk 3,6 is in running status constantly before the 6th band the 1st group access finished; Its semaphore adds 1; Get Sem 3=Sem 6=1, disk 4,5 is in halted state, so change ready state over to; After the 6th band the 1st group access finishes; Semaphore Sem 1, Sem 2, Sem 3, Sem 6 subtract 1, get Sem 1=Sem 2=-1, Sem 3=Sem 6=0; So disk 1,2 transfers halted state to by running status; Among Fig. 4 4. shown in, and recover Sem 1=Sem 2=0, disk 3,6 transfers ready state to by running status.
When beginning to visit the data on the disk 3,4,5,6, disk 3,4,5,6 forwards running status to by ready state; Because this visit finishes in the 2nd group of the 1st band, after then this visit finished, disk 3,4,5,6 transferred ready state to by running status, withdraws from this scheduling then.
For normal continuous data visit, in the overwhelming majority time, this disk array has only 4 disk work, and power consumption is about same disc and counts 4/6 of RAID 6.
If this array is made up of disk Seagate Cheetah 10K; Parameter is seen table 2; Disk size is 300GB; The minimum interval that calculates the startup of monolithic disk is visit 300GB/6 * required time of 2=100GB data, so the startup of disk interval and number of times can be ignored the influence in disk life-span.
Set up a new data buffer zone, the buffer memory application program writes the new data of disk array; Two new checking data buffer zones, two types of new checking datas of buffer memory generation respectively; 2 legacy datas that disk is read in advance from array are deposited in data pre-head district, 2 Geju City respectively; 2 Geju City checking datas are read the district in advance, deposit two types of old checking datas from array, reading in advance respectively.Each reads to distinguish equal and opposite in direction in advance, is 10 times of new data buffer zone.
The sub-piece of legacy data on the sub-piece writing position of new data, before write operation, read the corresponding district of reading in advance in advance with two types of sub-pieces of old checking data of the identical verification bar of the sub-piece of legacy data.
Carry out following operation steps and generate new checking data P:
The 2.1st step: when the new data buffer zone is not sky, from the new data buffer zone, take out 2 data sub-block at every turn, be split as the data sub-block D that belongs to 2 data blocks in the 2nd group (is example with the 2nd group) 1, D 2After, be written in parallel to 2 disks, be sent to the XOR device simultaneously, among Fig. 5 1.;
The 2.2nd step: the sub-piece D ' of legacy data of correspondence in the data pre-head district, 2 Geju City 1, D ' 2, be sent to the XOR device, with data sub-block D 1, D 2Carry out XOR, generate data sub-block
Figure BSA00000375316900151
Figure BSA00000375316900152
Among Fig. 5 2.;
The 2.3rd step: data sub-block X 1, X 2Be sent to the XOR device, 3., the sub-piece P ' of old checking data corresponding during old checking data is read to distinguish in advance is sent to the XOR device among Fig. 5, and 4., XOR generates the sub-piece P of new checking data among Fig. 5,
Figure BSA00000375316900153
Temporary to new checking data buffer zone 1, among Fig. 5 5.;
The 2.4th step: the sub-piece P ' of old checking data that upgrades in the corresponding verification bar of disk is the sub-piece P of new checking data.
Carry out following operation steps and generate new checking data Q:
The 3.1st step: from the new data buffer zone, take out 2 data sub-block, be split as the data sub-block D that belongs to the 2nd group of (is example with the 2nd group) interior 2 data blocks 1, D 2After, multiply by (Jia Luohua takes advantage of) corresponding coefficient A respectively 2 * 2-2+1Be A 3, A 2 * 2-2+2Be A 4, long-pending A 3* D 1, A 4* D 2Be sent to the XOR device;
The 3.2nd step: the sub-piece D ' of legacy data of correspondence in the data pre-head district, 2 Geju City 1, D ' 2, multiply by (Jia Luohua takes advantage of) corresponding coefficient A respectively 3, A 4, long-pending A 3* D ' 1, A 4* D ' 2Send the XOR device, with long-pending A 3* D 1, A 4* D 2Carry out XOR, generate data sub-block
Figure BSA00000375316900154
Figure BSA00000375316900155
The 3.3rd step: data sub-block
Figure BSA00000375316900156
is sent to the XOR device; Old checking data reads to distinguish the sub-piece Q ' of old checking data corresponding in 2 in advance; Be sent to the XOR device; Generate the sub-piece Q of new checking data with
Figure BSA00000375316900157
XOR;
Figure BSA00000375316900158
is temporary to new checking data buffer zone 2,1≤i≤2;
The 3.4th step: the sub-piece Q ' of old checking data that upgrades in the corresponding verification bar is the sub-piece Q of new checking data.
The disk that present embodiment adopts is Seagate Cheetah 10K, and the continuous tape reading of its monolithic disk is wide to be 85MB/s, and writing bandwidth continuously is 84MB/s.The bandwidth of writing continuously that can get this disk array is about 84MB/s, and the wide 170MB/s of being about that reads tape continuously writes when can satisfy the high definition video steaming of 42 road 2MB/s, reads in the time of 85 road such video flowings.
Break down as disk, in the time of can't carrying out the normal data read-write, need to activate all disks entering ready states in the array, suspend and use said disk duty control method.
When any two disks broke down simultaneously in the array, disposal route was identical, might as well break down simultaneously with disk 1, disk 2 to be example, described.
Suppose that disk 1, disk 2 in the array break down simultaneously, show as that the 1st, 2 storage block on each band breaks down in the array, the fault type of each band storage block is following:
1st, on 2,3 bands, 2 data blocks break down simultaneously, and disposal route is following:
During read operation, utilize, generate the data of being read on the fault data piece with other data block of band and the check block P and the Q computing of this band; After changing failed disk, with identical method recovered failure data block.
During write operation,, write respectively on the P and Q of this band, and write down the check block that writes writing two data on the fault data piece; After changing disk, carry out reduction of data and regeneration checking data.Reduction of data promptly is moved back into the origin-location to the data block that is deposited with in the check block, and the regeneration checking data is the check block that writes that regenerates record.
4th, on 6 bands, 1 data block and 1 check block break down simultaneously, and disposal route is following:
During read operation, utilize other data block and another check block computing of this band, generate the data of being read on this fault data piece; After changing fault data piece disk, with identical method recovered failure data block.
During write operation, write data is write on another check block of this band, and write down the check block that writes, behind the replacing fault data piece disk, be moved back into the origin-location to the data block that is deposited with in the check block, and regenerate the check block that writes of record.
After the fault data piece recovers, and after changing fault check block disk, regenerate the fault check block.
On the 5th band, 2 check blocks break down simultaneously, and disposal route is following:
Can continue to read and write the data on this strip data piece, but be in irredundant data check state this moment; After changing disk, regenerate the data on the fault check block.
When any 1 disk broke down in the array, disposal route was identical, and might as well break down with disk 3 is example, describes.When the disk 3 in the array breaks down, show as that the 3rd storage block on each band breaks down in the array, the fault type of each band storage block is following:
1st, on 2,5,6 bands, 1 data block breaks down, and disposal route is following:
During read operation, utilize, generate the data of being read on the fault data piece with other data block of band and the check block P computing of this band; After changing failed disk, with identical method recovered failure data block.
During write operation,, write on the P of this band, and write down the check block that writes the data on the Write fault data block; After changing disk, be moved back into the origin-location to the data block that is deposited with among the check block P, and regenerate the check block that writes of record.
3rd, on 4 bands, 1 check block breaks down, and disposal route is following:
Continue the data on this strip data of read-write piece; After changing disk, regenerate the data on the check block.
After changing normal condition over to, launch said disk duty control method once more the duty of disk array SVE-RAID 6 is controlled.
The above only is a preferred implementation of the present invention; Should be understood that; For those skilled in the art, under the prerequisite that does not break away from the principle of the invention, can also make some improvement; Perhaps part technical characterictic wherein is equal to replacement, these improvement and replacement also should be regarded as protection scope of the present invention.

Claims (6)

1. the construction method of two disk tolerance vertical-type grouping concurrent access disk arrays is characterized in that, may further comprise the steps:
One, disk array is carried out the division and the layout of band, data block, check block and data sub-block, concrete grammar is: comprise in the scene of N disk at disk array, wherein N >=4 and N are positive integer; N disk is configured to the disk matrix of 1 row * N row; Whole disks are carried out band divide, be divided into N band and, N storage block arranged on each band for each band serial number; Wherein 1 is the P check block; 1 is the Q check block, and N-2 is data block in addition, and check block P carries out XOR by the data block of the N-2 in the same strap and draws; Check block Q carries out the galois field multiplication by the data block of the N-2 in the same strap and additive operation draws, and the generation formula of P, Q is following:
P=D 0⊕D 1⊕…⊕D N-2(1)
Q=(A 0*D 0)⊕(A 1*D 1)⊕…⊕(A N-2*D N-2)(2)
D wherein 1, D 2..., D N-2Be the data block of the N-2 in the same band, ⊕ representes XOR, A 1, A 2..., A N-2Be the galois field multiplication coefficient of inequality, * representes the galois field multiplication;
(wherein i representes the sequence number of this storage block place band for i, a j) storage block among the two disk tolerance vertical-type grouping concurrent access disk array SVE-RAID 6 of expression with X; J representes to belong to the sequence number of disk, and (i j) is positioned on the disk j storage block X; 1≤i, j≤N; Check block P in the i band representes with P (i), with storage block X (i j) has following relation:
P ( i ) = X ( i , N - i ) i &NotEqual; N X ( N , N ) i = N - - - ( 3 )
Check block Q in the i band representes with Q (i), with storage block X (i j) has following relation:
Q(i)=X(i,N+1-i)(4)
Other storage block is a data block, with D (wherein u representes the sequence number of data block place band for u, v) expression, when 1≤u≤N, v represent to ignore check block P and Q, the sequence number of the data block of this data block in affiliated band, sequence number is since 1,1≤v≤N-2; Data block D (u, v) the corresponding relation with storage block satisfies formula 5:
In order to improve continuous memory bandwidth; And obtain the different successive memory bandwidth, to satisfy the demand of different memory rates, the data block of the N-2 on each band is divided into the p group; P >=2 and p are positive integer; Every group comprises q data block, and q >=1 and q are positive integer, and the q value is confirmed the demand of continuous bandwidth according to the actual storage task; And relation: p * q=N-2 below satisfying;
Each data block is divided into K equal-sized data sub-block; K >=2 and K are positive integer, are followed successively by the sub-block number of total data, the data sub-block that group number is identical according to group number then; According to the band serial number, that is: the 1st the sub-block number of data is 1 in the 1st data block of the 1st group of first band; Be that the 1st the sub-block number of data is 2 in the 2nd data block in the 1st group of first band; Be that the 1st the sub-block number of data is q in q the data block in the 1st group of first band; Be that the 2nd the sub-block number of data is q+1 in the 1st data block in the 1st group of first band then; Be that the 2nd the sub-block number of data is q+2 in the 2nd data block of the 1st group of first band; Be that the 2nd the sub-block number of data is 2 * q in q the data block in the 1st group of first band; By that analogy, be that K the sub-block number of data is (K-1) * q+1 in the 1st data block in the 1st group of first band; Be that K the sub-block number of data is (K-1) * q+2 in the 2nd data block of the 1st group of first band; Be that K the sub-block number of data is K * q in q the data block in the 1st group of first band;
Be that the 1st the sub-block number of data is K * q+1 in the 1st data block of the 1st group of second band then; Be that the 1st the sub-block number of data is K * q+2 in the 2nd data block of the 1st group of second band; Be that the 1st the sub-block number of data is (K+1) * q in q the data block of the 1st group of second band; Be that the 2nd the sub-block number of data is (K+1) * q+1 in the 1st data block of the 1st group of second band then; Be that the 2nd the sub-block number of data is (K+1) * q+2 in the 2nd data block of the 1st group of second band; Be that the 2nd the sub-block number of data is (K+2) * q in q the data block of the 1st group of second band; By that analogy, be that K the sub-block number of data is 2 * K * q-q+1 in the 1st data block of the 1st group of second band; Be that K the sub-block number of data is 2 * K * q-q+2 in the 2nd data block of the 1st group of second band; Be that K the sub-block number of data is 2 * K * q in q the data block of the 1st group of second band;
In like manner be that the 1st the sub-block number of data is (i-1) * K * q+1 in the 1st data block of the 1st group of i band, 1≤i≤N; Be that the 1st the sub-block number of data is (i-1) * K * q+2 in the 2nd data block of the 1st group of i band; Be that the 1st the sub-block number of data is (i-1) * K * q+q in q the data block of the 1st group of i band; Be that the 2nd the sub-block number of data is (i-1) * K * q+q+1 in the 1st data block of the 1st group of i band then; Be that the 2nd the sub-block number of data is (i-1) * K * q+q+2 in the 2nd data block of the 1st group of i band; Be that the 2nd the sub-block number of data is (i-1) * K * q+q * 2 in q the data block of the 1st group of i band; By that analogy, be that K the sub-block number of data is i * K * q-q+1 in the 1st data block of the 1st group of i band; Be that K the sub-block number of data is i * K * q-q+2 in the 2nd data block of the 1st group of i band; Be that K the sub-block number of data is i * K * q in q the data block of the 1st group of i band;
By that analogy; According to identical numbering principle; For the 2nd group of data sub-block of each band number, for the 3rd group of data sub-block of each band number ..., number for the p of each band group data sub-block, number adjacent data sub-block, its logical address is adjacent;
The band sequence number is adjacent, the identical group of group sequence number, and its logical address is adjacent, the adjacent group of group sequence number in the N band and first band, and logical address is adjacent, and the group that logical address is adjacent is called adjacent set;
Each check block P, check block Q also are divided into K equal-sized syndrome piece; Syndrome piece among check block P, the Q is drawn through XOR, galois field multiplication and addition respectively by N-2 identical data sub-block of deviation post in the band of place; The syndrome piece of this N-2 data sub-block and generation thereof constitutes a verification bar;
Two, according to following strategy, the scheduling disk carries out the duty conversion:
For each disk is provided with a semaphore, represent with Sem i, 1≤i≤N, the initial value of signalization amount Sem i is 0;
The 1st step: according to the logical address of visit data, represent, calculate affiliated band u of this visit data place data block and affiliated group, represent with r with LBA; Through type 6 obtains:
Figure FSB00000832253200031
Wherein, organize the big or small sum of each data block in the big or small expression group, MOD representes modular arithmetic, adds 1 expression band numbering and group # all since 1;
Then, confirm q data block D (u, r * q-q+1), D (u in the u band r group with formula 5; R * q-q+2) ..., D (u; The disk at the place of r * q-q+q) confirms that with formula 3 disk at this band check block P (u) place be (N-u) or N, and the disk that belongs to formula 4 definite this band check block Q (u) is N+1-u; Above disk is handled respectively:, then be dispatched to ready state if disk is in halted state;
Band sequence number u and group sequence number r are respectively applied for the band sequence number and group sequence number that the record current accessed is operated;
The 2nd step: q data block place disk in u band r group, u band check block P (u), when Q (u) place disk is in ready state, begin this q+2 of concurrent access disk, the disk that quilt is visited transfers running status to by ready state;
The 3rd step: if this visit finishes in u band r group; After then this visit finishes; Disk transfers ready state to by running status under check block P (u) under q the data block in the u band r group in disk, the u band, the Q (u), withdraws from this scheduling then; Otherwise t confirms the band numbering and the group # of adjacent set constantly before u band r group access finishes, and wherein t is that disk transfers the needed time of ready state to by halted state, and the band numbering representes that with I group # is represented with R; And check block P (I), check block Q (I) place disk in disk, the I band under q data block in the I band R group handled respectively: be in halted state as if disk, then be dispatched to ready state; If disk is in running status, then the semaphore of this disk adds 1; After u band r group access finishes; Disk under check block P (u) in disk, the u band under q the data block in the u band r group, the Q (u) is handled respectively: the semaphore of disk is subtracted 1; If the semaphore of this disk is less than 0; Then transfer this disk to halted state by running status, and the semaphore of resetting this disk is 0; Otherwise, transfer this disk to ready state by running status;
The 4th step: be updated to I to the u value of record current accessed place band sequence number, the value of the r that writes down current accessed place group sequence number is updated to R, repeat the 2nd and go on foot the 4th and go on foot.
2. the construction method of a kind of pair of disk tolerance vertical-type grouping concurrent access disk array as claimed in claim 1 is characterized in that:
When carrying out continuous read operation, check block place disk quits work.
3. the construction method of a kind of pair of disk tolerance vertical-type grouping concurrent access disk array as claimed in claim 1 is characterized in that:
Said method further comprises: the optimization processing method of write operation is specially:
Set up a new data buffer zone, the buffer memory application program writes the new data of disk array; Two new checking data buffer zones, two types of new checking datas of buffer memory generation respectively; Q legacy data that disk is read in advance from array deposited in data pre-head district, q Geju City respectively; Two Geju City checking datas are read the district in advance, deposit two types of old checking datas from array, reading in advance respectively; The sub-piece of legacy data that the sub-piece of each new data is corresponding, two kinds of sub-pieces of old checking data of identical verification bar have been read corresponding reading in advance in advance and have been distinguished before write operation;
Carry out following operation steps and generate new checking data P:
The 2.1st step: when the new data buffer zone is not sky, from the new data buffer zone, take out q data sub-block at every turn, be split as the data sub-block D that belongs to the q piece disk in the group r 1, D 2..., D qAfter, be written in parallel to q piece disk, be sent to the XOR device simultaneously;
The 2.2nd step: the sub-piece of legacy data of correspondence in the data pre-head district, q Geju City
Figure FSB00000832253200051
Be sent to the XOR device, carry out XOR, generate data sub-block X with the data sub-block that the 2.1st step was sent here 1, X 2..., X q, wherein
Figure FSB00000832253200052
1≤i≤q;
The 2.3rd step: data sub-block X 1, X 2..., X q, being sent to the XOR device, old checking data is read to distinguish the sub-piece P ' of old checking data corresponding in 1 in advance and is sent to the XOR device, and XOR generates the sub-piece P of new checking data, P=X 1⊕ X 2⊕ ... ⊕ X q⊕ P ', and temporary to new checking data buffer zone 1;
The 2.4th step: the sub-piece P ' of old checking data that upgrades in the corresponding verification bar is the sub-piece P of new checking data;
Carry out following operation steps and generate new checking data Q:
The 3.1st step: from the new data buffer zone, take out q data sub-block, be split as the data sub-block D that belongs to the q piece disk in the group r 1, D 2..., D qAfter, respectively with corresponding coefficient A R * q-q+iMake the Jia Luohua multiplication, long-pending A R * q-q+i* D iBe sent to the XOR device, 1≤i≤q;
The 3.2nd step: the sub-piece of legacy data of correspondence in the data pre-head district, q Geju City
Figure FSB00000832253200053
Respectively with corresponding coefficient A R * q-q+iMake the Jia Luohua multiplication, long-pending
Figure FSB00000832253200054
Be sent to the XOR device, with the 3.1st long-pending A that sends here of step R * q-q+i* D iCarry out XOR, generate data sub-block
Figure FSB00000832253200055
Figure FSB00000832253200056
1≤i≤q;
The 3.3rd step: the sub-piece of variance data is sent to the XOR device; Old checking data reads to distinguish the sub-piece Q ' of old checking data corresponding in 2 in advance; Be sent to the XOR device; Generate the sub-piece Q of new checking data with
Figure FSB00000832253200058
XOR;
Figure FSB00000832253200059
is temporary to new checking data buffer zone 2,1≤i≤q;
The 3.4th step: the sub-piece Q ' of old checking data that upgrades in the corresponding verification bar is the sub-piece Q of new checking data;
If writing data initial logical address LBA can not be divided exactly by the size of q data sub-block; And during surplus k data sub-block, k is a positive integer, or in the new data buffering area when k of difference and not enough q data sub-block; Then only get this k data sub-block and write corresponding disk; And utilize same verification bar corresponding legacy data and old checking data, and generating new checking data based on above method and get final product, all the other q-k disk does not have writing task on the same group;
Owing to can directly read sub-piece of legacy data and the sub-piece of old checking data from reading the district in advance; Disk just can be carried out write operation continuously; After having only data sub-block in reading in advance to distinguish to use up, just once read in advance to fill up and read the district in advance, through reading in advance; The read operation of a plurality of dispersions is converted into a continuous read operation, and the tracking that has reduced magnetic head moves number of times.
4. the construction method of a kind of pair of disk tolerance vertical-type grouping concurrent access disk array as claimed in claim 3 is characterized in that:
Legacy data is read the district in advance and old checking data reads to distinguish equal and opposite in direction in advance, and much larger than the new data buffer zone.
5. the construction method of a kind of pair of disk tolerance vertical-type grouping concurrent access disk array as claimed in claim 3 is characterized in that:
New checking data buffer zone and new data buffer size equate.
6. the construction method of a kind of pair of disk tolerance vertical-type grouping concurrent access disk array as claimed in claim 1 is characterized in that:
Said method further comprises: the disposal route of failed disk being carried out reading and writing data; Be specially:
Break down as disk, in the time of can't carrying out the normal data read-write, activate all disks entering ready states among two disk tolerance vertical-type grouping concurrent access disk array SVE-RAID 6 this moment, suspend and use said disk duty control method;
With the band is unit; Disposal route when storage block breaks down in the band is discussed, because disk failure shows as disk array and is distributed in all the storage block faults on this failed disk; To the processing of disk failure, can be through handling the realization of the storage block fault on each band in the array respectively;
When the storage block in band breaks down, can be divided into following 3 kinds of situation and handle:
Situation 1: if only check block breaks down, the data on the piece that continues to read and write data, but be in i.e. i.e. 2 the check block faults of 1 check block fault or irredundant data check state of substance redundant data verification state this moment; After changing fault check block place disk, regenerate check block;
Situation 2:, the reading and writing operation is handled respectively if only 1 or 2 data blocks break down:
Read operation: utilize with other data block of band when 1 data block fault with the check block P or the Q computing of this band, or when 2 data block faults and P and Q computing, generate the data that quilt is read on this data block; After changing fault data piece place disk, use identical method recovered failure data block;
Write operation: 2 kinds of processing modes are arranged:
The 1st kind of processing mode: directly write data is write on the P or Q of this band when 1 data block fault; Or when 2 data block faults, write on P and the Q; And write down the check block that writes, behind the disk of replacing fault data piece place, carry out reduction of data and regeneration checking data; Reduction of data promptly is moved back into the origin-location to the data block that is deposited with in the check block, and the regeneration checking data is the check block that writes that regenerates record;
The 2nd kind of processing mode: use subsequent use dish to keep in to write the data on the failed disk, after changing failed disk, be moved back into the origin-location to the data block that is deposited with in the subsequent use dish again;
Situation 3: if 1 data block and 1 check block break down, the disposal route of fault data piece is identical when only 1 data block breaks down, and just is in substance redundant data guard mode this moment; After the fault data piece recovers,, identical when only 1 check block breaks down for the disposal route of fault check block;
After changing normal condition over to, launch said disk duty control method once more the duty of two disk tolerance vertical-type grouping concurrent access disk array SVE-RAID 6 is controlled.
CN2010105756252A 2010-12-01 2010-12-01 Method for constructing double-disk fault-tolerant vertical packet parallel access disk array Active CN102023820B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2010105756252A CN102023820B (en) 2010-12-01 2010-12-01 Method for constructing double-disk fault-tolerant vertical packet parallel access disk array

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2010105756252A CN102023820B (en) 2010-12-01 2010-12-01 Method for constructing double-disk fault-tolerant vertical packet parallel access disk array

Publications (2)

Publication Number Publication Date
CN102023820A CN102023820A (en) 2011-04-20
CN102023820B true CN102023820B (en) 2012-08-22

Family

ID=43865153

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2010105756252A Active CN102023820B (en) 2010-12-01 2010-12-01 Method for constructing double-disk fault-tolerant vertical packet parallel access disk array

Country Status (1)

Country Link
CN (1) CN102023820B (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103034559B (en) * 2012-12-18 2016-06-08 无锡众志和达数据计算股份有限公司 PQ inspection module and the method for inspection based on RDMA architecture design
US9235471B2 (en) * 2013-04-29 2016-01-12 Netapp, Inc. Background initialization for protection information enabled storage volumes
CN105260261B (en) * 2015-11-19 2018-06-15 四川神琥科技有限公司 A kind of mail restoration methods
KR102572357B1 (en) * 2016-02-03 2023-08-29 삼성전자주식회사 Raid-6 data storage device and data processing system having the same
CN110659152B (en) * 2018-06-28 2021-04-09 华为技术有限公司 Data processing method and equipment
CN109189331B (en) * 2018-08-09 2021-08-17 河北工业大学 Dynamic data layout method
CN109491613B (en) * 2018-11-13 2021-11-02 深圳龙岗智能视听研究院 Continuous data protection storage system and storage method using same
CN111857552A (en) * 2019-04-30 2020-10-30 伊姆西Ip控股有限责任公司 Storage management method, electronic device and computer program product
CN111258807B (en) * 2020-01-16 2023-05-09 四川效率源科技有限责任公司 Data recovery method for raid6 missing disk in logical volume management
CN113297000B (en) * 2021-05-20 2022-12-23 山东云海国创云计算装备产业创新中心有限公司 RAID (redundant array of independent disks) coding circuit and coding method
CN116909488B (en) * 2023-09-11 2024-01-26 苏州浪潮智能科技有限公司 Method, system, medium, equipment and disk array card for accessing disk array
CN117149098B (en) * 2023-10-31 2024-02-06 苏州元脑智能科技有限公司 Stripe unit distribution method and device, computer equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1692335A (en) * 2002-11-01 2005-11-02 国际商业机器公司 Method and means for tolerating multiple dependent or arbitrary double disk failures in a disk array
CN101055511A (en) * 2007-05-16 2007-10-17 华为技术有限公司 Memory array system and its data operation method
EP1936486A2 (en) * 2006-11-28 2008-06-25 Hitachi, Ltd. Storage system comprising power saving function
CN101349979A (en) * 2008-09-05 2009-01-21 清华大学 Method for updating double-magnetic head user data of large scale fault-tolerant magnetic disk array storage system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1692335A (en) * 2002-11-01 2005-11-02 国际商业机器公司 Method and means for tolerating multiple dependent or arbitrary double disk failures in a disk array
EP1936486A2 (en) * 2006-11-28 2008-06-25 Hitachi, Ltd. Storage system comprising power saving function
CN101055511A (en) * 2007-05-16 2007-10-17 华为技术有限公司 Memory array system and its data operation method
CN101349979A (en) * 2008-09-05 2009-01-21 清华大学 Method for updating double-magnetic head user data of large scale fault-tolerant magnetic disk array storage system

Also Published As

Publication number Publication date
CN102023820A (en) 2011-04-20

Similar Documents

Publication Publication Date Title
CN102023820B (en) Method for constructing double-disk fault-tolerant vertical packet parallel access disk array
CN102023819B (en) Method for constructing double-disk fault tolerance horizontal grouping and parallel access disk array
CN102033716B (en) Method for constructing energy-saving type disc array with double discs for fault tolerance
CN101976175B (en) Method for constructing disk array by horizontal grouping parallel concentrated verification
CN101976174B (en) Method for constructing energy-saving disk array of vertical configuration distribution check
CN101976177B (en) Method for constructing vertical grouped disk array capable of being subject to parallel centralized check
CN101976178B (en) Method for constructing vertically-arranged and centrally-inspected energy-saving disk arrays
CN101976176B (en) Method for constructing disk array of horizontal grouping parallel distribution check
CN103049222B (en) A kind of RAID5 writes IO optimized treatment method
US7904647B2 (en) System for optimizing the performance and reliability of a storage controller cache offload circuit
CN104035830A (en) Method and device for recovering data
CN101622606A (en) Be used for device, system and method as the solid-state memory of the high-speed cache of high capacity, nonvolatile memory
CN102147713B (en) Method and device for managing network storage system
CN101715575A (en) Adopt device, the system and method for data pipe management data
CN101567211A (en) Method for improving usability of disk and disk array controller
EP3262500A1 (en) Data stripping, allocation and reconstruction
CN101976179B (en) Construction method of vertical-type grouping parallel-distributed and checked disk array
CN101470640B (en) Method and system for providing deferred maintenance on storage subsystems
CN102184079B (en) Method for optimizing writing performance of redundant array of independent disk (RAID5)-grade disk array
CN102177496A (en) System and method for transferring data between different RAID data storage types for current data and replay data
CN101625586A (en) Method, equipment and computer for managing energy conservation of storage device
CN105930097A (en) Distributed verification redundant array of independent disks capable of eliminating local parallel read-modify-write operation
CN102164165A (en) Management method and device for network storage system
CN101414244A (en) A kind of methods, devices and systems of processing data under network environment
CN105930099B (en) The fault-tolerant disk array of double plate of small write operation in a kind of elimination local parallel

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant