CN101976178A - Method for constructing vertically-arranged and centrally-inspected energy-saving disk arrays - Google Patents

Method for constructing vertically-arranged and centrally-inspected energy-saving disk arrays Download PDF

Info

Publication number
CN101976178A
CN101976178A CN 201010256899 CN201010256899A CN101976178A CN 101976178 A CN101976178 A CN 101976178A CN 201010256899 CN201010256899 CN 201010256899 CN 201010256899 A CN201010256899 A CN 201010256899A CN 101976178 A CN101976178 A CN 101976178A
Authority
CN
China
Prior art keywords
data
disk
block
disks
check
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN 201010256899
Other languages
Chinese (zh)
Other versions
CN101976178B (en
Inventor
谭毓安
孙志卓
周泽湘
李虓
王成武
郭毅
王道邦
李艳国
章珉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING TOYOU FEIJI ELECTRONICS Co Ltd
Original Assignee
BEIJING TOYOU FEIJI ELECTRONICS Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING TOYOU FEIJI ELECTRONICS Co Ltd filed Critical BEIJING TOYOU FEIJI ELECTRONICS Co Ltd
Priority to CN2010102568995A priority Critical patent/CN101976178B/en
Publication of CN101976178A publication Critical patent/CN101976178A/en
Application granted granted Critical
Publication of CN101976178B publication Critical patent/CN101976178B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Signal Processing For Digital Recording And Reproducing (AREA)

Abstract

The invention relates to a method for constructing vertically-arranged and centrally-inspected energy-saving disk arrays, which is suitable for continuous data storage and belongs to the technical field of redundant arrays of independent disks (RAID). According to the characteristic of continuous data storage, a new S-RAID 4 is designed and implemented. The new RAID mainly comprises the layout of stored data on the S-RAID 4, the reading and writing of the stored data, the generating mode of inspection data, data buffering and pre-reading optimization during data writing, switching and scheduling of disk working state and the like. The disk array S-RAID 4 has a good redundant data protection mechanism and can remarkably lower power consumption and prolong service life.

Description

A kind of vertical arrangement is concentrated the construction method of the energy-saving disk array of verification
Technical field
The present invention relates to a kind of Redundant Array of Independent Disks (RAID) (Redundant Arrays of Independent Disks, RAID) construction method, be particularly related to a kind of vertical arrangement and concentrate the construction method of the energy-saving disk array of verification, be applicable to the continuous data storage, belong to the Redundant Array of Independent Disks (RAID) technical field.
Background technology
In modern field of storage, for reliability that improves the storage data and input, the output performance that improves storage system, people have designed the several data storage scheme, the normally various types of Redundant Array of Independent Disks (RAID) of these data storage scheme (Redundant Arrays of Independent Disks, RAID).By using specific hardware or software, RAID joins together a plurality of physical storage devices such as disk, forms a unified logical memory device.
Below technical term commonly used among the RAID is made an explanation:
Band: be called Stripe again; Being the set of the relevant piecemeal in position on the different disk of disk array, is the unit that organizes stick on the different disk.
Striping: be called Striping again; Be meant the data block that one section continuous data is divided into identical size, every segment data be written to the method on the different disk of disk array respectively.
Disk mirroring is meant copy source data to one or more disks,
Error correction is meant utilizes certain computing, as XOR, generates and the preservation redundant data, can utilize redundant data, the data of losing or makeing mistakes on the regeneration disk.
XOR computing: XOR.
Relatively Chang Yong RAID has RAID0, RAID1, RAID5, RAID6, RAID10 etc.Wherein RAID0 does not have redundant ability, and RAID1 has just done mirror image to disk.Other 3 kinds of arrays are made up of a plurality of disks respectively, and they are with the disk write data of mode in array of band, and parity data leaves on each disk in the array.Each band of RAID5 contains 1 check block, supports to damage one of them disk arbitrarily, come data reconstruction by the parity block on other disk; Each band of RAID6 contains 2 check blocks, supports to damage arbitrarily wherein two disks, comes restore data by the parity block on other disk; RAID10 carries out mirror image to disk groups earlier, again disk is carried out striping, so it does not contain parity block, after a disk failures, pass through its corresponding mirror drive restore data, 50% disk failures under maximum permission diverse locations under the perfect condition, the situation of worst is to damage simultaneously with a pair of mirrored disk.
In RAID 5, a band comprises one group of data block that logic is adjacent, and these data block store are in array on the different memory device, the data that RAID 5 can correct or regenerate and store on the disk, and because a plurality of disks are united the redundant data of using a disk size, so the required redundant data amount of unit valid data is less, especially when comprising a plurality of disk among the RAID 5, this advantage is more obvious.As the RAID 5 that 8 disks constitute, redundant data accounts for 1 disk size, for all storing 1/8 of data.And because the data block that RAID 5 each band are distributed on the different disk is bigger, guaranteed that most of I/O operation can both drop in the data block of a disk, checking data also is a distributed store, increases the concurrent possibility of many I/O so greatly.Because RAID 5 has outstanding combination property, for most of commerce server provide the cost performance of optimizing, so obtained widespread use.
Yet; in the continuous data field of storage; in application such as video monitoring, VTL, continuous data protection; continuous data stream is written in the disk in proper order; what carry out is sequential storage; the address of promptly storing data is continuous, satisfies the principle of locality of storage space visit, does not therefore have the I/O concurrency.Disk array commonly used, in order to improve data access speed and to guarantee the I/O concurrency, continuous data is disperseed to store on the different disks, though the local adjacent disk of one or a few logic of each only visit, most disks do not have task must dally yet in the array, so not only waste a large amount of energy, also increased the invalid line duration of disk simultaneously greatly, seriously shortened the serviceable life of disk.
Along with the disk development of technology; the continuous readwrite bandwidth of monolithic disk is significantly improved; if can be when having the redundant data protection mechanism; make full use of the transmission bandwidth of monolithic disk, make the monolithic disk work in the array, finish store tasks; and other disk is in halted state; be that disk spindle and magnetic head motor quit work, not only can save a large amount of energy like this, and can also greatly prolong the serviceable life of array.The energy consumption of disk under the different operating state sees Table 1.
The energy consumption of table 1 disk under the different operating state
The disk duty Read-write operation Idling conditions Halted state
Average power consumption 11.5W 8.5W Be about 0W
Store for continuous data, if can be according to the actual task demand, make one or a few disk work in the array, finish store tasks, other disk scheduling that does not temporarily have task is to halted state, be that disk spindle and magnetic head motor quit work, not only can save a large amount of energy like this, and can also effectively utilize the serviceable life of disk.
Summary of the invention
The objective of the invention is to overcome the deficiency that prior art exists, satisfying under the prerequisite of setting memory property, propose the construction method that a kind of vertical arrangement is concentrated the energy-saving disk array S-RAID 4 of verification.Vertical arrangement is concentrated the energy-saving disk array of verification, abbreviates disk array S-RAID 4 as.
The objective of the invention is to be achieved through the following technical solutions.
The construction method of a kind of disk array S-RAID 4 is applied in the scene that disk array comprises N disk, and wherein N 〉=3 and N are positive integer; In N the disk, 1 disk is a check disk, and all the other N-1 disk is a data disks; N disk is configured to the disk matrix of 1 row * N row; Whole disks are carried out band divide, be divided into m band and be each band numbering, wherein m 〉=1 and m are positive integer; The piecemeal of each band on data disks is called data block, and the piecemeal of each band on check disk is called check block; Described check block is drawn by XOR by the data block of the N-1 in the same strap.
The logical address of the data block that physical location is adjacent in the same data disks is adjacent, and the logical address of adjacent data dish also is adjacent, and promptly the logical address of the first data block of the logical address of last data block of last data dish and next data disks is adjacent.
Each data block is divided into K data sub-block, and K 〉=2 and K are positive integer, and the logical address of the data sub-block that physical location is adjacent on the same disk is adjacent; Each check block also is divided into K syndrome piece, and the logical address of the syndrome piece that physical location is adjacent also is adjacent.Then according to the logical address order, respectively to the sub-block number of the sub-block sum check of data.Described syndrome piece is drawn by XOR by N-1 identical data sub-block of deviation post in each data block in its place band, and the syndrome piece that this N-1 data sub-block and XOR thereof generate constitutes a verification bar.
Preferably, if be h the serviceable life of data disks 1, be h the serviceable life of check disk 2, h then 1And h 2Satisfy following relation: h 2=h 1* (N-1).
Described method further comprises: the method for control disk duty.
The disk duty comprises: stop, operation, ready three kinds of states.
During the disk halted state, disk spindle stops the rotation, magnetic head stops tracking;
During the disk running status, disk is being carried out read-write operation, comprises rotating shaft rotation, magnetic head tracking;
During the disk ready state, expression will be read and write, and disk spindle rotation this moment, magnetic head be tracking not.
When carrying out continuous write operation, in the overwhelming majority times, have only 1 data disks and 1 check disk to be in running status, can dispatch the data disks that temporarily not have to work and enter into halted state, to reach purpose of energy saving.
When carrying out continuous write operation, check disk is in running status, and the duty conversion scheduling strategy of data disks is specially:
The 1st step:, calculate the data disks (sequence number of the data disks at this data place is represented with r) at this data place according to the logical address (representing) of visit data with LBA; If data disks r is in halted state, then dispatches this dish and forward ready state to by halted state; The data disks r at this visit data place obtains by formula 1:
Figure BSA00000234585300041
In the formula 1,
Figure BSA00000234585300042
Add 1 after rounding downwards, the sequence number r of expression data disks is since 1 counting.
The 2nd step: when data disks r is in ready state, begin to visit this data disks, and make it transfer running status to by ready state;
The 3rd step: if this visit finishes in data disks r, then after this visit finished, r transferred ready state to by running status with data disks, withdraws from this scheduling then; Otherwise, after data disks r visit finishes, transfer it to halted state by running status.T constantly before visit data dish r finishes, judge whether adjacent data dish (sequence number of adjacent data dish is represented with r ') is in halted state, be then to transfer data disks r ' to ready state by halted state, wherein t is that disk transfers the needed time of ready state to by halted state.
The 4th step: data disks r ' note is made data disks r, repeat the 2nd then and go on foot the 4th operation that goes on foot.
When carrying out continuous read operation, its disk duty conversion scheduling strategy is identical with continuous write operation.
Preferably, when only carrying out continuous read operation, the verification disk can quit work, and promptly has only 1 data disks in running order basically.
For each write operation,, when promptly writing new data D, need the new checking data P that generates on the identical verification bar with formula 2 because each data of only upgrading on a data disks and the check disk are equivalent to reading among the RAID 5 and rewrite:
P=(D)XOR(D′)XOR(P′) (2)
Wherein, D ' is the legacy data on the new data D writing position in the disk, and P ' is the old checking data on the legacy data D ' place verification bar, also is the old checking data on the new checking data P writing position, and XOR represents XOR.
In order to generate new checking data P, (time was very short when hardware was realized except carrying out XOR, can ignore) outside, must read legacy data D ' and old checking data P ' earlier, for disk, each write operation all comprises to be read legacy data earlier and writes two operations of new data again, in order to reduce the switching times of magnetic head reading and writing data, make full use of the disk bandwidth, described method further comprises: the optimization process method of write operation is specially:
Set up a new data buffer zone, being used for the buffer memory application program writes the new data of array, data pre-head district, one Geju City and Geju City checking data are read the district in advance, be used for respectively depositing and from array, read in advance, legacy data corresponding and old checking data with new data to be written, legacy data is read the district in advance and old checking data reads to distinguish equal and opposite in direction in advance, and much larger than the new data buffer zone, the sub-piece of old checking data in sub-piece of legacy data on the sub-piece writing position of new data and the identical verification bar is read in advance before write operation and is read the district in advance.
Carry out following operation steps then:
The 2.1st step: when the new data buffer zone is not sky, from the new data buffer zone, take out data sub-block D, write the corresponding data disk, be sent to the XOR device simultaneously.
The 2.2nd step: the sub-piece D ' of legacy data corresponding during legacy data is read to distinguish in advance is sent to the XOR device, and the sub-piece D of write data that sends here with the 2.1st step carries out XOR, generates the sub-piece X of variance data;
The 2.3rd step: the sub-piece X of variance data is sent to the XOR device, and the sub-piece P ' of old checking data of correspondence was sent to the XOR device during old checking data was read to distinguish in advance, and XOR generates the sub-piece P of new checking data;
The 2.4th step: the sub-piece of old checking data that upgrades in the corresponding verification bar is the sub-piece P of new checking data.
Owing to can directly read sub-piece of legacy data and the sub-piece of old checking data from reading the district in advance, disk just can be carried out write operation continuously, after having only the sub-piece of legacy data in reading in advance to distinguish to use up, just once read in advance to fill up and read the district in advance, by reading in advance, the read operation of a plurality of dispersions is converted into a continuous read operation, has effectively reduced the switching times of magnetic head, can be provided with according to actual conditions and read the district in advance for enough big.
According to above analysis,, estimate that easily the bandwidth of writing continuously of disk array S-RAID 4 is about half that the monolithic disk is write bandwidth continuously when reading the district in advance enough greatly the time.
Described method further comprises: the disposal route of failed disk being carried out reading and writing data; Be specially:
Break down as disk, can't carry out normal data when read-write, activate that all disks enter ready state among the disk array S-RAID 4 this moment, suspend the duty of using described disk duty control method control disk array S-RAID 4; The situation that is divided into then in following 2 is handled:
Situation 1: if check disk breaks down, the data on the dish that then continues to read and write data, but this moment, described disk array S-RAID 4 was in irredundant data check state; Behind the check disk to be replaced,, use all data blocks of this check block place band to carry out XOR, recover this check block, regenerate the check disk data for each check block on this check disk.
Situation 2: if data disks breaks down, then according to the reading and writing operation is handled respectively:
Read operation: for each data block on the data disks that breaks down, use other data block of this data block place band and check block to carry out XOR, calculate and generate this data block; Behind this data disks to be replaced,, use other data blocks of this data block place band and check block to carry out XOR, regenerate this data of data discs for each data block on this data disks.
Write operation: 2 kinds of processing modes are arranged:
The 1st kind of processing mode: directly write data is write the check block that check disk and record write, behind the fault data dish to be replaced, carried out reduction of data, data reproduction and regeneration checking data.Reduction of data promptly is moved back into the origin-location to the data block that is deposited with in the check disk; Data reproduction promptly recovers non-reduced data block on the new data dish, utilizes the check block of place band and other data block to carry out the XOR realization.The regeneration checking data is the check block that writes that regenerates record, utilizes all data blocks of place band to carry out the XOR realization.
The 2nd kind of processing mode: use standby dish to keep in to write the data on the failed disk, after changing the fault data disk, carry out reduction of data, data reproduction and regeneration checking data.
After changing normal condition over to, enable described disk duty control method once more the duty of disk array S-RAID 4 is controlled.
Beneficial effect
A kind of construction method that is used for the Redundant Array of Independent Disks (RAID) of continuous data storage of the present invention compared with the prior art, has the following advantages:
It has good redundant data protection mechanism, significantly reduces power consumption simultaneously, fully increases the service life.
1, significantly reduces energy resource consumption.When its data layout has good redundant data protection mechanism, be fit to make that most disks are in off working state in the array, and it quit work, the energy-saving effect highly significant by scheduling.
2, has higher continuous bandwidth.Be used to produce the legacy data and the old checking data of new checking data when reading and writing data in advance, that can effectively improve disk array S-RAID 4 writes bandwidth continuously, it is write bandwidth continuously and is about half that the monolithic disk is write bandwidth continuously, continuously the wide continuous tape reading that equals the monolithic disk of tape reading is wide, because modern disk all has higher continuous bandwidth, make it can satisfy most continuous storage systems that bandwidth do not had harsh requirement.
3, fully increase the service life.Because most disks are in off working state in the array, and are scheduled for halted state, the high speed rotating and the frequent of magnetic head that have reduced disk spindle move, and the time interval of disk start and stop is fully long, so can fully prolong its serviceable life.
Description of drawings
Fig. 1 is disk, band, data block, the check block numbering synoptic diagram among the disk array S-RAID 4 in the specific embodiment of the present invention;
Fig. 2 is data sub-block and a syndrome block number synoptic diagram among the disk array S-RAID 4 in the specific embodiment of the present invention;
Fig. 3 is that the disk duty in the specific embodiment of the present invention shifts synoptic diagram;
Fig. 4 is that the write operation in the specific embodiment of the present invention is optimized synoptic diagram.
Embodiment
The present invention is described in detail below in conjunction with the drawings and specific embodiments.
In the present embodiment, disk array S-RAID 4 is made up of 4 Seagate Cheetah 10K disks, and the capacity of monolithic disk is 300GB, and its correlation parameter sees Table 2.Wherein 1 block check dish, 3 blocks of data dishes, as shown in Figure 1.4 disks are carried out band divide, be divided into 4 bands, be numbered 1~4.The piecemeal of each band on data disks is called data block, and the piecemeal of each band on check disk is called check block; The logical address of the adjacent data block in position is adjacent in the same data disks, and the logical address of adjacent data dish also is adjacent, the logical address of first data block that is the logical address of last data block of last data dish and next data disks is adjacent, data block in the 1st band be numbered 11,21,31; Data block in the 2nd band be numbered 12,22,32; Data block in the 3rd band be numbered 13,23,33; Data block in the 4th band be numbered 14,24,34.Check block is drawn by XOR by 3 data blocks in this check block place band, P (11,21,31) expression is carried out the result that XOR obtains to data block 11, data block 21, data block 31, same P (12,22,32) expression is carried out the result that XOR obtains to data block 12, data block 22, data block 32, P (13,23,33) expression is carried out the result that XOR obtains to data block 13, data block 23, data block 33, and P (14,24,34) expression is carried out the result that XOR obtains to data block 14, data block 24, data block 34.
The correlation parameter of table 2Seagate Cheetah 10K disk
Figure BSA00000234585300081
Data sub-block is designed to 4KB, so each data block is divided into K=19,660,800 data sub-block, each data disks or check disk respectively have M=4 * K data sub-block or syndrome piece; The logical address of adjacent data sub-block is adjacent; Each check block also is divided into 19,660,800 syndrome pieces, and the logical address of adjacent syndrome piece also is adjacent.Then according to the logical address order, respectively to the sub-block number of the sub-block sum check of data.Described syndrome piece P (1,1), P (1,2) ..., P (1, K),, P (4,1), P (4,2) ..., P (4, K) draw by XOR, as shown in Figure 2 by 3 identical data sub-block of deviation post in each data block in this syndrome piece place band.
When disk array S-RAID 4 was carried out the write operation of continuous data, check disk 4 was in running status all the time; The state exchange of data disks as shown in Figure 3.
Write operation embodiment 1: when initial,, calculate the disk 2 at its place,, then forward ready state to, see among Fig. 3 1. if disk 2 is in halted state according to the logical address of the place disk array S-RAID 4 that will read and write data; Remaining 2 disk quits work; When disk 2 was in ready state, when beginning to read and write the data on the disk 2, disk 2 transferred running status to by ready state, saw among Fig. 3 2.; By the size of write data, obtain this write operation and finish at disk 2, therefore after this visit finishes, transfer disk 2 to ready state by running status, see among Fig. 3 and 3. to wait for the next arrival of write data continuously.
Write operation embodiment 2: when initial,, calculate the disk 1 at its place,, then forward ready state to, see among Fig. 3 1. if disk 1 is in halted state according to the logical address of the place data block 12 that will read and write data; Remaining 2 disk quits work; When disk 1 was in ready state, when beginning to read and write the data on the disk 1, disk 1 transferred running status to by ready state, saw among Fig. 3 2.; By the size of write data, obtain this write operation and finish at disk 2, t transferred disk 2 to ready state by halted state constantly before visit disk 1 finished, and saw among Fig. 3 1.; Wherein t is that disk transfers the needed time of ready state to by halted state.Enter halted state after disk 1 visit finishes, see among Fig. 3 4..
When reading and writing data piece 23, calculate the disk at adjacent data blocks 24 places, or disk 2, because disk 2 is just accessed, disk 2 is transferred to ready state after executing this subtask, see among Fig. 3 3., after the write operation that is illustrated in data block 23 finishes, continue visit disk 2.If just at visit data piece 24, promptly disk 2 is just accessed, and the data block 31 that will visit is positioned at disk 3, then makes disk 3 enter ready state, sees among Fig. 3 1. to enter halted state after disk 2 visits finish, and sees among Fig. 3 4..
When only disk array S-RAID 4 being carried out read operation, check disk enters into halted state, and other operation steps is identical when disk array S-RAID 4 is carried out write operation.
For normal continuous data storage, in the overwhelming majority time, disk array S-RAID 4 has only two disk work, has only the fraction time, has 3 disks to be in read-write or idling conditions (adding 1 disk that starts in advance), therefore its power consumption is about the 2/N that same disc is counted RAID 5, wherein N is the disk number of forming array, be about during 8 disks about 1/4, be about about 1/8 during 16 disks, along with increasing of formation dish number, energy-saving effect is outstanding more.
If the formation disk of this disk array S-RAID 4 is Seagate Cheetah 10K, its correlation parameter sees Table 2, the capacity of its monolithic disk is 300GB, promptly after the data of visit 300GB * 3=900GB, disk of start and stop, the disk start and stop are long enough at interval, so the disk start and stop can be ignored the infringement of disk life-span.Because the check disk among this disk array S-RAID 4 is worked all the time, has constituted the life-span bottleneck, can select long enterprise-level disk in serviceable life for use, the life-span of establishing check disk is h 2Hour, be h the serviceable life of general data dish 1Hour, because all disks are worked simultaneously among RAID 5 and the RAID 4, also be min (h its serviceable life 1, h 2) hour, and the disk array S-RAID 4 that the present invention proposes is 1 blocks of data dish and 1 block check dish work in the most of the time, its life-span is about min (h 2, h 1* (N-1)) hour, h wherein 1* (N-1) be the data disks array life-span, work as h 2Much larger than h 1The time, the effect of the ILS of the disk array S-RAID 4 that the present invention proposes is very outstanding.Be h the serviceable life of the data disks that the present invention selects for use 1With the serviceable life of check disk be h 2, satisfy following relation: h 2=h 1* (N-1).
For each write operation, the new checking data P that uses formula 2 to generate on the identical verification bar:
P=(D)XOR(D′)XOR(P′) (2)
Wherein, D ' is the legacy data on the new data D writing position in the disk, and P ' is the old checking data on the legacy data D ' place verification bar, also is the old checking data on the new checking data P writing position, and XOR represents XOR.
In order to generate new checking data P, (time was very short when hardware was realized except carrying out XOR, can ignore) outside, must read legacy data D ' and old checking data P ' earlier, for disk, each write operation all comprises to be read legacy data earlier and writes two operations of new data again, in order to reduce the switching times of magnetic head reading and writing data, make full use of the disk bandwidth, described method further comprises: the optimization process method of write operation is specially:
Set up a new data buffer zone, being used for the buffer memory application program writes the new data of array, data pre-head district, one Geju City and Geju City checking data are read the district in advance, be used for respectively depositing and from array, read in advance, legacy data corresponding and old checking data with new data to be written, legacy data is read the district in advance and old checking data reads to distinguish equal and opposite in direction in advance, and much larger than the new data buffer zone, the sub-piece of old checking data in sub-piece of legacy data on the sub-piece writing position of new data and the identical verification bar is read in advance before write operation and is read the district in advance.
Carry out following operation steps then:
The 2.1st step: when the new data buffer zone is not sky, from the new data buffer zone, take out data sub-block D, write the corresponding data disk, be sent to the XOR device simultaneously, as the 1. process among Fig. 4.
The 2.2nd step: the sub-piece D ' of legacy data corresponding during legacy data is read to distinguish in advance is sent to the XOR device, and as the 2. process among Fig. 4, the sub-piece D of write data that sends here with the 2.1st step carries out XOR, generates the sub-piece X of variance data.
The 2.3rd step: the sub-piece X of variance data is sent to the XOR device, and as the 3. process among Fig. 4, the sub-piece P ' of old checking data of correspondence was sent to the XOR device during old checking data was read to distinguish in advance, and as the 4. process among Fig. 4, XOR generates the sub-piece P of new checking data;
The 2.4th step: the sub-piece of old checking data that upgrades in the corresponding verification bar is the sub-piece P of new checking data, as the 5. process among Fig. 4.
Owing to can directly read sub-piece of legacy data and the sub-piece of old checking data from reading the district in advance, disk just can be carried out write operation continuously, after having only the sub-piece of legacy data in reading in advance to distinguish to use up, just once read in advance to fill up and read the district in advance, by reading in advance, the read operation of a plurality of dispersions is converted into a continuous read operation, has effectively reduced the switching times of magnetic head, can be provided with according to actual conditions and read the district in advance for enough big.
According to above analysis,, estimate that easily the bandwidth of writing continuously of disk array S-RAID 4 is about half that the monolithic disk is write bandwidth continuously when reading the district in advance enough greatly the time.
By data buffering with read in advance to handle, can make that disk array S-RAID's 4 write bandwidth is write bandwidth continuously near the monolithic disk half continuously, continuously the wide continuous tape reading that equals the monolithic disk of tape reading is wide, if the formation disk of this disk array S-RAID 4 is Seagate Cheetah 10K, it reads tape wide continuously is 85MB/s, and writing bandwidth continuously is 84MB/s.The bandwidth of writing continuously that can get disk array S-RAID 4 is about 42MB/s, and the wide 85MB/s of being about that reads tape continuously writes when can satisfy the high definition video steaming of 21 road 2MB/s, reads in the time of 42 road such video flowings.
Break down as disk, can't carry out normal data when read-write, activate that all disks enter ready state among the disk array S-RAID 4 this moment, suspend the duty of using described disk duty control method control disk array S-RAID 4; The situation that is divided into then in following 2 is handled:
Situation 1: if check disk breaks down, the data on the dish that then continues to read and write data, but this moment, described disk array S-RAID 4 was in irredundant data check state; Behind the check disk to be replaced,, use all data blocks of this check block place band to carry out XOR, recover this check block, regenerate the check disk data for each check block on this check disk.
Situation 2: if data disks breaks down, then according to the reading and writing operation is handled respectively:
Read operation: for each data block on the data disks that breaks down, use other data block of this data block place band and check block to carry out XOR, calculate and generate this data block; Behind this data disks to be replaced,, use other data blocks of this data block place band and check block to carry out XOR, regenerate this data of data discs for each data block on this data disks.
Write operation: 2 kinds of processing modes are arranged:
The 1st kind of processing mode: directly write data is write the check block that check disk and record write, behind the fault data dish to be replaced, carried out reduction of data, data reproduction and regeneration checking data.Reduction of data promptly is moved back into the origin-location to the data block that is deposited with in the check disk; Data reproduction promptly recovers non-reduced data block on the new data dish, utilizes the check block of place band and other data block to carry out the XOR realization.The regeneration checking data is the check block that writes that regenerates record, utilizes all data blocks of place band to carry out the XOR realization.
The 2nd kind of processing mode: use standby dish to keep in to write the data on the failed disk, after changing the fault data disk, carry out reduction of data, data reproduction and regeneration checking data.
After changing normal condition over to, enable described disk duty control method once more the duty of disk array S-RAID 4 is controlled.
The above only is a preferred implementation of the present invention; should be understood that; for those skilled in the art; under the prerequisite that does not break away from the principle of the invention; can also make some improvement; perhaps part technical characterictic wherein is equal to replacement, these improvement and replace and also should be considered as protection scope of the present invention.

Claims (6)

1. a vertical arrangement is concentrated the construction method of the energy-saving disk array of verification, and vertical arrangement is concentrated the energy-saving disk array of verification, abbreviates disk array S-RAID 4 as, it is characterized in that:
Be applied in the scene that disk array comprises N disk, wherein N 〉=3 and N are positive integer; In N the disk, 1 disk is a check disk, and all the other N-1 disk is a data disks; N disk is configured to the disk matrix of 1 row * N row; Whole disks are carried out band divide, be divided into m band and be each band numbering, wherein m 〉=1 and m are positive integer; The piecemeal of each band on data disks is called data block, and the piecemeal of each band on check disk is called check block; Described check block is drawn by XOR by the data block of the N-1 in the same strap;
The logical address of the data block that physical location is adjacent in the same data disks is adjacent, and the logical address of adjacent data dish also is adjacent, and promptly the logical address of the first data block of the logical address of last data block of last data dish and next data disks is adjacent;
Each data block is divided into K data sub-block, and K 〉=2 and K are positive integer, and the logical address of the data sub-block that physical location is adjacent on the same disk is adjacent; Each check block also is divided into K syndrome piece, and the logical address of the syndrome piece that physical location is adjacent also is adjacent; Then according to the logical address order, respectively to the sub-block number of the sub-block sum check of data; Described syndrome piece is drawn by XOR by N-1 identical data sub-block of deviation post in each data block in its place band, and the syndrome piece that this N-1 data sub-block and XOR thereof generate constitutes a verification bar;
Described method further comprises: the method for control disk duty;
The disk duty comprises: stop, operation, ready three kinds of states;
During the disk halted state, disk spindle stops the rotation, magnetic head stops tracking;
During the disk running status, disk is being carried out read-write operation, comprises rotating shaft rotation, magnetic head tracking;
During the disk ready state, expression will be read and write, and disk spindle rotation this moment, magnetic head be tracking not;
When carrying out continuous write operation, in the overwhelming majority times, have only 1 data disks and 1 check disk to be in running status, can dispatch the data disks that temporarily not have to work and enter into halted state, to reach purpose of energy saving;
When carrying out continuous write operation, check disk is in running status, and the duty conversion scheduling strategy of data disks is specially:
The 1st step:, calculate the data disks (sequence number of the data disks at this data place is represented with r) at this data place according to the logical address (representing) of visit data with LBA; If data disks r is in halted state, then dispatches this dish and forward ready state to by halted state; The data disks r at this visit data place obtains by formula 1:
Figure FSA00000234585200021
In the formula 1,
Figure FSA00000234585200022
Add 1 after rounding downwards, the sequence number r of expression data disks is since 1 counting;
The 2nd step: when data disks r is in ready state, begin to visit this data disks, and make it transfer running status to by ready state;
The 3rd step: if this visit finishes in data disks r, then after this visit finished, r transferred ready state to by running status with data disks, withdraws from this scheduling then; Otherwise, after data disks r visit finishes, transfer it to halted state by running status; T constantly before visit data dish r finishes, judge whether adjacent data dish (sequence number of adjacent data dish is represented with r ') is in halted state, be then to transfer data disks r ' to ready state by halted state, wherein t is that disk transfers the needed time of ready state to by halted state;
The 4th step: data disks r ' note is made data disks r, repeat the 2nd then and go on foot the 4th operation that goes on foot;
When carrying out continuous read operation, its disk duty conversion scheduling strategy is identical with continuous write operation.
2. a kind of vertical arrangement as claimed in claim 1 is concentrated the construction method of the energy-saving disk array of verification, it is characterized in that:
Preferably, if be h the serviceable life of data disks 1, be h the serviceable life of check disk 2, h then 1And h 2Satisfy following relation: h 2=h 1* (N-1)/p.
3. a kind of vertical arrangement as claimed in claim 1 is concentrated the construction method of the energy-saving disk array of verification, it is characterized in that:
Preferably, when disk array S-RAID 4 was only carried out continuous read operation, the verification disk quit work.
4. a kind of vertical arrangement as claimed in claim 1 is concentrated the construction method of the energy-saving disk array of verification, it is characterized in that:
Described method further comprises: the optimization process method of write operation is specially:
Set up a new data buffer zone, being used for the buffer memory application program writes the new data of array, data pre-head district, one Geju City and Geju City checking data are read the district in advance, be used for respectively depositing and from array, read in advance, legacy data corresponding and old checking data with new data to be written, legacy data is read the district in advance and old checking data reads to distinguish equal and opposite in direction in advance, and much larger than the new data buffer zone, the sub-piece of old checking data in sub-piece of legacy data on the sub-piece writing position of new data and the identical verification bar is read in advance before write operation and is read the district in advance;
Carry out following operation steps then:
The 2.1st step: when the new data buffer zone is not sky, from the new data buffer zone, take out data sub-block D, write the corresponding data disk, be sent to the XOR device simultaneously;
The 2.2nd step: the sub-piece D ' of legacy data corresponding during legacy data is read to distinguish in advance is sent to the XOR device, and the sub-piece D of write data that sends here with the 2.1st step carries out XOR, generates the sub-piece X of variance data;
The 2.3rd step: the sub-piece X of variance data is sent to the XOR device, and the sub-piece P ' of old checking data of correspondence was sent to the XOR device during old checking data was read to distinguish in advance, and XOR generates the sub-piece P of new checking data;
The 2.4th step: the sub-piece of old checking data that upgrades in the corresponding verification bar is the sub-piece P of new checking data;
Owing to can directly read sub-piece of legacy data and the sub-piece of old checking data from reading the district in advance, disk just can be carried out write operation continuously, after having only the sub-piece of legacy data in reading in advance to distinguish to use up, just once read in advance to fill up and read the district in advance, by reading in advance, the read operation of a plurality of dispersions is converted into a continuous read operation, has effectively reduced the switching times of magnetic head, can be provided with according to actual conditions and read the district in advance for enough big.
5. a kind of vertical arrangement as claimed in claim 4 is concentrated the construction method of the energy-saving disk array of verification, it is characterized in that:
Preferably, legacy data is read the district in advance and old checking data reads to distinguish equal and opposite in direction in advance, and much larger than the new data buffer zone.
6. concentrate the construction method of the energy-saving disk array of verification as the described a kind of vertical arrangement of claim 1 to 3, it is characterized in that:
Described method further comprises: the disposal route of failed disk being carried out reading and writing data; Be specially:
Break down as disk, can't carry out normal data when read-write, activate that all disks enter ready state among the disk array S-RAID 4 this moment, suspend the duty of using described disk duty control method control disk array S-RAID 4; The situation that is divided into then in following 2 is handled:
Situation 1: if check disk breaks down, the data on the dish that then continues to read and write data, but this moment, described disk array S-RAID 4 was in irredundant data check state; Behind the check disk to be replaced,, use all data blocks of this check block place band to carry out XOR, recover this check block, regenerate the check disk data for each check block on this check disk;
Situation 2: if data disks breaks down, then according to the reading and writing operation is handled respectively:
Read operation: for each data block on the data disks that breaks down, use other data block of this data block place band and check block to carry out XOR, calculate and generate this data block; Behind this data disks to be replaced,, use other data blocks of this data block place band and check block to carry out XOR, regenerate this data of data discs for each data block on this data disks;
Write operation: 2 kinds of processing modes are arranged:
The 1st kind of processing mode: directly write data is write the check block that check disk and record write, behind the fault data dish to be replaced, carried out reduction of data, data reproduction and regeneration checking data; Reduction of data promptly is moved back into the origin-location to the data block that is deposited with in the check disk; Data reproduction promptly recovers non-reduced data block on the new data dish, utilizes the check block of place band and other data block to carry out the XOR realization; The regeneration checking data is the check block that writes that regenerates record, utilizes all data blocks of place band to carry out the XOR realization;
The 2nd kind of processing mode: use standby dish to keep in to write the data on the failed disk, after changing the fault data disk, carry out reduction of data, data reproduction and regeneration checking data;
After changing normal condition over to, enable described disk duty control method once more the duty of disk array S-RAID 4 is controlled.
CN2010102568995A 2010-08-19 2010-08-19 Method for constructing vertically-arranged and centrally-inspected energy-saving disk arrays Active CN101976178B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2010102568995A CN101976178B (en) 2010-08-19 2010-08-19 Method for constructing vertically-arranged and centrally-inspected energy-saving disk arrays

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2010102568995A CN101976178B (en) 2010-08-19 2010-08-19 Method for constructing vertically-arranged and centrally-inspected energy-saving disk arrays

Publications (2)

Publication Number Publication Date
CN101976178A true CN101976178A (en) 2011-02-16
CN101976178B CN101976178B (en) 2012-09-05

Family

ID=43576066

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2010102568995A Active CN101976178B (en) 2010-08-19 2010-08-19 Method for constructing vertically-arranged and centrally-inspected energy-saving disk arrays

Country Status (1)

Country Link
CN (1) CN101976178B (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102789370A (en) * 2012-06-29 2012-11-21 浙江宇视科技有限公司 Method and device for synchronizing redundant array of independent disks (RAID)
WO2013166917A1 (en) * 2012-05-09 2013-11-14 腾讯科技(深圳)有限公司 Bad disk block self-detection method, device and computer storage medium
CN103718151A (en) * 2013-08-09 2014-04-09 华为技术有限公司 Document processing method and storage device
CN103970487A (en) * 2014-05-23 2014-08-06 浙江宇视科技有限公司 Redundant array of independent disks and implementation method
CN105930097A (en) * 2015-05-20 2016-09-07 德州学院 Distributed verification redundant array of independent disks capable of eliminating local parallel read-modify-write operation
CN105930099A (en) * 2015-05-20 2016-09-07 德州学院 Double-disc fault tolerant redundant array of independent disks capable of eliminating local parallel read-modify-write operation
CN105930098A (en) * 2015-05-20 2016-09-07 德州学院 Concentrated verification redundant array of independent disks capable of eliminating local parallel read-modify-write operation
CN107391046A (en) * 2017-08-02 2017-11-24 郑州云海信息技术有限公司 The method for writing data and device of a kind of RAID system
CN111857552A (en) * 2019-04-30 2020-10-30 伊姆西Ip控股有限责任公司 Storage management method, electronic device and computer program product
CN114510379A (en) * 2022-04-21 2022-05-17 山东百盟信息技术有限公司 Distributed array video data storage device

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104461383B (en) * 2014-11-28 2017-10-10 华为技术有限公司 The implementation method and device of a kind of RAID

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101751234A (en) * 2010-01-21 2010-06-23 浪潮(北京)电子信息产业有限公司 Method and system for distributing disk array data
CN101764846A (en) * 2009-12-18 2010-06-30 西南交通大学 Remote centralized disk array operation monitoring system and implement method thereof
CN101770346A (en) * 2010-01-04 2010-07-07 浪潮电子信息产业股份有限公司 Method for dispatching multi-controller disk array coordination

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101764846A (en) * 2009-12-18 2010-06-30 西南交通大学 Remote centralized disk array operation monitoring system and implement method thereof
CN101770346A (en) * 2010-01-04 2010-07-07 浪潮电子信息产业股份有限公司 Method for dispatching multi-controller disk array coordination
CN101751234A (en) * 2010-01-21 2010-06-23 浪潮(北京)电子信息产业有限公司 Method and system for distributing disk array data

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013166917A1 (en) * 2012-05-09 2013-11-14 腾讯科技(深圳)有限公司 Bad disk block self-detection method, device and computer storage medium
CN102789370A (en) * 2012-06-29 2012-11-21 浙江宇视科技有限公司 Method and device for synchronizing redundant array of independent disks (RAID)
CN102789370B (en) * 2012-06-29 2015-11-25 浙江宇视科技有限公司 A kind of RAID array synchronous method and device
CN103718151A (en) * 2013-08-09 2014-04-09 华为技术有限公司 Document processing method and storage device
WO2015018061A1 (en) * 2013-08-09 2015-02-12 华为技术有限公司 File processing method and storage device
US9081508B2 (en) 2013-08-09 2015-07-14 Huawei Technologies Co., Ltd. File processing method and storage device
US9223509B2 (en) 2013-08-09 2015-12-29 Huawei Technologies Co., Ltd. File processing method and storage device
CN103970487A (en) * 2014-05-23 2014-08-06 浙江宇视科技有限公司 Redundant array of independent disks and implementation method
CN105930097A (en) * 2015-05-20 2016-09-07 德州学院 Distributed verification redundant array of independent disks capable of eliminating local parallel read-modify-write operation
CN105930099A (en) * 2015-05-20 2016-09-07 德州学院 Double-disc fault tolerant redundant array of independent disks capable of eliminating local parallel read-modify-write operation
CN105930098A (en) * 2015-05-20 2016-09-07 德州学院 Concentrated verification redundant array of independent disks capable of eliminating local parallel read-modify-write operation
CN105930099B (en) * 2015-05-20 2019-01-29 德州学院 The fault-tolerant disk array of double plate of small write operation in a kind of elimination local parallel
CN105930098B (en) * 2015-05-20 2019-01-29 德州学院 The concentration of small write operation verifies formula disk array in a kind of elimination local parallel
CN107391046A (en) * 2017-08-02 2017-11-24 郑州云海信息技术有限公司 The method for writing data and device of a kind of RAID system
CN111857552A (en) * 2019-04-30 2020-10-30 伊姆西Ip控股有限责任公司 Storage management method, electronic device and computer program product
CN114510379A (en) * 2022-04-21 2022-05-17 山东百盟信息技术有限公司 Distributed array video data storage device
CN114510379B (en) * 2022-04-21 2022-11-01 山东百盟信息技术有限公司 Distributed array video data storage device

Also Published As

Publication number Publication date
CN101976178B (en) 2012-09-05

Similar Documents

Publication Publication Date Title
CN101976175B (en) Method for constructing disk array by horizontal grouping parallel concentrated verification
CN101976178B (en) Method for constructing vertically-arranged and centrally-inspected energy-saving disk arrays
CN101976174B (en) Method for constructing energy-saving disk array of vertical configuration distribution check
CN101976177B (en) Method for constructing vertical grouped disk array capable of being subject to parallel centralized check
CN102033716B (en) Method for constructing energy-saving type disc array with double discs for fault tolerance
CN102023819B (en) Method for constructing double-disk fault tolerance horizontal grouping and parallel access disk array
CN101976176B (en) Method for constructing disk array of horizontal grouping parallel distribution check
CN102023820B (en) Method for constructing double-disk fault-tolerant vertical packet parallel access disk array
US7640443B2 (en) Computer apparatus, storage apparatus, system management apparatus, and hard disk unit power supply controlling method
CN103049222B (en) A kind of RAID5 writes IO optimized treatment method
CN105138292A (en) Disk data reading method
CN101567211A (en) Method for improving usability of disk and disk array controller
CN102122235A (en) RAID4 (redundant array of independent disks) system and data reading and writing method thereof
CN101625586A (en) Method, equipment and computer for managing energy conservation of storage device
CN101976179B (en) Construction method of vertical-type grouping parallel-distributed and checked disk array
US20100115310A1 (en) Disk array apparatus
CN100470507C (en) Method for rewriting in magnetic disc array structure
CN104063184B (en) Magnetic disc storage method and its non-volatile storage medium and electronic installation
CN101533336B (en) Redundant array of independent disks memory system and method thereof
CN101719050A (en) Data storage method and equipment
JP4698710B2 (en) Storage system and power saving method thereof
CN104636078A (en) Method and system for efficient thresholding of nonvolatile storage (NVS) for a plurality of types of storage rank groups
CN102541472A (en) Method and device for reconstructing RAID (Redundant Array of Independent Disks)
JP2000293315A (en) Disk array device and rotation method for standby disk
CN100492306C (en) Data recovery method for independent magnetic disc redundant array damaging and its system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant