CN106990918A - Trigger the method and device that RAID array is rebuild - Google Patents
Trigger the method and device that RAID array is rebuild Download PDFInfo
- Publication number
- CN106990918A CN106990918A CN201710125115.7A CN201710125115A CN106990918A CN 106990918 A CN106990918 A CN 106990918A CN 201710125115 A CN201710125115 A CN 201710125115A CN 106990918 A CN106990918 A CN 106990918A
- Authority
- CN
- China
- Prior art keywords
- disk
- response time
- raid array
- average response
- exception
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/061—Improving I/O performance
- G06F3/0611—Improving I/O performance in relation to response time
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0629—Configuration or reconfiguration of storage systems
- G06F3/0632—Configuration or reconfiguration of storage systems by initialisation or re-initialisation of storage systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/0671—In-line storage system
- G06F3/0683—Plurality of storage devices
- G06F3/0689—Disk arrays, e.g. RAID, JBOD
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Debugging And Monitoring (AREA)
Abstract
The application provides a kind of method and device for triggering RAID array reconstruction, and methods described is applied to the RAID sub-system of storage device, and methods described includes:The IO read write commands received are issued to each member's disk in the RAID array;Count the average response time of each member's disk IO read write commands in default measurement period in the RAID array;Search member's disk that average response time in the non-HotSpare disk and non-faulting member's disk of the RAID array reaches exception response time threshold;The average response time found being reached to, the maximum top n member disk label of average response time is failure member's disk in member's disk of exception response time threshold, and notifies the RAID array to rebuild;Wherein, N is not more than the RAID array and supported while the member's disk number rebuild.The response time of the IO read write commands based on member's disk can be realized using this method to trigger the reconstruction to the RAID array belonging to member's disk.
Description
Technical field
The application is related to computer communication field, more particularly to the method and device that triggering RAID array is rebuild.
Background technology
RAID array (Redundant Array of IndependentDisks, RAID) is a kind of
The disk (physical disk) of polylith independence is combined in different ways and forms a disk group (logic magnetic disc), so as to carry
For the technology of the storage performance and data reliability higher than single disk.
In computer communication field, it will usually redundancy protecting is carried out to data in disk using RAID array technology, when having
When data write, data are split in multiple member's disks according to RAID array algorithm.It is different according to RAID array rank, can
1 piece of tolerance or polylith disk failure are offline, when detecting disk I O error or offline disk, can be used special hot standby
Disk or global HotSpare disk are rebuild, and recover RAID array data redundancy.
However, in the existing method rebuild of triggering RAID array, only account for disk I O error and disk from
The situation of line, do not account for the response time after disk aging it is slack-off cause the situation of service disconnection, therefore how in disk response
RAID array is triggered in the case of slow and is rebuild turns into urgent problem to be solved.
The content of the invention
In view of this, the application provides a kind of method and device for triggering RAID array reconstruction, and member's magnetic is based on to realize
The response time of the IO read write commands of disk triggers the reconstruction to the RAID array belonging to member's disk.
Specifically, the application is achieved by the following technical solution:
The method that RAID array is rebuild is triggered there is provided a kind of according to the first aspect of the application, methods described is applied to deposit
Store up the RAID sub-system of equipment;At least one pre-configured RAID array of the storage device, the RAID array includes several
Member's disk;Methods described includes:
The IO read write commands received are issued to each member's disk in the RAID array;
Based on the response time of the IO read write commands in default measurement period of each member's disk in the RAID array, statistics
The average response time of each member's disk;Search average in the non-HotSpare disk and non-faulting member's disk of the RAID array
Response time reaches member's disk of exception response time threshold;
The average response time found being reached to, average response time is most in member's disk of exception response time threshold
Big top n member disk label is failure member's disk, and notifies the RAID array to rebuild;Wherein, N is not more than described
RAID array supports the member's disk number rebuild simultaneously.
The device that RAID array is rebuild is triggered there is provided a kind of according to the second aspect of the application, described device is applied to deposit
Store up the RAID sub-system of equipment;At least one pre-configured RAID array of the storage device, the RAID array includes several
Member's disk;Described device includes:
Issuance unit, each member's disk in the RAID array is issued to by the IO read write commands received;
Statistic unit, for presetting IO read write commands in measurement period based on each member's disk in the RAID array
Response time, count the average response time of each member's disk;
Average response time in searching unit, the non-HotSpare disk and non-faulting member's disk for searching the RAID array
Reach member's disk of exception response time threshold;
Put down in indexing unit, member's disk for the average response time found to be reached to exception response time threshold
The top n member disk label of equal response time maximum is failure member's disk, and notifies the RAID array to rebuild;Wherein, N
No more than described RAID array supports the member's disk number rebuild simultaneously.
The application proposes a kind of method for triggering RAID array reconstruction, and RAID sub-system can refer to the IO received read-writes
Order is issued to each member's disk in the RAID array.And can be based on each member's disk in the RAID array default
The response time of the IO read write commands returned in measurement period, count the average response time of each member's disk.RAID
System can be in the non-HotSpare disk and non-faulting member's disk of the RAID array, and lookup average response time reaches abnormal loud
Answer member's disk of time threshold, it is possible to which the average response time found is reached to member's magnetic of exception response time threshold
The maximum top n member disk label of average response time is failure member's disk in disk, and is notified belonging to N number of member's disk
RAID array rebuild.
Because RAID sub-system can be while RAID array data flow not be influenceed, the average sound based on each member's disk
Between seasonable, average response time is reached to the maximum top n of average response time in member's disk of exception response time threshold
Member's disk label is failure member's disk, is rebuild with triggering the RAID array belonging to N number of member's disk, so as to realize
The response time of IO read write commands based on member's disk triggers the reconstruction to the RAID array belonging to member's disk.
Brief description of the drawings
Fig. 1 is the flow chart for the method that a kind of triggering RAID array shown in the exemplary embodiment of the application one is rebuild;
Fig. 2 is the hard of equipment where the device that a kind of triggering RAID array shown in the exemplary embodiment of the application one is rebuild
Part structure chart;
Fig. 3 is the block diagram for the device that a kind of triggering RAID array shown in the exemplary embodiment of the application one is rebuild.
Embodiment
Here exemplary embodiment will be illustrated in detail, its example is illustrated in the accompanying drawings.Following description is related to
During accompanying drawing, unless otherwise indicated, the same numbers in different accompanying drawings represent same or analogous key element.Following exemplary embodiment
Described in embodiment do not represent all embodiments consistent with the application.On the contrary, they be only with it is such as appended
The example of the consistent apparatus and method of some aspects be described in detail in claims, the application.
It is the purpose only merely for description specific embodiment in term used in this application, and is not intended to be limiting the application.
" one kind ", " described " and "the" of singulative used in the application and appended claims are also intended to including majority
Form, unless context clearly shows that other implications.It is also understood that term "and/or" used herein refers to and wrapped
It may be combined containing one or more associated any or all of project listed.
It will be appreciated that though various information, but this may be described using term first, second, third, etc. in the application
A little information should not necessarily be limited by these terms.These terms are only used for same type of information being distinguished from each other out.For example, not departing from
In the case of the application scope, the first information can also be referred to as the second information, similarly, and the second information can also be referred to as
One information.Depending on linguistic context, word as used in this " if " can be construed to " ... when " or " when ...
When " or " in response to determining ".
RAID array is that a kind of disk (physical disk) polylith independence is combined in different ways and forms one
Disk group (logic magnetic disc), so as to provide the technology of the storage performance higher than single disk and data reliability.
In computer communication field, it will usually redundancy protecting is carried out to data in disk using RAID array technology, when having
When data write, data are split in multiple member's disks according to RAID array algorithm.It is different according to RAID array rank, can
1 piece of tolerance or polylith disk failure are offline, when detecting disk I O error or offline disk, can be used special hot standby
Disk or global HotSpare disk are rebuild, and recover RAID array data redundancy.
In the method that related RAID array triggering is rebuild, when the IO that RAID sub-system receives the return of member's disk is read
Write error, and when judging that the mistake can not be recovered, member's disk failure can be marked, and trigger belonging to member's disk
RAID array is rebuild.In addition, when RAID sub-system receives the offline notification message of member's disk, this can also be triggered offline
Member's disk belonging to RAID array rebuild.
When rebuilding, HotSpare disk can be used to rebuild faulty disk or offline disk, RAID sub-system can be according to RAID gusts
Row algorithm calculates the data of correspondence band in HotSpare disk, recovers the redundancy of the RAID array belonging to the failed disk.
Because disk is the device that both mechanically and electrically combines, influenceed, actually should by factors such as device aging, environment
Disk I/O is likely to occur in and does not return wrong but slack-off response time phenomenon, upper layer application will be caused to read and write the disk corresponding
IO, which is returned, during RAID array, on response time slack-off disk is slower than other disks, and the performance of upper layer application occurs fluctuating or IO
Time-out.It is embodied in, is held the LUN created in RAID array (LUN) is distributed into front end application server
Resume studies when writing, in fact it could happen that LUN performance has the situation of the overtime service disconnections of very big fluctuation even IO, but developer exists
When being investigated to the phenomenons of the LUN performance great fluctuation processes, it is found that the RAID array state is normal, Disk State is also normal, member
Disk does not return to IO read-write errors yet.Further investigate, although the interface of member's disk of the RAID array is identical, rotating speed phase
Together, but on few members' disk the response time of the IO read-write responses returned is considerably longer than in the RAID array other members
Disk.Member's disk of IO read-write response time length is being pulled out away, the member's magnetic for using HotSpare disk to replace IO to read and write response time length
After disk, the RAID array performance and LUN performance recoveries are normal.
In summary, because the IO read-writes response time length of member's disk can have a strong impact on the RAID belonging to member's disk
The LUN created in array and the RAID array performance, occurs occurring IO time-out under performance inconsistency, extreme case that industry may be caused
Business is interrupted.However, in the method that existing triggering RAID array is rebuild, only accounting for disk I O error and disk being offline
Situation, not accounting for the response time after disk aging slack-off causes the situation of service disconnection.
The application proposes a kind of method for triggering RAID array reconstruction, and RAID sub-system can refer to the IO received read-writes
Order is issued to each member's disk in the RAID array.And can be based on each member's disk in the RAID array default
The response time of the IO read write commands returned in measurement period, count the average response time of each member's disk.RAID
System can be in the non-HotSpare disk and non-faulting member's disk of the RAID array, and lookup average response time reaches abnormal loud
Answer member's disk of time threshold, it is possible to which the average response time found is reached to member's magnetic of exception response time threshold
The maximum top n member disk label of average response time is failure member's disk in disk, and is notified belonging to N number of member's disk
RAID array rebuild.
Because RAID sub-system can be while RAID array data flow not be influenceed, the average sound based on each member's disk
Between seasonable, average response time is reached to the maximum top n of average response time in member's disk of exception response time threshold
Member's disk label is failure member's disk, is rebuild with triggering the RAID array belonging to N number of member's disk, so as to realize
The response time of IO read write commands based on member's disk triggers the reconstruction to the RAID array belonging to member's disk.
Referring to Fig. 1, Fig. 1 is the stream for the method that a kind of RAID array triggering shown in the exemplary embodiment of the application one is rebuild
Cheng Tu.Methods described is applied to the RAID sub-system of storage device, and the storage device further comprises disk subsystem.The storage is set
Standby to further comprises several RAID arrays, each RAID array can also include several member's disks, and this method is specifically included
Step as follows:
Step 101:The IO read write commands received are issued to each member's disk in the RAID array;
Step 102:Response based on the IO read write commands in default measurement period of each member's disk in the RAID array
Time, count the average response time of each member's disk;
Step 103:Average response time in the non-HotSpare disk and non-faulting member's disk of the RAID array is searched to reach
Member's disk of exception response time threshold;
Step 104:The average response time found is reached in member's disk of exception response time threshold and averagely rung
Maximum top n member disk label is failure member's disk between seasonable, and notifies the RAID array to rebuild;Wherein, N is little
The member's disk number for supporting to rebuild simultaneously in the RAID array.
Wherein, above-mentioned RAID sub-system, for managing each RAID array in storage device.For example, RAID sub-system
Function can include carrying out the fractionation based on RAID array algorithm to the multiple IO read write commands received, and IO after fractionation is read and write
Instruction is handed down to disk subsystem.The function of RAID sub-system can also include, when RAID array is rebuild, based on the RAID gusts
The algorithm of row, calculates the data of correspondence band in HotSpare disk, recovers the functions such as RAID array data redundancy.Certainly, RAID
Subsystem also has a variety of functions.Herein, the function not to RAID sub-system is specifically limited.
Above-mentioned disk subsystem, is mainly used in managing all physical disks in storage device.Disk subsystem is suitable
In " lower floor " system, for servicing " upper strata " system such as RAID sub-system.For example, disk subsystem can be in storage device
Physical disk is scanned, and the status information of each physical disk for notifying to scan to RAID sub-system etc..Certainly, in reality
In, disk subsystem also has other functions, herein, and the function progress to disk subsystem is not specifically defined.
Above-mentioned RAID array, refer to by multiple physical disks in storage device by RAID array algorithm combination into disk
Group.Physical disk in the RAID array can also be referred to as member's disk.RAID array according to its rank difference and
The difference of implementation, is supported while the member's disk number rebuild also is differed.For example, what tradition RAID5 can be supported
The member's disk number rebuild simultaneously is one, in the realization of some producers, can be supported while rebuilding multiple.
The response time of above-mentioned member's disk, refer to that RAID sub-system issues IO read write commands to the member to member's disk
Disk returns to the time required for the response of the IO read write commands.
The average response time of above-mentioned member's disk, refers in default measurement period, and member's disk is accumulative
The number of the response time of the IO read write commands of completion divided by the completed IO read write commands of member's disk is member's disk
Average response time.
Above-mentioned exception response time threshold, for judging whether the response time of member's disk is abnormal, when member's disk
When average response time is more than or equal to the exception response time threshold, represent that member's disk is abnormal.
Certainly, developer is when setting the exception response time threshold, if the exception response time threshold set
It is excessive, then possibly can not be accurately detected IO response times long member's disk so that RAID array and create thereon
LUN performance can not recover well.If the exception response time threshold is configured into too small, it may detect that a large amount of
Property abnormality member's disk, accidentally injure the normal member's disk of average response time, so as to cause the frequent weight of RAID array
Build, influence the performance of storage device.So in actual applications, developer can according to actual conditions to the exception response when
Between threshold value set.For example, the exception response time threshold can be set to the average sound in RAID array by developer
The average response time of minimum member's disk is multiplied by the product of exception response time weight value between seasonable.Certainly, developer
Numerical value of the exception response time threshold etc. can also directly be set.Herein, setting simply to exception response time threshold
Exemplary explanation is carried out, it is not limited specifically.
In the embodiment of the present application, RAID sub-system be no longer based only on member's disk return IO read-write errors or into
Member's disk carries out abnormal member's disk detection offline, when RAID sub-system is also based on the average response of the disk of each member
Between, before the average response time found being reached to, average response time is maximum in member's disk of exception response time threshold
N number of member's disk label is failure member's disk, and triggers the reconstruction of the RAID array belonging to member's disk, it is achieved thereby that base
In member's disk IO read write commands response time to triggering the reconstruction of the RAID array belonging to member's disk.
Below by taking a RAID array in storage device as an example, the side rebuild is triggered to the RAID array that the application is proposed
Method is described in detail.
When realizing, in default measurement period, RAID sub-system can receive IO read write commands, it is possible to based on this
The corresponding RAID array algorithm of RAID array, the IO read write commands received is split, and the instruction after fractionation is passed through
Disk subsystem is issued to each member's disk.
Member's disk can return to IO to RAID sub-system and read after completing to correspond to its allocated IO read write commands arrived
The response of write command.
After the response for the IO read write commands that the return of member's disk is received in RAID sub-system, member's disk can be calculated
The response time of this IO read write command, it is possible to response time and number to the completed IO read write commands of member's disk
Added up respectively.
RAID sub-system can count the completed IO read-writes of each member's disk of the RAID array based on the above method
The response time of the number of instruction and accumulative completed IO read write commands.
It should be noted that the response time of the IO read write commands of statistics member's disk can be by setting IO read write commands
The method of response time timer counted, the other method that can also be commonly used based on ability be counted, herein, no
Computational methods to the response time for the IO read write commands for counting member's disk are especially limited.
At the end of the default measurement period, RAID sub-system can calculate being averaged for each member's disk of the RAID array
Response time.
When realizing, RAID sub-system can obtain corresponding with each member's disk of RAID array cumulative completed
The response time of IO read write commands and number, the response for the completed IO read write commands that then each member's disk adds up respectively
The number of time divided by cumulative completed IO read write commands is divided by, and the average response time of each member's disk is obtained respectively.
Wherein, if the number of the corresponding cumulative completed IO read write commands of some member's disk is zero, member's disk it is flat
The equal response time is by zero processing.
After calculating obtains the average response time of each member's disk of above-mentioned RAID array, RAID sub-system can be by this
The response time of the above-mentioned completed IO read write commands of each member's disk added up in measurement period and completed IO read-writes
The number of instruction is emptied, to cause in next measurement period, and the completed IO of each member's disk of RAID array is read and write
The response time of instruction and the number of completed IO read write commands the two parameters are counted.
In the embodiment of the present application, after calculating and obtaining the average response time of each member's disk, RAID sub-system can be with
In the RAID array, member's magnetic that average response time reaches and (be more than or equal to) above-mentioned exception response time threshold is searched
Disk.
When realizing, RAID sub-system can be searched most in the non-HotSpare disk and non-faulting member's disk of the RAID array
Average response time that is small and being not zero, and calculate average response time that is minimum and being not zero and default exception response time
The product of weighted value, is used as above-mentioned exception response time threshold.
Then, RAID sub-system can search average ring in the non-HotSpare disk and non-faulting member's disk of the RAID array
Member's disk of above-mentioned exception response time threshold is reached and (is more than or equal to) between seasonable.
Wherein, above-mentioned exception response time weight value, for judging whether disk average response time abnormal, generally by with
Family carries out sets itself according to actual conditions, exists generally in the form of percentage, such as 200%, herein, not to the exception
Response time weighted value is especially limited.
RAID sub-system finds above-mentioned average response in the non-HotSpare disk and non-faulting member's disk of the RAID array
Time reached and (is more than or equal to) after member's disk of above-mentioned exception response time threshold, can be by the average response found
Time reaches that the maximum top n member disk label of average response time is failure in member's disk of exception response time threshold
Member's disk, wherein, N no more than (is less than or equal to) member's disk number that the RAID array is supported to rebuild simultaneously, and N is
Integer more than 0.
It should be noted that the difference of the RAID array grade in the storage device of each storage device manufacturer offer
And the difference of implementation, developer can set to above-mentioned N values.For example, in tradition RAID, when RAID array is
During RAID5, member's disk number that RAID5 supports are rebuild simultaneously is one, and now N can be 1, and RAID sub-system will can be put down
Member's disk label of equal response time maximum is failure member's disk.When RAID array is RAID6, RAID6 is supported
The member's disk number rebuild simultaneously is two, and now N can be 2 or 1, and RAID sub-system can be by average response time most
Big preceding two pieces of member's disks or member's disk label of maximum are failure member's disk.The value simply to N is entered herein
The exemplary explanation of row, is not limited especially it.
In addition, in order to improve the accuracy that RAID sub-system detects failure member's disk, it is to avoid by provisional exception response
Member's disk label be failure member's disk, RAID sub-system can not mark immediately member's disk be failure member's magnetic
Disk, but record judged result of the member's disk within several continuous cycles, if member's disk it is continuous several
In cycle when being look for the average response time and reached average response in member's disk of exception response time threshold
Between maximum top n member's disk, then be failure member's disk by member's disk label, and trigger belonging to member's disk
RAID array is rebuild.
In a kind of optional implementation, for the accuracy and practicality of increase detection failure member's disk, above-mentioned company
Continue several measurement periods, can be some measurement periods of " relatively continuous ".
In mark, above-mentioned RAID sub-system can record the above-mentioned average response time found respectively and reach exception response
The durations number of the maximum top n member's disk of average response time in member's disk of time threshold;If at several
After measurement period, the durations number of any member disk in the top n disk reaches default durations threshold value, then will
Member's disk label is failure member's disk.
In record, RAID sub-system can be reached for the average response time that finds exception response time threshold into
Each member's disk in member's disk in the maximum top n member's disk of average response time, terminates in next measurement period
When, if member's disk is look for the average response time and reached in member's disk of exception response time threshold again
The maximum top n member's disk of average response time, then increase the durations number of member's disk and record;If the member
Disk is not look for the average response time and reached average response time in member's disk of exception response time threshold
Maximum top n member's disk, then reduced the durations number of member's disk and record, if the lasting week of member's disk
Issue is reduced to zero, and the durations of member's disk are not re-recorded;Wherein, the initial value of the durations number of member's disk is
Zero.
For example, in above-mentioned RAID array, if some member's disk is look for the average response time for the first time
Reach the maximum top n member's disk of average response time in member's disk of exception response time threshold, then can be by the member
The durations number of disk is set to 1.At the end of next measurement period, if the member look for again it is described average
Response time reaches the maximum top n member's disk of average response time in member's disk of exception response time threshold, then will
The durations number of member's disk Jia 1 certainly;If member's disk do not look for the average response time reach it is different
The maximum top n member's disk of average response time in member's disk of normal response time threshold value, then holding member's disk
Continuous periodicity, if the durations number of member's disk reduces to zero, the durations of member's disk is not re-recorded from subtracting 1.
In another optional implementation, above-mentioned RAID sub-system can also several statistics based on " absolute continuation "
Cycle carries out failure member's disk label to member's disk.
When realizing, above-mentioned RAID sub-system can record the above-mentioned average response time found respectively and reach exception response
The durations number of the maximum top n member's disk of average response time in member's disk of time threshold;If at several
After measurement period, the durations number of any member disk in the top n disk reaches default durations threshold value, then will
Member's disk label is failure member's disk.
In record, RAID sub-system can be reached for the average response time that finds exception response time threshold into
Each member's disk in member's disk in the maximum top n member's disk of average response time, terminates in next measurement period
When, if member's disk is look for the average response time and reached in member's disk of exception response time threshold again
The maximum top n member's disk of average response time, then increase the durations number of member's disk and record;If the member
Disk is not look for the average response time and reached average response time in member's disk of exception response time threshold
Maximum top n member's disk, then be set to zero by the durations number of member's disk.Wherein, the durations of member's disk
Several initial values is zero.
In the embodiment of the present application, due to being likely to occur the failure member disk of this measurement period mark because other are former
Because (such as magnetic disk media mistake etc.) has begun to the situation of reconstruction.Therefore, RAID array can detect the failure of the mark into
Whether member's disk is being rebuild, and RAID array system can also detect whether the failure member disk of the mark meets its institute
The reconstruction requirement of the RAID array of category.For example, rebuilding the reconstruction number no more than RAID array that requirement can be member's disk
Support the member's disk number rebuild simultaneously.Rebuilding requirement can also have been prepared for finishing to rebuild used HotSpare disk.
In actual applications, developer can set according to actual conditions and rebuild requirement, merely just require to carry out example to rebuilding
Property explanation, it is not limited especially.
The method and the RAID array of above-detailed that other RAID arrays triggering in storage device is rebuild are triggered
The method of reconstruction is identical, herein, repeats no more.
It should be noted that when being rebuild to failure member's disk, each producer can be according to the weight oneself set
Implementation is built to complete to rebuild, for example, can carry out kicking disk when starting to rebuild or carry out kicking disk after the completion of reconstruction,
Here the implementation that RAID array is rebuild is not limited specifically.
The application proposes a kind of method for triggering RAID array reconstruction, and RAID sub-system can refer to the IO received read-writes
Order is issued to each member's disk in the RAID array.And can be based on each member's disk in the RAID array default
The response time of the IO read write commands returned in measurement period, count the average response time of each member's disk.RAID
System can be in the non-HotSpare disk and non-faulting member's disk of the RAID array, and lookup average response time reaches abnormal loud
Answer member's disk of time threshold, it is possible to which the average response time found is reached to member's magnetic of exception response time threshold
The maximum top n member disk label of average response time is failure member's disk in disk, and is notified belonging to N number of member's disk
RAID array rebuild.
Because RAID sub-system can be while RAID array data flow not be influenceed, the average sound based on each member's disk
Between seasonable, average response time is reached to the maximum top n of average response time in member's disk of exception response time threshold
Member's disk label is failure member's disk, is rebuild with triggering the RAID array belonging to N number of member's disk, so as to realize
The response time of IO read write commands based on member's disk triggers the reconstruction to the RAID array belonging to member's disk.
Further, since RAID sub-system detects same member's disk in several continuous measurement periods reaches that this is some
The RAID array belonging to member's disk is just triggered during individual measurement period correspondence exception response time threshold to rebuild, therefore can be had
Improve the accuracy that RAID sub-system detects failure member's disk in effect ground, it is to avoid the provisional exception response of member's disk occur.
Embodiment with the method that foregoing triggering RAID array is rebuild is corresponding, and present invention also provides triggering RAID array
The embodiment of the device of reconstruction.
The embodiment for the device that the application triggering RAID array is rebuild can be using on a storage device.Device embodiment can
To be realized by software, it can also be realized by way of hardware or software and hardware combining.Exemplified by implemented in software, one is used as
Device on logical meaning, is by corresponding computer journey in nonvolatile memory by the processor of storage device where it
Sequence instruction reads what operation in internal memory was formed.For hardware view, as shown in Fig. 2 triggering RAID array weight for the application
A kind of hardware structure diagram of storage device where the device built, except the processor shown in Fig. 2, internal memory, network outgoing interface and
Outside nonvolatile memory, storage device in embodiment where device, can be with generally according to the actual functional capability of the storage
Including other hardware, this is repeated no more.
Fig. 3 is refer to, Fig. 3 is the device that a kind of triggering RAID array shown in the exemplary embodiment of the application one is rebuild
Block diagram.Described device is applied to the RAID sub-system of storage device;At least one pre-configured RAID array of the storage device, institute
Stating RAID array includes several member's disks;Described device includes:
Issuance unit 310, for each member's disk being issued to the IO received read write commands in the RAID array;
Statistic unit 320, for IO read-writes to refer in default measurement period based on each member's disk in the RAID array
The response time of order, count the average response time of each member's disk;
Average response in searching unit 330, the non-HotSpare disk and non-faulting member's disk for searching the RAID array
Time reaches member's disk of exception response time threshold;
Indexing unit 340, member's disk for the average response time found to be reached to exception response time threshold
The maximum top n member disk label of middle average response time is failure member's disk, and notifies the RAID array to rebuild;Its
In, N is not more than the RAID array and supported while the member's disk number rebuild.
In a kind of optional implementation, the exception response time threshold for the RAID array non-HotSpare disk and
The minimum of member's disk and the average response time being not zero and default exception response time weight in non-faulting member's disk
The product of value;
In the searching unit 330, the non-HotSpare disk and non-faulting member's disk specifically for searching the RAID array
The minimum of member's disk and the average response time being not zero;In the non-HotSpare disk and non-faulting member's disk of the RAID array
In, search the minimum that average response time reaches member's disk in the non-HotSpare disk and non-faulting member's disk of the RAID array
And the average response time being not zero and member's disk of default exception response time weight value product.
In another optional implementation, the statistic unit 320, specifically for adding up, each member's disk exists
The response time of IO read write commands in the default measurement period;Each member's disk is counted in the default measurement period
Completed IO read write commands number;Each member's disk is distinguished to the completion of corresponding cumulative response time and statistics
IO read write command numbers be divided by, the average response time of each member's disk is obtained respectively.
In another optional implementation, the indexing unit 340, specifically for recording described find respectively
Average response time reaches the maximum top n member's disk of average response time in member's disk of exception response time threshold
Durations number;If after several measurement periods, the durations number of any member disk in the top n disk reaches
Then it is failure member's disk by member's disk label to default durations threshold value.
In another optional implementation, the indexing unit 340, be further used for for it is described find it is flat
The equal response time is reached in member's disk of exception response time threshold in the maximum top n member's disk of average response time
Each member's disk, at the end of next measurement period, if member's disk is look for the average response again
Between reach the maximum top n member's disk of average response time in member's disk of exception response time threshold, then increase this into
The durations number of member's disk is simultaneously recorded;If member's disk do not look for the average response time reach it is abnormal loud
The maximum top n member's disk of average response time in member's disk of time threshold is answered, then reduces continuing for member's disk
Periodicity is simultaneously recorded;Wherein, the initial value of the durations number of member's disk is zero.
The function of unit and the implementation process of effect specifically refer to correspondence step in the above method in said apparatus
Implementation process, will not be repeated here.
For device embodiment, because it corresponds essentially to embodiment of the method, so related part is real referring to method
Apply the part explanation of example.Device embodiment described above is only schematical, wherein described be used as separating component
The unit of explanation can be or may not be physically separate, and the part shown as unit can be or can also
It is not physical location, you can with positioned at a place, or can also be distributed on multiple NEs.Can be according to reality
Selection some or all of module therein is needed to realize the purpose of application scheme.Those of ordinary skill in the art are not paying
In the case of going out creative work, you can to understand and implement.
The preferred embodiment of the application is the foregoing is only, not to limit the application, all essences in the application
God is with principle, and any modification, equivalent substitution and improvements done etc. should be included within the scope of the application protection.
Claims (10)
1. a kind of trigger the method that RAID array is rebuild, it is characterised in that methods described is applied to the RAID subsystems of storage device
System;At least one pre-configured RAID array of the storage device, the RAID array includes several member's disks;Methods described
Including:
The IO read write commands received are issued to each member's disk in the RAID array;
Based on the response time of the IO read write commands in default measurement period of each member's disk in the RAID array, statistics is described
The average response time of each member's disk;
Search average response time in the non-HotSpare disk and non-faulting member's disk of the RAID array and reach the exception response time
Member's disk of threshold value;
The average response time found is reached to average response time maximum in member's disk of exception response time threshold
Top n member disk label is failure member's disk, and notifies the RAID array to rebuild;Wherein, N is not more than described RAID gusts
It is disbursed from the cost and expenses and holds while the member's disk number rebuild.
2. according to the method described in claim 1, it is characterised in that the exception response time threshold is the RAID array
In non-HotSpare disk and non-faulting member's disk member's disk minimum and the average response time being not zero and default exception are loud
The product of weighted value between seasonable;
Average response time reaches exception response in the non-HotSpare disk and non-faulting member's disk for searching the RAID array
Member's disk of time threshold, including:
Search the minimum of member's disk and being averaged for being not zero in the non-HotSpare disk and non-faulting member's disk of the RAID array
Response time;In the non-HotSpare disk and non-faulting member's disk of the RAID array, lookup average response time reaches described
In the non-HotSpare disk and non-faulting member's disk of RAID array the minimum and the average response time that is not zero of member's disk with it is pre-
If exception response time weight value product member's disk.
3. according to the method described in claim 1, it is characterised in that described to be based in the RAID array each member's disk pre-
If the response time of IO read write commands in measurement period, the average response time of each member's disk is counted, including:
The response time for each member's disk IO read write commands in the default measurement period that add up;
Count each member's disk completed IO read write commands number in the default measurement period;
Each member's disk is distinguished to the completed IO read write commands number phase of corresponding cumulative response time and statistics
Remove, the average response time of each member's disk is obtained respectively.
4. according to the method described in claim 1, it is characterised in that it is described the average response time found is reached it is abnormal loud
It is failure member's disk to answer the maximum top n member disk label of average response time in member's disk of time threshold, including:
The average response time that finds described in record reaches average response in member's disk of exception response time threshold respectively
The durations number of top n member's disk of time maximum;
If after several measurement periods, the durations number of any member disk in the top n disk reaches default
Durations threshold value, then by member's disk label be failure member's disk.
5. method according to claim 4, it is characterised in that the average response time found described in the record respectively
The durations number of the maximum top n member's disk of average response time in member's disk of exception response time threshold is reached,
Including:
Average response time in member's disk of exception response time threshold is reached for the average response time found
Each member's disk in maximum top n member's disk, at the end of next measurement period, if member's disk is again
It is look for the average response time and is reached the maximum preceding N of average response time in member's disk of exception response time threshold
Individual member's disk, then increase the durations number of member's disk and record;If member's disk is not look for described
Average response time reaches the maximum top n member's disk of average response time in member's disk of exception response time threshold,
Then reduce the durations number of member's disk and record;Wherein, the initial value of the durations number of member's disk is zero.
6. a kind of trigger the device that RAID array is rebuild, it is characterised in that described device is applied to the RAID subsystems of storage device
System;At least one pre-configured RAID array of the storage device, the RAID array includes several member's disks;Described device
Including:
Issuance unit, each member's disk in the RAID array is issued to by the IO read write commands received;
Statistic unit, for the response based on the IO read write commands in default measurement period of each member's disk in the RAID array
Time, count the average response time of each member's disk;
Average response time reaches in searching unit, the non-HotSpare disk and non-faulting member's disk for searching the RAID array
Member's disk of exception response time threshold;
Averagely rung in indexing unit, member's disk for the average response time found to be reached to exception response time threshold
Maximum top n member disk label is failure member's disk between seasonable, and notifies the RAID array to rebuild;Wherein, N is little
The member's disk number for supporting to rebuild simultaneously in the RAID array.
7. device according to claim 6, it is characterised in that the exception response time threshold is the RAID array
In non-HotSpare disk and non-faulting member's disk member's disk minimum and the average response time being not zero and default exception are loud
The product of weighted value between seasonable;
Member's disk in the searching unit, the non-HotSpare disk and non-faulting member's disk specifically for searching the RAID array
Minimum and the average response time that is not zero;In the non-HotSpare disk and non-faulting member's disk of the RAID array, search
Average response time reaches the minimum of member's disk in the non-HotSpare disk of the RAID array and non-faulting member's disk and is not
Zero average response time and member's disk of default exception response time weight value product.
8. device according to claim 6, it is characterised in that the statistic unit, specifically for each member that adds up
The response time of disk IO read write commands in the default measurement period;Each member's disk is counted in the default statistics
Completed IO read write commands number in cycle;Each member's disk is distinguished into corresponding cumulative response time and statistics
Completed IO read write commands number is divided by, and the average response time of each member's disk is obtained respectively.
9. device according to claim 6, it is characterised in that the indexing unit, specifically for recording described look into respectively
The average response time found reaches the maximum top n member of average response time in member's disk of exception response time threshold
The durations number of disk;If after several measurement period, the lasting week of any member disk in the top n disk
Issue reaches default durations threshold value, then is failure member's disk by member's disk label.
10. device according to claim 9, it is characterised in that the indexing unit, is further used for searching for described
To average response time reach the maximum top n member's magnetic of average response time in member's disk of exception response time threshold
Each member's disk in disk, at the end of next measurement period, if member's disk look for again it is described average
Response time reaches the maximum top n member's disk of average response time in member's disk of exception response time threshold, then increases
Plus member's disk durations number and record;Reached if member's disk is not look for the average response time
The maximum top n member's disk of average response time, then reduce member's disk in member's disk of exception response time threshold
Durations number and record;Wherein, the initial value of the durations number of member's disk is zero.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710125115.7A CN106990918A (en) | 2017-03-03 | 2017-03-03 | Trigger the method and device that RAID array is rebuild |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710125115.7A CN106990918A (en) | 2017-03-03 | 2017-03-03 | Trigger the method and device that RAID array is rebuild |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106990918A true CN106990918A (en) | 2017-07-28 |
Family
ID=59413096
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710125115.7A Pending CN106990918A (en) | 2017-03-03 | 2017-03-03 | Trigger the method and device that RAID array is rebuild |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106990918A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107678694A (en) * | 2017-10-17 | 2018-02-09 | 深圳大普微电子科技有限公司 | RAID stripe method for reconstructing and solid-state disk |
CN108334280A (en) * | 2017-12-28 | 2018-07-27 | 创新科存储技术(深圳)有限公司 | A kind of RAID5 disks group fast reconstructing method and device |
WO2022057374A1 (en) * | 2020-09-18 | 2022-03-24 | 苏州浪潮智能科技有限公司 | Method and apparatus for improving raid data backup efficiency |
CN116700633A (en) * | 2023-08-08 | 2023-09-05 | 成都领目科技有限公司 | IO delay monitoring method, device and medium for RAID array hard disk |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5657468A (en) * | 1995-08-17 | 1997-08-12 | Ambex Technologies, Inc. | Method and apparatus for improving performance in a reduntant array of independent disks |
CN101329641A (en) * | 2008-06-11 | 2008-12-24 | 华中科技大学 | Method for rebuilding data of magnetic disk array |
CN102147708A (en) * | 2010-02-10 | 2011-08-10 | 成都市华为赛门铁克科技有限公司 | Method and device for detecting discs |
CN102981778A (en) * | 2012-11-15 | 2013-03-20 | 浙江宇视科技有限公司 | Redundant array of independent disks (RAID) array reconstruction method and device thereof |
CN105353991A (en) * | 2015-12-04 | 2016-02-24 | 浪潮(北京)电子信息产业有限公司 | Disk array reconstruction optimization method and device |
-
2017
- 2017-03-03 CN CN201710125115.7A patent/CN106990918A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5657468A (en) * | 1995-08-17 | 1997-08-12 | Ambex Technologies, Inc. | Method and apparatus for improving performance in a reduntant array of independent disks |
CN101329641A (en) * | 2008-06-11 | 2008-12-24 | 华中科技大学 | Method for rebuilding data of magnetic disk array |
CN102147708A (en) * | 2010-02-10 | 2011-08-10 | 成都市华为赛门铁克科技有限公司 | Method and device for detecting discs |
CN102981778A (en) * | 2012-11-15 | 2013-03-20 | 浙江宇视科技有限公司 | Redundant array of independent disks (RAID) array reconstruction method and device thereof |
CN105353991A (en) * | 2015-12-04 | 2016-02-24 | 浪潮(北京)电子信息产业有限公司 | Disk array reconstruction optimization method and device |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107678694A (en) * | 2017-10-17 | 2018-02-09 | 深圳大普微电子科技有限公司 | RAID stripe method for reconstructing and solid-state disk |
CN107678694B (en) * | 2017-10-17 | 2019-02-05 | 深圳大普微电子科技有限公司 | RAID stripe method for reconstructing and solid-state disk |
CN108334280A (en) * | 2017-12-28 | 2018-07-27 | 创新科存储技术(深圳)有限公司 | A kind of RAID5 disks group fast reconstructing method and device |
CN108334280B (en) * | 2017-12-28 | 2021-01-08 | 深圳创新科技术有限公司 | RAID5 disk group fast reconstruction method and device |
WO2022057374A1 (en) * | 2020-09-18 | 2022-03-24 | 苏州浪潮智能科技有限公司 | Method and apparatus for improving raid data backup efficiency |
CN116700633A (en) * | 2023-08-08 | 2023-09-05 | 成都领目科技有限公司 | IO delay monitoring method, device and medium for RAID array hard disk |
CN116700633B (en) * | 2023-08-08 | 2023-11-03 | 成都领目科技有限公司 | IO delay monitoring method, device and medium for RAID array hard disk |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106980468A (en) | Trigger the method and device that RAID array is rebuild | |
CN104484251B (en) | A kind of processing method and processing device of hard disk failure | |
US8171379B2 (en) | Methods, systems and media for data recovery using global parity for multiple independent RAID levels | |
Schwarz et al. | Disk scrubbing in large archival storage systems | |
EP3660681B1 (en) | Memory fault detection method and device, and server | |
KR100974043B1 (en) | On demand, non-capacity based process, apparatus and computer program to determine maintenance fees for disk data storage system | |
US10025666B2 (en) | RAID surveyor | |
CN100530125C (en) | Safety storage method for data | |
CN102508733B (en) | A kind of data processing method based on disk array and disk array manager | |
CN106990918A (en) | Trigger the method and device that RAID array is rebuild | |
US20120096309A1 (en) | Method and system for extra redundancy in a raid system | |
JP2005122338A (en) | Disk array device having spare disk drive, and data sparing method | |
CN102272731A (en) | Apparatus, system, and method for predicting failures in solid-state storage | |
CN110750213A (en) | Hard disk management method and device | |
CN113535474B (en) | Method, system, medium and terminal for automatically repairing heterogeneous cloud storage cluster fault | |
CN103136075A (en) | Disk system, data retaining device, and disk device | |
US8370688B2 (en) | Identifying a storage device as faulty for a first storage volume without identifying the storage device as faulty for a second storage volume | |
US20060215456A1 (en) | Disk array data protective system and method | |
CA2532998C (en) | Redundancy in array storage system | |
US7992072B2 (en) | Management of redundancy in data arrays | |
CN108170375B (en) | Overrun protection method and device in distributed storage system | |
US7457990B2 (en) | Information processing apparatus and information processing recovery method | |
US8001425B2 (en) | Preserving state information of a storage subsystem in response to communication loss to the storage subsystem | |
CN109375869A (en) | Realize the method and system, storage medium of data reliable read write | |
US11537468B1 (en) | Recording memory errors for use after restarts |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170728 |
|
RJ01 | Rejection of invention patent application after publication |