CN103019894B - Reconstruction method for redundant array of independent disks - Google Patents

Reconstruction method for redundant array of independent disks Download PDF

Info

Publication number
CN103019894B
CN103019894B CN201210570497.1A CN201210570497A CN103019894B CN 103019894 B CN103019894 B CN 103019894B CN 201210570497 A CN201210570497 A CN 201210570497A CN 103019894 B CN103019894 B CN 103019894B
Authority
CN
China
Prior art keywords
disk
raid
write
read
raid system
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201210570497.1A
Other languages
Chinese (zh)
Other versions
CN103019894A (en
Inventor
金振成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Innovation Technology Co., Ltd.
Shenzhen Innovation Technology Co., Ltd.
Original Assignee
Innovation And Technology Storage Technology Co Ltd
UIT STORAGE TECHNOLOGY (SHENZHEN) Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Innovation And Technology Storage Technology Co Ltd, UIT STORAGE TECHNOLOGY (SHENZHEN) Co Ltd filed Critical Innovation And Technology Storage Technology Co Ltd
Priority to CN201210570497.1A priority Critical patent/CN103019894B/en
Publication of CN103019894A publication Critical patent/CN103019894A/en
Application granted granted Critical
Publication of CN103019894B publication Critical patent/CN103019894B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Signal Processing For Digital Recording And Reproducing (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a reconstruction method for a redundant array of independent disks (RAID). The method comprises the following steps of: (A) discovering that a first disk in the RAID system cannot give a response to input/output (IO) operation, independently powering off the first disk, and starting a timer of a preset time length by using a controller of a RAID system; (B) during the timing of the timer, performing normal read-write operation by using the RAID system, and recording the numbers of all stripe subjected to write operation; (C) when the timer is overtime, powering on the first disk; (D) after the first disk is electrified, performing read-write test operation on the first disk; (E) judging whether the first disk is read and written normally or not, executing a step (F) if the first disk is read and written normally, otherwise executing a step (G); (F) recovering data in corresponding stripes of the first disk according to the numbers of all the stripes subjected to the write operation during the outage of the first disk, and finishing the flow after the data is completely recovered; and (G) marking the first disk as a damaged disk, replacing the first disk by using a second disk which serves as a hot spare disk, performing calculation according to the data and parity check of other disks in the RAID system, and writing a calculation result into the second disk.

Description

A kind of method for reconstructing of raid-array
Technical field
The application relates to computer memory technical field, and particularly raid-array (RedundantArray ofIndependent Disks, RAID) technology, particularly relates to a kind of method for reconstructing of raid-array.
Background technology
RAID be a kind of polylith independently disk differently to combine formation disk group (logic magnetic disc), thus the technology memory property higher than single disk being provided and providing data redundancy to protect.The principle of RAID technique, exactly data and corresponding parity information are stored on each disk of composition RAID system, and parity information and corresponding data is stored on different disks respectively.After a data in magnetic disk of RAID system is damaged, remaining data and corresponding parity information is utilized to go to recover impaired data.As basis and the critical component of network store system, RAID with it fast, the feature of magnanimity and high reliability and famous.After RAID technique occurs, very extensive in the application demand of the every field such as industry, military affairs, education, be also industrial hot spot to the research of RAID technique always.
The different modes of composition disk array is called RAID rank (RAID Levels).Such as common RAID rank has RAID0, RAID1, RAID5, RAID6 etc.Different RAID ranks provides different Data Protection Scheme.
For the RAID5 of 4 pieces of disk compositions, only allowed one piece of hard disk to break down, when one piece of disk breaks down, RAID5 does not just possess data redundancy defencive function, needs to change as early as possible so break down when coiling.When after replacing faulty hard disk, Magnetic Disk Controller can utilize the data on normal disk and parity checking to calculate, and on the new disk after the result write calculated being changed, this process is called the reconstruction of RAID.
The object of rebuilding is to allow RAID again have data redundancy defencive function.When there is the disk failure of RAID, disk array manufacturer generally uses HotSpare disk technology to realize the automatic Reconstruction of RAID.HotSpare disk technology, in simple terms, being exactly that for this RAID specifies one piece of disk as HotSpare disk, when certain block member's disk failures of RAID system, HotSpare disk can replace failed disk automatically when creating RAID system, triggering RAID and rebuilding.As its name suggests, " heat " standby dish, when replacing failed disk, does not need to interrupt the read-write business in RAID system, namely during RAID system reconstructing, still can carry out carrying out read-write operation to this RAID system.
In prior art, when input and output (IO) request on upper strata can not be coiled response by certain member of RAID system, generally all can think that this member dish lost efficacy, RAID system can start process of reconstruction automatically.The reconstruction operation expense of RAID system is large, the cycle is long, affects the performance of normal data IO, and general during rebuilding, if there is other disk failure, RAID system can directly be collapsed, and then makes RAID system very fragile, therefore should avoid starting reconstruction operation as far as possible.
Summary of the invention
This application provides the method for reconstructing of a kind of RAID, the probability carrying out RAID reconstruction can be reduced as far as possible.
The method for reconstructing of a kind of RAID that the embodiment of the present application provides, comprising:
The controller of A, RAID system finds that the first disk in this RAID system cannot respond I/O operation, closes separately the power supply of the first disk, and starts the timer of a scheduled duration;
B, during described timer timing, RAID system carries out normal read-write operation, and all bar reel numbers of write operation occurred record during this period;
C, described timer expiry, open the power supply of the first disk, powers on to the first disk;
After D, the first disk power on, the first disk is done and carries out readwrite tests operation;
E, judge whether the first disk is read and write normally, if so, performs F, otherwise perform step G;
F, according to the generation of the first disk turnoff time interocclusal record all bar reel numbers of write operation, recover data in the corresponding band of the first disk, be recovered rear process ends;
G, be low-quality disk by the first disk label, the second disk as HotSpare disk is replaced the first disk, calculate according to the data of other disks in RAID system and parity checking, the result of calculating is write in described second disk.
Preferably, described readwrite tests operation comprises:
D1, check the first disk whether online and by drive load in operating system, if online, the first disk is low-quality disk; Continue to perform step D2 if online;
D2, to this disk send " TEST UNIT READY " this scsi command chkdsk whether be ready to read and write; If cannot read and write, disk is low-quality disk; If can step D3 be performed;
D3, RAID metadata corresponding for the first disk recorded in operating system is write the position of these disk corresponding element data, if write failure, then judge that the first disk is low-quality disk, continued to perform step D4 if write as merit;
D4, do read operation to the first disk RAID metadata, if the merit of being read as, the first disk confirms as well to coil, and reading failure then judges that the first disk is low-quality disk.
As can be seen from the above technical solutions, when certain disk of RAID system cannot respond I/O operation, first lower electric treatment is carried out to it, during lower electricity, allows application layer normally to read and write RAID, and during this period in there are all bar reel numbers of write operation; Then upper electric treatment is carried out to this disk, tests it and whether can normally read and write, if so, according to the generation of record all bar reel numbers of write operation, start to recover data in the corresponding band of this disk; Otherwise, be low-quality disk by this disk label, and start conventional process of reconstruction.The disk of RAID system in most of the cases can be made in this way to recover normal and without the need to carrying out reconstruction operation.
Accompanying drawing explanation
The method for reconstructing process flow diagram of a kind of raid-array that Fig. 1 provides for the embodiment of the present application.
Embodiment
In most cases, the I/O request on upper strata can not be coiled response by certain member of RAID system, and the disk not coiled as this member has really damaged.According to disk producer Seagate corporate statistics, when disk can not respond I/O request, the situation of 95% is because the software error of firmware, verification and so on causes, and these situations can make disk still effective by simple reparation operation; When only having 5%, be because disk is really damaged.Therefore, if when disk does not have real damage, just start process of reconstruction to RAID system, can the very big operation and maintenance cost improving RAID system.
This application provides a kind of method for reconstructing of raid-array, its basic thought is: provide on each disk slot interface of RAID system and control the circuit that disk realizes independent upper and lower electricity; When certain disk of RAID system cannot respond I/O operation, first lower electric treatment is carried out to it, during lower electricity, allows application layer normally to read and write RAID, and during this period in there are all bar reel numbers of write operation; Then upper electric treatment is carried out to this disk, tests it and whether can normally read and write, if so, according to the generation of record all bar reel numbers of write operation, start to recover data in the corresponding band of this disk; Otherwise, be low-quality disk by this disk label, and start conventional process of reconstruction.
For making the know-why of technical scheme, feature and technique effect clearly, below in conjunction with specific embodiment, technical scheme is described in detail.
The method for reconstructing flow process of a kind of raid-array that the embodiment of the present application provides as shown in Figure 1, comprises the steps:
The controller of step 101:RAID system finds that certain the block disk in this RAID system cannot respond I/O operation, closes separately the power supply of this disk, allows this disk power-off, and starts the timer of a scheduled duration.Below this disk is called the first disk.
Step 102: during described timer timing (namely between the first disk turnoff time), RAID system carries out normal read-write operation, and there are all bar reel numbers of write operation during this period in record.
Step 103: described timer expiry, opens the power supply of the first disk, powers on to the first disk.
Step 104: after the first disk powers on, does the first disk and carries out readwrite tests operation.
In the embodiment of the present application, readwrite tests does following operation:
D1, check the first disk whether online and by drive load in operating system, if online, the first disk is low-quality disk; Continue to perform step D2 if online;
D2, to this disk send " TEST UNIT READY " this scsi command chkdsk whether be ready to read and write; If cannot read and write, disk is low-quality disk; If can step D3 be performed;
D3, RAID metadata corresponding for the first disk recorded in operating system is write the position of these disk corresponding element data, if write failure, then judge that the first disk is low-quality disk, continued to perform step D4 if write as merit;
D4, do read operation to the first disk RAID metadata, if the merit of being read as, the first disk confirms as well to coil, and reading failure then judges that the first disk is low-quality disk.
Step 105: judge whether the first disk is read and write normally, if so, performs step 106, otherwise performs step 107.
Step 106: according to the generation of the first disk turnoff time interocclusal record all bar reel numbers of write operation, recover data in the corresponding band of the first disk, be recovered rear process ends.
Step 107: be low-quality disk by the first disk label, replaces the first disk using the second disk as HotSpare disk, calculates, the result of calculating write in described second disk according to the data of other disks in RAID system and parity checking.
The foregoing is only the preferred embodiment of the application; not in order to limit the protection domain of the application; within all spirit in technical scheme and principle, any amendment made, equivalent replacements, improvement etc., all should be included within scope that the application protects.

Claims (1)

1. a method for reconstructing of raid-array RAID, is characterized in that, comprising:
The controller of A, RAID system finds that the first disk in this RAID system cannot respond I/O operation, closes separately the power supply of the first disk, and starts the timer of a scheduled duration;
B, during described timer timing, RAID system carries out normal read-write operation, and all bar reel numbers of write operation occurred record during this period;
C, described timer expiry, open the power supply of the first disk, powers on to the first disk;
After D, the first disk power on, the first disk is done and carries out readwrite tests operation; Described readwrite tests operation comprises:
D1, check the first disk whether online and by drive load in operating system, if online, the first disk is low-quality disk; Continue to perform step D2 if online;
D2, to this disk send " TEST UNIT READY " this scsi command chkdsk whether be ready to read and write; If cannot read and write, disk is low-quality disk; If can step D3 be performed;
D3, RAID metadata corresponding for the first disk recorded in operating system is write the position of these disk corresponding element data, if write failure, then judge that the first disk is low-quality disk, continued to perform step D4 if write as merit;
D4, do read operation to the first disk RAID metadata, if the merit of being read as, the first disk confirms as well to coil, and reading failure then judges that the first disk is low-quality disk;
E, judge whether the first disk is read and write normally, if so, performs F, otherwise perform step G;
F, according to the generation of the first disk turnoff time interocclusal record all bar reel numbers of write operation, recover data in the corresponding band of the first disk, be recovered rear process ends;
G, be low-quality disk by the first disk label, the second disk as HotSpare disk is replaced the first disk, calculate according to the data of other disks in RAID system and parity checking, the result of calculating is write in described second disk.
CN201210570497.1A 2012-12-25 2012-12-25 Reconstruction method for redundant array of independent disks Active CN103019894B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210570497.1A CN103019894B (en) 2012-12-25 2012-12-25 Reconstruction method for redundant array of independent disks

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210570497.1A CN103019894B (en) 2012-12-25 2012-12-25 Reconstruction method for redundant array of independent disks

Publications (2)

Publication Number Publication Date
CN103019894A CN103019894A (en) 2013-04-03
CN103019894B true CN103019894B (en) 2015-03-04

Family

ID=47968524

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210570497.1A Active CN103019894B (en) 2012-12-25 2012-12-25 Reconstruction method for redundant array of independent disks

Country Status (1)

Country Link
CN (1) CN103019894B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104111880B (en) * 2013-04-16 2016-03-02 华中科技大学 A kind of forms data dish inefficacy fast reconstructing method holding three dish inefficacy correcting and eleting codes
CN103513942B (en) * 2013-10-21 2016-06-29 华为技术有限公司 The reconstructing method of raid-array and device
CN103699855B (en) * 2013-12-05 2018-04-27 华为技术有限公司 A kind of data processing method and device
CN105892950A (en) * 2016-04-01 2016-08-24 浪潮电子信息产业股份有限公司 Disk array reconstruction method and disk array reconstruction system
CN107544874A (en) * 2016-06-23 2018-01-05 南京中兴新软件有限责任公司 Method for processing business and device
CN107301106A (en) * 2017-06-28 2017-10-27 郑州云海信息技术有限公司 The restoration methods and device of a kind of RAID system failure
CN114968129B (en) * 2022-07-28 2022-12-06 苏州浪潮智能科技有限公司 Disk array redundancy method, system, computer equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101840311A (en) * 2009-12-30 2010-09-22 创新科存储技术有限公司 Self-repairing method suitable for RAID system and RAID system
CN101840360A (en) * 2009-10-28 2010-09-22 创新科存储技术有限公司 Rapid reconstruction method and device of RAID (Redundant Array of Independent Disk) system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7386758B2 (en) * 2005-01-13 2008-06-10 Hitachi, Ltd. Method and apparatus for reconstructing data in object-based storage arrays

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101840360A (en) * 2009-10-28 2010-09-22 创新科存储技术有限公司 Rapid reconstruction method and device of RAID (Redundant Array of Independent Disk) system
CN101840311A (en) * 2009-12-30 2010-09-22 创新科存储技术有限公司 Self-repairing method suitable for RAID system and RAID system

Also Published As

Publication number Publication date
CN103019894A (en) 2013-04-03

Similar Documents

Publication Publication Date Title
CN103019894B (en) Reconstruction method for redundant array of independent disks
CN102012847B (en) Improved disk array reconstruction method
TWI450087B (en) Data storage method for a plurality of raid systems and data storage system thereof
CN100392611C (en) Storage control apparatus and method
US9405617B1 (en) System and method for data error recovery in a solid state subsystem
CN102184129B (en) Fault tolerance method and device for disk arrays
US20090327603A1 (en) System including solid state drives paired with hard disk drives in a RAID 1 configuration and a method for providing/implementing said system
CN103513942B (en) The reconstructing method of raid-array and device
CN103034458B (en) Method and the device of Redundant Array of Independent Disks (RAID) is realized in solid state hard disc
CN104035830A (en) Method and device for recovering data
CN103019623B (en) Memory disc disposal route and device
CN102207895B (en) Data reconstruction method and device of redundant array of independent disk (RAID)
CN103246478B (en) A kind of based on the disc array system of software PLC support without packet type overall situation HotSpare disk
CN105531677A (en) Raid parity stripe reconstruction
CN104484251A (en) Method and device for processing faults of hard disk
CN104050056A (en) File system backup of multi-storage-medium device
JP6515752B2 (en) Storage control device, control method, and control program
CN101840311B (en) Self-repairing method suitable for RAID system and RAID system
CN102508733A (en) Disk array based data processing method and disk array manager
CN104503781A (en) Firmware upgrading method for hard disk and storage system
CN109032513A (en) Based on the RAID framework of SSD and HDD and its backup, method for reconstructing
US20060215456A1 (en) Disk array data protective system and method
CN109445982A (en) Realize the data storage device of data reliable read write
CN102495680A (en) Reconstruction method of RAID (Redundant Array of Independent Disks) system
CN105183590A (en) Disk array fault tolerance processing method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CP01 Change in the name or title of a patent holder
CP01 Change in the name or title of a patent holder

Address after: 518057 Shenzhen Software Park, No. 9, 501, 502, Science and Technology Middle Road, Nanshan District, Shenzhen City, Guangdong Province

Co-patentee after: Innovation Technology Co., Ltd.

Patentee after: Shenzhen Innovation Technology Co., Ltd.

Address before: 518057 Shenzhen Software Park, No. 9, 501, 502, Science and Technology Middle Road, Nanshan District, Shenzhen City, Guangdong Province

Co-patentee before: Innovation and Technology Storage Technology Co., Ltd.

Patentee before: UIT Storage Technology (Shenzhen) Co., Ltd.