CA2240412A1 - A method and apparatus for management of faulty data in a raid system - Google Patents

A method and apparatus for management of faulty data in a raid system Download PDF

Info

Publication number
CA2240412A1
CA2240412A1 CA 2240412 CA2240412A CA2240412A1 CA 2240412 A1 CA2240412 A1 CA 2240412A1 CA 2240412 CA2240412 CA 2240412 CA 2240412 A CA2240412 A CA 2240412A CA 2240412 A1 CA2240412 A1 CA 2240412A1
Authority
CA
Canada
Prior art keywords
memory
block
data
bdt
memory array
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
CA 2240412
Other languages
French (fr)
Inventor
Ashok Bhaskar
Ashwath Nagaraj
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US08/573,127 external-priority patent/US5913927A/en
Application filed by Individual filed Critical Individual
Publication of CA2240412A1 publication Critical patent/CA2240412A1/en
Abandoned legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1076Parity data used in redundant arrays of independent storages, e.g. in RAID systems
    • G06F11/1092Rebuilding, e.g. when physically replacing a failing disk

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Techniques For Improving Reliability Of Storages (AREA)

Abstract

When a read error occurs during reconstruction (100) of the failed disk data, the block corresponding to the error block does not allow the reconstruction of the corresponding failed disk block. To prevent the misuse of the two data blocks, a bad data table [BDT] (23) is constructed that lists the addresses of the block just read and the block to be reconstructed. Also a standard filler block (106) is written into the two bad blocks and a new parity block (108) is created. The addresses of all access requests to the memory array (200) are compared with the BDT (109) and, if not listed, the access proceeds (200). If an address is listed, an error signal (204) is returned. For a listed write request (203), the bad block address is deleted from the BDT (205), new data written (206) into the block and a new parity block computed and stored (207).

Description

W O 97/22931 PCT~US96/19810 TITLE OF T ~ I~rVENTIONA Method and Apparatus for Management of Faulty Data in a RAID System Backqround of the Invention The invention pertains to the field of fault tolerant arrays of hard disks that are known as Redundant Arrays of Inexpensive Disks (RAID). More specifically 10 the invention relates to the management and correction of undetected read errors in a RAID system that are discovered when a disk failure requires the replacement and build-up of a disk other than that containing the undetected read error.
SummarY o~ the Invention This invention provides a faulty data management subsystem that avoids the potential generation o~
spurious data when two bad data blocks exist within the same RAID data group because o~ a disk recording medium 20 error in one channel of the array and the failure, replacement, and rebuild of a disk drive on another channel. The faulty data management subsystem of the invention includes:
a) a disk rebuild routine ~or rebuilding a failed drive that checks for a bad data block amongst the channels being used to reconstruct the data on the failed disk and, if found, scrubbing the bad data block by writing a filler code in that block and in the corresponding block of the disk that is being rebuilt;
b) computing a new parity block corresponding to the bad data block by using all channels including those with filler code;
c) updating a bad data table that lists all bad data blocks; and , W O 97/22931 PCT~US96/19810 d) checking each disk array access request to determine if the data address of that request is listed in the bad data table, and, if not, allowing the access to proceed, otherwise checking if the access request is for a write and, if so, allowing the write, deleting the block address from the bad data table and generating a new parity block, and, if not, generating an error signal.
lo_ The present invention provides means for managing faulty data in RAID systems without incurring the potential problems de~cribed above.
Because a byte stripe is a single address block, it may be seen that a RAID-3 and RAID-4 system are the 15 same except for block length. Consequently, all following references to a block will be understood to include the RAID-3 byte stripe unless otherwise indicated.
Also, "RAID system" will be used to indicate RAID-20-3, -4 and -5 systems unless indicated otherwise.
RAID systems, based on magnetic disk technology developed for personal computers, o~fer an attractive alternative to single large expensive disk memories by providing improved performance, reliability, and power 25 consumption. The manufacturers of small disks can offer such performance because of the efforts at standardization in defining higher level peripheral interfaces such as the ANSI X3.131-lg86 Small Computer Synchronous Interface (SCSI). This has led to the 30 development of arrays of inexpensive disks which are organized in an interleaved manner for large block transfers of data, or arranged for independent parallel access for small transaction processing.
The formation of large disk arrays introduces 35~reliability problems which result from the use o~ a large W O 97/22931 PCT~US96/19810 multiplicity o~ electromechanical devices. The mean-time-to-failure (MTTF) for an array has been estimated to increase with the number of disks in an array (Patterson, D.A., Gibson, G., and Katz, R.H., "A Case for Redundant 5 Arrays of Inexpensive Disks (RAID), Report No. UCB/CSD
87/391, Dec. 1987, Computer Science Division (EECS), University of California, Berkeley, CA 94720) Brief Description of the Drawinqs The present invention may be more fully understood from the detailed description given below and from the accompanying drawings of the preferred embodiments of the invention, which, however should not be taken to limit the invention to the specific embodiment but are for explanation and understanding only.
FIG. 1 is a block diagram of a prior art MxN disk array, including a disk array controller.
FIG. 2 shows the prior art logical disk area configuration based on the array of FIG. 1.
FIG. 3 (a~ shows the memory map of a prior art 20 RAID-1 system.
FIG. 3(b) shows the memory map of a prior art RAID-3 system.
FIG. 3(c) shows the memory map of a prior art ~AID-4 system.
FIG. 3(D) shows the memory map of a prior art RAID-5 system.
FIGS. 4(a), 4(b), and 4(c) show an initial RAID-1 memory map, a double faulted memory RAID-1 memory map, and a rebuilt, but double faulted RAID-1 memory map, 30 respectively, according to the prior art.
FIGS. 5(a), 5 (b), and 5(c) show an initial RAID-5 memory map, a double faulted RAID-5 memory map, and a rebuilt, but double faulted, RAID-5 memory map, respectively, according to the prior art.

FIG. 6 is a ~low diagram of the disk rebuild and reconstruct method o~ the invention.
FIG. 7 is a ~low diagram o~ the method o~ the invention ~or accessing the array.
FIG. 8 shows the hardware configuration o~ the invention ~or a RAID ~aulty data management system.
DescriPtion o~ the Pre~erred Embodiments FIG. 6 is a ~low diagram o~ the disk rebuild and reconstruction method 100 used to circumvent the problem 10 o~ generating spurious data because a disk read ~ault has occurred on a channel o~ a RAID system while reconstructing a corresponding data block on another channel. The reconstruction o~ a data block or rebuild o~ a disk drive is accomplished by bit-by-bit EXORing of 15 the corresponding bits o~ the remaining blocks belonging to the same group that share a common parity block.
Disk rebuild and reconstruction method 100 begins at step 101 in which the EXOR method o~ reconstruction begins. During reconstruction, step 102 checks ~or an 20 indication that one o~ the blocks needed for reconstruction is bad. I~ the reconstruction is completed without detecting a bad block, the process terminates at step 112. Otherwise, step 105 checks i~
the bad data block needed ~or reconstruction is a parity 25 ~lock. I~ so, step 104 writes a ~iller block to the block that is to be reconstructed and step 105 updates the bad data table (BDT) by recording that the data block i8 non-recoverable (bad). (The filler block may contain any convenient alphanumeric code.) If the bad data block 30 o~ step 103 is not a parity bloc~, step 106 scrubs the bad data block by writing a ~iller block to replace the bad data block and, in step 107, also writes a ~iller block to the corresponding ~lock that was to be reconstructed. Step 107 updates the BDT by recording 35 that both the block in step 103 and the corresponding block of the channel being reconstructed is bad. Each entry in the BDT identifies a block as containing artificial data and hence non-recoverable data. Step 109 computes new parity data by EXORing all of the data 5 channel blocks (including the filler blocks). Step 100 writes a new parity block and step 100 resumes the reconstruction process by returning to step 102.
The procedure of Fig. 6 ensures that (1) if a read error occurs during reconstruction 10 in one of the data blocks, all of the remaining l'good data blocks are recoverable, and (2) the artificial data of the filler blocks can not be confused with real data.
Step 109 updates a llbad data tablell, or BDT, that 15 is located in disk controller 12, and may also be located in a reserved section of each of the disks. The BDT
contains a list of all bad data blocks (filler blocks) so that any read request to the disk array first compares the target address with the addresses listed in the BDT
20 and returns an appropriate error signal to the requesting agent if the address is listed. After the BDT update, the process of reconstruction continues in step 110, and the monitoring in step 104 continues until completed. If additional bad data blocks are found in step 104, steps 25 105-110 are repeated as re~uired.
The method for operating a comparator for checking array access 200, shown in the flow diagram of FIG. 7, is used to manage the use of a BDT in conjunction with the RAID memory system. The method is invoked whenever an 30 access is requested and begins in step 201 by checking if an access request is to a bad data block whose address is listed in the BDT, and, if not, proceeds with the access ! in the normal operating mode. If the address is listed in the BDT, step 203 determines if the access is for a 35 write operation, and, if not, an error flag is generated W O 97/22931 PCT~US96/19810 in step 204. The error ~lag advises the host system that the read request is to a non-recoverable data block I~
the request is for a write access, step 205 deletes the block address ~rom the BDT and permits the write 5 circuitry to write the new data to the block address in step 206, thus scrubbing the bad data block status. Step 207 is ~or computing the new block parity and writing it into the corresponding parity block associated with the accessed data group.
lo FIG. 8 is a block diagram that shows the architecture o~ a RAID system of this invention using the methods outlined in FIGS. 6 and 7. Typically, RAID
system 20 with ~aulty data management o~ the type described above is inter~aced through its controller 22 15 ~rom a SCSI bus 21. (SCSI stands for a well known industry standard bus as described in "Small Computer System Inter~ace-2", American National Standard X3T9.2/86-109). Bus 21 connects RAID system 20 to the host system. Disk array controller 22 provides the 20 necessary logic to map any logical addres~ associated with a ~emory access into the corresponding physical address, directs data tra~ic, controls the operation o~
each di~k drive and provides status in~ormation to the host. BDT 23 is shown coupled to controller 22 but may 25 also be an integral part of an integrated chip disk array controller. Also, each disk 11 o~ the array is shown having a local BDT 24 in which the bad data information for each disk is optionally stored so that an access address can be checked at the disk level rather than at 30 the array level in di~k controller 22.
As will be understood by those skilled in the art, man~ changes in the methods and apparatus described above may be made by the skilled practitioner without departing ~rom the spirit and scope o~ the invention, which should 35 be limited only as set ~orth in the claims which ~ollow.

Claims (16)

Claims What is claimed is:
1. Apparatus for managing faulty data in a multichannel memory system having a memory array controller for controlling access to the memory system, for memory failure detection, for memory error detection and for single channel correction, each channel having at least one memory module, each module having a failure and read error detection means, and each memory module being separately replaceable upon failure, the apparatus, accessible to the memory controller, comprising:
a) a bad data table (BDT) for storing addresses of non-recoverable data due to the occurrence of concurrent faults in more than one channel;
b) a write circuit for writing filler data to a bad block location for replacing a bad data block and for writing to the BDT for storing addresses of bad data blocks representing non-recoverable data; and c) detection circuitry for detecting memory access requests to addresses stored in the BDT, for returning a non-recoverable data error signal to a host system if the access request is a read request, and, if the access request is a write request, permitting the write to the BDT listed address and deleting the listed address from the BDT.
2. The apparatus of claim 1 wherein the multichannel memory system is a RAID-3 memory array.
3. The apparatus of claim 1 wherein the multichannel memory system is a RAID-4 memory array.
4. The apparatus of claim 1 wherein the multichannel memory system is a RAID-5 memory array.
5. The apparatus of claim 1 wherein the BDT is part of the memory array controller.
6. The apparatus of claim 1 wherein the BDT is stored in a reserved section of each memory module.
7. The apparatus of claim 1 wherein the BDT is stored in a memory in the memory array controller and in a reserved section of each memory module.
8. A method for managing faulty data in a multichannel memory system having memory error detection and single channel correction means, each channel having at least one memory module capable of detecting failure and read errors, each memory module being separately replaceable upon failure, the method comprising:
a) detecting a read error in a block of a given channel;
b) determining if the read error is due to a failure of a memory module and, if so, replacing the failed memory module, otherwise using the single channel correction means for reconstructing data in the given channel;
c) monitoring all other channels for a second read error that would prevent reconstructing data, and, if such a second read error is found and, if neither the first nor second block is a parity block, entering the second block address, and the corresponding address of the first block into a bad data table (BDT), writing a filler block to both the first and the second blocks, computing and replacing a corresponding parity block using the first and the second blocks together with all corresponding data blocks, otherwise, writing a filler block to whichever of the first block and the second block is a data block, computing and writing a corresponding parity block using the filler block with all corresponding data blocks, and then continuing reconstructing data in the given channel until finished; and d) monitoring all memory access requests by comparing the requested address to the BDT
listed addresses, and, if the address is not listed in the BDT, allowing the access request to proceed, otherwise checking if the request is a write request, and, if not, returning a non-recoverable data error signal, otherwise allowing the write request to proceed so that the addressed filler block is replaced by a valid data block, and then computing and entering a new corresponding parity data block.
9. The method of claim 8 wherein the multichannel memory system is a RAID-3 memory.
10. The method of claim 8 wherein the multichannel memory system is a RAID-4 memory array.
11. The method of claim 8 wherein the multichannel memory system is a RAID-4 memory array.
12. A multichannel memory array system with a faulty data management system handling multiple faults, comprising:
a) a memory array wherein each channel has at least one memory module, each memory module being separately replaceable upon failure, and each module having a failure and read error detector;
b) a memory array controller for coupling the memory array to a memory bus, for controlling access to the memory array, for memory array failure detection and single channel correction and for communicating memory status over the memory bus, the memory array controller further having, i) a bad data table (BDT) for storing addresses of non-recoverable data blocks due to the occurrence of concurrent faults in more than one channel, ii) write circuitry for writing filler data blocks to bad data blocks corresponding to non-recoverable data blocks and for storing non-recoverable data block addresses in the BDT, and iii) a comparator for detecting memory access requests to bad data blocks by comparing a memory access request address with the BDT stored addresses, for returning a non-recoverable error signal to a requesting agent if the access request is a read request and the address is stored in the BDT, and for permitting the access request to proceed if the request is for a write access.
13. The multichannel memory array system of claim 12 wherein each memory module has a reserved memory area for storing the BDT.
14. The multichannel memory array system of claim 12 wherein the multichannel memory array system is a RAID-3 memory system.
15. The multichannel memory array system of claim 12 wherein the multichannel memory array system is a RAID-4 memory system.
16. The multichannel memory array system of claim 12 wherein the multichannel memory array system is a RAID-5 memory system.
CA 2240412 1995-12-15 1996-12-11 A method and apparatus for management of faulty data in a raid system Abandoned CA2240412A1 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US08/573,127 US5913927A (en) 1995-12-15 1995-12-15 Method and apparatus for management of faulty data in a raid system
US08/573,127 1995-12-15
PCT/US1996/019810 WO1997022931A1 (en) 1995-12-15 1996-12-11 A method and apparatus for management of faulty data in a raid system

Publications (1)

Publication Number Publication Date
CA2240412A1 true CA2240412A1 (en) 1997-06-26

Family

ID=29406131

Family Applications (1)

Application Number Title Priority Date Filing Date
CA 2240412 Abandoned CA2240412A1 (en) 1995-12-15 1996-12-11 A method and apparatus for management of faulty data in a raid system

Country Status (1)

Country Link
CA (1) CA2240412A1 (en)

Similar Documents

Publication Publication Date Title
US5913927A (en) Method and apparatus for management of faulty data in a raid system
US5951691A (en) Method and system for detection and reconstruction of corrupted data in a data storage subsystem
US7328392B2 (en) Disk array system
US6282670B1 (en) Managing defective media in a RAID system
JP3177242B2 (en) Nonvolatile memory storage of write operation identifiers in data storage
US5566316A (en) Method and apparatus for hierarchical management of data storage elements in an array storage device
US5950230A (en) RAID array configuration synchronization at power on
US6243827B1 (en) Multiple-channel failure detection in raid systems
US7418550B2 (en) Methods and structure for improved import/export of raid level 6 volumes
US7315976B2 (en) Method for using CRC as metadata to protect against drive anomaly errors in a storage array
US6467023B1 (en) Method for logical unit creation with immediate availability in a raid storage environment
US5315602A (en) Optimized stripe detection for redundant arrays of disk drives
US8112679B2 (en) Data reliability bit storage qualifier and logical unit metadata
US5504858A (en) Method and apparatus for preserving data integrity in a multiple disk raid organized storage system
US7281089B2 (en) System and method for reorganizing data in a raid storage system
US8839028B1 (en) Managing data availability in storage systems
US6006308A (en) Removable library media system utilizing redundant data storage and error detection and correction
JP2912802B2 (en) Disk array device failure handling method and device
US20080126839A1 (en) Optimized reconstruction and copyback methodology for a failed drive in the presence of a global hot spare disc
EP0481759A2 (en) Direct access storage device
US20050050381A1 (en) Methods, apparatus and controllers for a raid storage system
US7523257B2 (en) Method of managing raid level bad blocks in a networked storage system
US20090113235A1 (en) Raid with redundant parity
CA2089834A1 (en) High availability disk arrays
US7174476B2 (en) Methods and structure for improved fault tolerance during initialization of a RAID logical unit

Legal Events

Date Code Title Description
FZDE Dead