US20110066881A1 - Resilient software-controlled redundant array of independent disks (raid) - Google Patents

Resilient software-controlled redundant array of independent disks (raid) Download PDF

Info

Publication number
US20110066881A1
US20110066881A1 US12/558,952 US55895209A US2011066881A1 US 20110066881 A1 US20110066881 A1 US 20110066881A1 US 55895209 A US55895209 A US 55895209A US 2011066881 A1 US2011066881 A1 US 2011066881A1
Authority
US
United States
Prior art keywords
storage medium
disk storage
raid
boot
primary disk
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US12/558,952
Other versions
US8055948B2 (en
Inventor
Justin Pierce
David Steiner
Richard W. Vanderpool, III
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Toshiba Global Commerce Solutions Holdings Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Priority to US12/558,952 priority Critical patent/US8055948B2/en
Assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION reassignment INTERNATIONAL BUSINESS MACHINES CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: PIERCE, JUSTIN, VANDERPOOL, RICHARD W., III, STEINER, DAVID
Publication of US20110066881A1 publication Critical patent/US20110066881A1/en
Application granted granted Critical
Publication of US8055948B2 publication Critical patent/US8055948B2/en
Assigned to TOSHIBA GLOBAL COMMERCE SOLUTIONS HOLDINGS CORPORATION reassignment TOSHIBA GLOBAL COMMERCE SOLUTIONS HOLDINGS CORPORATION PATENT ASSIGNMENT AND RESERVATION Assignors: INTERNATIONAL BUSINESS MACHINES CORPORATION
Expired - Fee Related legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1415Saving, restoring, recovering or retrying at system level
    • G06F11/1417Boot up procedures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/2284Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing by power-on test, e.g. power-on self test [POST]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/4401Bootstrapping
    • G06F9/4406Loading of operating system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2053Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant
    • G06F11/2056Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant by mirroring
    • G06F11/2069Management of state, configuration or failover

Definitions

  • the present invention relates to the use of a software-controlled redundant array of independent disks (software RAID), and more specifically relates to dealing with a disk failure in a software RAID system.
  • software RAID software-controlled redundant array of independent disks
  • a redundant array of independent disks refers generally to a group of computer data storage schemes that can divide and replicate data among multiple data storage devices, such as hard disk drives.
  • An implementation of RAID may take the form of hardware RAID or software RAID.
  • Hardware RAID uses dedicated hardware to control the array of disks. This hardware component may be referred to as a dedicated RAID controller.
  • software RAID by contrast, the functions required to implement the RAID array are preformed by the system processor using special software routines. Since management of the array is a low-level activity that must be performed in support of other software that runs on processor, software RAID is usually implemented by the installed operating system and drivers which emulate a dedicated hardware RAID controller.
  • One disadvantage of a software RAID implementation is that a hardware failure occurring in the boot portion of the source disk will prevent the operating system from even loading. Accordingly, the system will become unusable until a system administrator manually restores the system by either replacing the damaged disk or revising the code in the Basic Input/Output System (BIOS). Either approach to restoring the system results in significant downtime and expense.
  • BIOS Basic Input/Output System
  • One embodiment of the present invention provides a computer program product including computer usable program code embodied on a computer usable storage medium for handling hardware failures in a software RAID.
  • the computer program product comprises computer usable program code for detecting hardware failure of a primary disk storage medium in a software RAID during the BIOS power-on self test (POST), and automatically changing the boot order of the disk storage mediums in the RAID to position a secondary disk storage medium in the RAID higher in the boot order ahead of the primary disk storage medium in response to detecting hardware failure of the primary disk storage medium.
  • POST BIOS power-on self test
  • Another embodiment of the invention provides a method comprising identifying a primary disk storage medium that is higher in a boot order than a secondary disk storage medium in a software RAID, and testing for a hardware failure of the primary disk storage medium during the BIOS power-on self test.
  • the boot order of the disk storage mediums in the software RAID is automatically changed to position the secondary disk storage medium in the RAID higher in the boot order than the primary disk storage medium in response to detecting a hardware failure in the primary disk storage medium; and then an operating system is booted from the disk storage medium that is highest in the boot order.
  • FIG. 1 is a flowchart of a method performed during BIOS POST in order to detect and address hardware failures in a disk storage medium prior to booting the operating system.
  • One embodiment of the present invention provides a computer program product including computer usable program code embodied on a computer usable storage medium for handling hardware failures in a software RAID.
  • the computer program product comprises computer usable program code for detecting hardware failure of a primary disk in a software RAID during the BIOS power-on self test, and automatically changing the boot order of the disk storage mediums in the RAID to position a secondary disk storage medium in the RAID higher in the boot order ahead of the primary disk storage medium in response to detecting hardware failure of the primary disk storage medium.
  • the boot order may be changed so that the primary disk storage medium is positioned at the bottom of the boot order below all other disk storage mediums in the RAID.
  • the BIOS POST may change the boot order by writing into a boot table that is typically stored in the system CMOS.
  • Detecting a hardware failure in a disk storage medium may be performed in various ways that may be specific to the computer system being used and the types of components installed.
  • a disk storage medium may be tested by reading and verifying the presence of physically bad sectors on an individual disk storage medium utilizing commands which are a part of the ATA standard.
  • the ATA commands according to the ATAPI-7 specification—0xEC(Identify) and 0x42(read/verify sectors ext)—may be used to determine the presence of a bad sector.
  • the disk storage medium may be tested by running a cyclic redundancy check of at least a portion of the primary disk storage medium.
  • a hardware failure of a primary disk storage medium may be detected by testing a predetermined portion of the boot partition of the primary disk storage medium.
  • predetermined portion of the boot partition may be less than the entire primary disk storage medium, since the operating system does not reside over the entire disk storage medium.
  • the predetermined portion of the boot partition may be less than about 100 megabytes.
  • the system BIOS causes the performance of boot time diagnostics of the disk storage mediums to proactively check the disk storage mediums to determine whether the medium, such as a hard disk drive, is bad.
  • the boot time diagnostics is performed during BIOS POST, so that the boot order of the disk storage mediums can be dynamically modified, if necessary, to place the bad disk storage medium at a position below one or more good disk storage medium. Therefore, the operating system will boot from a disk storage medium that is known to be in good condition.
  • the predetermined portion of the boot partition that is tested should be sufficient to ensure that the software RAID drivers for the primary and secondary disk storage mediums can be loaded. Once the RAID drivers are loaded by the operating system during boot, the drivers are capable of detecting problems with a disk storage medium and taking corrective action according to the exact RAID implementation.
  • the software RAID preferably implements a RAID 1 configuration with two mirrored disk storage mediums.
  • Another embodiment of the invention provides a method comprising identifying a primary disk storage medium that is higher in a boot order than a secondary disk storage medium in a software RAID, and testing for a hardware failure of the primary disk storage medium during the BIOS power-on self test.
  • the boot order of the disk storage mediums in the software RAID is automatically changed to position the secondary disk storage medium in the RAID higher in the boot order than the primary disk storage medium in response to detecting a hardware failure in the primary disk storage medium, and then an operating system is booted from the disk storage medium that is highest in the boot order.
  • the step of automatically changing the boot order may include positioning the primary disk storage medium lower in the boot order than all other disk storage mediums in the software RAID.
  • the step of detecting hardware failure of a primary disk storage medium may include reading and verifying a predetermined portion of the boot partition of the primary disk storage medium.
  • the predetermined portion of the boot partition may be less than the entire primary disk storage medium, or less than about 100 megabytes.
  • the predetermined portion of the boot partition is sufficient to allow loading of software RAID drivers for the primary and secondary disk storage mediums.
  • the disk storage medium may be tested by running a cyclic redundancy check of at least a portion of the primary disk storage medium.
  • FIG. 1 is a flowchart of a method 10 , which is preferably performed during BIOS POST, in order to detect and address hardware failures in a disk storage medium prior to booting the operating system.
  • the BIOS enters or begins diagnostics of the disk storage medium, such as a hard disk drive.
  • the BIOS setup options are then read in step 14 . If step 16 determines that disk storage medium diagnostics have been enabled, then diagnostics are executed on the primary disk storage medium in step 18 . Next, if step 20 determines that RAID diagnostics are enabled, then RAID disk storage medium diagnostics are executed in step 22 . If step 24 determines that the diagnostics indicated a disk storage medium failure, then step 26 reads the current boot order.
  • step 28 determines that there is a need to adjust or change the boot order, the drives are reordered or repositioned in the boot order to place the failing drive below the good drive in step 30 .
  • step 30 or if the determinations in any of preceding decision steps 16 , 20 , 24 , 28 are negative, then the method continues to step 32 to exit the disk storage medium diagnostics, continue the BIOS POST or check the next disk storage medium in the software RAID.
  • the present invention may be embodied as a system, method or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit,” “module” or “system.” Furthermore, the present invention may take the form of a computer program product embodied in any tangible storage medium having computer-usable program code stored on the storage medium.
  • the computer-usable or computer-readable storage medium may be, for example but not limited to, an electronic, magnetic, electromagnetic, or semiconductor apparatus or device. More specific examples (a non-exhaustive list) of the computer-readable medium include: a portable computer diskette, a hard disk, random access memory (RAM), read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, or a magnetic storage device.
  • RAM random access memory
  • ROM read-only memory
  • EPROM or Flash memory erasable programmable read-only memory
  • CD-ROM compact disc read-only memory
  • optical storage device or a magnetic storage device.
  • the computer-usable or computer-readable storage medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured via, for instance, optical scanning of the paper or other medium, then compiled, interpreted, or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.
  • a computer-usable or computer-readable storage medium may be any storage medium that can contain or store the program for use by a computer.
  • Computer usable program code contained on the computer-usable storage medium may be communicated by a propagated data signal, either in baseband or as part of a carrier wave.
  • the computer usable program code may be transmitted from one storage medium to another storage medium using any appropriate transmission medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc.
  • Computer program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
  • LAN local area network
  • WAN wide area network
  • Internet Service Provider for example, AT&T, MCI, Sprint, EarthLink, MSN, GTE, etc.
  • These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means which implement the function/act specified in the flowchart and/or block diagram block or blocks.
  • the computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s).
  • the functions noted in the block may occur out of the order noted in the figures.
  • two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
  • Each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Abstract

Method and computer program product for identifying a primary disk storage medium that is higher in a boot order than a secondary disk storage medium in a software RAID, and testing for a hardware failure of the primary disk storage medium during the BIOS power-on self test. The boot order of the disk storage mediums in the software RAID is automatically changed to position the secondary disk storage medium in the RAID higher in the boot order than the primary disk storage medium in response to detecting a hardware failure in the primary disk storage medium. The operating system is then booted from the disk storage medium that is highest in the boot order. A hardware failure may be detected by reading and verifying a predetermined portion of the boot partition of the disk storage medium. Optionally, the predetermined portion of the boot partition may be less than the entire primary disk storage medium, but is preferably sufficient to allow loading of software RAID drivers for the primary and secondary disk storage mediums.

Description

    BACKGROUND
  • 1. Field of the Invention
  • The present invention relates to the use of a software-controlled redundant array of independent disks (software RAID), and more specifically relates to dealing with a disk failure in a software RAID system.
  • 2. Background of the Related Art
  • A redundant array of independent disks (typically referred to by the acronym RAID) refers generally to a group of computer data storage schemes that can divide and replicate data among multiple data storage devices, such as hard disk drives. An implementation of RAID may take the form of hardware RAID or software RAID. Hardware RAID uses dedicated hardware to control the array of disks. This hardware component may be referred to as a dedicated RAID controller. In software RAID, by contrast, the functions required to implement the RAID array are preformed by the system processor using special software routines. Since management of the array is a low-level activity that must be performed in support of other software that runs on processor, software RAID is usually implemented by the installed operating system and drivers which emulate a dedicated hardware RAID controller.
  • One disadvantage of a software RAID implementation is that a hardware failure occurring in the boot portion of the source disk will prevent the operating system from even loading. Accordingly, the system will become unusable until a system administrator manually restores the system by either replacing the damaged disk or revising the code in the Basic Input/Output System (BIOS). Either approach to restoring the system results in significant downtime and expense.
  • BRIEF SUMMARY
  • One embodiment of the present invention provides a computer program product including computer usable program code embodied on a computer usable storage medium for handling hardware failures in a software RAID. The computer program product comprises computer usable program code for detecting hardware failure of a primary disk storage medium in a software RAID during the BIOS power-on self test (POST), and automatically changing the boot order of the disk storage mediums in the RAID to position a secondary disk storage medium in the RAID higher in the boot order ahead of the primary disk storage medium in response to detecting hardware failure of the primary disk storage medium.
  • Another embodiment of the invention provides a method comprising identifying a primary disk storage medium that is higher in a boot order than a secondary disk storage medium in a software RAID, and testing for a hardware failure of the primary disk storage medium during the BIOS power-on self test. The boot order of the disk storage mediums in the software RAID is automatically changed to position the secondary disk storage medium in the RAID higher in the boot order than the primary disk storage medium in response to detecting a hardware failure in the primary disk storage medium; and then an operating system is booted from the disk storage medium that is highest in the boot order.
  • BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS
  • FIG. 1 is a flowchart of a method performed during BIOS POST in order to detect and address hardware failures in a disk storage medium prior to booting the operating system.
  • DETAILED DESCRIPTION
  • One embodiment of the present invention provides a computer program product including computer usable program code embodied on a computer usable storage medium for handling hardware failures in a software RAID. The computer program product comprises computer usable program code for detecting hardware failure of a primary disk in a software RAID during the BIOS power-on self test, and automatically changing the boot order of the disk storage mediums in the RAID to position a secondary disk storage medium in the RAID higher in the boot order ahead of the primary disk storage medium in response to detecting hardware failure of the primary disk storage medium. Optionally, the boot order may be changed so that the primary disk storage medium is positioned at the bottom of the boot order below all other disk storage mediums in the RAID. The BIOS POST may change the boot order by writing into a boot table that is typically stored in the system CMOS.
  • Detecting a hardware failure in a disk storage medium may be performed in various ways that may be specific to the computer system being used and the types of components installed. For example, wherein the disk storage mediums utilize the ATA interface standard for the connection of storage devices, a disk storage medium may be tested by reading and verifying the presence of physically bad sectors on an individual disk storage medium utilizing commands which are a part of the ATA standard. Specifically, the ATA commands according to the ATAPI-7 specification—0xEC(Identify) and 0x42(read/verify sectors ext)—may be used to determine the presence of a bad sector. Similarly, the disk storage medium may be tested by running a cyclic redundancy check of at least a portion of the primary disk storage medium.
  • Optionally, a hardware failure of a primary disk storage medium may be detected by testing a predetermined portion of the boot partition of the primary disk storage medium. For example, predetermined portion of the boot partition may be less than the entire primary disk storage medium, since the operating system does not reside over the entire disk storage medium. Specifically, the predetermined portion of the boot partition may be less than about 100 megabytes.
  • In accordance with the invention, the system BIOS causes the performance of boot time diagnostics of the disk storage mediums to proactively check the disk storage mediums to determine whether the medium, such as a hard disk drive, is bad. The boot time diagnostics is performed during BIOS POST, so that the boot order of the disk storage mediums can be dynamically modified, if necessary, to place the bad disk storage medium at a position below one or more good disk storage medium. Therefore, the operating system will boot from a disk storage medium that is known to be in good condition.
  • Regardless of the exact amount of disk storage space tested, the predetermined portion of the boot partition that is tested should be sufficient to ensure that the software RAID drivers for the primary and secondary disk storage mediums can be loaded. Once the RAID drivers are loaded by the operating system during boot, the drivers are capable of detecting problems with a disk storage medium and taking corrective action according to the exact RAID implementation. The software RAID preferably implements a RAID 1 configuration with two mirrored disk storage mediums.
  • Another embodiment of the invention provides a method comprising identifying a primary disk storage medium that is higher in a boot order than a secondary disk storage medium in a software RAID, and testing for a hardware failure of the primary disk storage medium during the BIOS power-on self test. The boot order of the disk storage mediums in the software RAID is automatically changed to position the secondary disk storage medium in the RAID higher in the boot order than the primary disk storage medium in response to detecting a hardware failure in the primary disk storage medium, and then an operating system is booted from the disk storage medium that is highest in the boot order. In a RAID having more than two disk storage mediums, the step of automatically changing the boot order may include positioning the primary disk storage medium lower in the boot order than all other disk storage mediums in the software RAID.
  • In yet another embodiment of the invention, the step of detecting hardware failure of a primary disk storage medium may include reading and verifying a predetermined portion of the boot partition of the primary disk storage medium. Optionally, the predetermined portion of the boot partition may be less than the entire primary disk storage medium, or less than about 100 megabytes. Preferably, the predetermined portion of the boot partition is sufficient to allow loading of software RAID drivers for the primary and secondary disk storage mediums. Specifically, the ATA commands—0xEC(Identify) and 0x42(read/verify sectors ext)—may be used to determine the presence of a bad sector. Similarly, the disk storage medium may be tested by running a cyclic redundancy check of at least a portion of the primary disk storage medium.
  • FIG. 1 is a flowchart of a method 10, which is preferably performed during BIOS POST, in order to detect and address hardware failures in a disk storage medium prior to booting the operating system. In step 12, the BIOS enters or begins diagnostics of the disk storage medium, such as a hard disk drive. The BIOS setup options are then read in step 14. If step 16 determines that disk storage medium diagnostics have been enabled, then diagnostics are executed on the primary disk storage medium in step 18. Next, if step 20 determines that RAID diagnostics are enabled, then RAID disk storage medium diagnostics are executed in step 22. If step 24 determines that the diagnostics indicated a disk storage medium failure, then step 26 reads the current boot order. If step 28 then determines that there is a need to adjust or change the boot order, the drives are reordered or repositioned in the boot order to place the failing drive below the good drive in step 30. Following step 30, or if the determinations in any of preceding decision steps 16, 20, 24, 28 are negative, then the method continues to step 32 to exit the disk storage medium diagnostics, continue the BIOS POST or check the next disk storage medium in the software RAID.
  • As will be appreciated by one skilled in the art, the present invention may be embodied as a system, method or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit,” “module” or “system.” Furthermore, the present invention may take the form of a computer program product embodied in any tangible storage medium having computer-usable program code stored on the storage medium.
  • Any combination of one or more computer usable or computer readable storage medium(s) may be utilized. The computer-usable or computer-readable storage medium may be, for example but not limited to, an electronic, magnetic, electromagnetic, or semiconductor apparatus or device. More specific examples (a non-exhaustive list) of the computer-readable medium include: a portable computer diskette, a hard disk, random access memory (RAM), read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, or a magnetic storage device. The computer-usable or computer-readable storage medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured via, for instance, optical scanning of the paper or other medium, then compiled, interpreted, or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory. In the context of this document, a computer-usable or computer-readable storage medium may be any storage medium that can contain or store the program for use by a computer. Computer usable program code contained on the computer-usable storage medium may be communicated by a propagated data signal, either in baseband or as part of a carrier wave. The computer usable program code may be transmitted from one storage medium to another storage medium using any appropriate transmission medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc.
  • Computer program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
  • The present invention is described below with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
  • These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means which implement the function/act specified in the flowchart and/or block diagram block or blocks.
  • The computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
  • The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. Each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
  • The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, components and/or groups, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. The terms “preferably,” “preferred,” “prefer,” “optionally,” “may,” and similar terms are used to indicate that an item, condition or step being referred to is an optional (not required) feature of the invention.
  • The corresponding structures, materials, acts, and equivalents of all means or steps plus function elements in the claims below are intended to include any structure, material, or act for performing the function in combination with other claimed elements as specifically claimed. The description of the present invention has been presented for purposes of illustration and description, but it is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the invention. The embodiment was chosen and described in order to best explain the principles of the invention and the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.

Claims (14)

What is claimed is:
1. A computer program product including computer usable program code embodied on a computer usable storage medium for handling hardware failures in a software RAID, the computer program product comprising:
computer usable program code for detecting hardware failure of a primary disk in a software RAID during the BIOS power-on self test; and
computer usable program code for automatically changing the boot order of the disks in the RAID to position a secondary disk in the RAID higher in the boot order ahead of the primary disk in response to detecting hardware failure of the primary disk.
2. The computer program product of claim 1, wherein the computer usable program code for changing the boot order of the disks in the RAID includes computer usable program code to position the primary disk at the bottom of the boot order below all other disks in the RAID.
3. The computer program product of claim 1, wherein the computer usable program code for detecting hardware failure of a primary disk storage medium includes computer usable program code for reading and verifying a predetermined portion of the boot partition of the primary disk storage medium
4. The computer program product of claim 3, wherein the predetermined portion of the boot partition is less than about 100 megabytes.
5. The computer program product of claim 3, wherein the predetermined portion of the boot partition is less than the entire primary disk storage medium.
6. The computer program product of claim 5, wherein the predetermined portion of the boot partition is sufficient to allow loading of software RAID drivers for the primary and secondary disk storage mediums.
7. The computer program product of claim 1, wherein the computer usable program code for detecting hardware failure of a primary disk storage medium includes computer usable program code for initiating a cyclic redundancy check of at least a portion of the primary disk storage medium.
8. A method, comprising:
identifying a primary disk that is higher in a boot order than a secondary disk in a software RAID, wherein the boot order is maintained by the BIOS;
testing for a hardware failure of the primary disk during the BIOS power-on self test;
automatically changing the boot order of the disks in the software RAID to position the secondary disk in the RAID higher in the boot order than the primary disk in response to detecting a hardware failure in the primary disk; and then
booting an operating system from the disk that is highest in the boot order.
9. The method of claim 8, wherein the step of automatically changing the boot order of the disks in the software RAID includes positioning the primary disk lower in the boot order than all other disks in the software RAID.
10. The method of claim 8, wherein the step of detecting hardware failure of a primary disk storage medium includes reading and verifying a predetermined portion of the boot partition of the primary disk storage medium
11. The method of claim 10, wherein the predetermined portion of the boot partition is less than about 100 megabytes.
12. The method of claim 10, wherein the predetermined portion of the boot partition is less than the entire primary disk storage medium.
13. The method of claim 12, wherein the predetermined portion of the boot partition is sufficient to allow loading of software RAID drivers for the primary and secondary disk storage mediums.
14. The method of claim 8, wherein the step of detecting hardware failure of a primary disk storage medium includes performing a cyclic redundancy check of at least a portion of the primary disk storage medium.
US12/558,952 2009-09-14 2009-09-14 Resilient software-controlled redundant array of independent disks (RAID) Expired - Fee Related US8055948B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US12/558,952 US8055948B2 (en) 2009-09-14 2009-09-14 Resilient software-controlled redundant array of independent disks (RAID)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US12/558,952 US8055948B2 (en) 2009-09-14 2009-09-14 Resilient software-controlled redundant array of independent disks (RAID)

Publications (2)

Publication Number Publication Date
US20110066881A1 true US20110066881A1 (en) 2011-03-17
US8055948B2 US8055948B2 (en) 2011-11-08

Family

ID=43731641

Family Applications (1)

Application Number Title Priority Date Filing Date
US12/558,952 Expired - Fee Related US8055948B2 (en) 2009-09-14 2009-09-14 Resilient software-controlled redundant array of independent disks (RAID)

Country Status (1)

Country Link
US (1) US8055948B2 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120011395A1 (en) * 2010-07-09 2012-01-12 Getac Technology Corporation Boot method under boot sector failure in hard disk and computer device using the same
CN102331958A (en) * 2011-11-02 2012-01-25 赵玉燕 Method for starting hard disk under Linux system
CN102841863A (en) * 2012-07-10 2012-12-26 上海德拓信息技术有限公司 Method for backuping data through adopting dual-disk read-write operation
US20160335151A1 (en) * 2015-05-11 2016-11-17 Dell Products, L.P. Systems and methods for providing service and support to computing devices
CN107179709A (en) * 2017-05-22 2017-09-19 郑州云海信息技术有限公司 An a kind of key realizes the storage system sequence switch machine device with extension cabinet
CN107544780A (en) * 2016-06-23 2018-01-05 北京忆恒创源科技有限公司 The installation method and erecting device of a kind of operating system
US20180052672A1 (en) * 2016-08-16 2018-02-22 Hon Hai Precision Industry Co., Ltd. Portable storage device and method of installing operating system from portable device
US11742054B2 (en) * 2020-11-12 2023-08-29 Dell Products L.P. Memory power fault resilience in information handling systems

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8533545B2 (en) * 2009-03-04 2013-09-10 Alcatel Lucent Method and apparatus for system testing using multiple instruction types
KR102178833B1 (en) * 2013-12-12 2020-11-13 삼성전자주식회사 Memory system and computing system including the same
CN105302610A (en) * 2015-11-13 2016-02-03 中标软件有限公司 Automatic installation method of Linux system
CN107766196B (en) * 2016-08-19 2021-01-29 阿里巴巴集团控股有限公司 Method and device for starting check of computing device

Citations (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5950230A (en) * 1997-05-28 1999-09-07 International Business Machines Corporation RAID array configuration synchronization at power on
US6035395A (en) * 1997-03-28 2000-03-07 Kabushiki Kaisha Toshiba Computer system capable of using removable disk drive as boot device and method of controlling bootstrap
US6098119A (en) * 1998-01-21 2000-08-01 Mylex Corporation Apparatus and method that automatically scans for and configures previously non-configured disk drives in accordance with a particular raid level based on the needed raid level
US6282641B1 (en) * 1998-11-18 2001-08-28 Phoenix Technologies Ltd. System for reconfiguring a boot device by swapping the logical device number of a user selected boot drive to a currently configured boot drive
US6292890B1 (en) * 1998-09-29 2001-09-18 Compaq Computer Corporation Computer system with dynamically configurable boot order
US20020178351A1 (en) * 2001-05-11 2002-11-28 International Business Machines Corporation Mechanism for eliminating need for flash memory in software RAID
US20030005277A1 (en) * 2001-06-29 2003-01-02 Harding Matthew C. Automatic replacement of corrupted BIOS image
US6643735B2 (en) * 2001-12-03 2003-11-04 International Business Machines Corporation Integrated RAID system with the capability of selecting between software and hardware RAID
US6931519B1 (en) * 2000-08-25 2005-08-16 Sun Microsystems, Inc. Method and apparatus for reliable booting device
US6952794B2 (en) * 2002-10-10 2005-10-04 Ching-Hung Lu Method, system and apparatus for scanning newly added disk drives and automatically updating RAID configuration and rebuilding RAID data
US20070128899A1 (en) * 2003-01-12 2007-06-07 Yaron Mayer System and method for improving the efficiency, comfort, and/or reliability in Operating Systems, such as for example Windows
US20070168701A1 (en) * 2005-11-07 2007-07-19 Lsi Logic Corporation Storing RAID configuration data within a BIOS image
US7281159B2 (en) * 2000-10-26 2007-10-09 Hewlett-Packard Development Company, L.P. Managing disk drive replacements on multidisk headless appliances
US20070294582A1 (en) * 2006-05-05 2007-12-20 Dell Products L.P. Reporting software RAID configuration to system BIOS
US20080177994A1 (en) * 2003-01-12 2008-07-24 Yaron Mayer System and method for improving the efficiency, comfort, and/or reliability in Operating Systems, such as for example Windows
US7430592B2 (en) * 2004-04-21 2008-09-30 Dell Products L.P. Method for heterogeneous system configuration
US20080244585A1 (en) * 2007-03-27 2008-10-02 Aster Data Systems, Inc. System and method for using failure casting to manage failures in computer systems
US20090113195A1 (en) * 2007-10-24 2009-04-30 Dell Products L.P. System and Method for Extension of the BIOS Boot Specification
US7584347B2 (en) * 2005-06-10 2009-09-01 Dell Products L.P. System and method for identifying bootable device by generating a signature for each bootable device where the signature is independent of a location of the bootable device
US7861117B2 (en) * 2008-06-25 2010-12-28 International Business Machines Corporation Method to recover from a boot device failure during reboot or system IPL

Patent Citations (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6035395A (en) * 1997-03-28 2000-03-07 Kabushiki Kaisha Toshiba Computer system capable of using removable disk drive as boot device and method of controlling bootstrap
US5950230A (en) * 1997-05-28 1999-09-07 International Business Machines Corporation RAID array configuration synchronization at power on
US6098119A (en) * 1998-01-21 2000-08-01 Mylex Corporation Apparatus and method that automatically scans for and configures previously non-configured disk drives in accordance with a particular raid level based on the needed raid level
US6292890B1 (en) * 1998-09-29 2001-09-18 Compaq Computer Corporation Computer system with dynamically configurable boot order
US6282641B1 (en) * 1998-11-18 2001-08-28 Phoenix Technologies Ltd. System for reconfiguring a boot device by swapping the logical device number of a user selected boot drive to a currently configured boot drive
US6931519B1 (en) * 2000-08-25 2005-08-16 Sun Microsystems, Inc. Method and apparatus for reliable booting device
US7281159B2 (en) * 2000-10-26 2007-10-09 Hewlett-Packard Development Company, L.P. Managing disk drive replacements on multidisk headless appliances
US20020178351A1 (en) * 2001-05-11 2002-11-28 International Business Machines Corporation Mechanism for eliminating need for flash memory in software RAID
US6823450B2 (en) * 2001-05-11 2004-11-23 International Business Machines Corporation Mechanism for eliminating need for flash memory in software RAID
US20030005277A1 (en) * 2001-06-29 2003-01-02 Harding Matthew C. Automatic replacement of corrupted BIOS image
US6643735B2 (en) * 2001-12-03 2003-11-04 International Business Machines Corporation Integrated RAID system with the capability of selecting between software and hardware RAID
US6952794B2 (en) * 2002-10-10 2005-10-04 Ching-Hung Lu Method, system and apparatus for scanning newly added disk drives and automatically updating RAID configuration and rebuilding RAID data
US20070128899A1 (en) * 2003-01-12 2007-06-07 Yaron Mayer System and method for improving the efficiency, comfort, and/or reliability in Operating Systems, such as for example Windows
US20080177994A1 (en) * 2003-01-12 2008-07-24 Yaron Mayer System and method for improving the efficiency, comfort, and/or reliability in Operating Systems, such as for example Windows
US7430592B2 (en) * 2004-04-21 2008-09-30 Dell Products L.P. Method for heterogeneous system configuration
US7584347B2 (en) * 2005-06-10 2009-09-01 Dell Products L.P. System and method for identifying bootable device by generating a signature for each bootable device where the signature is independent of a location of the bootable device
US20070168701A1 (en) * 2005-11-07 2007-07-19 Lsi Logic Corporation Storing RAID configuration data within a BIOS image
US20070294582A1 (en) * 2006-05-05 2007-12-20 Dell Products L.P. Reporting software RAID configuration to system BIOS
US20080244585A1 (en) * 2007-03-27 2008-10-02 Aster Data Systems, Inc. System and method for using failure casting to manage failures in computer systems
US20090113195A1 (en) * 2007-10-24 2009-04-30 Dell Products L.P. System and Method for Extension of the BIOS Boot Specification
US7861117B2 (en) * 2008-06-25 2010-12-28 International Business Machines Corporation Method to recover from a boot device failure during reboot or system IPL

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120011395A1 (en) * 2010-07-09 2012-01-12 Getac Technology Corporation Boot method under boot sector failure in hard disk and computer device using the same
CN102331958A (en) * 2011-11-02 2012-01-25 赵玉燕 Method for starting hard disk under Linux system
CN102841863A (en) * 2012-07-10 2012-12-26 上海德拓信息技术有限公司 Method for backuping data through adopting dual-disk read-write operation
US20160335151A1 (en) * 2015-05-11 2016-11-17 Dell Products, L.P. Systems and methods for providing service and support to computing devices
US9870282B2 (en) * 2015-05-11 2018-01-16 Dell Products, L.P. Systems and methods for providing service and support to computing devices with boot failure
CN107544780A (en) * 2016-06-23 2018-01-05 北京忆恒创源科技有限公司 The installation method and erecting device of a kind of operating system
US20180052672A1 (en) * 2016-08-16 2018-02-22 Hon Hai Precision Industry Co., Ltd. Portable storage device and method of installing operating system from portable device
CN107179709A (en) * 2017-05-22 2017-09-19 郑州云海信息技术有限公司 An a kind of key realizes the storage system sequence switch machine device with extension cabinet
US11742054B2 (en) * 2020-11-12 2023-08-29 Dell Products L.P. Memory power fault resilience in information handling systems

Also Published As

Publication number Publication date
US8055948B2 (en) 2011-11-08

Similar Documents

Publication Publication Date Title
US8055948B2 (en) Resilient software-controlled redundant array of independent disks (RAID)
US20170242744A1 (en) Method and apparatus for performing data scrubbing management in storage system
US6976197B2 (en) Apparatus and method for error logging on a memory module
US9218893B2 (en) Memory testing in a data processing system
US20080052506A1 (en) Storage apparatus, control method, and control device
US20100011261A1 (en) Verifying Data Integrity of a Non-Volatile Memory System during Data Caching Process
US7661044B2 (en) Method, apparatus and program product to concurrently detect, repair, verify and isolate memory failures
US20100281297A1 (en) Firmware recovery in a raid controller by using a dual firmware configuration
CN104834575A (en) Firmware recovery method and device
US9176837B2 (en) In situ processor re-characterization
CN111105840B (en) Method, device and system for testing abnormal power failure of solid state disk
US10916326B1 (en) System and method for determining DIMM failures using on-DIMM voltage regulators
KR20100050380A (en) Automated firmware recovery
US9519545B2 (en) Storage drive remediation in a raid system
US7574621B2 (en) Method and system for identifying and recovering a file damaged by a hard drive failure
CN106598637B (en) Method for selective loading of components within a node
US6985826B2 (en) System and method for testing a component in a computer system using voltage margining
US8788238B2 (en) System and method for testing power supplies of server
US20140089653A1 (en) Electronic apparatus, method of restoring guid partition table (gpt) and computer-readable recording medium
US10635554B2 (en) System and method for BIOS to ensure UCNA errors are available for correlation
US20150154082A1 (en) Provisioning memory in a memory system for mirroring
US9195529B2 (en) Information processing apparatus and activation method
US7480836B2 (en) Monitoring error-handler vector in architected memory
US20200264946A1 (en) Failure sign detection device, failure sign detection method, and recording medium in which failure sign detection program is stored
US20120210061A1 (en) Computer and method for testing redundant array of independent disks of the computer

Legal Events

Date Code Title Description
AS Assignment

Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:PIERCE, JUSTIN;STEINER, DAVID;VANDERPOOL, RICHARD W., III;SIGNING DATES FROM 20090906 TO 20090911;REEL/FRAME:023237/0425

ZAAA Notice of allowance and fees due

Free format text: ORIGINAL CODE: NOA

ZAAB Notice of allowance mailed

Free format text: ORIGINAL CODE: MN/=.

STCF Information on status: patent grant

Free format text: PATENTED CASE

AS Assignment

Owner name: TOSHIBA GLOBAL COMMERCE SOLUTIONS HOLDINGS CORPORA

Free format text: PATENT ASSIGNMENT AND RESERVATION;ASSIGNOR:INTERNATIONAL BUSINESS MACHINES CORPORATION;REEL/FRAME:028895/0935

Effective date: 20120731

FPAY Fee payment

Year of fee payment: 4

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 8

FEPP Fee payment procedure

Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

LAPS Lapse for failure to pay maintenance fees

Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STCH Information on status: patent discontinuation

Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362

FP Lapsed due to failure to pay maintenance fee

Effective date: 20231108