US20070214318A1 - Disk array system and fault-tolerant control method for the same - Google Patents

Disk array system and fault-tolerant control method for the same Download PDF

Info

Publication number
US20070214318A1
US20070214318A1 US11/798,063 US79806307A US2007214318A1 US 20070214318 A1 US20070214318 A1 US 20070214318A1 US 79806307 A US79806307 A US 79806307A US 2007214318 A1 US2007214318 A1 US 2007214318A1
Authority
US
United States
Prior art keywords
enclosure
drive
controller
loop
disk
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US11/798,063
Inventor
Shohei Abe
Azuma Kano
Ikuya Yagisawa
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to US11/798,063 priority Critical patent/US20070214318A1/en
Publication of US20070214318A1 publication Critical patent/US20070214318A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2053Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant
    • G06F11/2089Redundant storage control functionality
    • G06F11/2092Techniques of failing over between control units
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2002Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where interconnections or communication control functionality are redundant
    • G06F11/2007Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where interconnections or communication control functionality are redundant using redundant communication media
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0629Configuration or reconfiguration of storage systems
    • G06F3/0635Configuration or reconfiguration of storage systems by changing the path, e.g. traffic rerouting, path reconfiguration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0661Format or protocol conversion arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices

Definitions

  • the present invention relates to a disk array system, and particularly to a technique effectively applied to a disk array system where a plurality of disk components are connected by looped communication means such as a Fibre Channel loop and to a fault-tolerant control method for such a disk array system.
  • FC loop Fibre Channel
  • serial Advanced Technology Attachment Serial Advanced Technology Attachment
  • Patent document 1 discloses an information processing system employing SATA drives.
  • Patent Document 1 U.S. patent application Publication No. 2003/0135577
  • SATA drives it is not easy to apply SATA drives to a disk array system. More specifically, application of SATA drives to a disk array system requires to address many issues, for instance, management of a plurality of enclosures in each of which is housed a SATA drive and connection between a SATA drive and a controller which controls read and write operations to and from the SATA drive.
  • An object of this invention is to solve the problems of the conventional art, and to provide a large-scale storage system employing SATA drives as a disk array system.
  • the invention provides a disk array system where a SATA drive is utilized as a hard disk unit (which may be referred to as “disk drive unit”, “disk drive”, or simply “drive”) which constitutes a drive enclosure (i.e., disk drive enclosure) of a disk array system, and a plurality of such drive enclosures are connected via a dual FC loop.
  • a SATA drive is utilized as a hard disk unit (which may be referred to as “disk drive unit”, “disk drive”, or simply “drive”) which constitutes a drive enclosure (i.e., disk drive enclosure) of a disk array system, and a plurality of such drive enclosures are connected via a dual FC loop.
  • the disk array system is enabled to continue accessing a drive enclosure on the normal side or loop in the event of any error or fault in a drive enclosure or SATA drive on one of the two FC loops, by identifying the failed drive enclosure or SATA drive where the error occurs and then isolating the failed drive enclosure from the FC loop, so that discrepancy in controls by a controller (hereinafter referred to as system controller) and by drive controllers housed in respective drive enclosures is avoided.
  • system controller hereinafter referred to as system controller
  • the system controller may shutdown the drive controller of the drive enclosure on the normal side or loop when a path is about to be switched from the drive controller of the drive enclosure on the failed side or loop to the drive controller on the normal loop, consequently leading to a system down.
  • the failed drive enclosure on the failed loop can not be isolated, that is, where a loop failure takes place, the failed drive enclosure is not disconnected from the system, and therefore communication between the system controller and each drive enclosure through the FC loop can not be maintained.
  • FC loop is designed such that communication between a system controller and each disk drive on an FC loop is disabled even if the FC loop is disconnected merely at one point.
  • the invention provides a storage system (which may be simply referred to as “system” hereinafter) constituted by a disk array of SATA drives built with utilizing an FC loop, in which it is controlled such that when the disk drives are individually isolated from the FC loop, the path is switched by a device such as a port bypass circuit (PBC), so that the FC loop is not disrupted.
  • system which may be simply referred to as “system” hereinafter
  • PBC port bypass circuit
  • a disk array system is built such that a plurality of drive enclosures each housing a disk drive and a controller which controls the disk drive (hereinafter referred to as “drive controller”) and a system controller enclosure accommodating a system controller which controls an entirety of the plurality of the drive enclosures, are connected by utilizing an FC loop.
  • a drive enclosure to be added may be referred to as expansion enclosure.
  • the PBC operates to isolate the drive enclosure where the fault (which may be referred to as “error” hereinafter) occurs, in order to have other drive enclosures continue to operate.
  • error hereinafter
  • the FC loop is made dual so that where a fault occurs on one loop, the communication can be maintained by using the other loop and the drive enclosure where the error takes place is identified and isolated from the FC loop.
  • an interface connector for converting an FC loop data into a data which the SATA drive can read/write is provided to connect the controller of each drive enclosure (i.e. drive controller) to the FC loop.
  • drive controller which may be called RAID controller
  • the drive controller on the failed side or loop is powered off or reset.
  • a port bypass circuit is provided between each interface connector and an FC loop, to prepare against an occurrence of a failure in any drive controller.
  • PBC port bypass circuit
  • the provision of the port bypass circuit enables isolation of the failed component from the FC loop, such that the failed drive enclosure and the following enclosures are bypassed by the FC loop, or alternatively, only the failed drive enclosure is bypassed by the FC loop.
  • the invention is directed to disk drives and drive controllers as constituent units of a disk array system (storage system).
  • a drive enclosure is constituted with a drive controller being dualized.
  • the dualized drive controller is controlled by a master enclosure (which may be referred to as RAID controller) in the form of a controller enclosure housing two system controllers for controlling the dualized drive controller.
  • the drive enclosures and the system controllers are connected in two loops or two sub-systems by using a communication line which may be Fibre Channel to obtain an FC loop through which data is communicated between the system controllers and the drive enclosures.
  • a communication line which may be Fibre Channel
  • FC loop through which data is communicated between the system controllers and the drive enclosures.
  • the port bypass circuit (PBC) merely operates to bypass a port in accordance with presence/absence of a signal, and the control is actually performed by
  • the disk array system comprises at least one controller enclosure housing two system controllers, a plurality of drive enclosures, and a plurality of FC loops which connects the at least one controller enclosure and the plurality of drive enclosures.
  • the controller enclosure comprises at least: a communication controller connected to a higher-level device such as a host computer and receives data from the higher-level device, a cache memory which is connected to the communication controller and which holds data communicated between the communication controller and the higher-level device, and a control portion which is connected to the communication controller and the cache memory and which controls to transfer or receive the data communicated between the communication controller and the higher-level device, to and from the communication controller, through an FC loop.
  • a communication controller connected to a higher-level device such as a host computer and receives data from the higher-level device
  • a cache memory which is connected to the communication controller and which holds data communicated between the communication controller and the higher-level device
  • a control portion which is connected to the communication controller and the cache memory and which controls to transfer or receive the data communicated between the communication controller and the higher-level device, to and from the communication controller, through an FC loop.
  • Each of the disk drive enclosures houses a SATA drive connected to: a plurality of port bypass circuits (PBCs) which is connected to the FC loops and switches a connection to the controller enclosure; a plurality of interface connectors which is connected to the controller enclosure through the plurality of FC loops, and each of which connects a fiber channel interface which is used by the plurality of FC loops and an interface for the SATA drive; a plurality of dual port switch device which are connected to the plurality of interface connectors, respectively, and controls to switch a path to the SATA drive from the plurality of interface connectors; and a plurality of dual port switch circuits, and the SATA drive receives and stores the data transferred from the controller enclosure via the FC loops, the port bypass circuits, interface connecting circuits, and the dual port switch circuits.
  • PBCs port bypass circuits
  • interface connectors which is connected to the controller enclosure through the plurality of FC loops, and each of which connects a fiber channel interface which is used by the plurality of FC loops and an
  • Each of the two drive controllers housed in each dual-structured drive enclosure has an enclosure management processor which monitors operation of the each drive enclosure.
  • This enclosure management processor is assigned with an address of Fibre Channel, namely, FC-AL address or ALPA (Arbitrated Loop Physical Address).
  • an enclosure management processor (first processor) communicates with the other enclosure management processor (second processor) housed in the same enclosure, and in a case where the first processor recognizes an occurrence of an error in a drive controller monitored by the second processor, the first processor notifies the system controller of the fact. The system controller shuts down operation of the failed drive enclosure in response to the notification.
  • the invention can provide a disk array system to which a SATA drive is applied.
  • FIG. 1 is an external view of an entirety of a disk array system according to the invention
  • FIG. 2 is an explanatory view showing structure of a master enclosure indicated in FIG. 1 ;
  • FIG. 3 is an explanatory view showing structure of an expansion enclosure indicated in FIG. 1 ;
  • FIG. 4 is an explanatory view illustrating an example of disk drive unit shown in FIGS. 2 and 3 ;
  • FIG. 5 is a pattern diagram illustrating a basic concept of a fault-tolerant control method for a disk array system according to a first embodiment of the invention
  • FIG. 6 is a functional block diagram illustrating structure of the disk array system according to the first embodiment
  • FIG. 7 is a functional block diagram illustrating an example of internal structure of a system controller as indicated in FIG. 6 ;
  • FIG. 8 indicates control programs stored in a RAID controller of the master enclosure
  • FIG. 9 is a functional block diagram illustrating in further detail the disk array system according to the first embodiment shown in FIG. 6 ;
  • FIG. 10 is a flow chart illustrating operation of the system shown in FIG. 9 ;
  • FIG. 11 is a functional block diagram illustrating a disk array system according to a second embodiment of the invention.
  • FIG. 12 is a view showing in detail structure of an expansion enclosure indicated in FIG. 11 ;
  • FIG. 13 is a functional block diagram illustrating a disk array system according to a third embodiment of the invention.
  • FIG. 14 is a flow chart illustrating operation of the system shown in FIG. 13 ;
  • FIG. 15 is a functional block diagram illustrating a disk array system according to a fourth embodiment of the invention.
  • FIG. 16 is a flow chart illustrating operation of the system shown in FIG. 15 .
  • FIG. 1 is an overall view of the disk array system 10 ;
  • FIG. 1A is a front view, while FIG. 1B is a back view thereof.
  • FIGS. 2A and 2B are views illustrating structure of a master enclosure indicated in FIG. 1 , and are respectively a perspective front and back views of the master enclosure where part of its components is pulled out.
  • FIGS. 3A and 3B are views illustrating structure of an expansion enclosure indicated in FIG. 1 , and are respectively a perspective front and back views of the expansion enclosure where part of its components is pulled out.
  • FIG. 4 is a view for illustrating an example of structure of a disk drive unit indicated in FIGS. 2 and 3 .
  • the disk array system 10 has a plurality of additional units 12 installed on a rack frame 11 .
  • a master enclosure 20 and expansion enclosures 30 are mounted on the additional units 12 .
  • reference numerals 52 , 53 , 54 , and 55 respectively denote a disk drive unit on which is loaded a disk drive 51 as shown in FIG. 4 , a battery unit as a backup power source, a display panel including a display device such as LED lamps for indicating an operating condition of the disk drive 51 and others, and a flexible disk drive which may be used when a maintenance program is loaded, for instance.
  • power controller boards 56 of the respective enclosures On the back face of the rack frame 11 are disposed power controller boards 56 of the respective enclosures.
  • a PBC port bypass circuit
  • FC loop between a plurality of the disk drives 51
  • a circuit for monitoring status of an AC/DC power source 57 and temperatures of parts of the master enclosure and expansion enclosures and a circuit for controlling power supply to the disk drive 51 , operation of a cooling fan 66 (shown in FIGS. 2 and 3 ), and the display device on the display panel 54 .
  • a cooling fan 66 shown in FIGS. 2 and 3
  • reference numerals 48 , 49 , 58 , 59 , 63 and 92 respectively denote a control line, power supply line, cooling fan unit, controller board, connector, and communication cable connected to one or more higher-level devices, i.e., host computers 100 .
  • each disk drive unit 52 is capable of being pulled out like a draw-out.
  • the battery unit 53 and flexible disk drive 55 are accommodated and the display panel 54 is mounted.
  • the power controller board 56 for controlling power supply to the plurality of disk drives 51 .
  • the connector 67 to which is connected the FC cable as part of an FC loop.
  • the AC/DC power source 57 for supplying electricity to parts of the master enclosure.
  • the AC/DC power source 57 is connected to the power controller board 56 .
  • Reference numeral 64 denotes a breaker switch.
  • the cooling fan unit 58 having the cooling fan 66 .
  • the controller board 59 also on which is mounted an interface board 61 .
  • the controller board 59 has a cache memory 62 and the connector 63 for the communication cable connected to the higher-level device (host computer) 100 . In FIG. 1 , only a single higher-level device is shown.
  • the connector 63 of the interface board 61 is compliant with interface standards, for example, SAN (Storage Area Network) consisting of protocols such as Fibre Channel (FC) and Ethernet (registered trademark), LAN (Local Area Network), and SCSI.
  • SAN Storage Area Network
  • FC Fibre Channel
  • Ethernet registered trademark
  • LAN Local Area Network
  • SCSI Serial Bus
  • each unit 52 is capable of being pulled out like a draw-out, as shown in FIG. 3A .
  • a power controller board 56 On the backside of the expansion enclosure 30 , a power controller board 56 , an AC/DC power source 57 , and a cooling fan unit 58 , similar to those as shown in FIG. 2 are disposed.
  • the disk drive 51 constituting with other members the disk drive unit 52 in each master and expansion enclosure 20 , has a housing 70 in which is incorporated a magnetic disk (hard disk) 73 , an actuator 71 , a spindle motor 72 , a magnetic head 74 which performs read/write operations, a control circuit 75 for controlling elements including the magnetic head 74 , a signal processing circuit 76 for controlling data-read/write signals, a communication interface circuit 77 , an interface connector 79 through which various commands and data are input/output, and a power connector 80 .
  • the disk drive 51 may be a magnetic disk of CSS (contact start stop) system having a nominal size of 3.5-inch, or of load/unload system having a nominal size of 2.5-inch, and has a communication interface for serial ATA (SATA), for example.
  • SATA serial ATA
  • a disk array system employing SATA drives as disk drive units is illustrated.
  • FIG. 5 is a pattern diagram illustrating a basic concept of a fault-tolerant control method for the disk array system according to the first embodiment of the invention, and shows a fundamental control method of switching an access path in the event of a fault or error in one of two sub-systems of a dual-structured SATA drive enclosure.
  • FIG. 5A indicates an access path when the system is normally operated, while FIG. 5B indicates that when the fault occurs.
  • reference numerals 1 A and 1 B respectively denote a first system controller CTL# 0 and a second system controller CTL# 1 .
  • Reference numerals 2 and 3 respectively denote a backend (which connects the system controllers and the drive controllers) and a SATA drive enclosure DISK-ENC# 0 as an expansion component of the disk array.
  • DISK-ENC# 1 denotes an expansion drive enclosure.
  • Reference numerals 4 A and 4 B denote a first interface connector I/F- 0 and a second interface connector I/F- 1 , respectively, while reference numerals 5 A, 5 B denote a first drive enclosure management processor EMP- 0 , and a second drive enclosure management processor EMP- 1 .
  • Reference numerals 6 A, 6 B respectively denote a port bypass circuit (PBC), and reference numerals 7 and 8 denote a dual port circuit (dual port device DPD) and a SATA disk, respectively.
  • L-# 0 , L-# 1 denote a first backend FC loop and a second backend FC loop.
  • the first system controller (CTL# 0 ) 1 A is connected thorough the backend FC loop L-# 0 to the SATA drive enclosure (DISK-ENC# 0 ), expansion enclosure DISK-ENC# 1 and the following expansion enclosures
  • the second system controller (CTL# 0 ) 1 B is connected thorough the backend FC loop L-# 1 to the SATA drive enclosure (DISK-ENC# 0 ), expansion enclosure DISK-ENC# 1 and the following expansion enclosures.
  • the first drive enclosure management processor (EMP- 0 ) 5 A (which will be simply referred to as “first management processor”) is connected to the first interface connector (I/F- 0 ) 4 A via the port bypass circuit (PBC) 6 A
  • the second drive enclosure management processor (EMP- 1 ) 5 B (which will be simply referred to as “second management processor”) is connected to the second interface connector (I/F- 1 ) 4 B via the port bypass circuit (PBC) 6 B.
  • the SATA disk 8 is connected to the first and second interface connectors (I/F- 0 , I/F- 1 ) 4 A, 4 B via the dual port circuit (DPD) 7 .
  • the management processors are also assigned with respective Arbitrated Loop Physical Addresses (AL-PA) of Fibre Channel. These addresses are used when devices such as the system controllers access the management processors.
  • the first and second system controllers (CTL# 0 , CTL# 1 ) 1 A, 1 B communicate with, and read/write data from and to, the disk 8 as a component of the disk array, through the backend 2 , and then via the port bypass circuits 6 A, 6 B, first and second interface connectors (I/F- 0 , I/F- 1 ) 4 A, 4 B, and dual port circuit 7 of the SATA drive enclosure (DISK-ENC# 0 ) 3 .
  • This state is indicated by a bold arrow in FIG. 5A .
  • the second management processor (EMP- 1 ) 5 B switches the path of a path controller connected to the disk drive. Further, the second system controller (CTL# 1 ) 1 B disconnects itself from the second backend FC loop # 1 on which the error has occurred, and switches to the normally performing first backend FC loop # 0 , to access the disk drive. Thus, even where the error has occurred on the backend FC loop, the access to the disk drive is continued.
  • FIG. 6 is a functional block diagram illustrating structure of a disk array system according to the first embodiment
  • FIG. 7 is a functional block diagram illustrating internal structure of a system controller indicated in FIG. 6
  • the disk array system has a master enclosure 110 and an expansion enclosure 140 .
  • the master enclosure 110 has two system controllers 120 A, 120 B for controlling the disk array system or storage system.
  • the system controller 120 A, 120 B is a so-called RAID (Redundant Arrays of Inexpensive Disks) controller.
  • the master enclosure 110 is connected to the higher-level devices in the form of host computers 10 A, 10 B, which may be PC servers, through a SAN (Storage Area Network) 130 .
  • the host computer 100 A, 100 B has an FC/SCSI interface board in the form of a host adapter 102 A, 102 B.
  • the master enclosure 110 and expansion enclosure 140 are connected by a backend FC loop 160 .
  • the expansion enclosure 140 is a drive enclosure and a plurality of which is actually disposed as shown in FIG. 1 , although only one of them is shown in FIG. 6 .
  • the expansion enclosure 140 may be referred to as “drive enclosure” also.
  • the expansion enclosure 140 has disk drives 171 , 173 each constituted by a SATA drive.
  • a drive controller for the disk drives 171 , 173 is dual-structured, consisting of two drive controllers, namely, a first drive controller 150 A and a second drive controller 150 B.
  • the first drive controller 150 A is a controller for controlling the drive enclosure, i.e., drive controller, and has a first port bypass circuit 151 A, a first interface connector 152 A, and a first enclosure management processor 153 A.
  • an intelligent semiconductor chip On the drive controller of the expansion enclosure is mounted an intelligent semiconductor chip (processor).
  • processor On the drive controller of the expansion enclosure is mounted an intelligent semiconductor chip (processor).
  • the system has two command channels, namely, one originating from the system controller and the other from the drive controller. Accordingly, in the event of occurrence of an error at the interface connector of the drive controller, there may be caused an unexpected crash of the chip or malfunction thereof due to a latent bug, since the chip is intelligent. To prevent an influence of the malfunction on the whole system, the failed interface connector is reset or powered off to be completely shut down. Thus the possible malfunction is prevented.
  • the first interface connector 152 A of the drive controller 150 A has a function to convert data of FC format which is transmitted on the FC loop, into SATA format.
  • the first and second enclosure management processor 153 A, 153 B is a processor for monitoring and managing the status of the drive enclosure 140 (e.g., power failure, abnormal temperature, and abnormal path).
  • the first and second enclosure management processors 153 A, 153 B are connected via an exclusive line 180 to send respective management information to each other.
  • the first and second enclosure management processors 153 A, 153 B are respectively connected to the first and second port bypass circuits 151 A, 151 B, and are assigned with respective FC-AL addresses.
  • the SATA drive comprises disks 171 , 173 and dual port circuits or DPDs 170 , 172 .
  • the dual port circuit 170 , 172 has a function to switch the access path to the disk 171 , 173 between the first interface connector 152 A of the first drive controller 150 A and the second interface connector 152 B of the second drive controller 150 B.
  • the dual port circuit 170 , 172 has a function to connect one of a data line from the first interface connector 152 A of the first drive controller 150 A and a data line from the second interface connector 152 B, of the second drive controller 150 B to the disk 171 , 173 .
  • the first and second port bypass circuits 151 A, 151 B are circuits for switching the path (or data line). However, these circuits do not perform the switching of the path for themselves, but implement it in response to an instruction from the system controller 120 A, 120 B.
  • the system controller 120 A in FIG. 6 is constituted as shown in FIG. 7 . That is, the system controller 120 A includes a communication controller 121 A having an interface which handles communication with the host computers 10 A, 10 B, and a cache memory 122 A which temporarily holds data communicated between the communication controller 121 A and a control portion 123 A.
  • Reference numeral 124 A denotes a data bus.
  • the control portion 123 A implements write and read operations to and from the drive (disk) via the cache memory 122 A in accordance with a request for data input/output. The same applies to the system controller 120 B.
  • FIG. 8 is an explanatory view indicating control programs in a RAID controller 400 included in the master enclosure 110 .
  • the RAID controller (RAID CTL) 400 has a RAID control program 401 as a base program for controlling the whole system, a fault detection program 402 for detecting a fault or error in the whole system, and a non-response instruction program 403 for shutting down the drive controller ( 250 A or 250 B) where the fault has occurred and confirming whether the failed drive controller is actually shut down.
  • FIGS. 9 and 10 there will be described operation of the disk array system constructed as described above, where an abnormality occurs in one of the two drive controllers of any of the expansion enclosures and thus it is determined that an error has occurred in the system.
  • an abnormality occurs in one of the two drive controllers of any of the expansion enclosures and thus it is determined that an error has occurred in the system.
  • only one side of dual elements or only one sub-system of dual system is illustrated, with exception of elements particularly requiring illustration, for avoiding redundancy.
  • the other side or sub-system operates in the similar manner.
  • FIG. 9 is a functional block diagram illustrating in further detail the structure of the disk array system according to the first embodiment shown in FIG. 6 .
  • FIG. 10 is a flow chart illustrating operation of the disk array system of FIG. 9 .
  • the structure shown in FIG. 9 is basically the same as that of FIG. 6 , but FIG. 9 additionally shows functional elements essential in this embodiment.
  • elements corresponding to the same elements denoted by reference numerals in the 100 range in FIG. 6 are denoted by the reference numerals in the 200 range.
  • a master enclosure 210 it is shown in FIG. 9 that a system controller 220 A has a host interface 221 A and a path switch 222 A.
  • a first drive controller 250 A has a reset circuit 254 A for resetting or powering off the interface connector and the enclosure management processor of a failed drive controller.
  • Reference numeral 282 denotes a signal line for transferring a signal from a reset circuit 254 B, for resetting or powering off an enclosure management processor 253 A of the first drive controller 250 A.
  • the first and a second drive controllers 250 A, 250 B respectively have a memory 255 A, 255 B in which is stored an enclosure-management-processor control program 256 A, 256 B.
  • the enclosure management processors 253 A, 253 B are connected to a first port bypass circuit 251 A and a second port bypass circuit 251 B, respectively, and are assigned with respective FC-AL addresses.
  • FIG. 10 is a flow chart illustrating operation of the system shown in FIG. 9 .
  • the enclosure management processors 253 A, 253 B are directly connected to the port bypass circuits 251 A, 251 B, respectively.
  • the port bypass circuit 251 B which is on the normal side or loop directly passes a RESET/POWEROFF command for resetting/powering off the enclosure management processor 253 A to the enclosure management processor 253 B, thereby having the reset circuit 254 A transmit a reset/poweroff signal based on this command.
  • FIG. 9 There will be now described a flow of processing performed when a fault occurs in the system constructed as shown in FIG. 9 .
  • an abnormality error
  • a shut-down instruction is issued by the first drive controller 250 A.
  • a similar processing may be performed in another case where the shut-down instruction is issued by the second drive controller 250 B upon an occurrence such an of error, and also in a still another case where the drive controller where the abnormality occurs is the second drive controller 250 B.
  • 10 is indicated from left to right respective operations of: the system controller 220 A ( 220 B); failed-side port bypass circuit 251 A; failed-side interface connector 252 A; failed-side enclosure management processor 253 A; failed-side reset circuit 254 A; normal-side port bypass circuit 251 B; and normal-side enclosure management processor 253 B, and the order of processings performed by these functional elements are indicated by arrows.
  • the system controller 220 A periodically issues a RECEIVE DIAG command to the enclosure management processors 253 A, 253 B to collect logs thereof, to monitor whether any fault occurs, by executing the fault detection program 402 shown in FIG. 8 .
  • the system controller 220 B operates similarly.
  • the failed-side port bypass circuit 251 A receives the RECEIVE DIAG command directed to the enclosure management processor 253 A.
  • the interface connector 252 A becomes unable to communicate.
  • the failed-side enclosure management processor 253 A which has detected via the port bypass circuit 251 A that the fault has occurred at the interface connector 252 A, returns fault information indicative of this fact to the system controller 220 A.
  • the system controller 220 A When the system controller 220 A obtains the fault information from the enclosure management processor 253 A, or, does not receive a response to the RECEIVE DIAG command, the system controller 220 A switches the path to a normally operating controller 250 B, and issues a reset/poweroff instruction to the normal-side enclosure management processor 253 B to stop or shut down the failed-side interface connector 252 A and enclosure management processor 253 A, by executing the non-response instruction program 403 (issuance of a SEND DIAG command).
  • the normal-side port bypass circuit 251 B receives the RESET/POWEROFF command for shutting down the failed-side interface connector 252 A and enclosure management processor 253 A.
  • the normal-side enclosure management processor 253 B receives the RESET/POWEROFF command for shutting down the failed-side interface connector 252 A and enclosure management processor 253 A, and issues the RESET/POWEROFF command to the reset circuit 254 A.
  • the failed-side reset circuit 254 A receives the RESET/POWEROFF command, and transmits a reset/poweroff signal to the failed-side interface connector 252 A and enclosure management processor 253 A.
  • the failed-side enclosure management processor 253 A receives the reset/poweroff signal and is shut down.
  • the failed-side interface connector 252 A receives the reset/poweroff signal and is shut down.
  • the drive controller (SATA drive) on the failed side or loop is isolated or bypassed with reliability. Therefore, the communication between the system controller and each drive enclosure through the FC loop can be maintained, without causing a system down.
  • the provision of the enclosure management processor enables to quickly deal with the fault occurring in the SATA expansion enclosure. Further, since the system controller can read and write data from and to the SATA drive via the FC loop, PBC, and interface connector, a control of a large volume of data can be enabled in the disk array system. Still further, since the plurality of system controllers is connected to the SATA expansion enclosure through the plurality of FC loops, a highly fault-tolerant disk array system can be provided.
  • a second embodiment is arranged such that the path bypasses a drive controller of an expansion enclosure where a fault has occurred, and goes to the normal expansion enclosure, so that the data communication with the higher-level device can be continued.
  • FIG. 11 is a functional block diagram illustrating structure of a disk array system according to the second embodiment, and shows a normally operated state of the disk array system comprising a master enclosure 710 and a plurality of expansion enclosures 740 , 760 , 780 .
  • the master enclosure 710 comprises a dual-structured RAID controller, i.e., two system controllers 720 A and 720 B, which communicate with higher-level devices (e.g., host computers) 700 A, 700 B via host interfaces 721 A, 721 B, respectively.
  • the system controllers 720 A, 720 B have a path switch 722 A, 722 B, respectively.
  • the expansion enclosure 740 comprises a first drive controller 750 A and a second drive controller 750 B, which respectively have a port bypass circuit 751 A, 751 B.
  • the first and second drive controllers 751 A, 751 B are connected to the path switches 722 A, 722 B of the system controllers 720 A, 720 B via the port bypass circuits 751 A, 751 B through FC loops.
  • the path of the FC loop is indicated by a bold arrow.
  • FIG. 12 is a view showing detailed structure of the expansion enclosure shown in FIG. 11 . Structure of each of other expansion enclosures 760 , 780 shown in FIG. 11 is similar to that of the expansion enclosure 740 as shown in FIG. 12 .
  • the expansion enclosure 740 has disk drives 771 , 773 which are SATA drives.
  • the drive controller is dual-structured, i.e., comprises a first drive controller 750 A and a second drive controller 750 B.
  • the first drive controller 750 A is a controller for controlling the drive enclosure, i.e., drive controller, and has a first port bypass circuit 751 A, a first interface connector 752 A, and a first enclosure management processor 753 A.
  • an intelligent semiconductor chip Processor
  • the system has two command channels, namely, one originating from the system controller of the master enclosure and the other from the drive controller of the expansion controller. Accordingly, in the event of occurrence of an error at the interface connector of the drive controller, there may be caused an unexpected crash of the chip or malfunction thereof due to a latent bug, since the chip is intelligent. To prevent an influence of the malfunction on the whole system, the failed drive is reset or powered off to be completely shut down. Thus the possible malfunction is prevented.
  • the first interface connector 752 A of the first drive controller 750 A has a function to convert data of FC format which is transmitted on the FC loop, into SATA format.
  • the first and second enclosure management processor 753 A, 753 B is a processor for monitoring and managing the status of the drive enclosure 740 (e.g., power failure, abnormal temperature, and abnormal path).
  • the first and second enclosure management processors 753 A, 753 B are connected via an exclusive line 780 to send respective management information to each other.
  • the first and second enclosure management processors 753 A, 753 B are respectively connected to the first and second port bypass circuits 751 A, 751 B, and are assigned with respective FC-AL addresses.
  • the SATA drive comprises 771 , 773 and dual port circuits (dual port devices) or DPDs 770 , 772 .
  • the dual port circuit or DPDs 770 , 772 has a function to switch the access path to the disk 771 , 773 between the first interface connector 752 A of the first drive controller 750 A and the second interface connector 752 B of the second drive controller 750 B.
  • the dual port circuit 770 , 772 has a function to connect one of a data line from the first interface connector 752 A of the first drive controller 750 A and a data line from the second interface connector 752 B of the second drive controller 750 B to the disk 771 , 773 .
  • the first and second port bypass circuits 751 A, 751 B are circuits for switching the path (or data line). However, these circuits do not perform the switching of the path for themselves, but implement it in response to an instruction from the system controller 720 A, 720 B.
  • the system controller 720 A has a structure as shown in FIG. 7 , for instance.
  • the path of the FC loop shown in FIG. 11 indicates an access path between the master enclosure and expansion enclosures 740 , 760 , 780 , when the system does not suffer from any fault.
  • the number of the expansion enclosures is not limited to three, the case where only three of them are connected is described here for simplifying illustration.
  • the FC cable turns back at the port bypass circuit 791 A, as pointed by an arrow A in FIG. 11 .
  • FIG. 11 shows only one of the FC loops on the side of the system controller 720 A of the master enclosure 710 , the other loop on the side of the system controller 720 B operates similarly.
  • first drive controllers 750 A, 770 A, 790 A of the respective expansion enclosures 740 , 760 , 780 are connected by the FC loop as shown in FIG. 11 .
  • the path originates from the path switch 722 A of the system controller 720 A of the master enclosure 710 and enters into the first interface connector 752 A via the port bypass circuit 751 A of the first drive controller 750 A of the expansion enclosure 740 , so that read/write operations of data (communicated from and to the host computer 700 A or 700 B) from and to the disk drive, which is the SATA drive connected to the first interface connector 752 A via a dual port circuit (not shown), are executed.
  • the expansion enclosures 760 , 780 are also similarly connected through the FC loop.
  • FIG. 13 is a functional block diagram illustrating structure of a disk array system according to a third embodiment of the invention. Elements corresponding to the same elements shown in FIG. 11 are referred to by reference numerals in the 800 range.
  • the third embodiment there will be described a case where a fault occurs in the disk array system comprising a master enclosure 810 and a plurality of expansion enclosures 840 , 860 , 880 .
  • Detailed structure of the expansion enclosure 840 , 860 , 880 . . . is similar to that of the enclosure 740 shown in FIG. 12 ; the reference numerals in the 700 range are to be replaced by reference numerals in the 800 range in the third embodiment shown in FIG. 13 .
  • FIG. 14 is a flow chart illustrating operation of the system shown in FIG. 13 .
  • FIG. 14 is indicated from left to right respective operations of: the system controller 820 A ( 820 B); failed-side port bypass circuit 871 A; failed-side interface connector 872 A; failed-side enclosure management processor 873 A; failed-side reset circuit 874 A; normal-side port bypass circuit 871 B; and normal-side enclosure management processor 873 B.
  • the system controller 820 A periodically issues a RECEIVE DIAG command to each drive controller 850 A, 850 B, 870 A, 870 B, 890 A, 890 B of the expansion enclosures 840 , 860 , 880 to collect a log of the enclosure management processor of the each drive controller, according to the fault detection program 402 , for thereby monitoring whether any fault occurs.
  • the controller 820 B operates similarly.
  • the port bypass circuit 871 A receives the RECEIVE DIAG command from the system controller 820 A.
  • the system controller 820 A obtains the fault information indicative of the occurrence of the fault at the interface connector 872 A from the enclosure management processor 873 A. The path is switched to the normally operating controller 850 B. A reset/poweroff instruction is issued to the normal-side enclosure management processor 873 B to stop or shut down the failed-side interface connector 872 A and enclosure management processor 873 A, according to the non-response instruction program 403 .
  • the normal-side port bypass circuit 871 B receives a RESET/POWEROFF command for stopping or shutting down the failed-side interface connector 872 A and enclosure management processor 873 A.
  • the normal-side enclosure management processor 873 B receives the RESET/POWEROFF command for stopping or shutting down the failed-side interface connector 872 A and enclosure management processor 873 A, and issues a reset/poweroff signal to the reset circuit 874 A.
  • the failed-side reset circuit 874 A receives the reset/poweroff signal and transfers the reset/poweroff signal to the failed-side interface connector 872 A and enclosure management processor 873 A.
  • the failed-side enclosure management processor 873 A receives the reset/poweroff signal and shuts down the enclosure management processor 873 A.
  • the failed-side interface connector 872 A also receives the reset/poweroff signal and shuts down the interface connector 872 A.
  • the interface connector becomes incapable of being recognized through the port bypass circuit 871 A, and therefore the path bypasses the drive controller 870 A and goes to the following expansion enclosure 880 .
  • the port bypass circuit PBC
  • a fault occurring in a SATA expansion enclosure can be quickly dealt with.
  • the system controller can read and write data from and to the SATA drive via the FC loop, PBC and interface connector, a control of a large volume of data can be enabled in the disk array system.
  • the plurality of system controllers is connected to the SATA expansion enclosure via the plurality of FC loops, a highly fault-tolerant disk array system can be provided.
  • FIG. 15 is a functional block diagram illustrating structure of a disk array system according to a fourth embodiment of the invention. Elements corresponding to the same elements shown in FIG. 13 are referred to by reference numerals in the 900 range.
  • FIG. 15 shows a state where an abnormality has occurred in the disk array system comprising a master enclosure 910 and a plurality of expansion enclosures 940 , 960 , 980 .
  • Detailed structure of each expansion enclosure 940 , 960 , 980 . . . is similar to that of the expansion enclosure 740 as shown in FIG. 12 .
  • the reference numerals in the 700 range in FIG. 12 are to be replaced by reference numerals in the 900 range in FIG. 15 .
  • FIG. 16 is a flow chart illustrating operation of the system shown in FIG. 15 .
  • the fourth embodiment relates to a method of controlling isolation of a failed component, which may be called “latter-part bypass” method, and which is used in the event of occurrence of a fault in the disk array system to identify the expansion enclosure where the fault has occurred. There will be described a way of identifying a failed enclosure according to the latter-part bypass method according to the fourth embodiment, by reference to FIG. 16 .
  • the controller 920 A of the master enclosure 910 can not identify the location of the error. Hence the error is detected at the enclosure level by sequentially isolating defective part of the loop.
  • the latter half of the plurality of expansion enclosures on the loop is first bypassed. If the system operates normally in this state, then the latter half of the first bypassed latter half is sequentially bypassed, that is, isolated from the FC loop. In this example, the expansion enclosure 980 is bypassed.
  • the error is searched according to the bisection method. Namely, the expansion enclosure of the latter half, i.e. the expansion enclosure 980 in this example, is bypassed.
  • the FC loop is further sequentially bypassed according to the bisection method, and it is determined whether any problem is detected regarding the operation of the FC loop in the current bypassed state each time the bypassing is implemented.
  • the first half of the expansion enclosures is subject to the latter-half bypassing according to the bisection method to search the location of the error.
  • the FC loop is further sequentially bypassed according to the bisection method, and it is determined whether any problem is detected regarding the operation of the FC loop in the current bypassed state each time bypassing is implemented.
  • the enclosure management processors are provided, a fault occurring in the SATA expansion enclosure can be quickly dealt with. Further, since the system controller can read and write data from and to the SATA drive via the FC loop, PBC and interface connector, a control of a large volume of data can be enabled in the disk array system. Still further, since the plurality of system controllers is connected to the SATA expansion enclosure via the plurality of FC loops, a highly fault-tolerant disk array system can be provided.
  • the invention provides a disk array system exhibiting an improved reliability, which is equipped with looped communication means and need not stop read/write operations when an error occurs. Further, in an information processing system equipped with an FC looped communication means according to the invention, an expansion enclosure where a fault occurs is isolated from the loop, thereby enabling implementation of a quick and accurate recovery operation.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Quality & Reliability (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Debugging And Monitoring (AREA)
  • Hardware Redundancy (AREA)

Abstract

A disk array system where a plurality of SATA drive enclosures are connected through an FC loop is made capable of continuing to process data even in the event of a fault. When the system is normally operated, a first system controller and a second system controller execute read/write operations from and to disks of a SATA drive enclosure of a disk array via a first interface connector and a second interface connector, respectively, through the FC loop. When an error occurs on a second backend FC loop, the second system controller disconnects itself from the failed second backend FC loop and switches the path to a first backend FC loop which is normally functioning, to access the disk drive.

Description

    CROSS-REFERENCE TO PRIOR APPLICATION
  • The present application is a continuation of application Ser. No. 10/835,074, filed Apr. 30, 2004, which claims priority from Japanese Patent Application No. 2004-30792, filed on Feb. 6, 2004, the entire disclosure of which are incorporated herein by reference.
  • BACK GROUND OF THE INVENTION
  • 1. Field of the Invention
  • The present invention relates to a disk array system, and particularly to a technique effectively applied to a disk array system where a plurality of disk components are connected by looped communication means such as a Fibre Channel loop and to a fault-tolerant control method for such a disk array system.
  • 2. Description of Related Art
  • As an ultra high-speed gigabit network technology, there is known Fibre Channel standardized by ANSI NCITS T11 (formerly ANSI X3 T11). Utilization of such a Fibre Channel (FC) loop (hereinafter referred to as “FC loop”) enables to build a large-scale storage system, i.e., a disk array system, where a plurality of hard disk units is connected.
  • On the other hand, a disk drive unit which performs read/write operations via serial interface, i.e., SATA (Serial Advanced Technology Attachment) drive, has been devised recently.
  • Patent document 1 discloses an information processing system employing SATA drives.
  • Patent Document 1: U.S. patent application Publication No. 2003/0135577
  • SUMMARY OF THE INVENTION
  • However, it is not easy to apply SATA drives to a disk array system. More specifically, application of SATA drives to a disk array system requires to address many issues, for instance, management of a plurality of enclosures in each of which is housed a SATA drive and connection between a SATA drive and a controller which controls read and write operations to and from the SATA drive.
  • An object of this invention is to solve the problems of the conventional art, and to provide a large-scale storage system employing SATA drives as a disk array system.
  • The invention provides a disk array system where a SATA drive is utilized as a hard disk unit (which may be referred to as “disk drive unit”, “disk drive”, or simply “drive”) which constitutes a drive enclosure (i.e., disk drive enclosure) of a disk array system, and a plurality of such drive enclosures are connected via a dual FC loop. The disk array system is enabled to continue accessing a drive enclosure on the normal side or loop in the event of any error or fault in a drive enclosure or SATA drive on one of the two FC loops, by identifying the failed drive enclosure or SATA drive where the error occurs and then isolating the failed drive enclosure from the FC loop, so that discrepancy in controls by a controller (hereinafter referred to as system controller) and by drive controllers housed in respective drive enclosures is avoided.
  • In the disk array system employing SATA drives where the FC loop is dual, when the disk drive of the failed drive enclosure is isolated from or bypassed by the FC loop, there is a possibility that malfunction such as a firmware crash or a latent bug takes place at an interface connector, unless the drive controller of the failed drive enclosure is also shutdowned. Further, because of the discrepancy between detections of the error by the drive controller of the drive enclosure and by the system controller housed in a system controller enclosure, the system controller may shutdown the drive controller of the drive enclosure on the normal side or loop when a path is about to be switched from the drive controller of the drive enclosure on the failed side or loop to the drive controller on the normal loop, consequently leading to a system down.
  • Further, in a case where the drive controller of the drive enclosure on the failed loop can not be isolated, that is, where a loop failure takes place, the failed drive enclosure is not disconnected from the system, and therefore communication between the system controller and each drive enclosure through the FC loop can not be maintained.
  • Specifications of FC loop is designed such that communication between a system controller and each disk drive on an FC loop is disabled even if the FC loop is disconnected merely at one point.
  • To deal with the above-described situation, the invention provides a storage system (which may be simply referred to as “system” hereinafter) constituted by a disk array of SATA drives built with utilizing an FC loop, in which it is controlled such that when the disk drives are individually isolated from the FC loop, the path is switched by a device such as a port bypass circuit (PBC), so that the FC loop is not disrupted.
  • A disk array system is built such that a plurality of drive enclosures each housing a disk drive and a controller which controls the disk drive (hereinafter referred to as “drive controller”) and a system controller enclosure accommodating a system controller which controls an entirety of the plurality of the drive enclosures, are connected by utilizing an FC loop. A drive enclosure to be added may be referred to as expansion enclosure. In such a disk array system, where a fault occurs in a particular drive enclosure, the PBC operates to isolate the drive enclosure where the fault (which may be referred to as “error” hereinafter) occurs, in order to have other drive enclosures continue to operate. However, in a case where the FC loop is disrupted or the communication through the FC loop is interrupted for some reason, all drive enclosures connected to the FC loop is disabled.
  • According to the invention, to deal with the fault occurrence in the large-capacity storage system where the system controller enclosure and plurality of drive enclosures are connected through an FC loop, the FC loop is made dual so that where a fault occurs on one loop, the communication can be maintained by using the other loop and the drive enclosure where the error takes place is identified and isolated from the FC loop.
  • Further, according to the invention, an interface connector for converting an FC loop data into a data which the SATA drive can read/write is provided to connect the controller of each drive enclosure (i.e. drive controller) to the FC loop. When an error occurs in one of the plurality of drive enclosures, in response to an instruction from the drive controller (which may be called RAID controller) on the failed side or loop or from the drive controller on the normal side or loop of the failed drive enclosure, the drive controller on the failed side or loop is powered off or reset.
  • Further, according to the invention, a port bypass circuit (PBC) is provided between each interface connector and an FC loop, to prepare against an occurrence of a failure in any drive controller. The provision of the port bypass circuit enables isolation of the failed component from the FC loop, such that the failed drive enclosure and the following enclosures are bypassed by the FC loop, or alternatively, only the failed drive enclosure is bypassed by the FC loop.
  • In other words, the invention is directed to disk drives and drive controllers as constituent units of a disk array system (storage system). A drive enclosure is constituted with a drive controller being dualized. The dualized drive controller is controlled by a master enclosure (which may be referred to as RAID controller) in the form of a controller enclosure housing two system controllers for controlling the dualized drive controller. The drive enclosures and the system controllers are connected in two loops or two sub-systems by using a communication line which may be Fibre Channel to obtain an FC loop through which data is communicated between the system controllers and the drive enclosures. There is provided the above-described port bypass circuit (PBC) which individually controls to connect and disconnect the drive enclosure to and from the system controllers through the FC loop. The port bypass circuit (PBC) merely operates to bypass a port in accordance with presence/absence of a signal, and the control is actually performed by the system controllers.
  • As described above, the disk array system according to the invention comprises at least one controller enclosure housing two system controllers, a plurality of drive enclosures, and a plurality of FC loops which connects the at least one controller enclosure and the plurality of drive enclosures.
  • The controller enclosure comprises at least: a communication controller connected to a higher-level device such as a host computer and receives data from the higher-level device, a cache memory which is connected to the communication controller and which holds data communicated between the communication controller and the higher-level device, and a control portion which is connected to the communication controller and the cache memory and which controls to transfer or receive the data communicated between the communication controller and the higher-level device, to and from the communication controller, through an FC loop.
  • Each of the disk drive enclosures houses a SATA drive connected to: a plurality of port bypass circuits (PBCs) which is connected to the FC loops and switches a connection to the controller enclosure; a plurality of interface connectors which is connected to the controller enclosure through the plurality of FC loops, and each of which connects a fiber channel interface which is used by the plurality of FC loops and an interface for the SATA drive; a plurality of dual port switch device which are connected to the plurality of interface connectors, respectively, and controls to switch a path to the SATA drive from the plurality of interface connectors; and a plurality of dual port switch circuits, and the SATA drive receives and stores the data transferred from the controller enclosure via the FC loops, the port bypass circuits, interface connecting circuits, and the dual port switch circuits.
  • Each of the two drive controllers housed in each dual-structured drive enclosure has an enclosure management processor which monitors operation of the each drive enclosure. This enclosure management processor is assigned with an address of Fibre Channel, namely, FC-AL address or ALPA (Arbitrated Loop Physical Address).
  • In the above-described structure, an enclosure management processor (first processor) communicates with the other enclosure management processor (second processor) housed in the same enclosure, and in a case where the first processor recognizes an occurrence of an error in a drive controller monitored by the second processor, the first processor notifies the system controller of the fact. The system controller shuts down operation of the failed drive enclosure in response to the notification.
  • The invention can provide a disk array system to which a SATA drive is applied.
  • It is to be understood that the present invention is not limited to details as described above or of embodiments described below, but may be embodied with various modifications, without departing from the scope and spirit of the invention as defined in the attached claims.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is an external view of an entirety of a disk array system according to the invention;
  • FIG. 2 is an explanatory view showing structure of a master enclosure indicated in FIG. 1;
  • FIG. 3 is an explanatory view showing structure of an expansion enclosure indicated in FIG. 1;
  • FIG. 4 is an explanatory view illustrating an example of disk drive unit shown in FIGS. 2 and 3;
  • FIG. 5 is a pattern diagram illustrating a basic concept of a fault-tolerant control method for a disk array system according to a first embodiment of the invention;
  • FIG. 6 is a functional block diagram illustrating structure of the disk array system according to the first embodiment;
  • FIG. 7 is a functional block diagram illustrating an example of internal structure of a system controller as indicated in FIG. 6;
  • FIG. 8 indicates control programs stored in a RAID controller of the master enclosure;
  • FIG. 9 is a functional block diagram illustrating in further detail the disk array system according to the first embodiment shown in FIG. 6;
  • FIG. 10 is a flow chart illustrating operation of the system shown in FIG. 9;
  • FIG. 11 is a functional block diagram illustrating a disk array system according to a second embodiment of the invention;
  • FIG. 12 is a view showing in detail structure of an expansion enclosure indicated in FIG. 11;
  • FIG. 13 is a functional block diagram illustrating a disk array system according to a third embodiment of the invention;
  • FIG. 14 is a flow chart illustrating operation of the system shown in FIG. 13;
  • FIG. 15 is a functional block diagram illustrating a disk array system according to a fourth embodiment of the invention; and
  • FIG. 16 is a flow chart illustrating operation of the system shown in FIG. 15.
  • DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
  • There will be described in detail presently preferred embodiments of the invention by reference to the accompanying drawings. First, structure of a disk array system 10 according to the invention will be described.
  • FIG. 1 is an overall view of the disk array system 10; FIG. 1A is a front view, while FIG. 1B is a back view thereof. FIGS. 2A and 2B are views illustrating structure of a master enclosure indicated in FIG. 1, and are respectively a perspective front and back views of the master enclosure where part of its components is pulled out. FIGS. 3A and 3B are views illustrating structure of an expansion enclosure indicated in FIG. 1, and are respectively a perspective front and back views of the expansion enclosure where part of its components is pulled out. FIG. 4 is a view for illustrating an example of structure of a disk drive unit indicated in FIGS. 2 and 3.
  • As shown in FIG. 1A, the disk array system 10 has a plurality of additional units 12 installed on a rack frame 11. On the additional units 12 are mounted a master enclosure 20 and expansion enclosures 30, such that these enclosures are capable of being pulled out like draw-outs. In FIG. 1, reference numerals 52, 53, 54, and 55 respectively denote a disk drive unit on which is loaded a disk drive 51 as shown in FIG. 4, a battery unit as a backup power source, a display panel including a display device such as LED lamps for indicating an operating condition of the disk drive 51 and others, and a flexible disk drive which may be used when a maintenance program is loaded, for instance.
  • As shown in FIG. 1B, on the back face of the rack frame 11 are disposed power controller boards 56 of the respective enclosures. On each board is mounted, for example, a PBC (port bypass circuit) for controlling an FC loop between a plurality of the disk drives 51, a circuit for monitoring status of an AC/DC power source 57 and temperatures of parts of the master enclosure and expansion enclosures, and a circuit for controlling power supply to the disk drive 51, operation of a cooling fan 66 (shown in FIGS. 2 and 3), and the display device on the display panel 54.
  • On the power controller board 56 is further provided a connector 67 for an FC cable 91. In FIG. 1, reference numerals 48, 49, 58, 59, 63 and 92 respectively denote a control line, power supply line, cooling fan unit, controller board, connector, and communication cable connected to one or more higher-level devices, i.e., host computers 100.
  • As shown in FIG. 2A, on the master enclosure 20 are loaded a multiplicity of the disk drive units 52 such that each disk drive unit 52 is capable of being pulled out like a draw-out. Under the disk drive units 52 of the master enclosure 20, the battery unit 53 and flexible disk drive 55 are accommodated and the display panel 54 is mounted.
  • As shown in FIG. 2B, on the back face of the master enclosure 20 is disposed the power controller board 56 for controlling power supply to the plurality of disk drives 51. On the power controller board 56 is provided the connector 67 to which is connected the FC cable as part of an FC loop. Further, on the back face of the master enclosure 20 is mounted the AC/DC power source 57 for supplying electricity to parts of the master enclosure. The AC/DC power source 57 is connected to the power controller board 56. Reference numeral 64 denotes a breaker switch. Under the AC/DC power source 57 is disposed the cooling fan unit 58 having the cooling fan 66. There is mounted the controller board 59 also on which is mounted an interface board 61. The controller board 59 has a cache memory 62 and the connector 63 for the communication cable connected to the higher-level device (host computer) 100. In FIG. 1, only a single higher-level device is shown.
  • To be connected to the host computer 100, the connector 63 of the interface board 61 is compliant with interface standards, for example, SAN (Storage Area Network) consisting of protocols such as Fibre Channel (FC) and Ethernet (registered trademark), LAN (Local Area Network), and SCSI.
  • On the front side of the expansion enclosure 30, multiple disk drive units 52 are loaded such that each unit 52 is capable of being pulled out like a draw-out, as shown in FIG. 3A. On the backside of the expansion enclosure 30, a power controller board 56, an AC/DC power source 57, and a cooling fan unit 58, similar to those as shown in FIG. 2 are disposed.
  • As shown in FIG. 4, the disk drive 51 constituting with other members the disk drive unit 52 in each master and expansion enclosure 20, has a housing 70 in which is incorporated a magnetic disk (hard disk) 73, an actuator 71, a spindle motor 72, a magnetic head 74 which performs read/write operations, a control circuit 75 for controlling elements including the magnetic head 74, a signal processing circuit 76 for controlling data-read/write signals, a communication interface circuit 77, an interface connector 79 through which various commands and data are input/output, and a power connector 80.
  • The disk drive 51 may be a magnetic disk of CSS (contact start stop) system having a nominal size of 3.5-inch, or of load/unload system having a nominal size of 2.5-inch, and has a communication interface for serial ATA (SATA), for example. In the following description, a disk array system employing SATA drives as disk drive units is illustrated.
  • FIG. 5 is a pattern diagram illustrating a basic concept of a fault-tolerant control method for the disk array system according to the first embodiment of the invention, and shows a fundamental control method of switching an access path in the event of a fault or error in one of two sub-systems of a dual-structured SATA drive enclosure. FIG. 5A indicates an access path when the system is normally operated, while FIG. 5B indicates that when the fault occurs. In FIGS. 5A and 5B, reference numerals 1A and 1B respectively denote a first system controller CTL# 0 and a second system controller CTL# 1.
  • Reference numerals 2 and 3 respectively denote a backend (which connects the system controllers and the drive controllers) and a SATA drive enclosure DISK-ENC# 0 as an expansion component of the disk array. DISK-ENC# 1 denotes an expansion drive enclosure. Reference numerals 4A and 4B denote a first interface connector I/F-0 and a second interface connector I/F-1, respectively, while reference numerals 5A, 5B denote a first drive enclosure management processor EMP-0, and a second drive enclosure management processor EMP-1. Reference numerals 6A, 6B respectively denote a port bypass circuit (PBC), and reference numerals 7 and 8 denote a dual port circuit (dual port device DPD) and a SATA disk, respectively. L-#0, L-#1 denote a first backend FC loop and a second backend FC loop.
  • The first system controller (CTL#0) 1A is connected thorough the backend FC loop L-#0 to the SATA drive enclosure (DISK-ENC#0), expansion enclosure DISK-ENC# 1 and the following expansion enclosures, while the second system controller (CTL#0) 1B is connected thorough the backend FC loop L-#1 to the SATA drive enclosure (DISK-ENC#0), expansion enclosure DISK-ENC# 1 and the following expansion enclosures. The first drive enclosure management processor (EMP-0) 5A (which will be simply referred to as “first management processor”) is connected to the first interface connector (I/F-0) 4A via the port bypass circuit (PBC) 6A, while the second drive enclosure management processor (EMP-1) 5B (which will be simply referred to as “second management processor”) is connected to the second interface connector (I/F-1) 4B via the port bypass circuit (PBC) 6B. The SATA disk 8 is connected to the first and second interface connectors (I/F-0, I/F-1) 4A, 4B via the dual port circuit (DPD) 7. It is to be noted that, similarly to the disk drives, the management processors are also assigned with respective Arbitrated Loop Physical Addresses (AL-PA) of Fibre Channel. These addresses are used when devices such as the system controllers access the management processors.
  • While the whole system is normally operated, as shown in FIG. 5A, the first and second system controllers (CTL# 0, CTL#1) 1A, 1B communicate with, and read/write data from and to, the disk 8 as a component of the disk array, through the backend 2, and then via the port bypass circuits 6A, 6B, first and second interface connectors (I/F-0, I/F-1) 4A, 4B, and dual port circuit 7 of the SATA drive enclosure (DISK-ENC#0) 3. This state is indicated by a bold arrow in FIG. 5A.
  • Here, a specific case where an error has occurred on the second backend FC loop L-#1 is discussed. In this case, the second management processor (EMP-1) 5B switches the path of a path controller connected to the disk drive. Further, the second system controller (CTL#1) 1B disconnects itself from the second backend FC loop # 1 on which the error has occurred, and switches to the normally performing first backend FC loop # 0, to access the disk drive. Thus, even where the error has occurred on the backend FC loop, the access to the disk drive is continued.
  • There will be described in further detail embodiments of the invention based on the above-described fundamental fault-tolerant control method.
  • First Embodiment
  • FIG. 6 is a functional block diagram illustrating structure of a disk array system according to the first embodiment, while FIG. 7 is a functional block diagram illustrating internal structure of a system controller indicated in FIG. 6. The disk array system has a master enclosure 110 and an expansion enclosure 140. The master enclosure 110 has two system controllers 120A, 120B for controlling the disk array system or storage system. The system controller 120A, 120B is a so-called RAID (Redundant Arrays of Inexpensive Disks) controller. The master enclosure 110 is connected to the higher-level devices in the form of host computers 10A, 10B, which may be PC servers, through a SAN (Storage Area Network) 130. The host computer 100A, 100B has an FC/SCSI interface board in the form of a host adapter 102A, 102B. The master enclosure 110 and expansion enclosure 140 are connected by a backend FC loop 160.
  • The expansion enclosure 140 is a drive enclosure and a plurality of which is actually disposed as shown in FIG. 1, although only one of them is shown in FIG. 6. The expansion enclosure 140 may be referred to as “drive enclosure” also. The expansion enclosure 140 has disk drives 171, 173 each constituted by a SATA drive. A drive controller for the disk drives 171, 173 is dual-structured, consisting of two drive controllers, namely, a first drive controller 150A and a second drive controller 150B. The first drive controller 150A is a controller for controlling the drive enclosure, i.e., drive controller, and has a first port bypass circuit 151A, a first interface connector 152A, and a first enclosure management processor 153A.
  • On the drive controller of the expansion enclosure is mounted an intelligent semiconductor chip (processor). This means that the system has two command channels, namely, one originating from the system controller and the other from the drive controller. Accordingly, in the event of occurrence of an error at the interface connector of the drive controller, there may be caused an unexpected crash of the chip or malfunction thereof due to a latent bug, since the chip is intelligent. To prevent an influence of the malfunction on the whole system, the failed interface connector is reset or powered off to be completely shut down. Thus the possible malfunction is prevented.
  • The first interface connector 152A of the drive controller 150A has a function to convert data of FC format which is transmitted on the FC loop, into SATA format. The same applies to the second drive controller 150B. The first and second enclosure management processor 153A, 153B is a processor for monitoring and managing the status of the drive enclosure 140 (e.g., power failure, abnormal temperature, and abnormal path). The first and second enclosure management processors 153A, 153B are connected via an exclusive line 180 to send respective management information to each other. The first and second enclosure management processors 153A, 153B are respectively connected to the first and second port bypass circuits 151A, 151B, and are assigned with respective FC-AL addresses.
  • The SATA drive comprises disks 171, 173 and dual port circuits or DPDs 170, 172. The dual port circuit 170, 172 has a function to switch the access path to the disk 171, 173 between the first interface connector 152A of the first drive controller 150A and the second interface connector 152B of the second drive controller 150B. In other words, the dual port circuit 170, 172 has a function to connect one of a data line from the first interface connector 152A of the first drive controller 150A and a data line from the second interface connector 152B, of the second drive controller 150B to the disk 171, 173.
  • The first and second port bypass circuits 151A, 151B are circuits for switching the path (or data line). However, these circuits do not perform the switching of the path for themselves, but implement it in response to an instruction from the system controller 120A, 120B.
  • For instance, the system controller 120A in FIG. 6 is constituted as shown in FIG. 7. That is, the system controller 120A includes a communication controller 121A having an interface which handles communication with the host computers 10A, 10B, and a cache memory 122A which temporarily holds data communicated between the communication controller 121A and a control portion 123A. Reference numeral 124A denotes a data bus. The control portion 123A implements write and read operations to and from the drive (disk) via the cache memory 122A in accordance with a request for data input/output. The same applies to the system controller 120B.
  • FIG. 8 is an explanatory view indicating control programs in a RAID controller 400 included in the master enclosure 110. The RAID controller (RAID CTL) 400 has a RAID control program 401 as a base program for controlling the whole system, a fault detection program 402 for detecting a fault or error in the whole system, and a non-response instruction program 403 for shutting down the drive controller (250A or 250B) where the fault has occurred and confirming whether the failed drive controller is actually shut down.
  • By reference to FIGS. 9 and 10, there will be described operation of the disk array system constructed as described above, where an abnormality occurs in one of the two drive controllers of any of the expansion enclosures and thus it is determined that an error has occurred in the system. In the following description, only one side of dual elements or only one sub-system of dual system is illustrated, with exception of elements particularly requiring illustration, for avoiding redundancy. However, it is to be understood that the other side or sub-system operates in the similar manner. The same applies to the description of second through fourth embodiments of the invention.
  • FIG. 9 is a functional block diagram illustrating in further detail the structure of the disk array system according to the first embodiment shown in FIG. 6. FIG. 10 is a flow chart illustrating operation of the disk array system of FIG. 9. The structure shown in FIG. 9 is basically the same as that of FIG. 6, but FIG. 9 additionally shows functional elements essential in this embodiment. In FIG. 9, elements corresponding to the same elements denoted by reference numerals in the 100 range in FIG. 6 are denoted by the reference numerals in the 200 range. Regarding a master enclosure 210, it is shown in FIG. 9 that a system controller 220A has a host interface 221A and a path switch 222A. A first drive controller 250A has a reset circuit 254A for resetting or powering off the interface connector and the enclosure management processor of a failed drive controller.
  • Reference numeral 282 denotes a signal line for transferring a signal from a reset circuit 254B, for resetting or powering off an enclosure management processor 253A of the first drive controller 250A. The first and a second drive controllers 250A, 250B respectively have a memory 255A, 255B in which is stored an enclosure-management- processor control program 256A, 256B. The enclosure management processors 253A, 253B are connected to a first port bypass circuit 251A and a second port bypass circuit 251B, respectively, and are assigned with respective FC-AL addresses.
  • FIG. 10 is a flow chart illustrating operation of the system shown in FIG. 9. In the structure shown in FIG. 9, the enclosure management processors 253A, 253B are directly connected to the port bypass circuits 251A, 251B, respectively. In FIG. 10, it is indicated that the port bypass circuit 251B which is on the normal side or loop directly passes a RESET/POWEROFF command for resetting/powering off the enclosure management processor 253A to the enclosure management processor 253B, thereby having the reset circuit 254A transmit a reset/poweroff signal based on this command.
  • There will be now described a flow of processing performed when a fault occurs in the system constructed as shown in FIG. 9. Here is described for illustrative purposes a specific case where an abnormality (error) occurs in the first drive controller 250A of the expansion enclosure 240, and a shut-down instruction is issued by the first drive controller 250A. A similar processing may be performed in another case where the shut-down instruction is issued by the second drive controller 250B upon an occurrence such an of error, and also in a still another case where the drive controller where the abnormality occurs is the second drive controller 250B. In FIG. 10 is indicated from left to right respective operations of: the system controller 220A (220B); failed-side port bypass circuit 251A; failed-side interface connector 252A; failed-side enclosure management processor 253A; failed-side reset circuit 254A; normal-side port bypass circuit 251B; and normal-side enclosure management processor 253B, and the order of processings performed by these functional elements are indicated by arrows.
  • First, the system controller 220A periodically issues a RECEIVE DIAG command to the enclosure management processors 253A, 253B to collect logs thereof, to monitor whether any fault occurs, by executing the fault detection program 402 shown in FIG. 8. The system controller 220B operates similarly.
  • Here, an exemplar case where a fault occurs at the interface connector 252A is discussed.
  • The failed-side port bypass circuit 251A receives the RECEIVE DIAG command directed to the enclosure management processor 253A.
  • The interface connector 252A becomes unable to communicate. The failed-side enclosure management processor 253A, which has detected via the port bypass circuit 251A that the fault has occurred at the interface connector 252A, returns fault information indicative of this fact to the system controller 220A.
  • When the system controller 220A obtains the fault information from the enclosure management processor 253A, or, does not receive a response to the RECEIVE DIAG command, the system controller 220A switches the path to a normally operating controller 250B, and issues a reset/poweroff instruction to the normal-side enclosure management processor 253B to stop or shut down the failed-side interface connector 252A and enclosure management processor 253A, by executing the non-response instruction program 403 (issuance of a SEND DIAG command).
  • The normal-side port bypass circuit 251B receives the RESET/POWEROFF command for shutting down the failed-side interface connector 252A and enclosure management processor 253A.
  • The normal-side enclosure management processor 253B receives the RESET/POWEROFF command for shutting down the failed-side interface connector 252A and enclosure management processor 253A, and issues the RESET/POWEROFF command to the reset circuit 254A.
  • The failed-side reset circuit 254A receives the RESET/POWEROFF command, and transmits a reset/poweroff signal to the failed-side interface connector 252A and enclosure management processor 253A.
  • The failed-side enclosure management processor 253A receives the reset/poweroff signal and is shut down.
  • Similarly, the failed-side interface connector 252A receives the reset/poweroff signal and is shut down.
  • According to the processing sequence as described above, the drive controller (SATA drive) on the failed side or loop is isolated or bypassed with reliability. Therefore, the communication between the system controller and each drive enclosure through the FC loop can be maintained, without causing a system down.
  • According to the present embodiment, the provision of the enclosure management processor enables to quickly deal with the fault occurring in the SATA expansion enclosure. Further, since the system controller can read and write data from and to the SATA drive via the FC loop, PBC, and interface connector, a control of a large volume of data can be enabled in the disk array system. Still further, since the plurality of system controllers is connected to the SATA expansion enclosure through the plurality of FC loops, a highly fault-tolerant disk array system can be provided.
  • Second Embodiment
  • A second embodiment is arranged such that the path bypasses a drive controller of an expansion enclosure where a fault has occurred, and goes to the normal expansion enclosure, so that the data communication with the higher-level device can be continued.
  • FIG. 11 is a functional block diagram illustrating structure of a disk array system according to the second embodiment, and shows a normally operated state of the disk array system comprising a master enclosure 710 and a plurality of expansion enclosures 740, 760, 780. Elements corresponding to the same elements in the above-described first embodiment are referred to by reference numerals in the 700 range. The master enclosure 710 comprises a dual-structured RAID controller, i.e., two system controllers 720A and 720B, which communicate with higher-level devices (e.g., host computers) 700A, 700B via host interfaces 721A, 721B, respectively. Further, the system controllers 720A, 720B have a path switch 722A, 722B, respectively.
  • The expansion enclosure 740 comprises a first drive controller 750A and a second drive controller 750B, which respectively have a port bypass circuit 751A, 751B. The first and second drive controllers 751A, 751B are connected to the path switches 722A, 722B of the system controllers 720A, 720B via the port bypass circuits 751A, 751B through FC loops. In FIG. 11, the path of the FC loop is indicated by a bold arrow.
  • FIG. 12 is a view showing detailed structure of the expansion enclosure shown in FIG. 11. Structure of each of other expansion enclosures 760, 780 shown in FIG. 11 is similar to that of the expansion enclosure 740 as shown in FIG. 12. The expansion enclosure 740 has disk drives 771, 773 which are SATA drives. The drive controller is dual-structured, i.e., comprises a first drive controller 750A and a second drive controller 750B. The first drive controller 750A is a controller for controlling the drive enclosure, i.e., drive controller, and has a first port bypass circuit 751A, a first interface connector 752A, and a first enclosure management processor 753A.
  • On the drive controller of the expansion enclosure 740 is mounted an intelligent semiconductor chip (processor). This means that the system has two command channels, namely, one originating from the system controller of the master enclosure and the other from the drive controller of the expansion controller. Accordingly, in the event of occurrence of an error at the interface connector of the drive controller, there may be caused an unexpected crash of the chip or malfunction thereof due to a latent bug, since the chip is intelligent. To prevent an influence of the malfunction on the whole system, the failed drive is reset or powered off to be completely shut down. Thus the possible malfunction is prevented.
  • The first interface connector 752A of the first drive controller 750A has a function to convert data of FC format which is transmitted on the FC loop, into SATA format. The same applies to the second drive controller 750B. The first and second enclosure management processor 753A, 753B is a processor for monitoring and managing the status of the drive enclosure 740 (e.g., power failure, abnormal temperature, and abnormal path). The first and second enclosure management processors 753A, 753B are connected via an exclusive line 780 to send respective management information to each other. The first and second enclosure management processors 753A, 753B are respectively connected to the first and second port bypass circuits 751A, 751B, and are assigned with respective FC-AL addresses.
  • The SATA drive comprises 771, 773 and dual port circuits (dual port devices) or DPDs 770, 772. The dual port circuit or DPDs 770, 772 has a function to switch the access path to the disk 771, 773 between the first interface connector 752A of the first drive controller 750A and the second interface connector 752B of the second drive controller 750B. In other words, the dual port circuit 770, 772 has a function to connect one of a data line from the first interface connector 752A of the first drive controller 750A and a data line from the second interface connector 752B of the second drive controller 750B to the disk 771, 773.
  • The first and second port bypass circuits 751A, 751B are circuits for switching the path (or data line). However, these circuits do not perform the switching of the path for themselves, but implement it in response to an instruction from the system controller 720A, 720B. The system controller 720A has a structure as shown in FIG. 7, for instance.
  • The path of the FC loop shown in FIG. 11 indicates an access path between the master enclosure and expansion enclosures 740, 760, 780, when the system does not suffer from any fault. Although the number of the expansion enclosures is not limited to three, the case where only three of them are connected is described here for simplifying illustration. At the last expansion enclosure which is not connected to another expansion enclosure any further, the FC cable turns back at the port bypass circuit 791A, as pointed by an arrow A in FIG. 11. Although FIG. 11 shows only one of the FC loops on the side of the system controller 720A of the master enclosure 710, the other loop on the side of the system controller 720B operates similarly.
  • In the normal state, first drive controllers 750A, 770A, 790A of the respective expansion enclosures 740, 760, 780 are connected by the FC loop as shown in FIG. 11. The path originates from the path switch 722A of the system controller 720A of the master enclosure 710 and enters into the first interface connector 752A via the port bypass circuit 751A of the first drive controller 750A of the expansion enclosure 740, so that read/write operations of data (communicated from and to the host computer 700A or 700B) from and to the disk drive, which is the SATA drive connected to the first interface connector 752A via a dual port circuit (not shown), are executed. The expansion enclosures 760, 780 are also similarly connected through the FC loop. Thus there is provided a SATA disk array system where components are connected by a dual FC loop as a whole.
  • Third Embodiment
  • FIG. 13 is a functional block diagram illustrating structure of a disk array system according to a third embodiment of the invention. Elements corresponding to the same elements shown in FIG. 11 are referred to by reference numerals in the 800 range. In the description of the third embodiment, there will be described a case where a fault occurs in the disk array system comprising a master enclosure 810 and a plurality of expansion enclosures 840, 860, 880. Detailed structure of the expansion enclosure 840, 860, 880 . . . is similar to that of the enclosure 740 shown in FIG. 12; the reference numerals in the 700 range are to be replaced by reference numerals in the 800 range in the third embodiment shown in FIG. 13.
  • FIG. 14 is a flow chart illustrating operation of the system shown in FIG. 13. In FIG. 14 is indicated from left to right respective operations of: the system controller 820A (820B); failed-side port bypass circuit 871A; failed-side interface connector 872A; failed-side enclosure management processor 873A; failed-side reset circuit 874A; normal-side port bypass circuit 871B; and normal-side enclosure management processor 873B.
  • Here is illustrated by reference to the flow chart of FIG. 14 a control of bypassing failed enclosure, in the event of a fault on the side of the first drive controller 870A of the expansion enclosure 860 in the system shown in FIG. 13.
  • (1) The system controller 820A periodically issues a RECEIVE DIAG command to each drive controller 850A, 850B, 870A, 870B, 890A, 890B of the expansion enclosures 840, 860, 880 to collect a log of the enclosure management processor of the each drive controller, according to the fault detection program 402, for thereby monitoring whether any fault occurs. The controller 820B operates similarly.
  • (2) A fault occurs at the first interface connector 872A of the expansion enclosure 860.
  • (3) The port bypass circuit 871A receives the RECEIVE DIAG command from the system controller 820A.
  • When the enclosure management processor 873A on the failed-side has detected the occurrence of the fault at the interface connector 872A via the port bypass circuit 871A, this fault information is returned to the system controller 820A in response to the RECEIVE DIAG command.
  • The system controller 820A obtains the fault information indicative of the occurrence of the fault at the interface connector 872A from the enclosure management processor 873A. The path is switched to the normally operating controller 850B. A reset/poweroff instruction is issued to the normal-side enclosure management processor 873B to stop or shut down the failed-side interface connector 872A and enclosure management processor 873A, according to the non-response instruction program 403.
  • The normal-side port bypass circuit 871B receives a RESET/POWEROFF command for stopping or shutting down the failed-side interface connector 872A and enclosure management processor 873A.
  • The normal-side enclosure management processor 873B receives the RESET/POWEROFF command for stopping or shutting down the failed-side interface connector 872A and enclosure management processor 873A, and issues a reset/poweroff signal to the reset circuit 874A.
  • The failed-side reset circuit 874A receives the reset/poweroff signal and transfers the reset/poweroff signal to the failed-side interface connector 872A and enclosure management processor 873A.
  • The failed-side enclosure management processor 873A receives the reset/poweroff signal and shuts down the enclosure management processor 873A.
  • The failed-side interface connector 872A also receives the reset/poweroff signal and shuts down the interface connector 872A.
  • The interface connector becomes incapable of being recognized through the port bypass circuit 871A, and therefore the path bypasses the drive controller 870A and goes to the following expansion enclosure 880.
  • According to the third embodiment where the port bypass circuit (PBC) is provided at the port of each enclosure and it is controlled such that the path bypasses the failed drive controller 870A and goes to the following expansion enclosure 880 as pointed by the arrow B in FIG. 13, the operation of the system as a whole can be continued without closing the FC loop, minimizing influence of the fault on the system.
  • Further, according to the third embodiment where the enclosure management processors are employed, a fault occurring in a SATA expansion enclosure can be quickly dealt with. Still further, since the system controller can read and write data from and to the SATA drive via the FC loop, PBC and interface connector, a control of a large volume of data can be enabled in the disk array system. Still further, since the plurality of system controllers is connected to the SATA expansion enclosure via the plurality of FC loops, a highly fault-tolerant disk array system can be provided.
  • Fourth Embodiment
  • FIG. 15 is a functional block diagram illustrating structure of a disk array system according to a fourth embodiment of the invention. Elements corresponding to the same elements shown in FIG. 13 are referred to by reference numerals in the 900 range. FIG. 15 shows a state where an abnormality has occurred in the disk array system comprising a master enclosure 910 and a plurality of expansion enclosures 940, 960, 980. Detailed structure of each expansion enclosure 940, 960, 980 . . . is similar to that of the expansion enclosure 740 as shown in FIG. 12. The reference numerals in the 700 range in FIG. 12 are to be replaced by reference numerals in the 900 range in FIG. 15. For simplification, here is illustrated a case where only three expansion enclosures are provided and control instructions are issued by a system controller 920A of a master enclosure 910 (a system controller 920B operates similarly to the system controller 920A). FIG. 16 is a flow chart illustrating operation of the system shown in FIG. 15. The fourth embodiment relates to a method of controlling isolation of a failed component, which may be called “latter-part bypass” method, and which is used in the event of occurrence of a fault in the disk array system to identify the expansion enclosure where the fault has occurred. There will be described a way of identifying a failed enclosure according to the latter-part bypass method according to the fourth embodiment, by reference to FIG. 16.
  • (1) In a case where an error occurs on an FC loop but the location where the error occurs is unknown (or can be anywhere), the controller 920A of the master enclosure 910 can not identify the location of the error. Hence the error is detected at the enclosure level by sequentially isolating defective part of the loop. Here, the latter half of the plurality of expansion enclosures on the loop is first bypassed. If the system operates normally in this state, then the latter half of the first bypassed latter half is sequentially bypassed, that is, isolated from the FC loop. In this example, the expansion enclosure 980 is bypassed.
  • (2) The expansion enclosure 960 and the following enclosures are bypassed at the port bypass circuit 971A of the expansion enclosure 960 as pointed by the arrow C. By this bypassing, an FC loop on which the master enclosure 910, expansion enclosures 940 and 960 are operating is formed.
  • (3) It is determined whether any problem is detected with regard to operation on the FC loop (i.e., presence/absence of an error is determined).
  • (4) If it is determined that there is no problem detected, the error is searched according to the bisection method. Namely, the expansion enclosure of the latter half, i.e. the expansion enclosure 980 in this example, is bypassed. Although in FIG. 15 is shown only three expansion enclosures, a greater number of expansion enclosures are actually disposed in the system and the FC loop is further sequentially bypassed according to the bisection method, and it is determined whether any problem is detected regarding the operation of the FC loop in the current bypassed state each time the bypassing is implemented.
  • (5) On the other hand, if it is determined that there is a problem detected with regard to the operation of the FC loop, in the determination as to the presence/absence of the error in the step (3), the first half of the expansion enclosures is subject to the latter-half bypassing according to the bisection method to search the location of the error. Here again is noted that, as described with respect to the step (4), although there is shown only three expansion enclosures, a greater number of expansion enclosures are actually disposed in the system and the FC loop is further sequentially bypassed according to the bisection method, and it is determined whether any problem is detected regarding the operation of the FC loop in the current bypassed state each time bypassing is implemented.
  • (6) When an enclosure or drive controller where the error has occurred is detected in the error search in the latter half in the step (4) or in the first half in the step of (5), the failed element is bypassed.
  • Bypassing the failed enclosure where the error has occurred by the error search according to the latter-half bypass method enables the disk array system as a whole to continue to transmit and receive data to and from the higher-level device.
  • According to the fourth embodiment, since the enclosure management processors are provided, a fault occurring in the SATA expansion enclosure can be quickly dealt with. Further, since the system controller can read and write data from and to the SATA drive via the FC loop, PBC and interface connector, a control of a large volume of data can be enabled in the disk array system. Still further, since the plurality of system controllers is connected to the SATA expansion enclosure via the plurality of FC loops, a highly fault-tolerant disk array system can be provided.
  • Although there has been specifically described the present invention based on the embodiments thereof, it is to be understood that, as mentioned above, the invention is not limited to the details of the embodiments but may be embodied with various modifications without departing from the scope and spirit of the invention. For instance, although there has been described for illustrative purposes a storage system as one example of information processing system to which the invention is applied, the invention is widely applicable; for example, the invention may be applied to a general information processing system where components are connected by looped communication means such as a Fibre Channel loop.
  • As described above, the invention provides a disk array system exhibiting an improved reliability, which is equipped with looped communication means and need not stop read/write operations when an error occurs. Further, in an information processing system equipped with an FC looped communication means according to the invention, an expansion enclosure where a fault occurs is isolated from the loop, thereby enabling implementation of a quick and accurate recovery operation.

Claims (1)

1. A disk array system comprising a controller enclosure, one or more serial disk drive enclosures, and a plurality of fibre channel loops respectively connecting the controller enclosure and the plurality of serial disk drive enclosures, wherein the controller enclosure comprises:
a communication controller connected to a higher-level device and receives a data from the higher-level device;
a cache memory which is connected to the communication controller and which holds the data communicated between the communication controller and the higher-level device; and
a plurality of system controllers which is connected to the higher-level device and the cache memory and which controls to transfer or receive the data communicated between the higher-level device and the communication controller, to and from the communication controller,
wherein each of the one or more serial disk drive enclosures comprises:
a plurality of port bypass circuits each of which is connected to the fiber channel loops and used to switch a connection of the serial disk drive enclosures of its own to the controller enclosure;
a plurality of interface connectors which is connected to the plurality of system controllers through the plurality of fibre channel loops, and each of which connects a fiber channel interface which is used by the plurality of fibre channel loops and an interface for serial disk drives; a plurality of dual port circuits which is connected to the plurality of interface connectors, respectively, and controls to switch a path to the serial disk drive from the plurality of interface connectors;
a plurality of serial disk drives which is connected to the plurality of port bypass circuits, receives and stores the data transferred from the system controller via the fibre channel loops, the port bypass circuits, the interface connectors, and the dual port circuits; and
an enclosure management processor which is connected to the plurality of interface connectors via the port bypass circuits, and monitors operation of the interface connectors.
US11/798,063 2004-02-06 2007-05-10 Disk array system and fault-tolerant control method for the same Abandoned US20070214318A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US11/798,063 US20070214318A1 (en) 2004-02-06 2007-05-10 Disk array system and fault-tolerant control method for the same

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
JP2004030792A JP2005222379A (en) 2004-02-06 2004-02-06 Disk-array device and control method for avoiding failure thereof
JP2004-030792 2004-02-06
US10/835,074 US7234023B2 (en) 2004-02-06 2004-04-30 Disk array system and fault-tolerant control method for the same
US11/798,063 US20070214318A1 (en) 2004-02-06 2007-05-10 Disk array system and fault-tolerant control method for the same

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US10/835,074 Continuation US7234023B2 (en) 2004-02-06 2004-04-30 Disk array system and fault-tolerant control method for the same

Publications (1)

Publication Number Publication Date
US20070214318A1 true US20070214318A1 (en) 2007-09-13

Family

ID=34857633

Family Applications (2)

Application Number Title Priority Date Filing Date
US10/835,074 Expired - Fee Related US7234023B2 (en) 2004-02-06 2004-04-30 Disk array system and fault-tolerant control method for the same
US11/798,063 Abandoned US20070214318A1 (en) 2004-02-06 2007-05-10 Disk array system and fault-tolerant control method for the same

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US10/835,074 Expired - Fee Related US7234023B2 (en) 2004-02-06 2004-04-30 Disk array system and fault-tolerant control method for the same

Country Status (2)

Country Link
US (2) US7234023B2 (en)
JP (1) JP2005222379A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060282639A1 (en) * 2005-06-09 2006-12-14 Infortrend Technology Inc. Storage virtualization subsystem architecture
US20070035875A1 (en) * 2003-12-29 2007-02-15 Sherwood Information Partners, Inc. Disk-drive enclosure having pair-wise counter-rotating drives to reduce vibration and method
US20070291642A1 (en) * 2006-06-16 2007-12-20 Hitachi, Ltd. NAS system and information processing method for the same
US20090135698A1 (en) * 2007-11-28 2009-05-28 Akira Fujibayashi Disk Controller and Storage System
WO2010068215A1 (en) * 2008-12-11 2010-06-17 Lsi Corporation Independent drive power control

Families Citing this family (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7853741B2 (en) * 2005-04-11 2010-12-14 Emulex Design & Manufacturing Corporation Tunneling SATA targets through fibre channel
US20060288155A1 (en) * 2005-06-03 2006-12-21 Seagate Technology Llc Storage-centric computer system
EP1768026B1 (en) * 2005-09-23 2008-06-11 Infortrend Technology, Inc. Redundant storage virtualization subsystem having data path branching functionality
US7903556B2 (en) * 2005-11-03 2011-03-08 Hewlett-Packard Development Company, L.P. Method of controlling data transfers between nodes in a computer system
US7752387B2 (en) * 2006-03-21 2010-07-06 International Business Machines Corporation Offloading firmware update tasks from RAID adapter to distributed service processors in switched drive connection network enclosure
US7676694B2 (en) * 2006-03-31 2010-03-09 Emc Corporation Managing system components
JP2007280258A (en) * 2006-04-11 2007-10-25 Hitachi Ltd Memory control device
US7596723B2 (en) * 2006-09-22 2009-09-29 International Business Machines Corporation Apparatus, system, and method for selective cross communications between autonomous storage modules
JP4982304B2 (en) * 2007-09-04 2012-07-25 株式会社日立製作所 Storage system that understands the occurrence of power failure
US20090125754A1 (en) * 2007-11-08 2009-05-14 Rashmi Chandra Apparatus, system, and method for improving system reliability by managing switched drive networks
JP4500346B2 (en) * 2007-11-21 2010-07-14 富士通株式会社 Storage system
JP5353002B2 (en) * 2007-12-28 2013-11-27 富士通株式会社 Storage system and information processing apparatus access control method
JP5127491B2 (en) * 2008-02-08 2013-01-23 株式会社日立製作所 Storage subsystem and control method thereof
US8555042B2 (en) * 2008-05-29 2013-10-08 International Business Machines Corporation Apparatus, system, and method for resetting and bypassing microcontroller stations
GB2508178B (en) * 2012-11-22 2014-10-15 Xyratex Tech Ltd Data storage device enclosure and module
US9268493B2 (en) * 2012-11-28 2016-02-23 Dell Products L.P. Systems and methods for smart storage interconnection in a heterogeneous storage environment
WO2015151239A1 (en) * 2014-04-02 2015-10-08 株式会社日立製作所 Communications semiconductor integrated circuit, storage device, and failure management method for storage device
JP2016038656A (en) 2014-08-06 2016-03-22 富士通株式会社 Connection monitoring device, connection monitoring program and connection monitoring method
US9645872B1 (en) * 2015-03-27 2017-05-09 EMC IP Holding Company LLC Method to use multipath to reduce IO error handle duration
US10419278B2 (en) * 2015-10-02 2019-09-17 Ricoh Company, Ltd. Device management system, information processing apparatus, and information processing method
US10372364B2 (en) * 2016-04-18 2019-08-06 Super Micro Computer, Inc. Storage enclosure with daisy-chained sideband signal routing and distributed logic devices
JP7477780B2 (en) * 2020-02-13 2024-05-02 日本電信電話株式会社 Communication device and error detection method
JP7360063B2 (en) * 2020-02-13 2023-10-12 日本電信電話株式会社 Communication equipment and error handling methods

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5812754A (en) * 1996-09-18 1998-09-22 Silicon Graphics, Inc. Raid system with fibre channel arbitrated loop
US5898828A (en) * 1995-12-29 1999-04-27 Emc Corporation Reduction of power used by transceivers in a data transmission loop
US20010011357A1 (en) * 2000-02-01 2001-08-02 Yoshiaki Mori Troubleshooting method of looped interface and system provided with troubleshooting function
US6504817B2 (en) * 1997-03-31 2003-01-07 Hewlett-Packard Company Fiber channel arbitrated loop dynamic loop sizing
US20030013557A1 (en) * 1993-06-01 2003-01-16 Kennedy Thomas J. Multi-layer golf ball
US6658504B1 (en) * 2000-05-16 2003-12-02 Eurologic Systems Storage apparatus
US6725394B1 (en) * 2000-10-02 2004-04-20 Quantum Corporation Media library with failover capability
US20040236908A1 (en) * 2003-05-22 2004-11-25 Katsuyoshi Suzuki Disk array apparatus and method for controlling the same
US6898730B1 (en) * 2001-11-30 2005-05-24 Western Digital Technologies, Inc. System and method for fail-over switching in a disk storage medium
US6915381B2 (en) * 2001-12-12 2005-07-05 International Business Machines Corporation System and method for transferring data from a secondary storage controller to a storage media after failure of a primary storage controller

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030135577A1 (en) * 2001-12-19 2003-07-17 Weber Bret S. Dual porting serial ATA disk drives for fault tolerant applications

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030013557A1 (en) * 1993-06-01 2003-01-16 Kennedy Thomas J. Multi-layer golf ball
US5898828A (en) * 1995-12-29 1999-04-27 Emc Corporation Reduction of power used by transceivers in a data transmission loop
US5812754A (en) * 1996-09-18 1998-09-22 Silicon Graphics, Inc. Raid system with fibre channel arbitrated loop
US6504817B2 (en) * 1997-03-31 2003-01-07 Hewlett-Packard Company Fiber channel arbitrated loop dynamic loop sizing
US20010011357A1 (en) * 2000-02-01 2001-08-02 Yoshiaki Mori Troubleshooting method of looped interface and system provided with troubleshooting function
US6658504B1 (en) * 2000-05-16 2003-12-02 Eurologic Systems Storage apparatus
US6725394B1 (en) * 2000-10-02 2004-04-20 Quantum Corporation Media library with failover capability
US6898730B1 (en) * 2001-11-30 2005-05-24 Western Digital Technologies, Inc. System and method for fail-over switching in a disk storage medium
US6915381B2 (en) * 2001-12-12 2005-07-05 International Business Machines Corporation System and method for transferring data from a secondary storage controller to a storage media after failure of a primary storage controller
US20040236908A1 (en) * 2003-05-22 2004-11-25 Katsuyoshi Suzuki Disk array apparatus and method for controlling the same

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070035875A1 (en) * 2003-12-29 2007-02-15 Sherwood Information Partners, Inc. Disk-drive enclosure having pair-wise counter-rotating drives to reduce vibration and method
US7505264B2 (en) * 2003-12-29 2009-03-17 Atrato, Inc. Disk-drive enclosure having pair-wise counter-rotating drives to reduce vibration and method
US20060282639A1 (en) * 2005-06-09 2006-12-14 Infortrend Technology Inc. Storage virtualization subsystem architecture
US8352649B2 (en) * 2005-06-09 2013-01-08 Infortrend Technology, Inc. Storage virtualization subsystem architecture
US20070291642A1 (en) * 2006-06-16 2007-12-20 Hitachi, Ltd. NAS system and information processing method for the same
US20090135698A1 (en) * 2007-11-28 2009-05-28 Akira Fujibayashi Disk Controller and Storage System
US8243572B2 (en) * 2007-11-28 2012-08-14 Hitachi, Ltd. Disk controller and storage system
WO2010068215A1 (en) * 2008-12-11 2010-06-17 Lsi Corporation Independent drive power control
CN101889269A (en) * 2008-12-11 2010-11-17 Lsi公司 Independent drive power control
US20110231674A1 (en) * 2008-12-11 2011-09-22 Stuhlsatz Jason M Independent drive power control
TWI470411B (en) * 2008-12-11 2015-01-21 Lsi Corp Independent drive power control

Also Published As

Publication number Publication date
JP2005222379A (en) 2005-08-18
US7234023B2 (en) 2007-06-19
US20050188247A1 (en) 2005-08-25

Similar Documents

Publication Publication Date Title
US20070214318A1 (en) Disk array system and fault-tolerant control method for the same
US7412628B2 (en) Storage system and disconnecting method of a faulty storage device
US6826714B2 (en) Data gathering device for a rack enclosure
US7650532B2 (en) Storage system
EP0658850B1 (en) Storage system
US7441130B2 (en) Storage controller and storage system
US7702823B2 (en) Disk subsystem monitoring fault
US8392756B2 (en) Storage apparatus and method of detecting power failure in storage apparatus
US7111087B2 (en) Storage control system and operating method for storage control system
JP5561622B2 (en) Multiplexing system, data communication card, state abnormality detection method, and program
JP2005293595A (en) Multi-path redundant storage system architecture and method
US8904201B2 (en) Storage system and its control method
US7730474B2 (en) Storage system and automatic renewal method of firmware
JP2009015584A (en) Storage control device and method for controlling power supply of casing unit
US8095820B2 (en) Storage system and control methods for the same
JP2000181887A5 (en)
US7216195B1 (en) Architecture for managing disk drives
US7487293B2 (en) Data storage system and log data output method upon abnormality of storage control apparatus
US8429462B2 (en) Storage system and method for automatic restoration upon loop anomaly
CN113535472A (en) Cluster server
US10454757B2 (en) Control apparatus, storage apparatus, and non-transitory computer-readable recording medium having stored therein control program
KR100295746B1 (en) Redundancy Device and Method of Data Storage in ATM Exchange System
JP3661665B2 (en) How to close a package
US7346674B1 (en) Configurable fibre channel loop system
EP2000911B1 (en) Computer system comprising at least two computers for continuous operation of said system

Legal Events

Date Code Title Description
STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION