WO2005006174A1

WO2005006174A1 - Disc system

Info

Publication number: WO2005006174A1
Application number: PCT/JP2003/008769
Authority: WO
Inventors: Hidejiro Daikokuya; Mikio Ito; Kazuhiko Ikeuchi
Original assignee: Fujitsu Limited
Priority date: 2003-07-10
Filing date: 2003-07-10
Publication date: 2005-01-20
Also published as: WO2005006176A1

Abstract

An access data string transmitted from a host (10) is sent to a center module (CM) via a channel adapter (CA). Here, it is decided whether to perform striping management. If the striping management is performed and the data is divided into stripes, the plurality of the stripes are sent to a plurality of device adapters (DA). In the device adapters (DA), it is possible to set a type of control of an RAID (5), whether to perform mirroring, or to perform nothing. If control of the RAID (5) is performed, a parity is added to the data which has been divided into stripes and the data is stored in a physical disc connected to the device adapter (DA).

Description

Description Disc system Technical field

The present invention relates to a disk system. Background art

At present, various disk systems are in practical use, and among them, R AID 5 is common.

FIG. 1 is a diagram showing a configuration of a disk system of the RAID 5.

The host 10 sends the host access data sequence to the channel adapter CA of the RAI D5 system. In the figure, a state in which D1 to D4 are arranged in a row is shown as a host access data string. This is sent to the center module CM, temporarily stored in the cache, and then sent to the device adapter DA. The device adapter DA controls R AID 5 for the received host access data sequence. That is, the data sequence is divided into stripes D1 to D4, and a process of adding a parity (represented as P in the figure) is performed. Then, the data sequence controlled by the R AID 5 is stored in four physical disks connected to the device adapter D A. As shown in the lower part of Fig. 1, the data should be stored so that D1 to D4, which are one data string, are evenly arranged on four physical disks. In addition, the parity is also stored on the distributed physical disk.

2 to 4 are diagrams for explaining the problems of the conventional RAID5.

As shown in Figure 2, in a traditional RAID 5 system,-one disk failed As a result, a loop-down occurs, and all the disks on the FC (Fiber Channel) loop including the disk become invisible. In other words, the traditional RAI

In the way of assembling the disk of D5,

In this case, multiple disks in the RAID group could not be accessed due to loop-down due to the loop-down configuration, and as a result, normal I / O processing such as read / write could not be performed. .

Therefore, a system as shown in Fig. 3 exists as a countermeasure. That is,

It connects one disk at a time to the following loop. In this case, a total of four disks connected to different loops make up one RAID group. Therefore, the four disks in Fig. 3 form a set. Here, if one of the four disks fails, the other three disks are still accessible. Therefore, the data stored on the failed disk can be restored using the parity.

Figure 4 is a diagram explaining the problems when rebuilding when many disks are connected.

In the configuration of FIG. 4, the R AID 5 is configured using 16 physical disks. In rebuilding (redundancy restoration processing), data is read from 15 disks, XOR is performed, parity is generated, and then data is written to the spare disk HS. Therefore, as more disks are connected, the number of disks that need to be read during rebuilding increases, and the process of rebuilding becomes slower.

Further, as a similar problem, there is a problem that a processing delay occurs at the time of writing. For example, in RAID 5, when writing back, when writing in a state in which the RAID group has lost redundancy due to a disk failure, etc. If the disk to which data is to be written has failed, XOR is performed on the data read from all disks except parity and the data to be written back, a parity is created, and this is written to the parity disk. Therefore, processing is liable to be delayed because all disks are read or written.

As conventional disk systems, there are systems such as those described in Patent Literature 1, Patent Literature 2, Patent Literature 3, and Patent Literature 4.

Patent Literature 1 discloses a newly-built built-in cache via technology when a processor of a channel adapter and a processor of a disk adapter access a shared memory. Patent Document 2 discloses a technique in which a host adapter, a disk adapter, and a cache memory for temporary storage are connected to a common path and are commonly connected. Patent Document 3 discloses a technique for performing a spindle / block write-back restoration in an array disk processing device. Patent Document 4 discloses a technique in which a disk array is provided with a cache memory that functions as a virtual disk that holds the contents of a disk drive.

Patent Document 1

JP 2001-306265 A

Patent Document 2

JP-A-7-20994

Patent Document 3

JP-A-11-66693

Patent Document 4

JP 7-200190A Disclosure of the Invention An object of the present invention is to provide a disk system which is excellent in scalability and can efficiently realize redundancy when a failure occurs.

The disk system of the present invention is a disk system adopting a redundancy configuration, and a stripe processing management means for managing stripes of data columns according to a setting as to whether a data column from a host is divided into stripes, Redundancy management means for providing data redundancy and storing the data on a plurality of physical disks based on the setting of whether or not to perform redundancy management on the data from the stripe processing management means; By separating the functions of management and redundancy management, disk systems of various configurations can be configured.

According to the present invention, each of the center module and the device adapter has a function of managing the striping and a function of managing the redundancy, which are all conventionally performed by the device adapter. Various disk systems can be easily constructed by combining them. Also, by receiving a plurality of device adapters DA as redundancy management means, it is possible to easily expand the capacity of the disk system and to improve the performance by distributed processing. In addition, since the effects such as the write penalty that occurred in the conventional RAID 5 are closed within one device driver (redundancy management means), the effects can be prevented from affecting the entire disk system. Brief Description of Drawings

FIG. 1 is a diagram showing a configuration of a disk system of the RAID 5.

2 to 4 are diagrams for explaining the problems of the conventional RAID5.

FIG. 5 is a configuration diagram of the disk system according to the embodiment of the present invention. FIG. 6 is a diagram showing a state of mapping between a RAID group and a physical disk in the embodiment of the present invention. FIG. 7 is a diagram showing a configuration of the RLU information table.

FIG. 8 is a configuration example of the DLU information table.

FIG. 9 is a flowchart showing the operation of the center module CM.

FIG. 10 is a flowchart showing the operation of the device adapter DA.

FIG. 11 is a diagram for explaining the configuration of R AID 1 when the disk system according to the embodiment of the present invention is used.

FIG. 12 is a diagram for explaining the configuration of R AID 0 + 1 when the disk system according to the embodiment of the present invention is used.

FIG. 13 is a view for explaining the configuration of R AID 5 when the disk system according to the embodiment of the present invention is used.

FIG. 14 is a diagram illustrating the configuration of R AID 0 + 5 when the disk system according to the embodiment of the present invention is used.

FIG. 15 is a diagram for explaining R AID 0 + 5.

FIG. 16 is a diagram for explaining rebuilding in the disk system according to the embodiment of the present invention. BEST MODE FOR CARRYING OUT THE INVENTION

FIG. 5 is a configuration diagram of the disk system according to the embodiment of the present invention.

First, the host 10 sends a host access data string. The data stream is received by the channel adapter C A and transferred to the center module C M. The center module CM performs only the stripe control and distributes the data string to the two device adapters DA. Each device adapter DA performs RAI D5 control, generates parity, and stores each stripe on four physical disks connected to the device adapter DA.

As is clear from FIG. 5, in the embodiment of the present invention, the center module CM Stripe control is performed by considering the four physical disks connected to the device adapter DA as virtual disks, and then RAID 5 control, which is a control for redundancy, is performed at the device adapter DA corresponding to each virtual disk Like that.

According to the above embodiment of the present invention,

1) The roles are divided between the center module CM that performs stripe control and the device adapter DA that performs R AID 5 control, so the degree of freedom in configuration is further increased.

2) Performance improvement by distributed processing can be expected by using multiple device adapters DA that perform R AID 5 control.

3) In general, in RAID 5, as the number of configured disks increases, the number of disks used due to a write penalty in the event of a disk failure increases.However, in the embodiment of the present invention, the effect of the write penalty is limited only to the virtual disk. Just having a small area of influence is enough.

There is an advantage.

FIG. 6 is a diagram showing a state of mapping between a RAID group and a physical disk in the embodiment of the present invention.

First, the uppermost layer shows the entire disk system and is called the RAI D group. At the next stage, stripe management is performed and data is distributed to virtual logical units. Next, redundancy management is performed, and each data stripe is mapped to a physical disk.

Here, stripe management is to manage simple striping without redundancy. Redundancy management refers to management of mirroring in RAID 1 and stripes including parity in RAID 5. Here, the function of performing only stripe management that manages simple striping without redundancy is referred to as RAIDO. In the disk system according to the embodiment of the present invention, various types of disk systems can be configured according to the presence / absence of stripe management, the presence / absence of redundancy management, and, in the redundancy management, mirror processing or stripe processing including parity. .

Stripe management Redundancy management

RA I D 0 Yes No

RA I D 1 None Mirroring

RA I D 0 + 1 Yes Mira link "

RA I D 5 None Stripe with parity

R RAA II DD 00 + + 55 Yes Stripe including parity The terms used in the following description are defined as follows. First, an R LU represents one RA ID group. RLB A is the LBA (Logical Buffer Address) of the RLU. PLU means a physical disk alone. PLBA is LBA of PLU. DLU is logically an LU (logical unit) intermediate between RLU and PLU. It is a division of stripes from RLU, and the redundancy of PLU is considered. DLBA is the LBA of DLU.

FIG. 7 is a diagram showing a configuration of the RLU information table.

The RLU information table is a table that holds information in RLU units. This table is mainly held by the center module CM, and a necessary part of the table is copied by the channel adapter CA and the device adapter DA. RLUN is the RLU number. Status is the status of the RLU. Define CM Module ID is the module ID of the CM in the configuration responsible for the R LU. The Current CM Module ID is the module ID of the CM currently in charge of the R LU. Raid Level indicates the RAID level of the RLU, RA ID 0 and RA ID 5 are set. RLBAcount is the number of LBAs of the RLU. Stripe Depth is the length of one stripe when striping. The Stripe Size is, for example, the total length of the four stripes when one stripe is allocated to each of the four physical disks. That is, the total length of the stripes for the physical disk. Others are not so involved in the description of the embodiment of the present invention, and thus the description is omitted.

FIG. 8 is a configuration example of the DLU information table.

This is a table holding information in DLU units. This table is mainly held by the center module CM, and a necessary part of the table is copied by the channel adapter CA and the device adapter DA. DLUN is the DLU number. Stats is the status of the DLU. RLUN is the RLU number to which the DLU belongs. The Member Disk Number is a number indicating the order of the DLU constituting the stripe. Start DLBA is the first LBA of DLU. Block Count is the total number of blocks of the DLU. Stripe Depth and Stripe Size are as described above. Member Disk Count is the number of disks used in the DLU. DA ID XX is the module ID of DA connected to the D LU. PLU in Use xx is the PLU number that makes up the current DLU. Defined PLU xx is the PLU number in the configuration that makes up the DLU.

FIG. 9 is a flowchart showing the operation of the center module CM.

First, in step S10, the presence or absence of stripe management is determined. If it is determined in step S10 that there is no stripe management, in step S11, the spectrum traprocessor is divided. Then, using the RLU information tape, the set of RL UN and RLB A is converted into a corresponding set of DLUN and DL BA at least equal to or greater than tar. If it is determined in step S10 that there is stripe management, in step S12 the RLU information table is Using the set of 1 111 ^ and 1 8, a corresponding one-to-one conversion is performed to the corresponding 111 [and] 38.

In step S13, the type of redundancy management is determined. In step S13, when it is determined that there is no redundancy management, in step S16, the pair of DLUN and DLB A is used for the corresponding P LUN and PLB A using the DLU information table. One-to-one conversion into pairs is performed, and the process proceeds to step S17.

If it is determined in step S13 that the redundancy management is mirroring, mirroring is considered in step S15. Using the DLU information table, the pair of DLUN and DLBA is converted into two corresponding pairs of P LUN and PLBA, and the process proceeds to step S17.

In step S13, if it is determined that the redundancy management is a stripe including parity, in step S14, the device adapter DA functions by designating a pair of DL UN and DLB A. to perform processing. finish.

In step S17, a pair of a P LUN and a PLB A is specified for the device adapter DA, activated, and the processing is terminated.

FIG. 10 is a flowchart showing the operation of the device adapter DA. In step S20, the type of the designated LUN is determined. If it is determined in step S20 that the type is a pair of P LUN and PLB A, then in step S21, the disk is processed according to the specified pair of P LUN and PLB A. (Read, write, etc.) and terminate the process. If it is determined that the type in step S20 is a pair of DLUN and DLBA, in step S22, the conventional RAID 5 is controlled. In the process of controlling RA ID 5, the set of 0 1; ^ 1 and 018 is converted into at least one set of PL UN and P LB A corresponding to the set using the DLU information table. Executes reading and writing to the disk, and ends the processing. FIG. 11 is a diagram for explaining the configuration of RAID 1 when the disk system according to the embodiment of the present invention is used.

Since a LUN does not have a stripe, there is only one DLUN corresponding to RLUN. In mirroring, two DLUNs are provided because two disks are used. Since LBA has no stripe, RLB A = DLB A. Since mirroring only writes the same information to both disks, DLBA = PLBA. In FIG. 11, the RAID group RLU has a one-to-one relationship with the virtual logical unit DLU, and the virtual logical unit DLU mirrors two physical disks PLU to store data.

FIG. 12 is a diagram illustrating the configuration of R AID 0 + 1 when the disk system according to the embodiment of the present invention is used.

Four virtual logical unit DLUs are mapped to the RAID group RLU. This mapping is realized by performing striping. Each of the virtual logical units DLU is provided with two physical disks PLU, and stores data by performing mirroring. The diagram below FIG. 12 is a diagram showing how data is stored. There are four DLUs, # 0 to # 3, each of which is associated with two PLUs. The data is divided by striking and DO, D1, ..., D15 from the top. These are stored one by one in the horizontal direction, that is, in the direction of different virtual logical units.

FIG. 13 is a diagram for explaining the configuration of RAID 5 when the disk system according to the embodiment of the present invention is used.

As described on the left side of FIG. 13, the correspondence between the RA ID group RLU and the virtual logical unit DLU is one-to-one. The correspondence between the DLU and the physical disk PLU is one-to-many (in this case, one-to-four), and the DLU performs RA ID 5 control. Then, as shown on the right of FIG. 13, redundancy control is performed on a data string divided into stripes, and data is stored on each physical disk along with parity.

As shown in the upper part of FIG. 14, two virtual logical units DLU correspond to the RAID group RLU, and during this time, mapping is performed by striping. Between the DLU and the physical disk PLU, four PLUs correspond to one DLU and are mapped by RAID5 control. As shown in the lower part of FIG. 14, data is first divided into two DLUs by striping, and these are divided into four physical disks by RAID5 control. At this time, parity is generated because the RAID 5 control performs redundancy management. Also, the Stripe Size of the DLU information table and the Stripe Depth of the RLU information table are equal.

FIG. 15 is a diagram for explaining RA ID 0 + 5.

The CM performs stripe control and maps data to four DAs. Each DA performs redundancy management and maps data to four physical disks. In the above example, the CM maps to two DAs. However, the CM can be added in units of DAs and the physical disks under the DAs, so that the performance can be improved by distributed processing. Further, as described above, in the disk system according to the embodiment of the present invention, a disk system having various configurations other than RA ID 0 + 5 can be configured. And DA that performs RA ID 5 control.

FIG. 16 is a diagram for explaining rebuilding in the disk system according to the embodiment of the present invention. If DA has 4 FC ports, RAI D0 + 5 is formed by combining 4 physical disks connected to different FC ports, so that when rebuilding any DLU, up to 3 disks are read. Limited, read / write is possible even when 1 FC loop loop down occurs. Industrial applicability

The present invention provides a disk system as a storage device of a computer system that has excellent scalability, has a large degree of freedom in configuration, and can limit the influence of performance degradation to maintain the performance of the entire system. Can be provided.

Claims

The scope of the claims

1. A disk system with a redundant configuration,

Stripe processing management means for performing stripe management of the data string according to the setting as to whether the data string from the host is divided into stripes,

A redundancy management means for giving redundancy to the data and storing the data on a plurality of physical disks based on a setting as to whether or not to perform redundancy management on the data from the stripe processing management means. And disk system.

2. The disk system according to claim 1, wherein a plurality of said redundancy management means are provided.

3. The disk system according to claim 1, wherein the capacity of the disk system is increased by increasing the number of the redundancy management means and the number of physical disks connected to the redundancy management means.

4. The disk system according to claim 1, wherein the redundancy management means executes a RAID 5 process as a redundancy management method.

5. The disk according to claim 1, wherein the stripe processing management means notifies the redundancy management method of the redundancy management method by setting a redundancy management method. system.

6. The disk system according to claim 5, wherein the redundancy management method includes control of a stripe including parity and mirroring.

7. The disk system according to claim 1, wherein the stripe management is performed by designating a size and a depth of the stripe.

8. A method of configuring a disk system employing a redundancy configuration,

Providing a stripe processing management means for performing stripe management of the data string in accordance with the setting as to whether or not to divide the data string from the host into stripes; and determining whether or not to perform redundancy management on the data from the stripe processing management means. Providing redundancy management means for storing data on a plurality of physical disks by providing data redundancy based on the setting.

9. The method according to claim 8, wherein a plurality of the redundancy management units are provided.

10. The disk according to claim 8, wherein the capacity of the disk system is increased by increasing the number of the redundancy management means and the number of physical disks connected to the redundancy management means. How to configure the system.

11. The disk system configuration method according to claim 8, wherein said redundancy management means executes a RAID 5 process as a redundancy management method.

12. The disk system according to claim 8, wherein the stripe processing management means notifies the redundancy management means of the redundancy management method by setting a redundancy management method. Configuration method.

13. The disk system configuration method according to claim 12, wherein the redundancy management method includes control of a stripe including parity and mirroring.

14. The method according to claim 8, wherein the stripe management is performed by designating a size and a depth of the stripe.