WO2018016041A1

WO2018016041A1 - Storage system

Info

Publication number: WO2018016041A1
Application number: PCT/JP2016/071350
Authority: WO
Inventors: 前田　徹; 正範藤井
Original assignee: 株式会社日立製作所
Priority date: 2016-07-21
Filing date: 2016-07-21
Publication date: 2018-01-25

Abstract

A first storage device includes a first volume. A second storage device includes a second volume that forms a synchronous copy pair with the first volume. A third storage device includes a third volume that forms an asynchronous copy pair with the first volume. The first volume, the second volume, and the third volume are associated with a single virtual volume. The first storage device transmits an asynchronous write request to the third storage device in response to a request to write to the virtual volume. Then if the result of a health check performed after the third storage device has received the asynchronous write request indicates that the third storage device can perform normal communication with the first storage device, then the third storage device requests the first storage device to resynchronize the third volume and the first volume.

Description

Storage system

The present invention relates to a storage system.

In a storage system, a high availability (HA) function based on a cluster configuration of storage devices is required. The HA function realizes high availability of the storage system. The HA configuration has a multiplexed system, and when a failure occurs, it automatically disconnects the failed system and continues operation using only a normal system. Furthermore, the HA function realizes effective use of resources and load distribution by operating a plurality of systems as active systems.

A background technology for realizing high availability of a storage system is disclosed in Patent Document 1. Patent Document 1 states that “The storage system of the present invention generates a virtual volume based on a remote copy pair and provides the virtual volume to the host. The first storage device and the second storage device are the third storage device. The lock disk stores information for controlling the use of the virtual volume, which is generated based on the remote copy pair consisting of the primary volume and the secondary volume. The user can create and delete a virtual volume and create and delete a lock disk by issuing an instruction from the management server ”(see summary).

US Patent Application Publication No. 2011/0066801

The conventional storage system synchronizes multiplexed volumes in order to maintain the identity of the data in the volume in the HA configuration. The storage system returns a write completion response to the host after the data is written to the multiplexed volume. Therefore, overhead increases as the number of multiplexed storage devices increases, and the response performance to the host greatly decreases.

A representative example of the present invention includes a first storage device including a first volume, a second storage device including a second volume that constitutes a copy pair with the first volume, and the first volume and copy pair. A storage system including a third storage device and a quorum disk including a third volume to be configured, wherein the first volume, the second volume, and the third volume are accessed from outside The first storage device, the second storage device, and the third storage device are associated with a virtual volume, respectively, and accept access from outside to the virtual volume, and the first storage device In response to a write request to the virtual volume, write data is sent to the first volume. The second storage device writes the write data to the second volume after writing the write data to the first volume, and the storage system writes the write data to the second volume. After the data write, the write completion response of the write request is returned to the host, and the first storage device sends an asynchronous write request for the third volume to the third storage device in response to the write request. The third storage device accesses the quorum disk at a preset timing and executes a health check of the first storage device and the second storage device, and the third storage device The result of the health check after receiving the write request is the first storage device. To indicate that it is possible to normally communicate, the resynchronization request from the third volume and the first volume, requests the first storage device.

According to one aspect of the present invention, in a storage system having an HA configuration in which three or more storage apparatuses are multiplexed, it is possible to suppress a decrease in response performance with respect to write access.

An outline of the present embodiment will be shown. An outline of the present embodiment will be shown. The structural example of a computer system is shown. An example of a hardware configuration of a host computer and a storage device is schematically shown. The information stored in the shared memory of CMPK is shown. The structural example of a volume pair management table is shown. 4 shows a configuration example of an asynchronous write request management table. The structural example of a quorum update time management table is shown. Indicates information stored in the quorum disk. The structural example of a 1st DKC management table is shown. The structural example of the last update time management table is shown. The ladder chart of the processing of the write request transmitted from the host computer to MDKC (PVOL) is shown. The flowchart of the process by MDKC which received the write request from the host computer or is shown. The ladder chart of the processing of the write request transmitted from the host computer to RDKC [1] (SVOL [1]) is shown. The flowchart of the process by RDKC [1] which received the write request from the host computer is shown. The flowchart of the data resynchronization process between RDKC is shown. The flowchart of the health check by RDKC is shown. The flowchart of the health check by MDKC is shown. The flowchart of a forced resynchronization process is shown. The flowchart of the read process by RDKC is shown.

Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings. It should be noted that this embodiment is merely an example for realizing the present invention, and does not limit the technical scope of the present invention. In each figure, the same reference numerals are given to common configurations.

1A and 1B are diagrams showing an outline of the present embodiment. The configuration disclosed in FIGS. 1A and 1B includes a host computer 180, a first storage device 10A, a second storage device 10B, a third storage device 10C, and a fourth storage device 10D. The storage device 10A, the storage device 10B, and the storage device 10C provide the virtual storage device 200 to the host computer 180.

The fourth storage device 10D holds the quorum disk 201D. The quorum disk 201D is used by the storage device that detects the failure to notify the other storage device of the fact that the failure has occurred when a failure occurs in the path or the storage device. Each storage device periodically accesses the quorum disk 201D in order to detect the occurrence of a failure.

The volume 201A of the storage apparatus 10A, the volume 201B of the storage apparatus 10B, and the volume 201C of the storage apparatus 10A are provided as one virtual volume 210 of the virtual storage apparatus 200. These store the same data and indicate the same ID to the host.

The host computer 180 transmits a read / write instruction to one virtual volume 210, but in reality, the instruction may be transmitted to any of the

volumes

201A, 201B, and 201C. Such a group of volumes is called a volume pair group or simply a pair group. The volume pair group is composed of a plurality of volume pairs, and is composed of two volume pairs in the examples of FIGS. 1A and 1B.

One volume pair is composed of a primary volume (PVOL) and a secondary volume (SVOL). In the volume pair group, there is one PVOL. The PVOL forms a volume pair with each SVOL. In the example of FIGS. 1A and 1B, the volume 201A is a PVOL, and the

volumes

201B and 201C are SVOLs. As will be described later, the volume 201B is the first SVOL [1] of the synchronous copy destination, and the volume 201C is the second SVOL [2] of the asynchronous copy destination.

As in this example, a volume pair that receives access from the host for any volume is referred to as an Active-Active type High Availability (HA) volume pair or HA pair. A configuration including an HA pair is referred to as an HA configuration.

A storage device that provides a PVOL of a volume pair group is called an MDKC (Main DisK Controller) of the volume pair group. The storage device that provides the SVOL of the volume pair group is referred to as RDKC (Remote DiscK Controller) of the volume pair group.

Note that one storage device can include PVOL and SVOL of different volume pair groups. That is, one storage device functions as MDKC or RDKC according to the volume type in each HA volume pair group.

FIG. 1A shows an outline of the processing flow when a host write request is issued from the host computer 180 to the volume 201A which is a PVOL. The MDKC (first storage device) 10A receives a host write request and data from the host computer 180 (S101).

The MDKC 10A acquires the exclusion of the write destination address of the PVOL 201A and writes the data to the PVOL 201A (S102). The MDKC 10A transmits a write request and data to the RDKC [1] (second storage device) 10B in order to store data in the SVOL [1] 201B (S103).

The RDKC [1] 10B that has received the write request and data writes the data to the SVOL [1] 201B (S104). The RDKC [1] 10B transmits a write completion response to the MDKC 10A (S105). The MDKC 10A releases the exclusion acquired in step S102, and sends a host write completion response to the host computer 180 (S106).

The MDKC 10A transmits an asynchronous write request to the RDKC [2] (third storage device) 10C (S104). Write data is not transmitted to RDKC [2] 10C, and only an asynchronous write request is transmitted. As will be described later, the SVOL [2] 201C is resynchronized with the PVOL 201A at a predetermined timing.

The write data is written synchronously to PVOL 201A and SVOL [1] 201B, and further asynchronously written to SVOL [2] 201C. Specifically, the host write completion response is returned to the host computer 180 after data write to PVOL 201A and SVOL [1] 201B and before data write to SVOL [2] 201C. Therefore, it is possible to suppress a decrease in response performance to the host light while ensuring reliability.

As will be described later, when the RDKC (RDKC [1] 10B or RDKC [2] 10C) receives a host write request and write data from the host computer 180, the RDKC transmits the write request and write data to the MDKC 10A. Write data is written to SVOL [1] 201B after being written to PVOL 201A. This prevents old data from being read after new data is read, and appropriately maintains the identity of PVOL and SVOL in response to an access request from the host.

The MDKC 10A transmits an asynchronous write request to the RDKC [2] 10C after the data write to the PVOL 201A. The RDKC that has received the host write request returns a host write completion response to the host computer 180 after data write to the PVOL 201A and SVOL [1] 201B and before data write to the SVOL [2] 201C.

FIG. 1B shows an outline of resynchronization between PVOL 201A and SVOL [2] 201C. Resynchronization copies unupdated data from PVOL 201A to SVOL [2] 201C. The RDKC [2] 10C starts resynchronization between the PVOL 201A and the SVOL [2] 201C in synchronization with the health check using the quorum disk 201D. This makes processing more efficient.

The RDKC [2] 10C performs a health check using the quorum disk 201D at a predetermined timing. The RDKC [2] 10C reads the management information of the storage apparatuses 10A to 10C from the quorum disk 201D in the health check (S121). The RDKC [2] 10C refers to the management information and checks the status of other storage devices and the status of the path.

The RDKC [2] 10C manages the time of the previous health check and the time of the asynchronous write request received from the MDKC 10A. When the previous asynchronous write request is after the previous health check, the RDKC [2] 10C writes unupdated data to the SVOL [2] 201C. The RDKC [2] 10C determines a storage device from which unupdated data is read, that is, a volume to which the SVOL [2] 201C is resynchronized according to the result of the health check.

When normal communication with the MDKC 10A is possible, the RDKC [2] 10C reads unupdated data from the MDKC 10A (S122). When normal communication with the MDKC 10A is impossible, the RDKC [2] 10C reads unupdated data from the RDKC [1] 10B. Thereafter, the RDKC [2] 10C updates its own management information in the quorum disk 201D (S123), and the health check by the RDKC [2] 10C ends.

RDKC [2] 10C selects a storage device from which unupdated data is read for data resynchronization of SVOL [2] 201C. By performing data resynchronization of SVOL [2] 201C during normal health check, data resynchronization of SVOL [2] 201C can be performed efficiently.

FIG. 2 shows a configuration example of a computer system. The computer system of FIG. 2 includes a host computer 180, storage apparatuses 10A to 10D, and a management computer 40. The storage apparatuses 10A to 10D are included in the storage system, the host computer 180 accesses the storage system, and the management computer 40 manages the storage system.

The number of various devices (systems) included in the computer system depends on the design. The

storage apparatuses

10A, 10B, and 10C constitute an ALL Active type HA storage group, and further provide a virtual storage apparatus. Each of the storage apparatuses 10A to 10C behaves as the same virtual storage apparatus with respect to the host computer 180. In the example described below, the

storage apparatuses

10A, 10B, and 10C constitute one virtual storage apparatus.

The storage device 10D has a quorum disk. The quorum disk determines which one of the

storage devices

10A, 10B, and 10C in the HA configuration is to be continuously operated and which is to be stopped when the communication is impossible between the

storage devices

10A, 10B, and 10C in the HA configuration. Provide functionality.

Specifically, each of the

storage apparatuses

10A and 10B writes the state of the storage apparatus and the communication state to the other storage apparatus viewed from each storage apparatus in the quorum disk. Each storage device refers to the Quorum Disk periodically or in synchronization with the IO response, and determines what is to be continuously operated and what is to be stopped based on information written in the quorum disk.

The host computer 180, the management computer 40, and the storage apparatuses 10A to 10D are communicably connected via a management network configured by a LAN 195. For example, the management network 195 is an IP network. The management network 195 may be any type of network as long as it is a network for management data communication.

The host computer 180 and the storage devices 10A to 10D are connected by a data network configured by a SAN (Storage Area Network) 190. The host computer 180 accesses the volumes of the

storage apparatuses

10A, 10B, and 10C via the SAN 190. The storage apparatuses 10A to 10D communicate with each other via the SAN 190.

The data network 190 may be any type of network as long as it is a data communication network. The data network 190 and the management network 195 may be the same network.

FIG. 3 schematically shows a hardware configuration example of the host computer 180 and the storage apparatus 10A. The

other storage apparatuses

10B, 10C, and 10D can have the same configuration as the storage apparatus 10A.

The host computer 180 includes a secondary storage device 181, a CPU 182 as a processor, a memory 183 as a main storage device, an input device 184, a display device 185 as an output device, an I / F 186, and a port 187. These are connected to each other by an internal network. The management computer 20 can also have a similar hardware configuration.

The CPU 182 performs various processes by executing a program stored in the memory 183. For example, the memory 183 holds an OS, an alternate path program, and an application program. The application program reads and writes data from and to the volume provided by the storage apparatus 10. The alternate path program selects a path for the access destination logical volume from the paths assigned to the virtual volume.

The port 187 is a network interface connected to the SAN 190, and transmits / receives data and requests to / from the storage apparatus 10 via the SAN 190. The interface 186 is a network interface connected to the LAN 195, and transmits / receives management data and control commands to / from the management computer 40 and the storage apparatuses 10A to 10C via the LAN 195.

The storage apparatus 10A accommodates a plurality of storage drives 170. The storage drive 170 is, for example, an SSD having a nonvolatile magnetic disk and an SSD mounted with a nonvolatile semiconductor memory (for example, a flash memory). A volume is configured based on the storage drive 170.

The storage drive 170 stores data (user data) sent from the host computer 180 or another storage device. Since the plurality of storage drives 170 perform data redundancy by RAID operation, data loss when a failure occurs in one storage drive 170 can be prevented.

The storage apparatus 10A includes a front-end package (FEPK) 100 for connecting to an external apparatus via the SAN 190. In the computer system of this example, the external device is a host computer or a storage device.

The storage apparatus 10A further includes a back-end package (BEPK) 140 for connection to the storage drive 170, a cache memory package (CMPK) 130 for mounting a cache memory, and a microprocessor package (MPPK for mounting a microprocessor that performs internal processing) ) 120 and an internal network 150 connecting them. The storage apparatus 10A includes a plurality of FEPKs 100, a plurality of BEPKs 140, a plurality of CMPKs 130, and a plurality of MPPKs 120.

Each FEPK 100 has an interface 101 for connecting to an external device and a transfer circuit 112 for transferring data in the storage device 10 on the substrate. The interface 101 can include a plurality of ports, and each port can be connected to an external device. The interface 101 includes a buffer 113. The buffer is an area for temporarily storing data received from the host computer 180, and is composed of a storage medium such as a DRAM.

Each BEPK 140 has an interface 141 for connecting to the storage drive 170 and a transfer circuit 142 for transferring data in the storage apparatus 10 on the substrate.

Each CMPK 130 has a cache memory (CM) 131 that temporarily stores user data and a shared memory (SM) 132 that stores control information handled by one or more MPPKs 120 on the substrate. A plurality of MPPKs 120 (microprocessors) in charge of different volumes can access the shared memory 132. Data and programs handled by the MPPK 120 are loaded from a nonvolatile memory (not shown) or the storage drive 170 in the storage apparatus 10A.

Each MPPK 120 has one or more microprocessors 121, a local memory (LM) 122, and a bus 123 connecting them. A plurality of microprocessors 121 are mounted on the MPPK 120 in this example. The local memory 122 stores programs executed by the microprocessor 121 and control information used by the microprocessor 121.

As described above, one shared memory 132 stores control information handled by a plurality of MPPKs 120. The MPPK 120 loads control information required by itself from the shared memory 132 to its own local memory 122. The MPPK 120 (the microprocessor 121 thereof) is assigned the charge of the volume that the storage apparatus 10A provides to the host computer 180. The MPPK 120 performs processing for the allocated volume.

FIG. 4 shows information stored in the shared memory 132 of the CMPK 130 of each of the storage apparatuses 10A to 10C. The shared memory 132 stores a volume pair management table 220, an asynchronous write request management table 230, and a quorum update time management table 240. The shared memory 132 can be accessed from a plurality of MPPKs 120.

FIG. 5 shows a configuration example of the volume pair management table 220. The volume pair management table 220 holds volume pair management information. Each entry in the volume pair management table 220 indicates information on each volume pair group.

The volume pair management table 220 has a volume pair group ID column 221, a PVOL ID column 222 common to the volume pair groups, and an MDKC ID column 223 that manages the PVOL.

The volume pair management table 220 further includes an ID column 224 of the first SVOL [1] that configures the PVOL and the synchronous copy pair, and an ID of the first RDKC [1] that manages the first SVOL [1]. A column 225 and a column 226 indicating the status of the synchronous copy pair.

The volume pair management table 220 further includes an ID column 227 of the second SVOL [2] that configures the asynchronous copy pair with the PVOL, and an ID of the second RDKC [2] that manages the second SVOL [2]. A column 228 and a column 229 indicating the status of the asynchronous copy pair.

The copy pair status includes, for example, “PAIR”, “SMPL (simplex)”, “PSUS (suspend: PVOL single operation)”, “SSWS (swap suspend: SVOL single operation)”, and the like.

“PAIR” indicates that the PVOL and SVOL form a copy pair, and the PVOL data is copied to the SVOL synchronously or asynchronously. “SMPL” indicates that each copy pair volume is a normal logical volume.

“PSUS” indicates a state in which the copy pair is in the suspended state and only the PVOL accepts I / O from the host computer. “SSWS” indicates a state in which the copy pair is in the suspended state and only the SVOL accepts I / O from the host computer.

FIG. 6 shows a configuration example of the asynchronous write request management table 230. The asynchronous write request management table 230 manages asynchronous write requests received by the own device. The asynchronous write request management table 230 includes a write request time column 231 and a volume pair group ID column 232. The write request time is included in, for example, an asynchronous write request. The MDKC 10A includes the reception time of the host write request in the asynchronous write request. The write request time may be the reception time of an asynchronous write request by RDKC.

FIG. 7 shows a configuration example of the quorum update time management table 240. The quorum update time management table 240 indicates the previous reference time of the quorum disk 201D and the update time of the management information by the own device.

FIG. 8 shows information stored in the quorum disk 201D. The quorum disk 201D stores a first DKC management table 260A, a second DKC management table 260B, a third DKC management table 260C, and a last update time management table 280.

The first DKC management table 260A, the second DKC management table 260B, and the third DKC management table 260C are respectively a first DKC (first storage device) 10A, a second DKC (second storage device) 10B, and a third DKC (third storage device) 10C. Management information. Each DKC reads all the DKC management tables and updates the DKC management table of its own device in the health check.

FIG. 9 shows a configuration example of the first DKC management table 260A. Other DKC management tables have the same format. The first DKC management table 260A includes an ID (for example, serial number) 261 of the DKC, an update generation 262, a communication state [2] 263, a communication state [3] 264, a previous generation [2] 265, a previous time [2] 266, It includes information on the previous generation [3] 267, the previous time [3] 268, the response state [2] 269, and the response state [3] 270.

The update generation 262 indicates the number of times the first DKC 10A has updated the first DKC management table 260A in the health check. The value of the update generation 262 is incremented for each health check.

Communication state [2] 263 indicates whether or not the first DKC 10A can communicate with the second DKC 10B. The first DKC 10A determines whether it can communicate with the second DKC 10B, and the communication state [2] indicates the determination result. For example, when an error is detected in data communication with the second DKC 10B, the first DKC 10A changes the value of the communication state [2] to a value indicating that communication is not possible. The communication state [3] 264 indicates whether or not the first DKC 10A can communicate with the third DKC 10C.

Communication state [k] indicates a determination result of a communication state with another DKC. Therefore, the second DKC management table 260B includes a communication state [1] and a communication state [3], and the third DKC management table 260C includes a communication state [1] and a communication state [2].

The previous generation [2] 265 indicates the value of the updated generation 262 of the second DKC management table 260B in the previous health check by the first DKC 10A. The first DKC 10A refers to the second DKC management table 260B during the health check, and copies the value of the updated generation 262 to the previous generation [2] 265.

The previous time [2] 266 indicates the time when it is detected that the value of the update generation 262 in the second DKC management table 260B has not changed. In the health check, the first DKC 10A compares the value of the updated generation 262 of the second DKC management table 260B with the value of the previous generation [2] 265.

If these are different, the first DKC 10A maintains or changes the value of the previous time [2] 266 to 0. If these are the same and the value of the previous time [2] 266 is 0, the first DKC 10A changes the value of the previous time [2] 266 to the current time. If these values are the same and the value of the previous time [2] 266 is not 0, the value is maintained.

That the value of the update generation 262 in the second DKC management table 260B is the same as the value of the previous generation [2] 265 indicates that the health check by the second DKC 10B is not executed.

The previous generation [3] 267 and the previous time [3] 268 indicate information on the previous generation and the previous time for the third DKC management table 260C, respectively. The previous generation [k] and the previous time [k] each indicate information of another DKC. Therefore, the second DKC management table 260B includes the previous generation [1], the previous time [1], the previous generation [3], and the previous time [3]. The third DKC management table 260C includes the previous generation [1], the previous time [1], the previous generation [2], and the previous time [2].

The first DKC 10A compares the value of the updated generation 262 in the second DKC management table 260B with the value of the previous generation [2] 265 in the health check. If they match, the first DKC 10A refers to the previous time [2] 266. When the value of the previous time [2] 266 is not 0, the first DKC 10A calculates the elapsed time from the time indicated by the previous time [2] 266 to the current time, and compares the elapsed time with a specified timeout value.

If the elapsed time exceeds the timeout value, the first DKC 10A refers to the communication state [2]. When the communication state [2] indicates that communication is not possible, the first DKC 10A determines that a failure has occurred in the second DKC 10B and that it cannot respond. The first DKC 10A makes the same determination for the third DKC 10C. Each of the second DKC 10B and the third DKC 10C also determines the state of the other DKC by the same method in the health check.

Response state [k] indicates a determination result of another DKC response state (response possible / impossible). Therefore, the second DKC management table 260B includes response state [1] and response state [3], and the third DKC management table 260C includes response state [1] and response state [2].

It is an example of a method for determining the state of the DKC in the health check, and other methods and other information may be used. For example, information on the communication status with other DKCs may not be referred to, and other conditions may be added. Although an example of a configuration including three DKCs has been described, when four or more DKCs are included, the DKC management table includes information on all other DKCs.

FIG. 10 shows a configuration example of the last update time management table 280. The last update time management table 280 manages the last update time of each volume constituting the volume pair group. The last update time management table 280 includes a volume pair group ID column 281, an MDKC ID column 282, and a column 283 indicating the last update time of each volume.

The MDKC of each volume pair group updates the corresponding entry information in the last update time management table 280. As a result, the last update time of each volume in the volume pair group can be centrally managed. Also, the quorum disk 201D stores the last update time management table 280, so that other DKCs can refer to the information of the last update time management table 280 when a failure occurs in the MDKC.

For example, the last update time of MDKC is the reception time of the last write request received from the host computer 180 or another DKC. The last update time of RDKC is, for example, the reception time by MDKC of the write request last transferred by MDKC to RDKC. Alternatively, the last update of RDKC is the transfer time of the last write request that MDKC transferred to RDKC.

The processing of the storage system will be described below. In the following description, it is assumed that each DKC (storage device) is an MDKC or RDKC of all the volumes to be held. The RDKC can hold an SVOL (SVOL [1]) that constitutes a synchronous copy pair with the PVOL and an SVOL (SVOL [2]) that constitutes an asynchronous copy pair with the PVOL.

When one DKC (storage device) includes a PVOL and an SVOL, the DKC may execute the following MDKC process for the PVOL and execute the following RDKC process for the SVOL. In the following description, the PVOL, SVOL [1], and SVOL [2] of the write request from the host computer 180 are DKC (storage device) 10A, DKC 10B, and DKC 10C, respectively.

FIG. 11 shows a ladder chart of processing of a write request transmitted from the host computer to MDKC (PVOL). The host computer 180 issues a write request to the area in the virtual volume 210 to the MDKC 10A. The MDKC 10A secures exclusion for the area specified by the write request in the PVOL 201A corresponding to the virtual volume 210.

When the MDKC 10A is ready to receive data, it returns a READY response for data transfer to the host computer 180 and receives write data from the host computer 180. Write data is stored in the buffer 113 of the FEPK100.

The MDKC 10A writes the received write data to the address area in the PVOL 201A (S201). Writing to the PVOL 201A is writing to the cache memory if the write cache function is ON, and writing to a parity group (physical storage area) if the write cache function is OFF.

The MDKC 10A transmits a write request to the area in the SVOL [1] 201B constituting the synchronous copy pair with the PVOL 201A to the RDKC [1] 10B. The RDKC [1] 10B that has received the write request returns a READY response for data transfer to the MDKC 10A, and receives write data from the MDKC 10A. The RDKC [1] 10B writes the received write data to the address area in the SVOL [1] 201B (S202). Writing to the SVOL is the same as writing to the PVOL.

The RDKC [1] 10B returns a write completion response to the MDKC 10A when the data writing to the address area in the SVOL [1] 201B is completed. The MDKC 10A returns a write completion response to the host computer 180 after receiving the write completion response from the RDKC [1] 10B. The MDKC 10A releases the secured exclusion.

MDKC 10A transmits an asynchronous write request to RDKC [2] 10C. The asynchronous write request specifies the target pair group ID. The RDKC [2] 10C that has received the asynchronous write request updates the asynchronous write request management table 230. Specifically, the RDKC [2] 10C adds an entry indicating the value of the volume pair group ID indicated by the received asynchronous write request and the write request time to the asynchronous write request management table 230.

The MDKC 10A may transmit an asynchronous write request to the RDKC [2] 10C before receiving a write completion response from the RDKC [1] 10B. The MDKC 10A may return a write completion response to the host computer 180 before or after sending the asynchronous write request.

FIG. 12 shows a flowchart of processing by the MDKC 10A that has received a write request from the host computer 180. The MDKC 10A writes the write data to the PVOL 201A according to the write request received from the host computer 180 (S201).

The MDKC 10A transmits a write request to the area in the SVOL [1] 201B constituting the synchronous copy pair with the PVOL 201A to the RDKC [1] 10B (S202). The MDKC 10A refers to the volume pair management table 220 and identifies SVOL [1] 201B and RDKC [1] 10B that form a pair with the access destination PVOL 201A.

The MDKC 10A determines whether or not the transmitted write request is successful (S203). When the write request is successful (S203), when the MDKC 10A receives a READY response of data transfer from the RDKC [1] 10B, it transmits write data. Further, the MDKC 10A receives a write completion response from the RDKC [1] 10B.

For example, when the MDKC 10A does not receive the data transfer READY response from the RDKC [1] 10B or does not receive the write completion response after transmitting the write data, the MDKC 10A determines that the write request has failed (S203: NO).

When the write request transmitted to RDKC [1] 10B is successful (S203: YES), MDKC 10A transmits an asynchronous write request designating a pair group ID to RDKC [2] 10C. The MDKC 10A refers to the volume pair management table 220 and identifies the pair group ID of the access destination PVOL 201A.

The MDKC 10A returns a write completion response to the host computer 180 (S205). The MDKC 10A may return a write completion response to the host computer 180 before transmitting the asynchronous write request.

When the write request transmitted to RDKC [1] 10B fails (S203: NO), MDKC 10A determines that a path to RDKC [1] 10B or a failure has occurred in RDKC [1] 10B. In the volume pair management table 220, the MDKC 10A changes the pair status value of the pair of PVOL 201A and SVOL [1] 201B to suspend (PSUS) (S206). The MDKC 10A may update the first DKC management table 260A to indicate that communication with the RDKC [1] 10B is impossible. The MDKC 10A may notify the RDKC [2] 10C of the update of the volume pair management table 220.

The MDKC 10A requests the RDKC [2] 10C to write data to the SVOL [2] 201C that forms a pair with the PVOL 201A (S207). Specifically, the MDKC 10A refers to the last update time management table 280, and acquires the last update time of the pair group in the SVOL [2] 201C of the RDKC [2] 10C. The MDKC 10A retains log information, and identifies data not reflected in the SVOL [2] 201C from the log information and the acquired last update time. The MDKC 10A requests the RDKC [2] 10C to write unreflected data.

When the MDKC 10A determines from the last update time management table 280 that there is unreflected data in the SVOL [2] 201C, the MDKC 10A may copy all the data in the PVOL 201A to the SVOL [2] 201C. When the last update times are the same, the MDKC 10A transmits only the current write request to the RDKC [2] 10C. Redundancy is ensured by synchronizing the data of SVOL [2] 201C with the data of PVOL 201A.

The MDKC 10A may execute resynchronization with the PVOL for the SVOL of another asynchronous copy pair held by the RDKC [2] 10C. When the MDKC 10A receives a write request to the PVOL, the MDKC 10A may perform resynchronization between the PVOL and the SVOL of the RDKC [2] 10C constituting the copy pair with the PVOL.

When the write to RDKC [2] 10C is successful (S208: YES), the MDKC 10A returns a write completion response to the host computer 180 (S205). When writing to the RDKC [2] 10C has failed (S208: NO), the MDKC 10A changes the pair status value of the pair of PVOL 201A and SVOL [1] 201B to suspend (PSUS) in the volume pair management table 220. (S209). Further, the MDKC 10A may update the first DKC management table 260A. Thereafter, the MDKC 10A returns a write completion response to the host computer 180 (S205).

FIG. 13 shows a ladder chart of processing of a write request transmitted from the host computer 180 to RDKC [1] (SVOL [1]). The host computer 180 issues a write request to the area in the virtual volume 210 to RDKC [1] 10B. When the RDKC [1] 10B is ready to receive data, it returns a READY response for data transfer to the host computer 180 and receives write data from the host computer 180. Write data is stored in the buffer 113 of the FEPK100.

The RDKC [1] 10B transmits to the MDKC 10A a write request to the area in the PVOL 201A constituting the volume pair with the SVOL [1] 201B. The MDKC 10A secures exclusion for the area in the volume designated by the write request, and then returns a READY response for data transfer to the RDKC [1] 10B and receives the write data from the RDKC [1] 10B.

The MDKC 10A writes the received write data to the designated area in the PVOL 201A (S301). When the writing of data to the area in the PVOL 201A is completed, the MDKC 10A returns a write completion response to the RDKC [1] 10B.

After receiving the write completion response from the MDKC 10A, the RDKC [1] 10B writes the write data in the designated area in the SVOL [1] 201B (S302). The RDKC [1] 10B returns a write completion response to the host computer 180. Thereafter, the RDKC 10 [1] B transmits an exclusive release request to the MDKC 10A. The MDKC 10A releases the secured exclusion.

The MDKC 10A transmits an asynchronous write request to the RDKC [2] 10C after writing data to the area in the PVOL 201A. The RDKC [2] 10C that has received the asynchronous write request updates the asynchronous write request management table 230 (S303).

When the RDKC [2] 10C receives a write request from the host computer 180, the RDKC [2] 10C transmits the write request to the MDKC 10A. After writing the data to the area in the PVOL 201A, the MDKC 10A transmits a write request to the RDKC [2] 10C. When the MDKC 10A receives a write completion response from the RDKC [2] 10C, the MDKC 10A transmits an asynchronous write request to the RDKC [1] 10B and returns a write completion response to the RDKC [2] 10C. The RDKC [2] 10C returns a write completion response to the host computer 180, and further updates the asynchronous write request management table 230.

FIG. 14 shows a flowchart of processing by RDKC [1] 10B that has received a write request from the host computer 180. The RDKC [1] 10B receives the write data from the host computer 180 and stores it in the buffer 113 of the FEPK100. The RDKC [1] 10B transmits to the MDKC 10A a write request to the area in the PVOL 201A constituting the volume pair with the SVOL [1] 201B (S401). The RDKC [1] 10B refers to the volume pair management table 220 and identifies the PVOL 201A and the MDKC 10A that form a pair with the access destination SVOL [1] 201B.

The MDKC 10A determines whether or not the transmitted write request is successful (S402). The determination method is the same as the method described with reference to FIG. The MDKC 10A that has received the write request writes the write data to the PVOL 201A, and further transmits an asynchronous write request to the RDKC [2] 10C.

When the write request transmitted to the MDKC 10A is successful (S402: YES), the RDKC [1] 10B writes the write data stored in the buffer 113 to the SVOL [1] 201B (S403). The RDKC [1] 10B transmits an exclusive release request to the MDKC 10A. The RDKC [1] 10B transmits a write completion response to the host computer 180 (S409).

When the write request transmitted to the MDKC 10A fails (S402: NO), the RDKC [1] 10B confirms whether the MDKC 10A is alive (S404). The RDKC [1] 10B determines whether a failure has occurred in the path or in the MDKC 10A. Specifically, the RDKC [1] 10B acquires the first DKC management table 260A and the second DKC management table 260B from the quorum disk 201D.

As described with reference to FIG. 9, the RDKC [1] 10B determines whether or not there is a failure in the MDKC 10A. In this state, RDKC [1] 10B cannot communicate with MDKC 10A. If the first DKC management table 260A has not been updated for longer than the specified time, the RDKC [1] 10B determines that a failure has occurred in the MDKC 10A and that it cannot respond.

When it is determined that no failure has occurred in the MDKC 10A (S405: NO), the RDKC [1] 10B changes the state of the volume pair to the suspended state due to the failure of the own device. Specifically, the RDKC [1] 10B changes the pair status of the pair of PVOL 201A and SVOL [1] 201B to PSUS in the volume pair management table 22. The RDKC [1] 10B may notify the RDKC [2] 10C of the update of the volume pair management table 22. The RDKC [1] 10B may update the second DDKC management table 260B.

When it is determined that a failure has occurred in the MDKC 10A (S405: YES), the RDKC [1] 10B changes the state of the volume pair to a suspended state due to a failure of the MDKC (S407). Specifically, the RDKC [1] 10B changes the pair state of the pair of PVOL 201A and SVOL [1] 201B to SSWS in the volume pair management table 22. The RDKC [1] 10B may notify the RDKC [2] 10C of the update of the volume pair management table 22.

Next, the RDKC [1] 10B performs volume resynchronization with the RDKC [2] 10C (S408). RDKC [1] 10B changes its own device to MDKC and resynchronizes each volume pair of the same volume pair group with RDKC [2] 10C. Details will be described later.

In step S409, when the RDKC [1] 10B determines that a failure has occurred in the path (S405: NO, S406), it returns a write request failure response to the host computer 180. In other cases, RDKC [1] 10B returns a write completion response to the host computer 180.

RDKC [2] also performs substantially the same processing in response to a write request from the host computer 180. In response to the write request from RDKC [2], MDKC transmits an asynchronous write request to RDKC [1].

FIG. 15 shows a flowchart of data resynchronization processing between RDKCs. The RDKC that detects the failure of the MDKC executes the flow of FIG. A predetermined RDKC may execute this flow. Below, the example of a process of DCK10B is demonstrated. The DCK 10C may operate similarly.

The DCK 10B reads the management information of the quorum disk 201D (S501). The DCK 10B changes the state of its own device from RDKC to MDKC (S502). DCK 10B changes to MDKC, and DCK 10C operates as RDCK [1] for all volume pairs. All the volumes of the DCK 10B change to PVOL, and all the volumes of the DCK 10C become SVOL [1].

The DCK 10B updates the volume pair management table 260B so as to indicate the state change of the DKC and the volume. For example, the DCK 10B registers its own device ID in the MDKCID column 223 of each volume pair group, and registers its own device volume ID in the PVOLID column 222. The DCK 10B further registers the volume ID of the DKC 10C in the SVOL [1] ID column 224, and registers the ID of the DCK 10C in the RDKC [1] ID column 225. The DCK 10B registers “PAIR” in the PAIR [1] STATUS column 226.

The DCK 10B registers the ID of the volume of the DKC 10A in the SVOL [2] ID column 227, and registers the ID of the DCK 10A in the RDKC [2] ID column 228. The DCK 10B registers “PSUS” in the PAIR [1] STATUS column 226. The DCK 10B registers its own device ID in the MDKC ID column 282 of the last update time management table 280.

DCK 10B notifies the update of the volume pair group management table to the other DKC 10C. Further, the DCK 10B notifies the other DKC 10C of the start of the resynchronization process between RDKCs (S503). The DKC 10B and the DKC 10C temporarily wait for I / O processing for resynchronization between RDKCs.

The DCK 10B executes steps S504 and S505 for each volume held by the own device. In step S504, the DCK 10B determines whether there is unupdated data for the selected volume. The DCK 10B refers to the last update time management table 280, and compares the last update time of its own device with the last update time of another device in the same volume pair group. When the last update time of the own device is early, the DCK 10B determines that unupdated data exists.

When it is determined that there is no unupdated data (S504: NO), the DCK 10B selects the next volume. If it is determined that unupdated data exists (S504: YES), the DCK 10B reads the latest data of the volume pair group from the DKC 10C and writes it to the selected volume (S505). All the data of the volume of the DKC 10C or only the unupdated data identified from the log information held by the DKC 10C is transferred to the DKC 10B. By resynchronization, the data of the two volumes match. After resynchronization, the two volumes DKC 10B and DKC 10C constitute a synchronous copy pair.

DCK 10B, after executing steps S504 and S505 for all the volumes to be held, requests data resync with the latest data from other DKC 10C. The DKC 10C executes steps S504 and S505 for all the volumes to be held.

As described above, the data resynchronization between the RDKCs makes it possible to match the volume data of two active DKCs remaining due to the failure of the MDKC, and to ensure redundancy.

FIG. 16 shows a health check flowchart using RDKC (RDKC 10B or RDKC 10C). The DKC periodically performs health checks on other DKCs. First, the RDKC acquires all the DKC management tables 260A to 260C from the quorum disk 201D (S601). The RDKC determines whether another DKC is alive or dead (S602). The method for determining whether or not DKC is alive is as described for step S404 in the flowchart of FIG.

When it is determined that a failure has occurred in the MDCK 10A (S603: YES), the RDKC changes the state of the volume pair to a suspended state due to the failure of the MDKC 10A (S604). The process in step S604 is the same as step S407 in FIG. The RDKC executes inter-RDKC resynchronization processing (S605). The inter-RDKC resynchronization process is as described with reference to FIG.

When it is determined that no failure has occurred in the MDCK 10A (S603: NO), the RDCK executes steps S606 and S607 for each volume held by the own device. In step S606, the RDCK determines whether an unprocessed asynchronous write request remains for the selected volume.

Specifically, the RDCK compares the latest write request time of the volume pair group indicated by the asynchronous write request management table 230 with the time indicated by the quorum update time management table 240. If the write request time is the same as or later than the quorum update time, RDCK determines that there is an unprocessed asynchronous write request remaining. Otherwise, RDCK determines that there are no outstanding asynchronous write requests remaining.

If an unprocessed asynchronous write request remains (S606: YES), the RDCK executes resynchronization of the SVOL and PVOL (S607). RDCK designates PVOL, sends a read request (resynchronization request) for resynchronization to MDKC, and writes the received read data to SVOL. As described above, the data to be transferred is all PVOL data or only unupdated data.

Finally, the RDCK updates its own DKC management table and quorum update time management table 240 (S608). As will be described later, the MDKC updates the column 283 of the last update time of the volume pair group for which the synchronization processing has been executed in the health check performed by the own device. In this manner, the RDKC implements an efficient resynchronization process by executing the resynchronization process of the asynchronous volume pair together with the health check.

FIG. 17 shows a flowchart of health check by MDKC 10A. First, the MDKC 10A acquires all the DKC management tables 260A to 260C from the quorum disk 201D (S701). The MDKC 10A determines whether other DKCs are alive or dead (S702). The method for determining whether or not DKC is alive is as described for step S404 in the flowchart of FIG.

The MDKC 10A updates the last update time management table 280 (S703). The MDKC 10A updates the last time in the last update time management table 280 when performing synchronous copy or data resynchronization to any RDKC after the previous health check.

The MDKC 10A executes forced resynchronization processing (S704). Details of the forced resynchronization process will be described later. To explain the outline, the MDKC 10A refers to the last update time management table 280, and identifies the SVOL in which unupdated data exists and the elapsed time from the previous update exceeds the threshold. In the last update time management table 280, the SVOL is identified by the volume pair group ID and RDKC ID.

The MDKC 10A instructs the identified RDKC to specify SVOL (volume pair group) and perform resynchronization. This prevents an excessive amount of data that has not been updated with respect to the SVOL due to a delay in health check or the like. Finally, the MDKC 10A updates its own DKC management table and the quorum update time management table 240 (S705).

FIG. 18 shows a flowchart of forced resynchronization processing. The MDKC 10A refers to the last update time management table 280 and executes steps S801, S802, and S803 for each volume pair.

In step S801, the MDKC 10A determines whether an unprocessed asynchronous write process remains in the selected volume pair. In the last update time management table 280, when the last update time of any RDKC is earlier than the last update time of MDKC, an unprocessed asynchronous write process remains.

If unprocessed asynchronous write processing remains (S802: YES), in step S702, the MDKC 10A compares the elapsed time from the previous update with a threshold (for example, 4s). When the elapsed time exceeds the threshold value (S803: YES), the MDKC 10A transmits a request for resynchronization to the RDKC by specifying SVOL (volume pair group).

FIG. 19 shows a flowchart of read processing by RDKC (RDKC 10B or RDKC 10C). In response to a read request from the host computer 180, the

DKCs

10A, 10B, and 10C each return the volume data of its own device to the host computer 180. The read processing by MDKC is the same as the normal read processing, and description thereof is omitted.

Referring to FIG. 19, upon receiving a read request from host computer 180, RDKC reads the latest write request time of the specified SVOL volume pair group from asynchronous write request management table 230. Further, the RDKC reads the quorum last update time from the quorum update time management table 240 (S901).

The RDKC determines whether the target SVOL data is the latest from the acquired information (S902). Specifically, if the latest write request time is later than the quorum last update time, the RDKC determines that the SVOL data is the latest.

If the target SVOL data is not the latest (S902: NO), the RDKC waits for a predetermined time (S903). Thereby, the RDKC waits for the volume pair resynchronization by the health check. If the target SVOL data is the latest (S902: YES), the RDKC reads the data from the target address of the target SVOL and returns it to the host computer 180.

The latest read data can be returned to the host computer 180 by the above method. In addition, by waiting for the next health check, it is possible to avoid a load due to re-synchronization independent of a read from another DKC or a health check.

In addition, this invention is not limited to the above-mentioned Example, Various modifications are included. For example, the above-described embodiments have been described in detail for easy understanding of the present invention, and are not necessarily limited to those having all the configurations described. Further, a part of the configuration of one embodiment can be replaced with the configuration of another embodiment, and the configuration of another embodiment can be added to the configuration of one embodiment. Further, it is possible to add, delete, and replace other configurations for a part of the configuration of each embodiment.

In addition, each of the above-described configurations, functions, processing units, and the like may be realized by hardware by designing a part or all of them with, for example, an integrated circuit. Each of the above-described configurations, functions, and the like may be realized by software by interpreting and executing a program that realizes each function by the processor. Information such as programs, tables, and files for realizing each function can be stored in a recording device such as a memory, a hard disk, or an SSD, or a recording medium such as an IC card or an SD card.

Claims

A first storage device including a first volume;
A second storage device including a second volume constituting a copy pair with the first volume;
A third storage device including a third volume constituting a copy pair with the first volume;
A storage system including a quorum disk,
The first volume, the second volume, and the third volume are associated with one virtual volume accessed from the outside,
The first storage device, the second storage device, and the third storage device each accept external access to the virtual volume,
In response to a write request from the host to the virtual volume, the first storage device writes write data to the first volume,
The second storage device writes the write data to the second volume after writing the write data to the first volume,
The storage system returns a write completion response of the write request to the host after writing the write data to the second volume,
In response to the write request, the first storage device transmits an asynchronous write request for the third volume to the third storage device,
The third storage device accesses the quorum disk at a preset timing, and executes a health check of the first storage device and the second storage device,
The third storage device, when the health check result after receiving the asynchronous write request indicates that normal communication with the first storage device is possible, the third volume and the first volume A storage system that requests the first storage device to re-synchronize.
The storage system according to claim 1,
When the result of the check by the quorum disk indicates that normal communication with the first storage device is impossible and normal communication with the second storage device is impossible, the third storage The apparatus requests the second storage apparatus to resynchronize the third volume and the second volume.
The storage system according to claim 1,
The third storage device
Managing the time of each asynchronous write request received from the first storage device,
In the health check, based on the time of the previous asynchronous write request and the time of the previous health check, it is determined whether there is an unprocessed asynchronous write request,
A storage system that requests the resynchronization request to the first storage device when an unprocessed asynchronous write request remains.
The storage system according to claim 3,
The third storage device
Receiving a read request for the third volume from the outside;
Based on the time of the previous asynchronous write request and the time of the previous health check, determine whether there is an outstanding asynchronous write request,
If an unprocessed asynchronous write request remains, wait for a response to the read request until the next health check,
A storage system that returns update data of the third volume from the first storage device by the next health check, and then returns read data for the read request to the request source of the read request.
The storage system according to claim 1,
The second storage device and the third storage device are:
Resynchronizing the data of the second volume and the third volume in response to the failure of the first storage device,
A storage system that forms a synchronous copy pair by the second volume and the third volume.
The storage system according to claim 5,
The quorum disk holds final update time management information for managing the last update time of each of the first volume, the second volume, and the third volume;
The first storage device manages the last update time management information, updates the last update time management information in a health check using the quorum disk,
The storage system, wherein the second storage device and the third storage device execute the resynchronization based on the last update time management information.
The storage system according to claim 1,
The quorum disk holds final update time management information for managing the last update time of each of the first volume, the second volume, and the third volume;
The first storage device manages the last update time management information, updates the last update time management information in a health check using the quorum disk,
In the first storage device, when the data of the first volume and the third volume are different and the elapsed time from the last update time of the third volume indicated by the last update time management information exceeds a prescribed value, A storage system that instructs the third storage device to resynchronize the third volume and the first volume.
A storage system control method comprising:
The storage system
A first storage device including a first volume;
A second storage device including a second volume constituting a copy pair with the first volume;
A third storage device including a third volume constituting a copy pair with the first volume;
Including a quorum disk,
The first volume, the second volume, and the third volume are associated with one virtual volume accessed from the outside,
The first storage device, the second storage device, and the third storage device each accept external access to the virtual volume,
The method
In response to a write request from the host to the virtual volume, the first storage device writes write data to the first volume,
The second storage device writes the write data to the second volume after writing the write data to the first volume;
After writing the write data to the second volume, a write completion response of the write request is returned to the host,
In response to the write request, the first storage device transmits an asynchronous write request for the third volume to the third storage device,
The third storage device accesses the quorum disk at a preset timing, and executes a health check of the first storage device and the second storage device;
When the third storage device indicates that the health check result after receiving the asynchronous write request indicates that normal communication with the first storage device is possible, the third volume and the first volume Requesting the first storage device to re-synchronize.
The method according to claim 8, comprising:
When the result of the check by the quorum disk indicates that normal communication with the first storage device is impossible and normal communication with the second storage device is impossible, the third storage A method in which an apparatus requests the second storage apparatus to resynchronize the third volume and the second volume.
The method according to claim 8, comprising:
The third storage device is
Managing the time of each asynchronous write request received from the first storage device,
In the health check, based on the time of the previous asynchronous write request and the time of the previous health check, it is determined whether there is an unprocessed asynchronous write request,
A method of requesting the resynchronization request to the first storage device when an unprocessed asynchronous write request remains.
The method of claim 10, comprising:
The third storage device is
Receiving a read request for the third volume from the outside;
Based on the time of the previous asynchronous write request and the time of the previous health check, determine whether there is an outstanding asynchronous write request,
If an unprocessed asynchronous write request remains, wait for a response to the read request until the next health check,
A method of returning read data corresponding to the read request to a request source of the read request after receiving update data of the third volume from the first storage device by the next health check.
The method according to claim 8, comprising:
The second storage device and the third storage device are:
Resynchronizing the data of the second volume and the third volume in response to the failure of the first storage device,
A method of configuring a synchronous copy pair by the second volume and the third volume.
The method of claim 12, comprising:
The quorum disk holds final update time management information for managing the last update time of each of the first volume, the second volume, and the third volume;
The first storage device manages the last update time management information, updates the last update time management information in a health check using the quorum disk,
The method in which the second storage device and the third storage device execute the resynchronization based on the last update time management information.
The method according to claim 8, comprising:
The quorum disk holds final update time management information for managing the last update time of each of the first volume, the second volume, and the third volume;
The first storage device manages the last update time management information, updates the last update time management information in a health check using the quorum disk,
When the first storage device has different data in the first volume and the third volume, and the elapsed time from the last update time of the third volume indicated by the last update time management information exceeds a specified value, A method of instructing the third storage device to resynchronize the third volume and the first volume.