WO2017081785A1

WO2017081785A1 - Computer system

Info

Publication number: WO2017081785A1
Application number: PCT/JP2015/081822
Authority: WO
Inventors: 健志奥村; 一志宮武; 聡角入
Original assignee: 株式会社日立製作所
Priority date: 2015-11-12
Filing date: 2015-11-12
Publication date: 2017-05-18

Abstract

A computer system which is an aspect of the present invention comprises a storage system and a plurality of hosts which access the storage system. The storage system further comprises a first storage device which has a first volume, and a second storage device which has a second volume. The storage system operates the first and second volumes as a volume pair. If an access fault has occurred when the hosts access the volume pair, the computer system identifies the fault occurrence site, and on the basis of information about the identified fault occurrence site, carries out changes of settings of access paths from each host to the volume pair, or a change of settings of the volume pair which the storage system operates.

Description

Computer system

The present invention relates to a computer system.

In many storage devices used in current computer systems, for example, by adopting high reliability technology such as RAID (Redundant Arrays of Independent (or Inexpensive) Disks) technology, the reliability beyond the reliability of HDD alone is achieved. providing. However, due to the recent evolution of the information society, there is a scene where the reliability that can be provided by the RAID technology is insufficient.

As a high availability technology corresponding to such a situation, as disclosed in Patent Document 1, for example, an information system using a plurality of (for example, two) storage apparatuses (hereinafter referred to as apparatus A and apparatus B). Is constructed, and data is double-written (mirrored) in the volume of the device A and the volume of the device B. In the information system disclosed in Patent Document 1, when the host issues a request (write command) for writing data to the volume of the device A, the device A stores a copy of the data in the volume of the device B. After the copy of the data is stored in the volume of the device B, the device A returns to the host that the write command is completed. As a result, the same data is always stored in the volume of the device A and the volume of the device B.

One of the requirements for such a redundant system is that the host can be prevented from accessing wrong data. Patent Document 1 discloses an example in which volume duplication (copy) between device A and device B fails as a result of the link between device A and device B being disconnected. In this case, it is conceivable that the host A switches to access to the device B because a failure has occurred in the device A after the host has used the volume of the device A for a while. However, since only data older than the volume of the device A is stored in the volume of the device B at that time, it is desirable to perform control so that access from the host is not accepted.

In the information system disclosed in Patent Document 1, a problem detection volume that can be accessed in common by apparatus A and apparatus B is provided, and this problem is solved by using it. When the device A fails in the volume duplication processing, the device A reads the contents of the failure detection volume and checks whether the failure information flag is written by the device B. When the failure information flag is not written, the device A writes the failure detection flag, and then resumes the processing related to the access request from the host. However, if a failure information flag is written, apparatus A returns an I / O failure to the host. This prevents the host from reading old data.

In the information system disclosed in Patent Document 1, it is necessary to provide a failure detection volume in addition to the storage devices (devices A and B) for storing data. This increases the cost of the information system. On the other hand, it is possible to eliminate the failure detection volume by substituting a part of the determination process at the time of failure on the host side. For example, in Patent Document 2, when the host makes a write request to the primary storage and receives a remote copy failure response from the storage (in the case of an inter-storage device path failure), the host stores data in both the primary and secondary storages. Transition to the overwriting mode and maintain the data duplex state.

JP 2009-266120 A Patent No. 5057366

For information systems that require high availability, not only storage but also host side redundancy measures are required. For example, unless measures such as operating a cluster server using a plurality of hosts are taken, a highly available information system cannot be obtained. In the technique disclosed in Patent Document 2, there is no consideration on the configuration of an information system in which a plurality of hosts that access a storage apparatus exist.

The computer system according to one aspect of the present invention includes a storage system and a plurality of hosts that access the storage system. The storage system has a first storage device having a first volume and a second storage device having a second volume, and operates the first and second volumes as a volume pair. If an access failure occurs when a host accesses a volume pair, the computer system identifies the location where the failure occurred, and changes the setting of the access route to the volume pair of each host based on the information on the identified failure location Or change the setting of the volume pair operated by the storage system.

According to the present invention, a highly available computer system can be constructed at low cost.

It is a hardware block diagram of the computer system which concerns on an Example. It is a software block diagram of the computer system which concerns on an Example. It is an example of a pair management table. It is an example of LDEV status information. It is an example of an alternate path management table. It is an example of an integrated path management table. It is a flowchart (1) of processing performed when a host issues a write request to a P-VOL. It is a flowchart (2) of processing performed when the host issues a write request to the P-VOL. It is a flowchart (3) of processing performed when the host issues a write request to the P-VOL. It is a flowchart (4) of processing performed when the host issues a write request to the P-VOL. It is a flowchart (1) of processing performed when the host issues a write request to the S-VOL. It is a flowchart (2) of processing performed when the host issues a write request to the S-VOL. It is a flowchart (3) of the processing performed when the host issues a write request to the S-VOL. It is a flowchart (4) of processing performed when the host issues a write request to the S-VOL. It is a flowchart (1) of the process performed by an integrated management program. It is a flowchart (2) of the process performed by an integrated management program. It is a flowchart (3) of the process performed by an integrated management program. It is an example of a format of notification information.

Hereinafter, embodiments of the present invention will be described with reference to the drawings. The embodiments described below do not limit the invention according to the claims, and all the elements and combinations described in the embodiments are essential for the solution of the invention. Is not limited.

In the following description, processing executed by a computer such as a host may be described using “program” as the subject. Actually, the processing described in the program is performed by the processor (CPU (Central Processing Unit)) executing the program, so the expression having the program as the subject is not technically accurate. However, in order to prevent the explanation from becoming redundant, the contents of the process may be explained with the program as the subject. Further, part or all of the program may be realized by dedicated hardware. Various programs described below may be provided by a storage medium that can be read by a program distribution server or a computer, and may be installed in each device that executes the program. The computer-readable storage medium is a non-transitory computer-readable medium such as a non-volatile storage medium such as an IC card, an SD card, or a DVD.

Before starting the description of the embodiments, various terms used in the embodiments described below will be described.

“Volume” means a storage area (storage space) provided by a target device such as a storage device or storage device to an initiator such as a host or storage controller. Further, the storage apparatus according to the embodiment described below can create one logical volume using volumes provided from a plurality of storage devices, and provide this logical volume to the host. This logical volume is referred to herein as a “logical volume” or “logical device”.

“Remote copy” means a process of creating a copy of a storage device volume in a volume of another storage device. In the embodiment described below, the storage apparatus has a function of performing remote copy. When the storage apparatus receives write data for the volume from the host, the storage apparatus writes the write data to each volume of the two storage apparatuses.

Also, when data is stored in two volumes by remote copy, the volume in which data is stored first is called “primary volume” or “P-VOL”. The second volume in which data is stored is called “secondary volume” or “S-VOL”. A pair of a primary volume and a secondary volume is called a “volume pair” or “pair”.

“Access path” or “path” means an access path when a host accesses a logical device. When the storage apparatus has a remote copy function, the access path from the data copy source storage apparatus to the copy destination logical device is also called a “path”. In order to improve fault tolerance, a computer system may employ a configuration in which a plurality of access paths to one logical device are provided. In that case, a host accessing the logical device may use an arbitrary access path provided for the logical device.

“Device file” is an interface for a program executed on the host to access an input / output device such as a disk. In the embodiment described below, a program such as a device driver manages a unique device file name in association with each logical device provided by the storage apparatus. A program executed on the host 1 can access a logical device associated with a device file name by issuing an access request specifying a device file name.

Also, when there are multiple access paths from the host to the logical device, a device file is defined for each access path. For example, when there are two access paths to the logical device (first access path and second access path), the first device file associated with the first access path and the first access path associated with the second access path Two device files are defined. When the host accesses the logical device using the first device file, the access request and data are transmitted to the logical device via the first access path. When the host accesses the logical device using the second device file, the access request and data are transmitted to the logical device via the second access path.

Hereinafter, the configuration of the computer system according to the embodiment of the present invention will be described. FIG. 1 is a diagram illustrating a hardware configuration of a computer system.

In the embodiments described below, reference numerals are given to the respective components constituting the computer system. When there are a plurality of components of the same type in the computer system, reference numbers with suffixes such as “a” and “b” may be used. Components having different suffixes and the same reference numbers mean the same type of components. In addition, when a matter common to the same type of components is described, the component may be specified using a reference number that abbreviates suffix.

The computer system according to the embodiment of the present invention includes a plurality of hosts 1 (

hosts

1a and 1b in FIG. 1), a plurality of storage devices (

storage devices

2a and 2b in FIG. 1), and a management server 3. The host 1 and the storage device 2 are connected by a network 4. The host 1, the storage device 2, and the management server 3 are connected by a management network 6. The storage device 2a and the storage device 2b are connected by an inter-storage path (5a, 5b).

The storage device 2a and the storage device 2b are devices that provide a storage area to the host 1. Although the hardware configuration of the storage device 2b is not necessarily the same as the hardware configuration of the storage device 2a, in this embodiment, an example in which both hardware configurations are the same will be described.

Taking the storage device 2a as an example, the hardware configuration of the storage device 2 will be described. The storage apparatus 2a includes a storage controller (sometimes abbreviated as CTL) 21a and a plurality of DISKs 22a. The storage controller 21a includes a communication interface (called a port in this embodiment) for communicating with the host 1 and the storage device 2b, a processor for processing an I / O request from the host 1, a cache memory, and the like. Device. The DISK 22 a is a storage device that stores write data from the host 1. As an example, an HDD (Hard Disk Drive) can be used for the DISK 22a. However, other than the HDD, a storage device such as an SSD (Solid State Drive) may be used for the DISK 22a.

The storage device 2a and the storage device 2b are connected via inter-storage paths (5a, 5b). The storage device 2a can transmit a copy of data written from the host 1 to the storage device 2a to the storage device 2b via the inter-storage path (5a, 5b).

FIG. 1 shows a configuration in which each storage device 2 has one CTL 21 and two DISKs 22, but the number is not limited to this. Two or more CTLs 21 may exist in the storage apparatus 2 or three or more DISKs 22 may exist.

The host 1 is a computer that makes a data read / write request to the storage device 2 and includes at least a CPU 11, a memory 12, an HBA (Host Bus Adapter) 13, and a NIC (Network Interface Controller) 14. The NIC 14 is a communication interface device for communicating with the management server 3 and the like. The HBA 13 is a communication interface device for communicating with the storage device 2.

FIG. 1 shows an example in which two HBAs 13a exist in the host 1a and two HBAs 13b exist in the host 1b. However, the number of HBAs 13 is not limited to two. Three or more HBAs 13 may exist in each host 1. Alternatively, there may be a configuration in which only one HBA 13 exists in the host 1. Further, the number of HBAs 13 included in each host 1 may not be the same. The host 1a may have two HBAs 13a, and the host 1b may have three HBAs 13a.

The memory 12 is a high-speed accessible storage device such as RAM (Random Access Memory) or ROM (Read Only Memory). The host 1 reads a program on the memory 12, and the CPU 11 executes the program. The host 1 may have another type of storage resource (such as an HDD) in addition to or instead of the memory 12. The memory 12 is an example of a storage device.

The management server 3 is a computer for performing management operations of the storage apparatus 2 and includes at least a CPU 31, a memory 32, a NIC 33, and a human interface device (HID) 34. The management server 3 reads a program on the memory 32, and the CPU 31 executes the program. The management server 3 may have another type of storage resource (such as an HDD) in addition to or instead of the memory 32. The memory 32 is an example of a storage device. The NIC 33 is a communication interface device for communicating with the host 1, the storage device 2, and the like. The HID 34 is a device group for a user to perform input / output processing of information with respect to the management server 3, and includes, for example, a keyboard and a display.

The network 4 is composed of, for example, one or more fiber channel switches 41 and fiber channel cables 42, and is used for data transfer between the host 1 and the storage device 2. The topology between the host 1 and the storage apparatus 2 shown in FIG. 1 is an example, and a topology different from that shown in FIG. 1 may be adopted. At least the network 4 only needs to be configured so that both the host 1a and the host 1b can access the storage areas of the

storage apparatuses

2a and 2b. In FIG. 1, only one cable 42 that connects one host 1 and one switch 41 or one cable 42 that connects one storage device 2 and one switch 41 is shown. 42 may be provided between the host 1 and the switch 41 (or between the storage apparatus 2 and the switch 41).

The inter-storage paths (5a, 5b) are transmission paths used for data transfer between the

storage apparatuses

2a, 2b. For example, fiber channel cables are used for the inter-storage paths (5a, 5b). Each of the inter-storage paths (5a, 5b) has a determined data transmission direction. The inter-storage path 5a is used when the storage apparatus 2a transmits a data transmission instruction and data to the storage apparatus 2b. On the other hand, the inter-storage path 5b is used when the storage apparatus 2b transmits a data transmission instruction and data to the storage apparatus 2a. Although only one inter-storage path 5a and inter-storage path 5b are shown in FIG. 1, a plurality of inter-storage paths 5a and inter-storage paths 5b may be provided.

The management network 6 is a network configured using Ethernet as an example, and is used for exchanging management information of the host 1 and the storage device 2. Details of the management information will be described later.

FIG. 2 is a software configuration diagram of the computer system according to the present embodiment. In the storage device 2, the processor in the CTL 21 executes the control program 25. By executing the control program 25, the CTL 21 forms one or a plurality of logical storage areas using one or more DISK 22 storage areas. In this embodiment, this logical storage area is called a logical device (LDEV) 24. The storage apparatus 2 provides the LDEV 24 to the host 1, and the host 1 issues an I / O request (read command, write command, etc.) to the LDEV 24 provided from the storage apparatus 2. The CTL 21 executes data access to the LDEV 24 in response to the I / O request received from the host 1.

FIG. 2 shows a configuration in which the storage apparatus 2a has one LDEV 24 (LDEV 24a-1) and the storage apparatus 2b has one LDEV 24 (LDEV 24b-1). However, the storage apparatus 2 can have more than one LDEV. Hereinafter, the LDEV 24a-1 may be referred to as “LDEVa” and the LDEV 24b-1 may be referred to as “LDEVb”.

The storage device 2 further has a function of performing remote copy processing. The storage apparatus 2 according to the present embodiment manages LDEVa and LDEVb as volume pairs. LDEVa is set as the primary volume, and LDEVb is set as the secondary volume.

By the remote copy function, LDEVa and LDEVb, which are volume pairs, maintain the same data stored state. Therefore, even if a failure occurs in the storage device 2a and the host 30 cannot access the LDEVa, if the host 1 accesses the storage device 2b (LDEVb), the data stored in the LDEVa by the host 1 (Replica).

Explains the outline of the data write process to the volume by remote copy. The storage apparatus 2 performs so-called synchronous remote copy. When the storage apparatus 2a accepts a data write request (write request) from the host 1 to the LDEVa area (assuming the address of this area is A), the storage apparatus 2a writes the data to the address A of the LDEVa and the inter-storage path 5a. The storage device 2b is instructed to write the data passed from the host 1 to the address A of the LDEVb. When the storage device 2b receives this instruction (and data), it writes the data to the address A of the LDEVb.

When the data writing to LDEVb is completed, the storage apparatus 2b returns a response to the effect that the data writing is completed to the storage apparatus 2a. The response is returned via the inter-storage path 5a. Upon receipt of the response from the storage device 2b, the storage device 2a responds to the host 1 that processing for the data write request received from the host 1 has been completed. By performing such processing, the state in which the same data is stored in LDEVa and LDEVb is maintained. Further, when a response to the write request is returned from the storage apparatus 2 to the host 1, it is guaranteed that data is stored in the two volumes constituting the volume pair.

Further, when the host 1 issues a write request to the LDEVb area of the storage apparatus 2b (assuming that the address of this area is B), the storage apparatus 2 performs the following processing. When the storage apparatus 2b receives a write request from the host 1, the storage apparatus 2b first instructs the storage apparatus 2a to write data to the address B of the LDEVa via the inter-storage path 5b. Further, the data passed from the host 1 is transmitted together with the instruction. When the storage device 2a receives this instruction (and data), it writes the data to the address B of LDEVa.

When the data writing to LDEVa is completed, the storage apparatus 2a returns a response (notifying that the data writing has been completed) to the storage apparatus 2b. The response is returned via the inter-storage path 5b. After receiving the response from the storage device 2a, the storage device 2b writes the data received from the host 1 to the address B of the LDEVb. When the data writing to the LDEVb is completed, the storage apparatus 2b responds to the host 1 that the processing for the data write request received from the host 1 has been completed.

The reason why data is written to LDEVa before LDEVb is that LDEVa is the primary volume. That is, when the storage apparatus 2 receives a write request for a volume pair, it first writes data to the primary volume.

Note that the example described above is an example when the computer system is operating normally. When a failure occurs in a part of the computer system, an operation different from the operation described above may be performed. For example, when a failure such as disconnection of the inter-storage path 5b occurs, data cannot be transmitted from the storage device 2b to the storage device 2a. In that case, the computer system according to the present embodiment may change the setting of the storage apparatus 2 so as not to accept a write request from the host 1 to the LDEVb. In this case, when the host 1 issues a write request to a volume pair composed of LDEVa and LDEVb, the write request is issued only to LDEVa. Details will be described later.

Subsequently, the pair management table T300 of the management information that the storage apparatus 2 has will be described. As described above, in principle, the storage system 1 stores the write data from the host 1 in two logical devices. For example, when the storage apparatus 2a receives a write request and write data for the LDEV 24a from the host 1, the write data is stored in the LDEV 24a of the storage apparatus 2a and the LDEV 24b of the storage apparatus 2b.

FIG. 3 shows the configuration of the pair management table T300. The pair management table T300 is management information that each storage apparatus 2 has. Information on one volume pair is stored in each row of the pair management table T300. In this embodiment, the S-VOL in which the copy of the P-VOL is stored is called “a volume that is paired with the P-VOL” or “a pair volume of the P-VOL”. Conversely, a P-VOL that is a logical device in which copy data of an S-VOL is stored is also referred to as “a volume that is paired with an S-VOL” or “a pair volume of an S-VOL”.

The storage apparatus 2 manages each pair with an identifier called a pair number (Pair #). In the storage apparatus 2, each logical device is managed with an identifier called a logical device number (LDEV #). In this embodiment, an integer value of 0 or more is used for each identifier. Hereinafter, a volume pair whose pair number is n (n is an integer value of 0 or more) is denoted as “pair #n”. A logical device whose LDEV # is n is expressed as “LDEV # n”.

Pair number is stored in Pair # (C301). Then, PDKC # (C303) and P-VOL # (C304) include information on P-VOL belonging to the volume pair (PDKC # and PDK which are the serial numbers of the storage apparatus 2 to which the P-VOL belongs). -VOL LDEV #) is stored. In addition, the SDKC # (C305) and S-VOL # (C306) include information on the S-VOL belonging to the volume pair (SDKC #, which is the serial number of the storage apparatus 2 to which the S-VOL belongs, and LDEV # of the S-VOL, respectively). ) Is stored.

The pair status (pair status) is stored in Pair Status (C302). Each volume pair has one of the states described below. These states are referred to as “pair status” in this embodiment.

(A) Initial-Copy state:
When forming a volume pair, the storage apparatus 2 first performs a process of copying all the contents of the P-VOL to the S-VOL (referred to as an initial copy process). This state during processing is referred to as an “Initial-Copy state”.

(B) Duplex state:
The state of the volume pair in which the contents of the P-VOL and the contents of the S-VOL are the same by the initial copy process or the resynchronization process described later is referred to as a “Duplex state”. When the volume pair is in Duplex state, the host 1 can issue a write request to either the P-VOL or S-VOL of the volume pair. Regardless of whether the host 1 writes data to the P-VOL or S-VOL of the volume pair, the storage apparatus 2 stores the data in both the P-VOL and S-VOL.

(C) Duplex (S) state:
Similar to the Duplex state, the Duplex (S) state means a state of a volume pair in which the contents of the P-VOL and the S-VOL are the same. Data written from the host 1 to the P-VOL of the volume pair in the Duplex (S) state is also copied to the S-VOL. However, in the volume pair in the Duplex (S) state, the P-VOL is in a writable state, but the S-VOL is in a state in which writing from the host 1 is impossible. Therefore, when the volume pair is in the Duplex (S) state, the host 1 needs to issue a write request only to the P-VOL of that volume pair (do not write to the S-VOL). At this time, the S-VOL is in a readable state.

(D) Suspend state:
When the pair status of the volume pair is in the states (a) to (c) described above, data written to the volume pair is stored in both the P-VOL and S-VOL. On the other hand, a state in which the contents of the P-VOL are not reflected in the S-VOL is referred to as a “Suspend state”. For example, when the transmission line connecting the storage apparatus 2a and the storage apparatus 2b is cut off and copying becomes impossible, the volume pair becomes “Suspend state”. Alternatively, the volume pair may enter the “Suspend state” according to an instruction from the user. In this state, the storage apparatus 2 manages that the P-VOL and S-VOL are in a pair relationship (P-VOL and S-VOL information remains stored in the pair management table T300). For example, when the storage apparatus 2a receives a write request for a pair volume (for example, P-VOL), the storage apparatus 2a performs data write to the P-VOL, but replicates data to the storage apparatus 2b (pair volume). Not implemented. The same applies when the storage apparatus 2b receives a write request.

(E) Duplex-Pending state:
When the volume pair is in a transitional state from the Suspend state to the Duplex state, the state of the volume pair is referred to as a “Duplex-Pending state”. In this state, the P-VOL (or S-VOL) data is S-VOL (or P-VOL) to match (synchronize) the contents of the P-VOL and S-VOL for the volume pair that was in the Suspend state. VOL). When the copying is completed, the state of the volume pair becomes “Duplex state”. The process of transitioning the “Suspend state” volume pair to the Duplex state is referred to as resynchronization processing (resync processing).

One of the five states described above is stored in the Pair Status (C302) of the pair management table T300. When 0 is stored in Pair Status (C302), the status of the volume pair is “Initial-Copy status”. When 1 is stored, the status of the volume pair is “Duplex status” Represents that. Further, when 2 is stored in the Pair Status (C302), it indicates that the volume pair state is “Duplex (S) state”. When 3 is stored, the state of the volume pair is “Suspend state”. If 4 is stored, the volume pair status is “Duplex-Pending status”.

In the above description, it is assumed that the “Initial-Copy state” and the “Duplex-Pending state” are different states. However, when the volume pair is in the “Initial-Copy state” or the “Duplex-Pending state”, both of them agree that the contents of the P-VOL and the S-VOL are in synchronization. Therefore, it is not always necessary to manage the two states separately, and they may be managed as one state.

In addition to the information about the volume pair, the storage apparatus 2 manages information about whether each logical device can be accessed. Therefore, the storage apparatus 2 has LDEV status information T400.

FIG. 4 shows an example of the LDEV status information T400. Mode (C402) stores the state of the logical device specified by LDEV # (C401). In this specification, when a logical device is in a state accessible from the host 1, the state of the logical device is referred to as a “Valid state”. Conversely, when the logical device is not accessible from the host 1 (when reading and writing cannot be performed, for example, when a failure occurs in the logical device), the state of the logical device is “Invalid state” or “Blocked state”. Called.

Mode (C402) can be either 0 or 1. 0 indicates that the state of the logical device is “Valid state”, and 1 indicates that the state of the logical device is “Invalid state”. When the pair status of the volume pair is “Duplex state”, the states of the P-VOL and S-VOL belonging to the volume pair are “Valid state”.

The LDEV status information T400 is information that each storage device 2 has. The LDEV status information T400 stores only the information on the logical device that one storage apparatus 2 has (for example, the LDEV status information T400 that the storage apparatus 2a has stores only the status of the logical device that the storage apparatus 2a has. In the LDEV status information T400 of the storage device 2b, only the status of the logical device of the storage device 2b is stored).

Taking the case where the computer system has the configuration of FIG. 2 as an example, transition of pair status and status of logical device will be described. When the host 1 issues a write request to the LDEVa, if the CTL 21a fails to write to the LDEVa because of a failure in the LDEVa (DISK 22a constituting the LDEVa), the CTL 21a returns an error to the host 1 At the same time, the LDEVa is closed (Mode (C402) is set to “1”). If the LDEVa is a pair volume, the Pair Status (C302) of the volume pair is changed to “2” (Suspend state).

Also, even when no failure has occurred in LDEVa, Pair Status (C302) may be changed to “2” (Suspend state). For example, when the host 1 issues a write request to LDEVa, if a failure occurs in the inter-storage path 5a, or if a failure occurs in LDEVb that is paired with LDEVa, the CTL 21a copies to LDEVb ( (Data writing) fails. In this case, the CTL 21a does not change the LDEVa state (Mode (C402)), but changes the Pair Status (C302) of the volume pair to “2” (Suspend state).

Next, various programs executed by the host 1 will be described with reference to FIG. The memory 12 of the host 1 stores at least an application program (AP) 102, an alternate path program 103, and a device driver 104.

The device driver 104 is a program for providing an access interface to the LDEV 24 for a higher-level program (AP 102, alternate path program 103). The device driver 104 maps the LDEV 24 (more precisely, the access path to the LDEV 24) to the device file. When the upper program designates a device file and issues a read or write request, the device driver 104 issues a read command or a write command to the LDEV 24 mapped to the designated device file.

In the computer system according to this embodiment, the device driver 104 maps LDEVa to the device file sda (141) and device file sdb (142), and maps LDEVb to the device files sdc (143) and sde (144). Yes. Hereinafter, device files (device files sda (141) to sde (144)) to which the device driver 104 maps the LDEV 24 are referred to as “scsi device files”.

The reason why the LDEVa is mapped to the scsi device files sda (141) and sdb (142) is because there are two access paths from the host 1 to the LDEVa of the storage device 2a. There are two HBAs 13 in the host 1, a path for accessing LDEVa via one of the two HBAs 13 is associated with the scsi device file sda (141), and access to LDEVa via the other HBA 13 Is associated with the scsi device file sdb (142).

The number of access paths formed between the host 1 and the logical device varies depending on the hardware configuration of the host 1, the setting of the storage device 2, or the setting of the network 4 (such as the fiber channel switch 41). . Here, as an example, only the case where there are two access paths from the host 1 to the LDEVa has been described, and three or more access paths may exist. For example, when there are four access paths from the host 1 to LDEVa, four scsi device files are mapped to LDEVa.

The alternate path program 103 is a program that dynamically switches the access path to the device file. The alternate path program 103 provides a device file sddlma (131) to a higher-level program such as the AP 102. In this embodiment, the device file provided by the alternate path program 103 is called “device file”, and the device driver 104 uses the device files (device files sda (141) to sde (144) used to map the access path to the LDEV 24. )) Is called “scsi device file” and is distinguished from each other.

The device file sddlma (131) includes scsi device files sda (141) and sdb (142) mapped to LDEVa, and sdc (143) and sde (144) mapped to LDEVb, which is a pair volume of LDEVa. ) Is mapped. That is, the alternate path program 103 generates one device file (for example, sddlma (131)) for one volume pair and provides it to the AP 102 or the like. When the host 1 (the AP 102) accesses the logical device, it issues an access request using the device file (for example, sdlma (131)). At this time, whether the access destination logical device of the host 1 is P-VOL or S-VOL has not been determined. That is, the host 1 (its AP 102) substantially issues an access request to the volume pair. Therefore, in this specification, when the host 1 accesses the P-VOL or S-VOL of the volume pair, it may be referred to as “accessing the volume pair”.

For example, when the AP 102 accesses (reads or writes) the sddlma (131), the alternate path program 103 accesses any one of the sda (141) to sde (144). For example, when the alternate path program 103 performs data write to sda (141), the device driver 104 writes data to the LDEVa of the storage apparatus 2a. The data written to LDEVa is also copied to LDEVb by the remote copy function of the storage apparatus 2 as described above.

If the LDEVa (sda (141)) is inaccessible, an error is returned from the storage apparatus 2a to the alternate path program 103. In this case, the alternate path program 103 stores other scsi device files (sdb (142) to Retry access to LDEVa using sde (144)). Since the access destination is changed transparently to the AP 102, the AP 102 does not need to stop the processing, and the computer system can continue the operation without stopping.

However, depending on the cause of the failure, even if the access to the LDEV is retried using another scsi device file, the access to the LDEV of the alternate path program 103 may fail. In that case, more complicated processing than that described above is performed. Details of this processing will be described later.

The AP 102 is a program that accesses data stored in the LDEV 24. The AP 102 can be any type of program. For example, it may be a database management system (DBMS) or a file system. Alternatively, it may be cluster software (a program that continues business on the host 1b on which the same cluster software is being executed when the host 1a is down due to a failure). A plurality of types of APs 102 may be executed on the host 1.

In the computer system according to the present embodiment, the host 1a is configured to be accessible to both LDEVa and LDEVb. Similarly, the host 1b is configured to be accessible to both LDEVa and LDEVb. Here, the configuration example in which only one volume pair exists has been described, but a plurality of volume pairs may exist in the

storage apparatuses

2a and 2b.

On the other hand, in the management server 3, the integrated management program 301 is executed. The integrated management program 301 acquires computer system information from the alternate path program 103 executed on each host 1, and sets the volume pair of the storage device 2 and the alternate path for each host 1 based on the acquired information. The setting of the program 103 is changed. When the integrated management program 301 changes the settings of the storage apparatus 2 and each host 1, it issues an instruction via the management network 6.

Next, the contents of the management information used by the host 1 program (alternate path program 103) will be described. FIG. 5 shows an example of the contents of the alternate path management table T100. The alternate path management table T100 is a table for storing management information used by the alternate path program 103. The pair # (C102), the device file C103, the scsi device file C104, the path ID (C105), the LDEV # (C106), Attribute (C107), path status (C108), number of available paths (C109), number of paths available to other hosts (C109-2), product number (C110), port name (C111), volume pair status (C112), The host-storage all path failure flag (C113), the P → S all path failure flag (C114), and the S → P all path failure flag (C115) are provided.

Each row (record) stores information of an access path (access path to LDEV) that the host 1 has. The path ID (C105) is an identification number of an access path managed by the host 1, and an identification number unique to the host 1 is used for each access path. LDEV # (C106) is a logical device number of a logical device that can be accessed through an access path. The scsi device file C104 stores the name of the scsi device file defined in the access path. On the other hand, the device file C103 stores the name of the device file that the alternate path program 103 provides to the upper program. The scsi device file with the name stored in the scsi device file (C104) is mapped to the device file with the name stored in the device file (C103). The pair # (C102) stores the pair number of the volume pair to which the LDEV (LDEV (C106)) mapped to the access path belongs.

Attribute (C107) is a column for storing information on the attribute of the logical device. When “P-VOL” is stored, it means that the logical device is a primary volume, and when “S-VOL” is stored, it means that the logical device is a secondary volume.

The path status (C108) represents the status of the access path. When “Success” is stored, it means that the access path to the logical device is normal. If “Failure” is stored, it means that an error has occurred in the access path to the logical device, and the logical device cannot be accessed using the access path. There can be various reasons why an access path abnormality occurs. For example, a failure may occur in the network 4 between the host 1 and the storage apparatus 2 (for example, a failure in the fiber channel switch 41 or a disconnection in the fiber channel cable 42). Alternatively, there may be a failure in the storage device 2 or the logical device.

Before describing the number of available paths (C109) and the number of other host usable paths (C109-2), the definition of “available paths” will be described. In this embodiment, of the paths from the host 1 to the logical device (P-VOL or S-VOL), the path that can be used for accessing the logical device (that is, the path status (C108) is normal (Success)). A certain path) is called an “available path”.

The number of available paths (C109) indicates the number of available paths (path status (C108) is normal (Success)) among paths from the host 1 to the logical device (P-VOL or S-VOL). To express. The number of available paths (C109) is counted for each P-VOL or each S-VOL. Specifically, the number of P-VOL usable paths in a certain volume pair is specified by identifying a record having an attribute (C107) of “P-VOL” among logical devices having the same device file (C103), and further The path status (C108) is identified by counting the number of records that are normal (Success). Of course, the number of usable paths of the S-VOL of the volume pair is specified by the same method.

The other host usable path number (C109-2) is information similar to the usable path number (C109). However, the number of paths available to other hosts (C109-2) stores the number of paths that can be used among paths to logical devices (P-VOL or S-VOL) of other hosts 1. Since each host 1 in the computer system cannot know the status of the access path of the other host 1 in principle, the information stored in the number of other host usable paths (C109-2) is obtained from the management server 3. Derived based on information. Details will be described later.

The product number (C110) is the product number of the storage device 2 to which the logical device belongs, and the port name (C111) is the port of the storage device 2 (interface for communication with the host 1) on the path of the access path. It is an identifier. The volume pair status (C112) is a column for storing the pair status of the volume pair. In this embodiment, a character string is stored in the volume pair status (C112), for example, a character string such as “Duplex” is stored in the volume pair status (C112) as shown in FIG. An example will be described. Specifically, as described in the description of the pair status, “Initial-Copy”, “Duplex”, “Duplex (S)”, “Suspend”, or “Duplex-pending” is stored. However, similarly to the pair management table T300, numerical values from 0 to 4 may be stored.

The all-path failure flag between host and storage (C113) is information defined for each logical device as well as the number of available paths (C109). When the path status (C108) of the path from the host to a certain logical device is “Failure”, “ON” is stored in the all-path failure flag (C113) between the host and storage of the logical device. Stores “OFF”.

The P → S all path failure flag (C114) is information defined for the P-VOL. For this reason, no information is stored in the P → S all-path failure flag (C114) of the row whose attribute (C107) is not “P-VOL”. In the inter-storage paths (5a, 5b), when all paths from the storage apparatus 2 having the primary volume to the storage apparatus 2 having the secondary volume are blocked, the P → S all path failure flag (C114) is set to “ ON "is stored. For example, in FIG. 1, when a P-VOL exists in the storage apparatus 2a and an S-VOL exists in the storage apparatus 2b, the P → S all path failure flag (C114) is “ON” when the inter-storage path 5a is blocked. Is stored. When there are a plurality of inter-storage paths 5a, "ON" is stored in the P → S all-path failure flag (C114) when all the inter-storage paths 5a are blocked.

On the other hand, the S → P all-path failure flag (C115) is information defined for the S-VOL. For this reason, no information is stored in the S → P all-path failure flag (C115) of the row whose attribute (C107) is not “S-VOL”. In the inter-storage paths (5a, 5b), when all paths from the storage apparatus 2 having the secondary volume to the storage apparatus 2 having the primary volume are blocked, the S → P all-path failure flag (C114) is set to “ ON "is stored. For example, in FIG. 1, when a P-VOL exists in the storage device 2a and an S-VOL exists in the storage device 2b, the S → P all-path failure flag (C115) is “ON” when the inter-storage path 5b is blocked. Is stored. When there are a plurality of inter-storage paths 5b, "ON" is stored in the S → P all-path failure flag (C115) when all the inter-storage paths 5b are blocked.

The integrated management program 301 of the management server 3 manages the contents of the alternate path management table T100 notified from each host 1. In the present embodiment, information including the contents of the alternate path management table T100 notified from each host 1 is referred to as an “event”. The contents of the management information (integrated path management table T200) maintained by the integrated management program 301 will be described with reference to FIG. The integrated path management table T200 is also called a path management DB.

The integrated path management table T200 includes at least a host name (C201), a pair # (C202), a path ID (C205), an LDEV # (C206), an attribute (C207), a path status (C208), and the number of available paths (C209). , Serial number (C210), port name (C211), volume pair status (C212), host-storage all path failure flag (C213), P → S all path failure flag (C214), S → P all paths It has a column of a failure flag (C215). Among these, pair # (C202), path ID (C205), LDEV # (C206), attribute (C207), path status (C208), number of available paths (C209), serial number (C210), port name (C211) ), Volume pair status (C212), host-storage all path failure flag (C213), P → S all path failure flag (C214), S → P all path failure flag (C215), alternate path management table Pair # (C102), path ID (C105), LDEV # (C106), attribute (C107), path status (C108), number of available paths (C109), serial number (C110), port name (C111) in T100 ), Volume pair status (C112), host-storage all path failure flag (C113), P → S all path failure Lugs (C114), is the same information as the S → P between the all-path fault flag (C115).

When the management server 3 receives the contents of the alternate path management table T100 from the host 1, it reflects the received contents in the integrated path management table T200. For example, when the contents of the alternate path management table T100 are received from the host 1 whose host name is “host A”, the management server 3 places the host name (C201) in the “host A” line in the integrated path management table T200. Store the received information.

Specifically, pair # (C102), path ID (C105), LDEV # (C106), attribute (C107), path status (C108), number of available paths (C109), manufactured in the received alternate path management table T100, (C110), port name (C111), volume pair status (C112), all-path failure flag between host and storage (C113), all-path failure flag between P → S (C114), all-path failure flag between S → P The information of (C115) includes pair # (C202), path ID (C205), LDEV # (C206), attribute (C207), path status (C208), number of available paths (C209), and product number (C210). , Port name (C211), volume pair status (C212), host-storage all path failure flag (C213) , P → S between total path failure flag (C214), is reflected in the S → P between total path failure flag (C215). In FIG. 6, there is a part described as “OFF → ON” in the column of the all-path failure flag between S → P (C215), which represents an update transient state. The meaning of the description of this part will be described later.

Also, an event is transmitted from the host 1 to the management server 3 when a volume pair is set or an access path from the host 1 to the volume pair is created. Therefore, when the AP 102 of the host 1 starts accessing the volume pair, the integrated path management table T200 reflects the contents of the alternate path management table T100 received from each host 1.

Subsequently, the flow of processing performed by the host 1 when the host 1 issues a write request to the storage apparatus 2 will be described. FIG. 7 to FIG. 10 are explanatory diagrams of the processing flow when the host 1 issues a write request to the P-VOL of the storage apparatus 2.

In the following, the computer system has the configuration shown in FIGS. 1 and 2, the storage apparatus 2a has a P-VOL, and the storage apparatus 2b has an S-VOL that is paired with the P-VOL. This will be described as an example. For this reason, the host 1 issues an instruction to the storage apparatus 2a at a place where the processing for the P-VOL by the host 1 is described, such as “issue a write request to the P-VOL” in the following description. Means. On the other hand, a part in which processing for the S-VOL by the host 1 is described such as “issue a write request to the S-VOL” means that the host 1 issues an instruction to the storage apparatus 2b. .

Further, when the storage apparatus 2a receives a write request and write data for the P-VOL from the host 1, the storage apparatus 2a transmits a copy of the write data to the S-VOL (storage apparatus 2b) via the inter-storage path 5a. When the storage apparatus 2b receives a write request and write data for the S-VOL from the host 1, the storage apparatus 2b transmits a copy of the write data to the P-VOL (storage apparatus 2a) via the inter-storage path 5b. Therefore, the inter-storage path 5a is referred to as “P-VOL → S-VOL storage path” or “P → S storage path”. On the other hand, the inter-storage path 5b is referred to as “S-VOL → P-VOL storage path” or “S → P storage path”. The access path from the host 1 to the P-VOL is called “host-P-VOL path”, and the access path from the host 1 to the S-VOL is called “host-S-VOL path”. Call.

In step 1001, the AP 102 issues a write request to the volume pair (this is equivalent to issuing a write request specifying a device file). Here, it is assumed that the AP 102 designates the device file sddlma. In step 1002, the alternate path program 103 identifies one write-destination access path (scsi device file) by referring to the write request issued by the AP 102 and the alternate path management table T100, and uses the identified access path. Issue a write request to the logical device.

Specifically, the alternate path program 103 identifies one record whose path status 108 is “Success” from the records in which the device file name specified in step 1001 is the same as the device file C103. The alternate path program 103 issues a write request to the logical device using the scsi device file C104 of the record specified here.

7 to 10 are diagrams for explaining an example of processing when a write request is issued to the P-VOL. In step 1002, a record whose logical device attribute C107 is “P-VOL” is selected. A write request is issued to the P-VOL (storage device 2a). An example of the case where a record whose logical device attribute C107 is “S-VOL” is selected will be described later (FIGS. 11 to 14).

In the example illustrated in FIGS. 7 to 14, the case where the volume pair status (C112) of the access target volume pair stored in the alternate path management table T100 at the time of execution of step 1002 is “Duplex”. explain. When the volume pair status (C112) of the access target volume pair is “Duplex”, in step 1002 (or step 2002), the alternate path program 103 may select a record whose logical device attribute C107 is “P-VOL”. If so, a record that is “S-VOL” may be selected. On the other hand, when the volume pair status (C112) of the access target volume pair is “Duplex (S)”, at least when the alternate path program 103 issues a write request, a record whose logical device attribute C107 is “P-VOL” is displayed. Select (because it is impossible to write to S-VOL).

In step 1003, the alternate path program 103 receives a response to the write request issued in step 1002 from the storage apparatus 2a (accurately, the response may not be received). If the status of this response is “normal end” (step 1003: Yes), the alternate path program 103 notifies the AP 102 that the write request has been normally made (step 1010), and the processing is ended. If the write process is not performed normally, the processes after step 1004 are performed.

If the write process is not performed normally, there may be several causes. One is a case where a write request issued by the host 1 does not reach the storage device 2 because a failure has occurred in the network 4. In this embodiment, this state is referred to as “the access path is in a link down state”.

Another reason why the write process is not performed normally is when the P-VOL is in a blocked state. For example, when a failure occurs in a plurality of DISKs 22 used for forming a P-VOL, the storage apparatus 2a puts the P-VOL into a blocked state and returns an error response to the host 1. At this time, the storage apparatus 2a notifies the host 1 that the cause of the error is that the logical device is blocked, in the error response.

Also, when the S-VOL is in a blocked state, or when a failure occurs in the inter-storage path 5a, the write process is not performed normally. In this case, even if the storage apparatus 2a transmits a copy of the write data to the storage apparatus 2b, an error is returned from the storage apparatus 2b or no response is returned from the storage apparatus 2b. Therefore, the storage apparatus 2a returns an error response to the host 1. At this time, the storage apparatus 2a notifies the host 1 of the error response that the cause of the error is the storage apparatus 2b side or the inter-storage path 5a.

First, the processing flow when the access path to the logical device is in the link down state will be described. If the response is not returned from the storage apparatus 2a within a predetermined time, the alternate path program 103 determines that the link is down (step 1004: Yes), and sets the path status (C108) of the write request issue destination access path to “Failure”. "(Step 1005).

Subsequently, the alternate path program 103 checks whether the path status (C108) is “Failure” for all access paths to the same logical device (P-VOL) as the access path whose path status (C108) has been changed in step 1005. . As a result, if there is an access path whose path status (C108) is not “Failure” (step 1006: No), a write request is issued to the P-VOL using that access path (that is, step 1002 is executed). Conversely, when all the path statuses (C108) are “Failure” (step 1006: Yes), step 1007 is performed. In step 1007, the alternate path program 103 sets the host-storage all path failure flag (C113) of the P-VOL to be accessed to “ON”.

When an error response indicating logical device blockage is returned from the storage apparatus 2 (step 1004: No, and step 1011: Yes), the alternate path program 103 sets the path status (for all access paths of the P-VOL to be accessed) C108) is changed to “Failure” (step 1012). This is because when the logical device (P-VOL) is in a blocked state, it is substantially equivalent to a state in which all access paths from the host 1 to the P-VOL are blocked.

Further, the alternate path program 103 sets the host-storage all path failure flag (C113) and the P → S all path failure flag (C114) of the P-VOL to be accessed to "ON", and sets the volume pair to be accessed. The S → P all-path failure flag (C115) is set to “ON” (step 1013). The reason why the all-path failure flag between P → S (C114) and the all-path failure flag between S → P (C115) is set to “ON” is that the logical device (P-VOL or S-VOL) is blocked. This is because mirroring using the remote copy function is not possible. Further, the S → P all-path failure flag (C115) which is turned “ON” here is a flag (C115) defined for the pair volume (S-VOL) of the access target P-VOL. After step 1013, step 1031 is performed.

The processing flow when an error other than the logical device blockage has occurred (step 1011: No) will be described later.

After step 1007 or step 1013, the alternate path program 103 issues a write request to the S-VOL (storage device 2b) that is paired with the P-VOL (step 1031). The process of step 1031 is similar to step 1002, but in step 1031 the alternate path program 103 has the same pair # (C102) as the record selected in step 1002 from the alternate path management table T100, and Select a record whose attribute C107 is “S-VOL” and whose path status (C108) is “Success” (note that before the processing of step 1031, the path status (C108) of the access path to the P-VOL is All are changed to “Failure.” Therefore, in step 1031, the alternate path program 103 does not actually have to determine whether or not the attribute C107 is “S-VOL”). A write request to the logical device is issued using the scsi device file C104 of the record selected here.

In step 1032, the alternate path program 103 receives a response to the write request issued in step 1031 from the storage apparatus 2b. If the status of this response is “normal end” (step 1032: Yes), the alternate path program 103 notifies the AP 102 that the write request has been made normally (step 1065). Thereafter, the alternate path program 103 notifies the event (failure information) to the integrated management program 301 of the management server 3 (step 1038), and ends the process. As described above, the event (failure information) is information including the contents of the alternate path management table T100.

If no response is returned from the storage apparatus 2b, that is, if the access path is in a link down state (step 1033: Yes), the alternate path program 103 sets the path status (C108) of the access path that issued the write request to “Failure”. "(Step 1034). This process is the same as step 1005. Then, if there is an access path whose path status (C108) is “Success” among the access paths to the S-VOL, the alternate path program 103 retries the write request using the access path. Is repeated (step 1035, step 1031).

If the path status (C108) of all the access paths to the S-VOL is “Failure” (step 1035: Yes), the alternate path program 103 indicates that all paths between the host and storage of the S-VOL to be accessed have failed. The flag (C113) is set to “ON” (step 1036). The alternate path program 103 then responds to the application program 102 with an I / O error (notifying that the write process has failed) (step 1037). Further, the alternate path program 103 notifies the event (failure information) to the integrated management program 301 of the management server 3 (step 1038), and ends the process.

When an error response indicating logical device blockage is returned from the storage apparatus 2b (step 1033: No, and step 1041: Yes), the alternate path program 103 sets the path status (for all access paths of the S-VOL to be accessed) C108) is changed to “Failure” (step 1042). Further, the alternate path program 103 sets the all-path failure flag between host and storage (C113) and the all-path failure flag between S → P (C115) of the S-VOL to be accessed to “ON”, and the pair of the access target S-VOL. The P → S all path failure flag (C114) of the volume (P-VOL) is set to “ON” (step 1043). The reason why Step 1043 is performed is the same as the reason why Step 1013 is performed.

Thereafter, the alternate path program 103 responds with an I / O error to the application program 102 (step 1037), notifies the integrated management program 301 of the management server 3 of an event (failure information) (step 1038), and ends the processing.

An error when the logical device (S-VOL) is normal (not blocked) and the access path is not in a link down state means that a failure has occurred in the inter-storage path 5b (Step 1041: No). Case). In this case, the alternate path program 103 sets the S → P all path failure flag (C115) of the access target S-VOL to “ON”, and the P → S of the pair volume (P-VOL) of the access target S-VOL. The all path failure flag (C114) is set to “ON” (step 1051). When Step 1051 is executed, at least the S-VOL is normal and the access path from the host 1 to the S-VOL can be used, so the all-path failure flag (C113) between the host and storage is set to “ON”. Not.

Subsequently, the alternate path program 103 checks the access path status to the P-VOL that is the access target in this write process (step 1052). Specifically, the alternate path program 103 refers to the value of the number of other host usable paths (C109-2) of the P-VOL that is the access target in the current write target from the alternate path management table T100. .

If the value of the number of paths that can be used by other hosts (C109-2) is 0 (step 1052: Yes), all hosts 1 other than the own host in the computer system are accessed by this write process. The VOL cannot be accessed. When step 1052 is executed, the own host is also in a state where it cannot access the P-VOL that is the access target in this write process. In this case, the alternate path program 103 instructs the storage apparatus 2a to block the P-VOL (step 1053). This instruction is transmitted from the host 1 via the management network 6 to the storage apparatus 2a. Upon receipt of the instruction, the storage apparatus 2a closes the P-VOL and sets the pair status of the volume pair to which the P-VOL belongs to “Suspend state”. Thereafter, the alternate path program 103 issues a write request to the S-VOL and stores data in the S-VOL (step 1054). After step 1054, the alternate path program 103 executes step 1065 and step 1038, and ends the process.

If the value of the number of paths available to other hosts (C109-2) is not 0 (step 1052: No), that is, among the hosts 1 in the computer system, the access to the P-VOL to be accessed in the current write processing When there is a host 1 having a normal access path, the alternate path program 103 responds to the application program 102 with an I / O error (notifying that the write process has failed) (step 1037). Further, the alternate path program 103 notifies the event (failure information) to the integrated management program 301 of the management server 3 (step 1038), and ends the process. Note that the value of the number of paths available to other hosts (C109-2) is a value that is updated when a notification is received from the management server 3 separately from this processing (the processing of FIGS. 7 to 10). Details of the update processing of the number of other host usable paths (C109-2) will be described later.

In step 1052 described above, whether or not there is a host 1 having a normal access path to the access target P-VOL by referring to the value of the number of other host available paths (C109-2). It was judged. However, as another embodiment, the alternate path program 103 requests the integrated management program 301 of the management server 3 to acquire the integrated path management table T200, thereby providing a host having a normal access path to the access target P-VOL. It may be determined whether or not there is one. When the alternate path program 103 acquires the integrated path management table T200 from the management server 3, the alternate path program 103 includes LDEV # (C206), attribute (C207), storage device serial number among the records of the integrated path management table T200. (C210) specifies the same record as the P-VOL that is the access target in the current write target. Then, the alternate path program 103 determines whether all the host-storage all path failure flags (C213) of these records are “ON”. At this time, the alternate path program 103 stores the name of the host 1 executing the alternate path program 103 in the information about the access path of the own host (host name (C201)) in the record of the integrated path management table T200. Record) is not referenced.

If all the path fault flags between the host and storage (C113) of the specified record are all “ON” (step 1052: Yes), all the hosts 1 in the computer system have become access targets in this write process. The P-VOL cannot be accessed. In this case, the alternate path program 103 executes Step 1053, Step 1054, Step 1065, and Step 1038, and ends the process. If there is a record in which the all-path failure flag between host and storage (C113) is “OFF” in the specified record (step 1052: No), the alternate path program 103 performs steps 1037 and 1038. Then, the process ends.

Subsequently, the case where an error other than the logical device blockage has occurred as a result of executing Step 1002 (Step 1011: No) will be described. In this case, a failure may have occurred in the inter-storage path 5a, or the S-VOL may be blocked. In this state, at least data transmission using the inter-storage path 5a, that is, P-VOL data Cannot be copied to S-VOL. Therefore, in this case, the alternate path program 103 sets the all-path failure flag (C114) between P → S of the P-VOL to be accessed to “ON” (step 1015). Thereafter, the alternate path program 103 issues a write request to the S-VOL that is paired with the P-VOL (step 1016). Step 1016 is the same processing as step 1031.

As a result of executing Step 1016, when the access path is in the link down state (Step 1017: Yes), the alternate path program 103 changes the path status (C108) of the access path that issued the write request to “Failure” (Step 1017). 1018). This process is the same as step 1034. Then, if there is an access path whose path status (C108) is “Success” among the access paths to the S-VOL, the alternate path program 103 reissues the write request using the access path. Are repeated (step 1019, step 1016).

If the path status of all access paths to the S-VOL is “Failure” (step 1019: Yes), the alternate path program 103 indicates the all-path failure flag between the host and storage of the S-VOL to be accessed (C113). ) Is turned “ON”, and the S → P all-path failure flag (C115) is turned “ON” (step 1020). Subsequently, the alternate path program 103 instructs the storage apparatus 2b to change the state of the S-VOL to be accessed to the blocked state (step 1070). This instruction is transmitted from the host 1 to the storage apparatus 2b via the management network 6. As a result, this S-VOL becomes inaccessible from all hosts 1.

Thereafter, the alternate path program 103 issues a write request to the P-VOL that is paired with the S-VOL (step 1075). At the time of execution of step 1075, since the access path to the P-VOL is normal (not in the link down state) and the P-VOL is not in the blocked state, the write request from the host 1 to the P-VOL is successful. Thereafter, the alternate path program 103 executes step 1065 and step 1038, and then ends the processing.

As a result of executing Step 1016, when an error response indicating logical device blockage is returned from the storage apparatus 2 (Step 1017: No, and Step 1021: Yes), the alternate path program 103 accesses all the S-VOLs to be accessed. For the path, the path status (C108) is changed to “Failure” (step 1022). Further, the alternate path program 103 sets the all-path failure flag between host and storage (C113) and the all-path failure flag between S → P (C115) of the S-VOL to be accessed to “ON” (step 1023).

Thereafter, the alternate path program 103 issues a write request to the P-VOL that is paired with the S-VOL (step 1075). After step 1075, the alternate path program 103 executes step 1065 and step 1038, and then ends the process.

When an error response indicating an inter-storage path failure (in this case, an inter-storage path 5b failure) is returned as a result of executing Step 1016 (Step 1017: No, Step 1021: No, and Step 1024: Yes), an alternate path The program 103 sets the S → P all path failure flag (C115) of the S-VOL to be accessed to “ON” (step 1026).

If the determination in step 1024 is affirmative, the alternate path program 103 indicates that the access path from the host 1 to the P-VOL and the access path from the host 1 to the S-VOL are normal, and both the P-VOL and S-VOL are It can be judged that it is in a normal state. However, the remote copy function cannot be executed because there is a failure in the inter-storage path (5a, 5b). As a result, when the determination in step 1024 is performed, the

storage apparatuses

2a and 2b have changed the pair status of the access target volume pair to the Suspend state, and the storage apparatus 2a has received a write request to the P-VOL. Copying to the S-VOL is not performed, and the storage apparatus 2b is also configured not to copy to the P-VOL when a write request to the S-VOL is accepted.

Therefore, the alternate path program 103 issues a write request to both the P-VOL and S-VOL, thereby performing data mirroring on behalf of the storage apparatus 2 (step 1060). In the case where the determination in step 1024 is performed, P-VOL and S-VOL are normal (not in a blocked state). Therefore, it is desirable that the alternate path program 103 writes data in at least either P-VOL or S-VOL. However, at this point, the failure of the inter-storage path (5a, 5b) may not have been detected in the other host 1 (this volume pair may be recognized as normal. The host 1 recognizes that the same data is stored in the P-VOL and S-VOL (data can be read from any logical device). Therefore, in the case where the determination of step 1024 is performed, if the alternate path program 103 writes data only to the P-VOL (or only S-VOL), the other host 1 is the logical device to which the alternate path program 103 has written data. There is a possibility that data is read from a different logical device and incorrect data is accessed. In order to prevent this, the alternate path program 103 writes data in both the P-VOL and the S-VOL in step 1060.

After that, the alternate path program 103 executes step 1065 and step 1038, and then ends the process. By executing step 1038, the management server 3 (integrated management program 301) changes the setting of each host 1 so that each host 1 accesses only one of P-VOL and S-VOL. Will be done. Processing performed by the management server 3 will be described later.

If the determination at step 1024 is negative, the alternate path program 103 does not execute step 1026 but executes step 1060, step 1065, and step 1038, and ends the process.

FIG. 11 to FIG. 14 are explanatory diagrams of processing flow when the host 1 issues a write request to the S-VOL of the storage apparatus 2. The flow of processing described in FIGS. 11 to 14 is similar in many respects to the flow of processing illustrated in FIGS. 7 to 10, and only the differences will be described below. 7 to 10, the computer system has the configuration shown in FIGS. 1 and 2 below, the P-VOL exists in the storage apparatus 2a, and the P-VOL is paired with the P-VOL. A case where a related S-VOL exists in the storage apparatus 2b will be described as an example.

Steps 2001 to 2023 in FIGS. 11 and 12 are almost the same as steps 1001 to 1023 in FIGS. However, the portion described as P-VOL in Step 1001 to Step 1023 (processing for P-VOL) becomes S-VOL in Step 2001 to Step 2023 (changed to processing for S-VOL). The portions described as S-VOL in 1001 to Step 1023 (processing for S-VOL) are different in that they become P-VOL (changed to processing for P-VOL) in Step 2001 to Step 2023. Specifically, in step 1002, the alternate path program 103 issues a write request to the P-VOL, but in step 2002, the alternate path program 103 issues a write request to the S-VOL.

In step 1006, the alternate path program 103 determines whether the path status (C108) of all access paths to the P-VOL to be accessed is “Failure”, but in step 2006, the alternate path program 103 determines whether the alternate path program 103 is the S-target to be accessed. It is determined whether the path status (C108) of all access paths to the VOL is “Failure”. In step 1012, the alternate path program 103 changes the path status (C108) to “Failure” for all access paths of the P-VOL to be accessed. In step 2012, the alternate path program 103 executes the S-VOL to be accessed. For all the access paths, the path status (C108) is changed to “Failure”.

In step 1013, the alternate path program 103 sets the all-host-storage path failure flag (C113) of the P-VOL to be accessed to “ON”, but in step 2013, the alternate path program 103 sets the S-target to be accessed. -The VOL all-host failure path failure flag (C113) is set to "ON". In other steps as well, the processing performed on the P-VOL in FIGS. 7 to 8 is changed to the processing performed on the S-VOL in FIGS. Except for this point, the processing in FIGS. 7 and 8 is the same as the processing in FIGS. 11 and 12.

The steps described in FIGS. 13 and 14 are also almost the same as the steps described in FIGS. However, the processing at the location described as P-VOL in FIGS. 9 and 10 is different from that in FIGS. 13 and 14 in that it is changed to processing for S-VOL. For example, in step 1031, the alternate path program 103 issues a write request to the S-VOL that is paired with P-VOL, but in step 2031, the alternate path program 103 is a P-pair that is paired with S-VOL. The processing is changed to issue a write request to the VOL.

Steps 2051 to 2054 in FIG. 14 are the same processes as steps 1051 to 1054 in FIG. However, in FIG. 14, between step 2051 and step 2054, the same processing as in step 1052 in FIG. 10 (all hosts 1 in the computer system cannot access the S-VOL to be accessed in this write processing) There is no process for determining whether or not. In the processing of FIG. 10, the alternate path program 103 blocks the P-VOL when all the hosts 1 in the computer system cannot access the P-VOL that is the access target in the current write processing, and otherwise In response, an I / O error is returned to the application program 102. In the processing of FIG. 14, the alternate path program 103 uniformly blocks the S-VOL regardless of whether or not all the hosts 1 in the computer system can access the S-VOL accessed by the current write processing. The storage device 2b is instructed (step 2053).

Step 2075 is the same processing as Step 1075 in FIG. 10, except that the alternate path program 103 issues a write request to the S-VOL instead of the P-VOL. Steps 2024, 2026, and 2060 are the same processes as steps 1024, 1026, and 1060 in FIG. However, in step 2024, it is determined whether or not an error response indicating a failure in the inter-storage path 5a has been returned. In step 2026, the alternate path program 103 sets the all-path failure flag between P → S of the P-VOL to be accessed ( C114) is turned “ON”. If no error response indicating a failure in the inter-storage path 5a is returned in step 2024 (step 2024: No), step 2060 is not executed.

Step 2071 is similar to step 1070 in FIG. 10, but P-VOL is blocked instead of S-VOL. In FIG. 14, the same processing as step 1052 of FIG. 10 is performed before step 2071. In other words, before step 2071, the alternate path program 103 determines whether there is a host 1 having an access path to the P-VOL that is the access target in the current write processing for all the hosts 1 in the computer system (step 2070). If this determination is affirmative, that is, if there is a normal access path to the P-VOL that is the access target in the current write process (step 2070: No), the alternate path program 103 executes steps 2037 and 2038. Execute to finish the process. If the determination in step 2070 is negative (when all the hosts 1 in the computer system do not have a normal access path to the P-VOL that is the access target in the current write process), the alternate path program 103 Instructs the storage apparatus 2a to block the P-VOL (step 2071). Thereafter, the alternate path program 103 executes Step 2075, Step 2065, and Step 2038, and ends the process.

Subsequently, the flow of processing performed by the management server 3 (integrated management program 301) will be described with reference to FIGS. 15 to 17, the trapezoid box represents the start of the loop process, and the inverted trapezoid box represents the end of the loop process. For example, in FIG. 15, there is a trapezoid box at step 5004 and an inverted trapezoid box at step 5025 in FIG. In step 5004, “loop for the number of volume pairs” is described. In this case, the processing between Step 5004 and Step 5025 (Step 5005 to Step 5024) is repeatedly executed for the number of volume pairs (volume pairs possessed by the storage apparatus 2) under the management of the management server 3.

The integrated management program 301 continues to wait for an event notification from the host 1, and starts the processing from step 5001 when the event is notified from the host 1. In steps 5001 and 5002, the integrated management program 301 receives an event from the host 1, and reflects the content of the received event in the path management DB (integrated path management table T200).

An outline of processing when the contents of the received event are reflected in the integrated path management table T200 will be described. As described above, the event notified from the host 1 to the management server 3 is information including the contents of the alternate path management table T100 managed by the host 1. The event may include other information, for example, information indicating that the path status of the host 1 has been changed. When the management server 3 is notified of an event from the host 1 whose host name is “host A” (hereinafter, this host 1 is called “host A”), the management server 3 has an alternate path management table that the host 1 has. Receive the entire contents of T100. In step 5002, the integrated management program 301 reflects the received contents of the alternate path management table T100 in the row (record) of the integrated path management table T200 with the host name (C201) being “host A”.

At this time, in step 5002, the contents of the all-path failure flag between P → S (C214) and the all-path failure flag between S → P (C215) are also obtained for lines other than the host name (C201) “host A”. May change. An explanation will be given by taking FIG. 6 as an example. FIG. 6 shows a state in which the information received from the host A by the integrated management program 301 is being reflected in the integrated path management table T200. In particular, the information in the row of the host name (C201) “host A” in the integrated path management table T200. Represents the state immediately after is reflected.

In FIG. 6, the S → P all-path failure flag (C215) in the row where the host name (C201) is “host A” is changed to “ON”. Further, this line (the line in which the S → P all-path failure flag (C215) is changed to “ON”) is a line in which information on the volume pair having the pair # (C202) of 1 is stored. In this case, the S → P all-path failure flag (C215) is also changed to “ON” for the other rows where the pair # (C202) is 1. In FIG. 6, the column “OFF → ON” indicates that the S → P all-path failure flag (C 215) is changed to “ON” by the event received from the host 1.

If the P → S all-path failure flag (C114) or S → P all-path failure flag (C115) information received from host A is “ON”, integration is performed as described above. The management program 301 changes the contents of the P → S all-path failure flag (C214) and the S → P all-path failure flag (C215) for lines other than the host name (C201) “host A”. However, if the information of the P → S all path failure flag (C114) or the S → P all path failure flag (C115) received from host A is “OFF”, the integrated management program 301 uses the host name ( C201) does not change any lines other than “host A”. When the contents of the all-path failure flag between P → S (C214) and the all-path failure flag between S → P (C215) are changed, the volume that uses the inter-storage path 5 is recovered after the failure of the inter-storage path 5 This is a case where the pair status of the pair returns to a normal state (for example, when the Suspend (or Duplex (S)) state transitions to the Duplex state). In this case, the storage device 2 notifies the management server 3 that the pair status has been changed, and the management server 3 accordingly responds to the P → S all-path failure flag (C214) and the S → P all-path failure flag ( The content of C215) is changed.

Subsequently, the integrated management program 301 reads the contents of the path management DB (integrated path management table T200) (step 5003). Here, the contents for all the hosts 1 are read.

In step 5004, the integrated management program 301 selects one managed volume pair. Information on all volume pairs under management is recorded in the integrated path management table T200. For example, in the example of FIG. 6, 1, 2, 3, or 4 is stored in the pair # (C202) of each row, so the volume pairs under management are pair # 1 to pair # 4.

After step 5004, the integrated management program 301 performs steps 5005 to 5024. After step 5024, if there is a volume pair that has not yet been processed in steps 5005 to 5024, the integrated management program 301 selects the volume pair (step 5004), and steps 5005 to 5024 for that volume pair. Process. When this process is performed for all volume pairs, the integrated management program 301 ends the process. Hereinafter, an example will be described in which the processing of Step 5005 to Step 5024 is executed for the pair #n (n is any value of 1 to 4).

In step 5005, the integrated management program 301 checks the all-path failure flag of the inter-storage path 5 of the pair #n. Specifically, the integrated management program 301 determines whether the P → S all-path failure flag (C214) and the S → P all-path failure flag (C215) in the row where the pair # (C202) is n is OFF or ON. Check. When a plurality of hosts 1 access the pair #n, there are a plurality of rows in which the pair # (C202) is n. In that case, the integrated management program 301 may refer to the P → S all-path failure flag (C214) and the S → P all-path failure flag (C215) in any row.

If both the P → S all-path failure flag (C214) and the S → P all-path failure flag (C215) are both ON (step 5006: Yes), the integrated management program 301 uses all pairs #n. For the host 1, the status of the path status (C208) of the logical device (P-VOL, S-VOL) belonging to the pair #n is confirmed, and the number of path statuses (C208) is totaled (step 5008). The total result is stored in the number of available paths (C209). In step 5008, the integrated management program 301 starts from all the hosts 1 that use the pair #n, and manages the logical devices (P-VOL, P-VOL,) managed by the alternate path management table T100 that the host 1 has. The path status (C108) of S-VOL) may be acquired, and the path status (C208) and the number of available paths (C209) may be updated using the information.

After step 5008 is performed for all the hosts 1 that use the pair #n, the integrated management program 301 executes a normal access path between the P-VOL of the pair #n and all the hosts 1 that use the pair #n. (Hereinafter referred to as “the number of normal paths between the host and P-VOL”) and the number of normal access paths between the S-VOL of pair #n and all hosts 1 using pair #n (hereinafter referred to as “host-P-VOL normal path number”). , Referred to as “the number of normal paths between the host and S-VOL”). When the number of normal paths between the host and P-VOL is smaller than the number of normal paths between the host and S-VOL (step 5010: Yes), the processing from step 5011 to step 5016 is performed. Otherwise (step 5010: No), the integrated management program 301 does not perform the processing of steps 5011 to 5016, but performs step 5017.

If the determination in step 5010 is Yes, the integrated management program 301 instructs the storage apparatus 2 via the management network 6 to reverse the P-VOL and S-VOL of the pair #n (step 5011). In this embodiment, “inversion” means a process of switching the roles of P-VOL and S-VOL. That is, when inversion is instructed, the logical device that was previously P-VOL is changed to S-VOL, and conversely, the logical device that was previously S-VOL is changed to P-VOL.

Upon receipt of this instruction, the storage apparatus 2 sets the logical device that has been P-VOL so far to S-VOL and the logical device that has been S-VOL to P-VOL for pair #n. Specifically, the control program 20 of the storage apparatus 2 reads the contents of PDKC # (C303) and P-VOL # (C304) and the contents of SDKC # (C305) and S-VOL # (C304) in the pair management table T300. Replace. At this time, the integrated management program 301 issues an instruction to both the storage device 2a and the storage device 2b, and the storage device 2a and the storage device 2b each update the contents of the pair management table T300 that they own.

In step 5012, the integrated management program 301 also updates the information recorded in the integrated path management table T200. In other words, the attribute (C207) of the inverted logical device is changed (if “P-VOL” has been stored so far, it is changed to “S-VOL” and “S-VOL” is stored). If it is, it is changed to “P-VOL”).

Subsequently, the integrated management program 301 notifies all the hosts 1 that use the pair #n that the P-VOL and S-VOL of the pair #n have been inverted (step 5014), and the path status (C208). ) Are recounted (step 5015). Step 5015 is the same processing as step 5008. The host 1 notified that the P-VOL and S-VOL of the pair #n have been inverted updates the contents of the alternate path management table T100 managed by the host 1 (the P-VOL and S-VOL of the pair #n) , The attribute (C107) is updated).

In step 5017, the integrated management program 301 instructs the storage apparatus 2 to put the S-VOL of the pair #n into a closed state, and simultaneously updates the contents of the integrated path management table T200. Here, the storage apparatus 2 instructed to close the S-VOL is the storage apparatus 2 having the S-VOL.

After step 5017, the integrated management program 301 notifies all the hosts 1 that use the pair #n that the S-VOL of the pair #n is blocked (step 5019). Receiving the notification that the S-VOL is blocked, the host 1 changes the status of all access path paths (C108) to the S-VOL belonging to the pair #n to “Failure”.

That is, when both the P → S all path failure flag (C214) and the S → P all path failure flag (C215) are both ON, the storage apparatus 2 cannot replicate data by the remote copy function. Therefore, the integrated management program 301 executes steps 5010 to 5017 to change one of the P-VOL and the S-VOL to the blocked state and leave only the other one accessible from the host 1. Specifically, a logical device having a large number of access paths available from the host 1 is maintained in a state accessible from the host 1, and the other logical device (a logical device having a small number of access paths available from the host 1 is maintained. Device) is blocked. The reason why a logical device having a large number of access paths available from the host 1 remains in an accessible state is that the larger the number of access paths, the higher the resistance to failure is expected (host 1). Even if an access path failure occurs between the storage device 2 and the storage device 2, the business can be continued).

Thereafter, the integrated management program 301 notifies all hosts 1 using the pair #n of information on the number of paths available between the P-VOL of the pair #n and each host 1 (step 5023). Exit. Details of step 5023 will be described later.

As a result of executing Step 5005, when only the P → S all-path failure flag (C214) is ON (Step 5006: No, and Step 5030: Yes), the integrated management program 301 first performs Step 5031 to Step 5036. Execute. The processing of step 5031 to step 5036 is the same as that of step 5011 to step 5016. That is, the integrated management program 301 reverses the P-VOL and S-VOL belonging to the pair #n (step 5031), reflects the result in the integrated path management table T200 (step 5032), and uses the pair #n. Each host 1 is notified (step 5034). The path status (C208) is also recounted (step 5035).

Thereafter, the integrated management program 301 instructs the storage apparatus 2 to change the pair status of the pair #n (step 5037). Here, an instruction to change the pair status to the Duplex (S) state is sent to the storage apparatus 2. Receiving this instruction, the storage apparatus 2 changes the pair status (Pair Status (C302) of the pair management table T300) of the pair #n to “2”. Then, the integrated management program 301 notifies all the hosts 1 using the pair #n that the pair status of the pair #n has been changed to the Duplex (S) state (step 5041). Upon receiving this notification, the host 1 changes the contents of the alternate path management table T100 (volume pair status (C112) of pair #n) to “Duplex (S)”. When the volume pair state (C112) of the pair #n is changed to the Duplex (S) state, the host 1 issues a write request only to the P-VOL when writing data to the pair #n. Finally, the integrated management program 301 notifies all the hosts 1 that use the pair #n of the number of paths available between the P-VOL of the pair #n and each host 1 (step 5023), and the processing ends. To do.

As a result of executing Step 5005, when only the S → P all-path failure flag (C215) is ON (Step 5006: No, Step 5030: No, and Step 5038: Yes), the integrated management program 301 Steps 5037 to 5043 and steps 5022 to 5024 are executed, and the process ends. That is, in this case, the process is almost the same as the process performed when only the P → S all path failure flag (C214) is ON. If only the S → P all-path failure flag (C215) is ON, Steps 5031 to 5036 are not performed, and only the P → S all-path failure flag (C214) is ON. The other processing is the same.

In other words, when only the P → S all-path failure flag (C214) is ON, or when only the S → P all-path failure flag (C215) is ON (in other words, of the

inter-storage paths

5a and 5b). The integrated management program 301 causes the storage apparatus 2 to change the pair #n to the Duplex (S) state when only one path (when only the inter-storage path 5a or the inter-storage path 5b) is blocked) In addition, P-VOL and S-VOL are inverted as necessary. Also, the integrated management program 301 prevents the host 1 from writing to the S-VOL. As a result, it is possible to maintain a state in which mirroring by the storage apparatus 2 is performed on the pair #n.

When the P → S all path failure flag (C214) and the S → P all path failure flag (C215) are both OFF (step 5006: No, step 5030: No, and step 5038: No), integration The management program 301 totals the number of available paths between the host 1 and the P-VOL and S-VOL belonging to the pair #n for all the hosts 1 that use the pair #n, and the number of available paths (C209) (Step 5052). Subsequently, the integrated management program 301 calculates the average number of available paths between each host 1 and the P-VOL belonging to the pair #n, and at the same time, compares each host 1 with the S-VOL belonging to the pair #n. The average number of available paths is calculated (step 5054).

The average number of available paths between each host 1 and the P-VOL belonging to pair #n is greater than the average number of available paths between each host 1 and S-VOL belonging to pair #n. If it is smaller (step 5055: Yes), the integrated management program 301 executes step 5056 to step 5061. Steps 5056 to 5061 are the same as steps 5031 to 5036. That is, inversion of the P-VOL and S-VOL belonging to the pair #n is performed. This is because fault tolerance is enhanced when a logical device having a large number of normal access paths is set to P-VOL. On the other hand, if the determination in step 5055 is negative, steps 5056 to 5061 are not performed. Through the processing from step 5054 to step 5061, the settings of the storage apparatus 2 and the host 1 are changed so that a logical device having a large average access path number from the host 1 becomes a P-VOL.

Thereafter, the integrated management program 301 notifies all the hosts 1 using the pair #n of the number of paths available between the P-VOL of the pair #n and each host 1 (steps 5022 to 5024). The process ends.

The specific contents of step 5023 and the operation of the host 1 when receiving the notification of step 5023 will be described. In step 5023, the integrated management program 301 refers to the integrated path management table T200 and calculates the total number of available paths (C209) for each logical device. An example will be described with reference to FIG.

Referring to FIG. 6, the logical device having the product number (C210) “0001” and the LDEV # (C206) “1” (hereinafter, this logical device is called a target LDEV) is host A, host B, and host. It can be seen that there is an access path with D. In addition, the number of available paths between host A and target LDEV (C209) is “1”, the number of available paths between host B and target LDEV (C209) is “1”, and between host D and target LDEV. The number of possible paths (C209) is “1”. Therefore, the total number of available paths between all the hosts 1 and the target LDEV is 3.

In this way, the integrated management program 301 calculates the total number of available paths (C209) for each logical device, and creates notification information T250 for notifying each host 1. An example of the format of the notification information T250 is shown in FIG. The total number of available paths of the logical device specified by the product number (C251) and LDEV # (C252) is stored in the number of paths (C253). The integrated management program 301 transmits this notification information T250 to each host 1.

On the other hand, each host 1 that has received the notification information T250 updates the number of paths available to other hosts (C109-2) using the notification information T250. An example of the update method will be described with reference to FIGS. Referring to the notification information T250, the number of paths (C253) of the logical device whose product number (C251) is “0001” and LDEV # (C252) is “1” is “3”. The alternate path program 103 sets the values of the number of available paths (C109) of records with the production number (C110) and the LDEV # (C106) of “0001” and “1”, respectively, among the records of the alternate path management table T100. Identify. In the example of FIG. 5, this value is “1”.

The alternate path program uses the number “3” of the number of paths (C253) obtained from the notification information and the number of available paths obtained from the alternate path management table T100 in the number of available host paths (C109-2) in this record. The difference (that is, “2”) of the value “1” of (C109) is stored. As a result, the total number of normal access paths between each host other than the own host and the logical device is stored in the alternate path management table T100.

The above is the description of the processing flow of the integrated management program 301. The integrated management program 301 changes the setting of the host 1 and the setting of the storage device 2 in accordance with the state of the inter-storage path 5 and the path between the host 1 and the storage device 2 (P-VOL or S-VOL). When all the inter-storage paths 5 cannot be used (step 5006: Yes), data cannot be replicated by the remote copy function. Therefore, in this case, the integrated management program 301 puts one of the P-VOL and S-VOL in a closed state (step 5017) and stops the remote copy function. Further, the number of normal access paths between the P-VOL and all the hosts 1 and the number of normal access paths between the S-VOL and all the hosts 1 are compared, and a logical device having a large number of normal access paths. Is changed to P-VOL, and S-VOL is blocked (step 5010 to step 5017). This is because the fault tolerance is improved when the setting of the computer system is changed so that the host 1 accesses a logical device having a large number of normal access paths.

On the other hand, when only the P → S path or only the S → P path cannot be used (step 5030: Yes or step 5038: Yes), one-way remote copy can be executed. Therefore, in this case, the integrated management program 301 changes the pair status of the volume pair to the Duplex (S) state so that each host 1 accesses the P-VOL (steps 5037 to 5043).

When the inter-storage path 5 is normal, the logical device having the larger number of normal access paths among the host 1 and P-VOL path and the host 1 and S-VOL path is changed to P-VOL ( Step 5051 to Step 5061). This is also because the fault tolerance is improved if the setting of the computer system is changed so that the host 1 preferentially accesses a logical device having a large number of normal access paths.

The above is the description of the computer system according to the embodiment of the present invention. In the computer system according to this embodiment, when an access failure occurs when a host accesses a volume pair, access paths are set not only for the host that issued an access request to the volume pair, but also for other hosts in the computer system. Make changes. As a result, even in an environment where there are a plurality of hosts that access the volume pair, the computer system can continue business appropriately.

As mentioned above, although the Example of this invention was described, this is an illustration for description of this invention, Comprising: It is not the meaning which limits the scope of the present invention only to these Examples. That is, the present invention can be implemented in various other forms.

For example, in the embodiment described above, the configuration in which the management server 3 is provided separately from the host 1 and the integrated management program 301 is executed by the management server 3 has been described. However, as another embodiment, instead of providing the management server 3, the integrated management program 301 may be executed on any one of the plurality of hosts 1. In that case, each host 1 notifies the host 1 on which the integrated management program 301 is executed, and the host 1 on which the integrated management program 301 is executed sets the access path of each host 1 based on the notified event. Or the setting of the storage device 2 is changed.

1: host, 2: storage device, 3: management server, 4: network, 5: path between storages, 6: management network

Claims

A storage system having a first storage device having a first volume and a second storage device having a second volume;
A plurality of hosts accessing the storage system;
A computer system comprising:
The storage system
Managing the first volume and the second volume as a volume pair;
The host is configured to store data requested to be written to the volume pair in the first volume and the second volume,
When the host detects an access failure when accessing the volume pair, the computer system
Identify the fault occurrence site in the computer system,
A computer system, wherein the setting of the access path to the volume pair of the plurality of hosts is changed or the setting of the volume pair is changed based on the specified information on the failure occurrence site.
The computer system further includes a management server,
Each of the plurality of hosts has at least one first path that is an access path to the first volume and one or more second paths that are access paths to the second volume;
The storage system includes a first transfer path for transferring data from the first storage device to the second storage device, and a second transfer for transferring data from the second storage device to the first storage device. Road and
The management server receives the status of the first path and the second path, the status of the first transfer path and the second transfer path from the host,
Based on the state of the first path, the second path, the first transfer path, and the second transfer path, the setting of the access path to the volume pair is changed for the plurality of hosts, or the Let the storage system change the volume pair settings.
The computer system according to claim 1, wherein:
When the management server receives information indicating that a failure has occurred only in the first transfer path of the first transfer path and the second transfer path, the plurality of hosts write data for the volume pair. And changing the setting of the plurality of hosts so as to issue a write request to the second volume,
The computer system according to claim 2, wherein:
When the first volume is set as a primary volume and the second volume is set as a secondary volume, the storage system stores data requested by the host to write to the volume pair in the first volume. And is configured to be stored in the second volume after
When the management server receives information indicating that a failure has occurred only in the first transfer path out of the first transfer path and the second transfer path, the second volume becomes the primary volume, and the first volume becomes the first volume. Change the setting of the storage system so that the volume becomes a secondary volume,
The computer system according to claim 3, wherein:
When the first volume is set as a primary volume and the second volume is set as a secondary volume, the storage system stores data requested by the host to write to the volume pair in the first volume. And is configured to be stored in the second volume after
When the management server receives information indicating that a failure has occurred in both the first transfer path and the second transfer path, the management server includes the total number of the first paths that can be used for access and the access. Count the total number of available second passes,
When the total number of the first paths that can be used for the access is less than the total number of the second paths that can be used for the access, the management server
Causing the storage system to block the first volume;
Changing the settings of the plurality of hosts such that the plurality of hosts access only the second volume of the volume pair;
The computer system according to claim 2, wherein:
If the host issues a write request to the volume pair and detects that the first volume or the second volume is blocked, the host sends the first transfer path and the management server to the management server. Notifying information that a failure has occurred in the second transfer path,
The computer system according to claim 5, wherein:
When the first volume is set as a primary volume and the second volume is set as a secondary volume, the storage system stores data requested by the host to write to the volume pair in the first volume. And is configured to be stored in the second volume after
The management server counts the number of the first path and the second path that can be used based on the state of the first path and the second path received from the host,
When the average number of the first paths that can be used per one host is less than the average number of the second paths that can be used per one host, the management server Changing the setting of the storage system so that the second volume becomes a primary volume and the first volume becomes a secondary volume;
The computer system according to claim 2, wherein:
When an access failure occurs when the host issues a write request for writing write data to the volume pair, and the access failure is determined to be a failure in the first transfer path and the second transfer path,
The host writes the write data to both the first volume and the second volume;
The computer system according to claim 2, wherein:
A storage system having a first storage device having a first volume and a second storage device having a second volume;
A plurality of hosts accessing the storage system;
A management server,
A computer system control method comprising:
Each of the plurality of hosts has at least one first path that is an access path to the first volume and one or more second paths that are access paths to the second volume;
The storage system includes a first transfer path for transferring data from the first storage device to the second storage device, and a second transfer for transferring data from the second storage device to the first storage device. Road and
The storage system
Managing the first volume and the second volume as a volume pair;
The host is configured to store data requested to be written to the volume pair in the first volume and the second volume,
The method
1) the host detecting an access failure when accessing the volume pair;
2) The management server receives the first path and the second from the host.
Receiving a state of a path and states of the first transfer path and the second transfer path;
3) The management server sets access paths to the volume pairs of the plurality of hosts based on the states of the first path, the second path, the first transfer path, and the second transfer path. Changing, or changing the setting of the volume pair;
A method for controlling a computer system, comprising:
In 3), when the management server receives information indicating that a failure has occurred only in the first transfer path among the first transfer path and the second transfer path, the plurality of hosts receive the volume pair. Changing the settings of the plurality of hosts so as to issue a write request to the second volume when writing data for
The computer system control method according to claim 9, wherein:
When the first volume is set as a primary volume and the second volume is set as a secondary volume, the storage system stores data requested by the host to write to the volume pair in the first volume. And is configured to be stored in the second volume after
In 3), when the management server receives information indicating that a failure has occurred only in the first transfer path among the first transfer path and the second transfer path, the second volume becomes a primary volume. Changing the setting of the storage system so that the first volume becomes a secondary volume,
The computer system control method according to claim 10, wherein:
When the first volume is set as a primary volume and the second volume is set as a secondary volume, the storage system stores data requested by the host to write to the volume pair in the first volume. And is configured to be stored in the second volume after
In 2), when the management server receives information indicating that a failure has occurred in both the first transfer path and the second transfer path,
In 3), the management server a) counts the total number of the first paths that can be used for access and the total number of the second paths that can be used for access;
b) When the total number of the first paths that can be used for the access is less than the total number of the second paths that can be used for the access, the first volume is put into a blocked state in the storage system. And changing the settings of the plurality of hosts so that the plurality of hosts access only the second volume of the volume pair.
The computer system control method according to claim 9, wherein:
In 1), when it is detected that the first volume or the second volume is blocked, the host has failed in the management server, and the first transfer path and the second transfer path have failed. To inform you that
The computer system control method according to claim 12, wherein:
When the first volume is set as a primary volume and the second volume is set as a secondary volume, the storage system stores data requested by the host to write to the volume pair in the first volume. And is configured to be stored in the second volume after
In the step 3), the management server counts the number of the first and second paths that can be used based on the state of the first path and the second path received from the host. If the average number of the first paths available per host is less than the average number of the second paths available per host, the second volume becomes a primary volume, Changing the setting of the storage system so that the first volume becomes a secondary volume;
The computer system control method according to claim 9, wherein:
In 1), when the host issues a write request for writing write data to the volume pair, and detects a failure in the first transfer path and the second transfer path,
The host writes the write data to both the first volume and the second volume;
The computer system control method according to claim 9, wherein: