WO2014068764A1 - システム冗長化確認方法及び計算機システム - Google Patents
システム冗長化確認方法及び計算機システム Download PDFInfo
- Publication number
- WO2014068764A1 WO2014068764A1 PCT/JP2012/078453 JP2012078453W WO2014068764A1 WO 2014068764 A1 WO2014068764 A1 WO 2014068764A1 JP 2012078453 W JP2012078453 W JP 2012078453W WO 2014068764 A1 WO2014068764 A1 WO 2014068764A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- computers
- information
- server
- computer
- storage area
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/008—Reliability or availability analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/004—Error avoidance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/20—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
- G06F11/202—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/20—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
- G06F11/202—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
- G06F11/2023—Failover techniques
Definitions
- the present invention relates to a technique for determining whether or not a failover function operates in a computer system in which a cluster is configured without affecting an operating computer.
- a conventional cluster system having a failover function is composed of an active server on which an application for executing business runs, and a standby server on which an application runs instead of the active server when a failure occurs in the active server .
- As a method for confirming the operation of the failover function in the conventional cluster system it is necessary to confirm the following two points.
- the standby server can access the LU (Logical Unit) accessed by the active server.
- the standby server is physically connected (connected) to the LU accessed by the active server, but also that the logical connection be confirmed.
- the logical connection represents a setting of a switch that connects between the LU accessed by the active server and the standby server, and various settings such as ports and paths in the storage apparatus.
- a typical example of the invention disclosed in the present application is as follows. That is, a system redundancy check method in a computer system, wherein the computer system includes at least one or more first computers, at least one or more second computers, a storage system, and the at least one or more first computers. And a management computer that manages the at least one second computer.
- the at least one or more first computers have a first processor, a first memory connected to the first processor, and a first I / O interface connected to the first processor.
- the at least one second computer has a second processor, a second memory connected to the second processor, and a second I / O interface connected to the second processor.
- the storage system includes a disk controller including one or more controllers having one or more ports, and a plurality of storage media, and the management computer is connected to a third processor and the third processor.
- the at least one or more first computers execute business, and the at least one or more second computers take over the business when a failure occurs in the at least one or more first computers.
- the storage system provides a storage area for storing data necessary for execution of the business to the at least one or more first computers.
- the management computer has first hardware information related to a hardware configuration of the at least one or more first computers, and second hardware related to a hardware configuration of the at least one or more second computers.
- the second computer receives the acquisition instruction, the second computer acquires the second storage area information from the storage system, and the management calculation
- a fourth step of transmitting the acquired second storage area information to the management computer wherein the management computer acquires the acquired first hardware information, the acquired first storage area information, and the acquisition Comparing the obtained second hardware information and the obtained second storage area information, and based on the comparison result, the at least one first computer and the at least one second computer And a fifth step of determining whether or not failover is possible with the computer.
- the task executed by the first computer since it is determined whether or not failover is possible by comparing information acquired from the first computer and the second computer, the task executed by the first computer may be affected. Absent.
- FIG. 1 is a block diagram illustrating an example of a computer system according to a first embodiment of this invention.
- FIG. It is a block diagram explaining the hardware constitutions and software constitutions of the management server of Example 1 of this invention. It is a block diagram explaining the hardware constitutions and software constitutions of the server of Example 1 of this invention. It is a block diagram explaining the disk controller with which the storage apparatus of Example 1 of this invention is provided.
- FIG. 1 is a block diagram illustrating a configuration example of a computer system according to the first embodiment of this invention.
- the computer system in this embodiment includes a management server 100, a client terminal 190, an NW-SW 110, a plurality of servers 120, an FC (Fiber Channel) -SW 140, and one or more storage devices 130.
- the management server 100 is connected to the management interface 141 of the FC-SW 140, the management interface 121 of the plurality of servers 120, and the management interface 131 of the storage apparatus 130 via the NW-SW 100.
- the management interfaces 121, 131, and 141 are I / O interfaces for transmitting information on each IT device (hereinafter simply referred to as “device”) in response to an inquiry from the management server 100.
- An interface can be used.
- the management server 100 is connected to a client terminal 190 from which an administrator issues an input / output instruction to the management server 100.
- the present invention is not limited to the connection method between the management server 100 and the client terminal 190, and may be connected via, for example, a network or a physical cable. Note that the administrator may directly operate the management server 100 without using the client terminal 190.
- the NW-SW 110 is a switch that constitutes a management network for the management server 100 to manage the plurality of servers 120, the FC-SW 140, and one or more storage apparatuses 130.
- the management network is configured by one NW-SW 110, but the management network may be configured by a plurality of switches and routers.
- the NW-SW 110 has a network management unit 111.
- the network management unit 111 acquires information on the connection path between the server 120 and the LU 135. In the first embodiment, the network management unit 111 is not used. In the seventh embodiment, specific processing of the network management unit 111 will be described with reference to FIG.
- the management server 100 performs various management of the entire computer system. For example, the management server 100 acquires various types of information from each server 120, operates the power source of each server 120, and manages the cluster system.
- the cluster system is a system composed of one or more clusters, and the cluster is composed of one or more servers 120.
- the management server 100 includes a control unit 101 and a management table group 102.
- the control unit 101 executes various processes for realizing the management function in the management server 100.
- the management table group 102 stores information necessary for the control unit 101 to execute processing.
- the hardware configuration and software configuration of the management server 100 will be described later with reference to FIG.
- a cluster is composed of a plurality of servers 120.
- the servers 120 constituting the cluster include an active server 120 on which the application 311 (see FIG. 3) operates, and a standby server 120 that takes over work when the active server 120 fails. Note that the standby server 120 is normally in a power-off state.
- Each server 120 is connected to the business network 105.
- the business network 105 is a network used by an application 311 (see FIG. 3) running on each server 120.
- the business network 105 is composed of switches and routers.
- a server 120 on which an application 311 (see FIG. 3) operates is connected to a WAN or the like via the business network 105 and communicates with an external client computer.
- Each server 120 is connected to an FC-SW 140 that constitutes a SAN (Storage Area Network) via an adapter 122.
- the server 120 and the storage apparatus 130 are connected via a SAN, but the present invention is not limited to this, and may be an IP (Internet Protocol) network.
- the adapter 122 includes I / O adapters or I / O devices such as NIC (Network Interface Card), HBA (Host Bus Adapter), and CNA (Converged Network Adapter) corresponding to the type of FC-SW 140. .
- I / O adapters or I / O devices such as NIC (Network Interface Card), HBA (Host Bus Adapter), and CNA (Converged Network Adapter) corresponding to the type of FC-SW 140. .
- configuration information described above may include hardware information and software information of the server 120.
- the above-described configuration information is also referred to as confirmation configuration information.
- the FC-SW 140 constitutes a SAN that connects each server 120 and the storage apparatus 130.
- one FC-SW 140 constitutes a SAN, but a plurality of FC-SWs 140 may constitute a SAN.
- the storage device 130 provides a storage area used by the server 120 on which the application 311 (see FIG. 3) operates.
- the storage apparatus 130 includes a disk controller 132 and a plurality of storage devices (not shown).
- the types of the management server 100 and the server 120 may be any of a physical server, a blade server, a virtualization server, a logical physical partition or a physically partitioned server, and the like.
- the present invention is not limited to the types of the management server 100 and the server 120, and the effects of the present invention can be obtained.
- FIG. 2 is a block diagram illustrating a hardware configuration and a software configuration of the management server 100 according to the first embodiment of this invention.
- the CPU 201 includes one or more arithmetic devices and executes a program stored in the memory 202.
- the functions provided in the management server 100 can be realized by the CPU 201 executing the program.
- the description will be made mainly with a program, it indicates that the CPU 201 is executing the program.
- the input / output device 206 includes an input device such as a keyboard and a mouse, and a display device such as a display.
- the management server 100 may connect an external storage medium such as a USB memory via the input / output device 206.
- one disk interface 204 and one network interface 205 are shown as representatives, but the management server 100 may include a plurality of disk interfaces 204 and network interfaces 205.
- the memory 202 stores a program for realizing the control unit 101 and a management table group 102. Note that the memory 202 may store a program and information (not shown).
- the control unit 101 includes a plurality of program modules, and executes processing for confirming the operation of the failover function.
- processing for confirming the operation of the failover function is also referred to as cluster confirmation processing.
- control unit 101 includes an opportunity reception unit 215, an information acquisition unit 212, a check list generation unit 211, an information comparison unit 213, an information reception unit 214, a storage operation unit 216, a priority setting unit 217, and a process notification unit. 218, a network management unit 111, and a standby system generation unit 219.
- the CPU 201 operates as a functional unit that realizes a predetermined function by operating according to a program that implements each functional unit.
- the CPU 201 functions as the information acquisition unit 212 by operating according to the information acquisition program. The same applies to other programs. Further, the CPU 201 also operates as a functional unit that realizes each of a plurality of processes executed by each program.
- a computer and a computer system are an apparatus and a system including these functional units.
- the trigger reception unit 215 detects a trigger for starting the cluster confirmation process.
- the processing executed by the trigger reception unit 215 will be described later with reference to FIG.
- the information acquisition unit 212 acquires information from the processing target server 120 in the cluster confirmation process.
- the processing executed by the information acquisition unit 212 will be described later with reference to FIG.
- the check list generation unit 211 generates a cluster check table 251 used when executing the cluster confirmation process.
- the processing executed by the check list generation unit 211 will be described later with reference to FIG.
- the information comparison unit 213 determines whether or not the failover function operates normally. The processing executed by the information comparison unit 213 will be described later with reference to FIG.
- the information receiving unit 214 acquires various information necessary for the cluster confirmation process. The processing executed by the information receiving unit 214 will be described later with reference to FIGS. 13A, 13B, and 13C.
- the storage operation unit 216 performs settings for the storage apparatus 130 when the cluster confirmation process is executed. The processing executed by the storage operation unit 216 will be described later with reference to FIG.
- the standby system generation unit 219 instructs generation of a virtual computer when a virtual computer is used as the standby server 120. In the first embodiment, the standby system generation unit 219 is not used. In the ninth embodiment, a specific process of the standby system generation unit 219 will be described with reference to FIG.
- the management table group 102 includes a management target table 250, a cluster check table 251, and a port performance table 252.
- the management target table 250 is generated and updated by the information acquisition unit 212, and stores various pieces of information of devices such as the server 120 in the computer system managed by the management server 100. Details of the management target table 250 will be described later with reference to FIGS. 5A and 5B.
- the port performance table 252 stores information used when setting priorities. In the first embodiment, the process using the port performance table 252 is not executed. Details of the port performance table 252 will be described in the sixth embodiment with reference to FIG.
- the information of each table included in the management table group 102 may be automatically generated by another functional unit in the management server 100, or may be input manually or using the client terminal 190 by the administrator.
- the program that implements the control unit 101 and the tables included in the management table group 102 can be read by a storage device 130, a nonvolatile semiconductor memory, a storage device such as an HDD or SSD, or a computer such as an IC card, SD card, or DVD It can be stored in a non-temporary storage medium.
- the CPU 201 reads the program and table from the storage device 130, the storage device, or the computer-readable non-transitory storage medium, and loads the program and table onto the memory 202.
- FIG. 3 is a block diagram illustrating a hardware configuration and a software configuration of the server 120 according to the first embodiment of this invention.
- the server 120 includes a CPU 301, a memory 302, a BMC 303, a disk interface 304, and a network interface 305.
- the CPU 301 includes one or more arithmetic devices and executes a program stored in the memory 302.
- the functions of the server 120 can be realized by the CPU 301 executing the program.
- the program when the program is mainly described, it indicates that the CPU 301 is executing the program.
- the memory 302 stores a program executed by the CPU 301 and information necessary for executing the program.
- the program and information stored in the memory 302 will be described later.
- the BMC 303 controls the power supply and each interface.
- the disk interface 304 is an interface for accessing the storage apparatus 130.
- the network interface 305 is an interface for communicating with other devices via the IP network.
- the server 120 may include an input device such as a keyboard and a mouse, and a display device such as a display.
- the server 120 only needs to include one I / O interface that can be connected to an external device such as the management server 100 and the storage device 130 instead of the disk interface 304 and the network interface 305.
- the memory 302 of the active server 120 stores a program for realizing the application 311 and the OS 310.
- the OS 310 manages the entire server 120.
- the application 311 executes various tasks. In this embodiment, it is assumed that a program for realizing the OS 310 and the application 311 is stored in the LU 135.
- the standby server 120 since the memory 302 of the standby server 120 is normally in a power-off state, no program is stored. However, in the present embodiment, the standby server 120 loads a program for realizing the information acquisition application 123 into the memory 302 when receiving an activation instruction from the management server 100.
- the information acquisition application 123 acquires configuration information necessary for cluster confirmation processing, that is, configuration information for confirmation. Details of processing executed by the information acquisition application 123 will be described later with reference to FIG.
- the server 120 it is necessary to set the server 120 so that the information acquisition application 123 is activated.
- BIOS Basic Input / Output System
- UEFI Unified Extensible Firmware Interface
- the server 120 is turned off again after the processing of the information acquisition application 123 is completed.
- the standby server 120 may activate the OS 310 or the like after the processing of the information acquisition application 123 is completed.
- one disk interface 304 and one network interface 305 are shown as representatives, but the server 120 may include a plurality of disk interfaces 304 and network interfaces 305.
- the server 120 includes a network interface 305 for connecting to each of the management network and the business network 105.
- a network interface 305 connected to the management network corresponds to the management interface 121.
- FIG. 4 is a block diagram illustrating the disk controller 132 included in the storage apparatus 130 according to the first embodiment of this invention.
- the CPU 401 includes one or more arithmetic devices and executes a program stored in the memory 402.
- the functions provided in the storage apparatus 130 can be realized by the CPU 401 executing the program.
- the program when the program is mainly described, it indicates that the CPU 401 is executing the program.
- the memory 402 stores a program executed by the CPU 401 and information necessary for executing the program.
- the program and information stored in the memory 402 will be described later.
- the disk interface 403 is an interface for accessing the LU 135 or storage device in the storage apparatus 130.
- the management interface 131 is an interface for connecting to the management server 100 via the management network.
- the controller 133 manages input / output processing with a device such as the server 120 that accesses the storage device 130.
- the controller 133 has a port 134 for connecting a device.
- one disk interface 403 and one controller 133 are shown as representatives, but the storage apparatus 130 may include a plurality of disk interfaces 403 and controllers 133.
- one port 134 is shown as a representative, but the controller 133 may include a plurality of ports 134.
- the host group control unit 411 manages host group settings.
- the host group is a group used for protecting the mapping between the server 120 and the LU 135 and the security of the data stored in the LU 135.
- the LUs 135 that can be referred to or updated by the server 120 can be restricted.
- the CPU 401 functions as the host group control unit 411 by loading a program for realizing the host group control unit 411 into the memory 402 and operating according to the program.
- 5A and 5B are explanatory diagrams illustrating an example of the management target table 250 according to the first embodiment of this invention.
- the management target table 250 stores configuration information of devices to be managed, cluster identification information, and the like.
- the server 120 is a device to be managed.
- the management target table 250 includes a server ID 501, a management IP address 502, a model 503, configuration information 504, WWN 505, LU information 506, operation information 507, a cluster ID 508, and a type 509.
- the server ID 501 stores an identifier for uniquely identifying the server 120 in the computer system managed by the management server 100. Note that the data stored in the server 501 can be omitted by specifying one of the columns used in this table or a combination of a plurality of columns. Further, the management server 100 may automatically assign an identification number in ascending order to each server 120 as an identifier.
- Management IP address 502 stores a management IP address assigned to the server 120.
- the management server 100 connects to the server 120 that is a device to be managed based on the management IP address.
- the model 503 stores information related to the model of the server 120.
- the information stored in the model 503 is information related to the infrastructure, and based on this information, the manufacturer, performance, configurable system limit, and the like of the device to be managed can be grasped. In this embodiment, as described later, it is determined based on the model 503 whether or not the configuration of the management target server 120 is the same.
- the WWN 505 stores the WWN used by the server 120 to perform fiber channel communication with the LU 135.
- the WWN is a unique device identifier.
- an identifier equivalent to WWN such as iSCSI Qualified Name is stored.
- the LU information 506 stores information for specifying the LU 135 that accesses the server 120.
- the LU information 506 stores Inquiry information.
- the operation information 507 stores information indicating the operation state of the server 120.
- the information indicating the operating state is information indicating whether the server 120 is powered on / off and whether the OS or the business system (application 311) is operating normally.
- the operation information 507 may store information indicating that communication between the management server 100 and the management target server 120 is disabled.
- the cluster ID 508 stores an identifier for uniquely identifying the cluster to which the server 120 belongs.
- the data stored in the cluster ID 508 can be omitted by specifying any of the columns used in this table or a combination of a plurality of columns.
- the management server 100 may automatically assign an identification number in ascending order to each cluster as an identifier.
- the cluster ID 508 is blank or stores information indicating that it does not belong to the cluster.
- the type 509 stores information indicating whether the server 120 is the active server 120 or the standby server 120 in the cluster to which the server 120 belongs.
- the type 509 is blank or stores information indicating that it does not belong to a cluster.
- 6A and 6B are explanatory diagrams illustrating an example of the cluster check table 251 according to the first embodiment of this invention.
- the cluster check table 251 stores the result of determining whether failover is possible for each check pair as a result of the cluster confirmation process.
- the check pair represents a combination of the active server 120 and the standby server 120 belonging to the same cluster.
- the determination of whether or not failover is possible represents processing for determining whether or not failover is possible between the active server 120 and the standby server 120.
- the cluster check table 251 includes a pair ID 601, cluster ID 602, active server ID 603, standby server ID 604, check flag 605, LU flag 606, confirmation result 607, reason 608, acquisition information 609, acquisition time 610, And priority 611.
- the cluster ID 602 stores an identifier for uniquely identifying a cluster.
- the cluster ID 602 is the same as the cluster 508 in the management target table 250.
- the active server ID 603 stores an identifier for uniquely identifying the active server 120.
- the active server ID 603 stores the server ID of the server 120 registered as the active server 120 among the servers 120 belonging to the cluster corresponding to the cluster ID 602.
- the active server ID 603 stores the same identifier as that stored in the server ID 501.
- the standby server ID 604 stores an identifier for uniquely identifying the standby server 120.
- the spare server ID 604 stores the server ID of the server 120 registered as the spare server 120 among the servers 120 belonging to the cluster corresponding to the cluster ID 602.
- the standby server ID 604 stores the same identifier as that stored in the server ID 501.
- the check flag 605 stores a flag indicating whether or not a failover permission determination is performed for the check pair. In the present embodiment, when the determination as to whether or not failover is possible has been executed, “done” is stored in the check flag 605, and when the determination as to whether or not failover is possible has not been executed, the check flag 605 remains blank.
- the LU flag 606 stores information indicating whether or not the confirmation configuration information has been acquired by the information acquisition application 123 running on the standby server 120 corresponding to the standby server ID 604.
- the reason 608 stores the reason why failover is impossible. For example, when the LU 135 assigned to the active server 120 cannot be accessed from the standby server 120, “LU access not possible” or the like is stored. Based on the information indicated in the reason 608, the administrator can review the cluster settings and review the failover settings.
- the acquisition information 609 stores the confirmation configuration information acquired by the information acquisition application 123.
- the acquisition information 609 stores inquiry information for uniquely specifying the LU 135.
- the configuration information for confirmation stored in the acquisition information 609 is not limited to inquiry information, and can be any information that can determine whether the standby server 120 can access the LU 135 assigned to the active server 120. Good.
- the configuration information for confirmation may include server type, I / O information, and other information.
- the acquisition time 610 stores the time taken for the information acquisition application 123 to acquire the confirmation configuration information. For example, the time from when a command for obtaining predetermined information is issued until a response is obtained is stored.
- FIG. 7 is an explanatory diagram illustrating an example of the host group table 412 according to the first embodiment of this invention.
- the host group table 412 stores information on the controller 133 and the port 134 through which the server 120 accesses the LU 135 in the storage apparatus 130, and information on the accessible LU 135. Specifically, the host group table 412 includes a host group ID 701, WWN 702, controller ID 703, port ID 704, LU ID 705, and authority 706.
- the host group ID 701 stores an identifier for uniquely identifying the host group.
- the host group is a group of WWNs of the servers 120 that are permitted to refer to and / or update the assigned LU 135.
- the data stored in the host group ID 701 can be omitted by specifying any of the columns used in this table or a combination of multiple columns.
- the management server 100 may automatically assign an identification number in ascending order to each host group as an identifier.
- the WWN 702 stores the WWN of the server 120 that accesses the storage apparatus 130.
- the server 120 having the adapter 122 corresponding to the WWN stored in the WWN 702 can access the LU 135 in the host group.
- the controller ID 703 stores an identifier of the controller 133 that is used when the server 120 accesses the storage apparatus 130.
- the port ID 704 stores the identifier of the port 134 that is used when the server 120 accesses the storage device 130.
- LU ID 705 stores the identifier of LU 135 registered in the host group.
- the authority 706 stores information on the authority that the server 120 having the adapter 122 corresponding to the WWN 702 is permitted to the LU 135 corresponding to the LU ID 705.
- the management server 100 acquires configuration information of the server 120 to be managed from the computer system (step S802). Specifically, the information acquisition unit 212 acquires configuration information of the server 120 from a predetermined server 120.
- the management server 100 generates and / or updates the cluster check table 251 (step S803). Specifically, the check list generation unit 211 generates a check pair by combining the active server 120 and the standby server 120, and adds an entry corresponding to the generated check pair to the cluster check table 251.
- the management server 100 instructs the storage apparatus 130 to change the host group setting (step S804). Specifically, the storage operation unit 216 transmits a host group setting change instruction to the host group control unit 411. At this time, the storage operation unit 216 gives a reference authority to the host group to which the active server 120 belongs, and instructs to add the WWN of the standby server 120.
- the storage apparatus 130 When the storage apparatus 130 receives the instruction from the management server 100, the storage apparatus 130 changes the setting of the host group based on the instruction (step S805). Specifically, the host group control unit 411 adds a reference authority to the host group to which the active server 120 belongs and adds the WWN of the standby server 120.
- the storage apparatus 130 notifies the management server 100 that the setting of the host group has been changed (step S806). Specifically, the host group control unit 411 notifies the storage operation unit 216 that the host group setting has been changed.
- the management server 100 transmits an activation instruction for the information acquisition application 123 to the standby server 120 (step S807). Specifically, the information receiving unit 214 transmits an activation instruction for the information acquisition application 123 to the standby server 120.
- the standby server 120 When the standby server 120 receives the activation instruction of the information acquisition application 123, the standby server 120 activates the information acquisition application 123 and acquires the configuration information for confirmation (step S808).
- the standby server 120 transmits the confirmation configuration information acquired by the execution of the information acquisition application 123 to the management server 100 (step S809). Specifically, the information acquisition application 123 transmits the confirmation configuration information to the information reception unit 214.
- the configuration information for confirmation includes at least information about the LU 135 and information acquisition time.
- the storage apparatus 130 When the storage apparatus 130 receives the host group setting change instruction, the storage apparatus 130 changes the host group setting (step S811). Specifically, the host group control unit 411 deletes the WWN of the standby server 120 from the host group to which the active server 120 belongs.
- the storage apparatus 130 notifies the management server 100 that the setting of the host group has been changed (step S812). Specifically, the host group control unit 411 notifies the information reception unit 214 that the host group setting change has been completed.
- the management server 100 determines whether or not failover is possible for each check pair (step S813). Specifically, the information comparison unit 213 compares the LU 135 information of the active server 120 with the LU 135 information acquired from the standby server 120 to determine whether failover is possible.
- the management server 100 sets a priority that is the order of use of the standby server 120 in the failover process (step S814). Specifically, the priority setting unit 217 executes the priority setting process based on the cluster check table 251 and the result of the determination process.
- the management server 100 notifies the administrator of the processing result of the cluster confirmation processing (step S815), and ends the processing.
- the process notification unit 218 generates information for notifying the process result, and presents the generated information to the administrator.
- step S813 and step S814 may be executed at any timing from step S809 to step S812 as long as the processing consistency can be maintained.
- the management server 100 may determine whether or not failover is possible after acquiring the confirmation configuration information.
- step S804 If the confirmation configuration information has been acquired, the processing from step S804 to step S812 can be omitted.
- FIG. 9 is a flowchart illustrating an example of processing executed by the trigger reception unit 215 of the management server 100 according to the first embodiment of this invention. It is a flowchart which shows an example of the process performed by.
- the opportunity receiving unit 215 detects the start instruction of the cluster confirmation process input from the administrator (step S901).
- the trigger for starting the cluster confirmation process is not limited to when the start instruction input by the administrator is detected.
- the processing may be started after a predetermined time elapses based on the schedule function of the management server 100, the processing may be started periodically, or a configuration change in the server 120 belonging to the cluster is detected. The processing may be started when it is done.
- the cluster confirmation processing start instruction includes information for specifying the target of the cluster confirmation processing designated by the administrator.
- the information for specifying the target of the cluster confirmation processing may be information that can specify the processing target such as the server 120 identifier or the cluster identifier.
- the target of cluster confirmation processing is also described as processing target.
- Information for specifying the processing target is also referred to as identification information of the processing target.
- the opportunity reception unit 215 transmits a process start instruction to the information acquisition unit 212 (step S902), and ends the process.
- the processing start instruction includes identification information of the processing target.
- the process is executed for each cluster. That is, when the identifier of the server 120 is included as identification information to be processed, the cluster to which the server 120 belongs is a processing unit. Further, when a cluster identifier is included as identification information to be processed, the cluster is a processing unit.
- cluster confirmation processing may be executed for a plurality of clusters.
- FIG. 10 is a flowchart illustrating an example of processing executed by the information acquisition unit 212 of the management server 100 according to the first embodiment of this invention.
- the information acquisition unit 212 refers to the management target table 250 based on the processing target identification information notified from the trigger reception unit 215, and identifies the server 120 that is the target of the failover determination process (step S1001).
- the server 120 that is the target of the failover availability determination process is also referred to as a determination target server 120.
- step S1001 the following processing is executed.
- the information acquisition unit 212 refers to the management target table 250, and the active server 120 corresponding to the identifier and the active server 120 belong to it.
- the standby server 120 included in the cluster is searched. In this case, the searched active server 120 and standby server 120 become the determination target server 120. The same applies when the identifier of the standby server 120 is notified.
- the information acquisition unit 212 refers to the management target table 250 and searches all the servers 120 belonging to the cluster corresponding to the identifier. In this case, all the searched servers 120 are the determination target servers 120.
- a cluster that is a processing unit can be specified by the processing in step S1001.
- the information acquisition unit 212 may temporarily hold the identifier of the cluster that is a processing unit in the work area of the memory 202.
- the information acquisition unit 212 acquires configuration information of the server 120 from the identified determination target server 120 (step S1002).
- the information acquisition unit 212 executes configuration information acquisition processing for the identified determination target server 120. Since the configuration information acquisition process may use a known technique, a detailed description thereof will be omitted.
- the configuration information of the server 120 such as the inquiry information for specifying the type of the server 120 to be processed and the LU 135 assigned to the server 120 is acquired.
- the determination target server 120 is the standby server 120
- the standby server 120 cannot access the LU 135, and therefore inquiry information is not acquired.
- the information acquisition unit 212 adds a new entry to the management target table 250 and registers the configuration information of the server 120 acquired in the entry.
- the information acquisition unit 212 overwrites the configuration information of the server 120 acquired in the entry.
- the information acquisition unit 212 transmits a process completion notification to the check list generation unit 211 (step S1004), and ends the process.
- FIG. 11 is a flowchart illustrating an example of processing executed by the check list generation unit 211 of the management server 100 according to the first embodiment of this invention.
- the check list generation unit 211 starts processing upon receiving a processing completion notification from the information acquisition unit 212.
- the check list generation unit 211 refers to the management target table 250 (step S1101) and generates a check pair (step S1102).
- the process branches as follows depending on the identification information to be processed.
- the checklist generation unit 211 When the identifier of the active server 120 is received as the processing target identification information, the checklist generation unit 211 generates a combination with the standby server 120 in the cluster to which the active server 120 belongs. Here, one combination corresponds to one check pair.
- the checklist generation unit 211 identifies “server 4” and “server 5” as the standby server 120 belonging to the cluster having the cluster ID 220 of “cluster 1”. Further, the check list generation unit 211 generates a check pair of “server 1” and “server 4” and a check pair of “server 1” and “server 5”.
- the check list generation unit 211 When the identifier of the standby server 120 is received as the processing target identification information, the same processing as described above is executed. That is, the check list generation unit 211 generates a combination with the active server 120 in the cluster to which the standby server 120 belongs.
- the checklist generating unit 211 When the cluster identifier is received as the processing target identification information, the checklist generating unit 211 generates a predetermined number of combinations of the active server 120 and the standby server 120 belonging to the designated cluster.
- the number of generated check pairs and the conditions of the generated check pairs can be arbitrarily set.
- all combinations of the active server 120 and the standby server 120 may be generated as check pairs.
- the number of check pairs to be generated is a number obtained by multiplying the number of active servers and the number of standby servers.
- a combination with one standby server 120 may be generated as a check pair for each active server 120.
- the number of check pairs generated is the number of active servers 120.
- the check list generation unit 211 registers the generated check pair information in the cluster check table 251 (step S1103). Specifically, the following processing is executed.
- the check list generation unit 211 generates a new entry for each check pair in the cluster check table 251.
- the check list generation unit 211 stores a predetermined identifier in the pair ID 601 of the generated entry. Here, it is assumed that identifiers in ascending order are stored.
- the check list generation unit 211 does not have to register the entry in the cluster check table 251.
- the check list generation unit 211 sets “not yet” to the check flag 605 and the LU flag 606 of the newly added entry (step S1104).
- the check list generation unit 211 transmits a processing completion notification to the information comparison unit 213 (step S1105), and ends the processing.
- the check list generation unit 211 registers all combinations of the active server 120 and the standby server 120 belonging to the cluster as check pairs, and information as necessary. May be updated.
- the management server 100 checks whether or not a check pair exists in the cluster check table 251 when executing a failover determination process described later, and if the check pair does not exist, the check list generation unit 211 newly A check pair may be added.
- FIG. 12 is a flowchart illustrating an example of processing executed by the information comparison unit 213 of the management server 100 according to the first embodiment of this invention.
- the information comparison unit 213 refers to the cluster check table 251 (step S1201), and selects one entry whose check flag 605 is “not yet” (step S1202).
- the processing from step S1202 to step S1212 is a loop processing of the check flag 605, and is repeatedly executed until “done” is stored in the check flag 605 of all entries.
- the information comparison unit 213 determines whether or not the LU flag 606 of the selected entry is “completed” (step S1203). That is, it is determined whether the confirmation configuration information has been acquired. When the LU flag 606 is “completed”, it indicates that the confirmation configuration information has been acquired.
- step S1206 When it is determined that the LU flag 606 of the selected entry is “completed”, the information comparison unit 213 proceeds to step S1206.
- the information comparison unit 213 transmits a process start instruction including the cluster ID 702 of the selected entry to the information reception unit 214 (step S1204). Thereafter, the information comparison unit 213 waits until a processing completion notification is received from the information reception unit 214.
- the information comparison unit 213 receives the processing completion notification from the information reception unit 214 (step S1205), and then proceeds to step 1206.
- the information comparison unit 213 refers to the cluster check table 251 and the management target table 250 (step S1206), and determines whether failover is possible (step S1207). That is, a failover determination process is performed. Specifically, the following processing is executed.
- the information comparison unit 213 refers to the cluster check table 251 and acquires the active server ID 603 and the standby server ID 604 of the selected entry.
- the information comparison unit 213 searches the management target table 250 for an entry that matches the active server ID 603 from which the server ID 501 is acquired, and an entry that matches the standby server ID 604 from which the server ID 501 is acquired.
- failover decision processing including the following two comparison processes is executed.
- the information comparison unit 213 compares the entry model 503 and configuration information 504 of the active server 120 in the management target table 250 with the entry model 503 and configuration information 504 of the standby server 120.
- the information comparison unit 213 compares the LU information 506 of the active server entry in the management target table 250 with the acquisition information 609 of the selected entry in the cluster check table 251.
- Comparison 1 it is determined whether or not the types of the servers 120 in the active server 120 and the standby server 120 match.
- Comparison process it is determined whether or not the LU 135 accessed by the active server 120 and the LU 135 accessible by the standby server 120 match.
- the comparison process of (Comparison 2) is a process for determining whether or not the standby server 120 can take over the job being executed by the active server 120.
- the standby server 120 is completely accessible to the LU 135 used by the active server 120, it is possible to take over the business even after failover processing.
- the standby server 120 cannot access the LU 135 used by the active server 120, or cannot access some of the LUs 135, there is a possibility of affecting the business after failover processing.
- the present invention is not limited to the determination criteria described above.
- the LU 135 accessed by the active server 120 can be taken over. It may be determined that the LU 135 accessible by the standby server 120 matches.
- the information acquisition application 123 acquires confirmation configuration information including the configuration of the LU 135 and the type of the standby server 120, the model 503 of the entry of the active server 120 in the management target table 250 and the configuration information What is necessary is just to compare 504 and the acquisition information 609 of the selected entry of the check table 221. As a result, a process equivalent to the case where both the comparison processes (Comparison 1) and (Comparison 2) are executed can be realized.
- step S1207 the information comparison unit 213 determines whether the active server 120 or the standby server 120 is a server type that does not support the failover function, in addition to the comparison processing described above. It is possible to determine whether or not failover processing cannot be performed because the standby server 120 is in operation.
- the information comparison unit 213 stores “impossible” in the confirmation result 607 of the selected entry, and stores the reason why failover is impossible in the reason 608 (step S1208). ), The process proceeds to step S1210.
- the information comparison unit 213 determines that the standby server 120 is the active server 120. It is determined that the LU 135 assigned to the server 120 cannot be accessed, and “LU access impossible” is stored in the column 708.
- the information comparison unit 213 stores “OK” in the confirmation result 607 of the selected entry (step S1209), and proceeds to step S1210.
- the information comparison unit 213 stores “Done” in the LU flag 606 of the selected entry (step S1210).
- the information comparison unit 213 determines whether or not the check flags 605 of all the entries in the cluster check table 251 are “completed” (step S1211).
- the processing completion notification includes an identifier of a cluster that is a processing unit.
- FIGS. 13A, 13B, and 13C are flowcharts for explaining an example of processing executed by the information receiving unit 214 of the management server 100 according to the first embodiment of this invention.
- the information receiving unit 214 refers to the management target table 250 based on the received cluster identifier (step S1301), and acquires the WWN of the standby server 120 belonging to the cluster corresponding to the received cluster identifier (step S1302). ).
- the information receiving unit 214 searches for an entry that matches the cluster identifier received by the cluster ID 508.
- the information receiving unit 214 selects an entry in which “standby” is stored in the type 509 of the searched entry. Further, the information receiving unit 214 acquires the WWN of the standby server 120 from the WWN 505 of the selected entry. When a plurality of standby servers 120 belong to the cluster, a plurality of WWNs are acquired.
- the information receiving unit 214 selects one active server 120 belonging to the cluster corresponding to the received cluster identifier, and acquires the WWN of the selected active server 120 (step S1304).
- the information receiving unit 214 refers to the management target table 250 and acquires the WWN of the active server 120 from the WWN 505 of the entry of the active server 120 belonging to the cluster.
- the information receiving unit 214 transmits an additional processing start instruction to the storage operation unit 216 (step S1305).
- the instruction includes the WWN of the active server 120 acquired in step S1304 and the WWNs of all the standby servers 120 acquired in step S1302.
- the information receiving unit 214 waits until receiving a processing completion notification from the storage operation unit 216.
- step S1306 When the information reception unit 214 receives the processing completion notification from the storage operation unit 216 (step S1306), whether or not the processing has been executed for all active servers 120 belonging to the cluster corresponding to the received cluster identifier. Is determined (step S1307).
- the information receiving unit 214 returns to step S1305 to newly select the active server 120 and execute the same processing.
- the information reception unit 214 activates the information acquisition application 123 for the standby server 120 belonging to the cluster corresponding to the received cluster identifier.
- An instruction is transmitted (step S1309).
- the information receiving unit 214 receives the confirmation configuration information acquired by the information acquisition application 123 from the standby server 120 (step S1312).
- the received confirmation configuration information includes at least the information of the LU 135 accessed by the active server 120 and the information acquisition time.
- the received confirmation configuration information may include information such as the server type and the number of I / Os of the active server 120 or the standby server 120.
- the standby server 120 transmits its own identifier along with the information acquired by the information acquisition application 123. As a result, the information receiving unit 214 can ascertain from which standby server 120 the confirmation configuration information is transmitted.
- the information reception unit 214 updates the cluster check table 251 based on the received confirmation configuration information (step S1313).
- the information acquisition application 123 running on one standby server 120 can acquire information on LUs 135 of a plurality of active servers 120 that are paired. Therefore, when receiving a plurality of pieces of confirmation configuration information, the information reception unit 214 updates each entry by the number of pieces of confirmation configuration information received.
- the information receiving unit 214 changes the LU flag 606 of the updated entry in the cluster check table 251 to “Done” (step S1314).
- the reason for changing the LU flag 606 is that since the information acquisition application 123 operating on one standby server 120 acquires information on LUs 135 of a plurality of active servers 120, the same configuration information for confirmation is stored.
- the purpose is to avoid duplicate acquisition. In other words, this is for preventing the same information from being acquired when information is acquired for each check pair.
- the information receiving unit 214 determines whether or not the LU flag 606 of all entries corresponding to the received cluster identifier is “completed” (step S1315).
- the information receiving unit 214 If it is determined that the LU flags 606 of all entries are not “completed”, the information receiving unit 214 returns to step S1312, and executes the same processing. Since information from the standby server 120 may not be transmitted, the information receiving unit 214 may proceed to step 1316 after a predetermined time has elapsed.
- Step S1316 the information receiving unit 214 ends the loop processing of the information acquisition application 123 (step S1316), and sets the cluster corresponding to the received cluster identifier.
- One active server 120 to which the user belongs is selected (step S1321).
- Steps S1321 to S1325 are a loop process of the active server 120, and the process is repeatedly executed for all the active servers 120 until the process is completed.
- the information reception unit 214 transmits a deletion processing start instruction to the storage operation unit 216 (step S1322).
- the instruction includes the WWN of the selected active server 120 and the WWNs of all the standby servers 120 acquired in step 1302. After transmitting the instruction, the information receiving unit 214 waits until a processing completion notification is received from the storage operation unit 216.
- step S1323 When the information reception unit 214 receives the processing completion notification from the storage operation unit 216 (step S1323), whether or not the processing has been executed for all active servers 120 belonging to the cluster corresponding to the received cluster identifier. Is determined (step S1324).
- step S1322 selects one active server 120, and executes similar processing.
- the information receiving unit 214 transmits a processing completion notification to the information comparing unit 213 (step S1326), and ends the processing.
- FIG. 14 is a flowchart illustrating an example of processing executed by the storage operation unit 216 of the management server 100 according to the first embodiment of this invention.
- the storage operation unit 216 starts processing upon receiving a processing start instruction from the information receiving unit 214.
- the storage operation unit 216 acquires from the storage device 130 information on the host group to which the active server 120 belongs, which is included in the received process start instruction (step S1401).
- the storage operation unit 216 makes an inquiry including the WWN of the active server 120 to the storage apparatus 130.
- the storage apparatus 130 searches the host group table 412 for the host group ID 701 to which the active server 120 belongs based on the WWN of the active server 120 included in the inquiry. Further, the storage apparatus 130 transmits a response including the host group ID 701 to the storage operation unit 216.
- the storage operation unit 216 determines whether or not the received process start instruction is an additional process start instruction (step S1402).
- the storage operation unit 216 instructs the host group control unit 411 to delete the WWN of the standby server 120. Is transmitted (step S1403).
- the storage operation unit 216 instructs to delete the WWN of the standby server 120 from the host group corresponding to the acquired host group ID 701.
- the instruction includes the acquired host group ID 701.
- the storage operation unit 216 waits until a processing completion notification is received from the host group control unit 411.
- this process may be omitted and the process may proceed to step 1405.
- the storage operation unit 216 transmits an instruction to add the WWN of the standby server 120 to the host group control unit 411 (step S1404).
- the storage operation unit 216 gives a reference authority to the host group corresponding to the acquired host group ID 701 and instructs to add the WWN of the standby server 120.
- the instruction includes the acquired host group ID 701 and the WWN of the standby server 120. After transmitting the instruction, the storage operation unit 216 waits until a processing completion notification is received from the host group control unit 411.
- the storage operation unit 216 When the storage operation unit 216 receives the processing completion notification from the host group control unit 411 (step S1405), the storage operation unit 216 transmits the processing completion notification to the information reception unit 214 (step S1406). Thereafter, the storage operation unit 216 ends the process.
- the storage operation unit 216 updates the host group table 412 by transmitting an operation instruction to the host group control unit 411.
- the present invention is not limited to this.
- the storage operation unit 216 may transmit a specific operation instruction to the host group table 412 using an API (Application Program Interface). Further, the storage operation unit 216 may acquire the host group table 412 and add or delete the WWN of the standby server 120 from the host group table 412 using the API.
- API Application Program Interface
- FIG. 15 is a flowchart illustrating an example of processing executed by the host group control unit 411 of the storage apparatus 130 according to the first embodiment of this invention.
- the host group control unit 411 starts processing upon receiving an operation instruction from the storage operation unit 216.
- the host group control unit 411 updates the host group table 412 in accordance with the received operation instruction (step S1501). Processing branches as follows according to the received operation instruction.
- the host group control unit 411 searches for an entry of the target host group based on the host group ID 701 included in the delete instruction.
- the host group control unit 411 deletes the WWN of the standby server 120 included in the deletion instruction from the searched host group entry.
- the host group control unit 411 searches for an entry of the target host group based on the host group ID 701 included in the addition instruction.
- the host group control unit 411 adds the WWN of the standby server 120 included in the addition instruction to the searched host group entry.
- controller ID 703, port ID 704, and LU ID 705 store the same entries as other WWN entries.
- the authority 706 stores “Read” indicating the reference authority.
- the host group control unit 411 When the storage operation unit 216 transmits a specific operation instruction using the API, the host group control unit 411 does not need to execute the above-described processing, and the standby server 120 does not need to execute the process described above. Add or delete WWN.
- the host group control unit 411 transmits a process completion notification to the storage operation unit 216 (step S1502), and ends the process.
- FIG. 16 is a flowchart for explaining an example of processing executed by the information acquisition application 123 according to the first embodiment of the present invention.
- the standby server 120 activates the information acquisition application 123 when receiving an activation instruction from the information receiving unit 214.
- the information acquisition application 123 acquires the confirmation configuration information (step S1601). Specifically, the following processing is executed.
- the information acquisition application 123 When acquiring the information of the LU 135 included in the confirmation configuration information, the information acquisition application 123 issues a predetermined command to the storage apparatus 130. For example, a command for acquiring inquiry information can be considered. At this time, the information acquisition application 123 calculates the response time for the command as the information acquisition time.
- the information acquisition application 123 may acquire the information of the LU 135 by mounting the LU 135 and referring to the information in the LU 135.
- a confirmation program may be stored in advance in the LU 135 accessed by the active server 120, and information may be acquired by executing the program.
- the information acquisition application 123 transmits the acquired configuration information for confirmation to the information reception unit 214 (step S1602), and ends the process.
- the standby server 120 may be turned off after the processing is completed.
- the setting may be changed so that the information acquisition application 123 does not start.
- the information acquisition application 123 acquires configuration information such as LU information of the active server 120 from the management server 100 in advance, and can failover be performed by comparing the acquired configuration information with the confirmation configuration information? It may be determined whether or not.
- the failover availability determination process may be the same as the process of step S1207.
- the information comparison unit 213 does not need to perform the comparison process in step S1207, and may determine whether or not failover is possible based on the determination result transmitted from the information acquisition application 123.
- FIG. 17 is a flowchart illustrating an example of processing executed by the priority setting unit 217 of the management server 100 according to the first embodiment of this invention.
- the priority setting unit 217 When the priority setting unit 217 receives a processing completion notification from the information comparison unit 213, the priority setting unit 217 starts processing.
- the priority setting unit 217 refers to the cluster check table 251 based on the cluster identifier included in the received process completion notification (step S1701).
- the priority setting unit 217 searches the cluster check table 251 for an entry that matches the identifier of the active server 120 for which the active server ID 603 is selected.
- the priority setting unit 217 selects a check pair that is determined to be able to be failed over from the searched check pairs (step S1704).
- the priority setting unit 217 selects an entry in which “OK” is stored in the confirmation result 607 of the entry corresponding to the searched check pair.
- the priority setting unit 217 sets the priority for the selected check pair (step S1705).
- the priority setting unit 217 refers to the acquisition time 610 of the entry corresponding to the selected check pair, and sets a higher priority in order from the shortest value.
- the priority is represented by a number, and the priority is set so that the priority is high when the number is small, and the priority is low when the number is large.
- the priority setting unit 217 may set the priority based on the pair ID 601 or the standby server ID 604. For example, a method of setting a high priority for an entry with a small pair ID 601 is conceivable.
- the priority setting unit 217 transmits a processing completion notification to the processing notification unit 218 (step S1707), and ends the processing.
- FIG. 18 is a flowchart illustrating an example of processing executed by the processing notification unit 218 of the management server 100 according to the first embodiment of this invention.
- the process notification unit 218 When the process notification unit 218 receives the process completion notification from the priority setting unit 217, the process notification unit 218 starts the process.
- the process notification unit 218 notifies the administrator that the cluster confirmation process has been completed (step S1801), and ends the process.
- the processing result may be displayed on an output device such as a display provided in the management server 100, or the processing result may be displayed on a display of the client terminal 190 or the like.
- a method of sending an email or alert to the administrator is also conceivable.
- the first embodiment it is possible to determine whether or not failover is possible without actually executing failover processing between the active server 120 and the standby server 120.
- the information comparison unit 213 compares the information of the LU 135 that is accessed by the active server 120 with the information of the LU 135 that is accessible by the standby server 120, thereby failing over. It was determined whether it was possible.
- the present invention is not limited to this, and the following method can also be used.
- step S1601 the information acquisition application 123 on the standby server 120 sends a command to the storage apparatus 130 to inquire whether the LU 135 can be connected.
- the command includes the WWN of the standby server 120.
- the host group control unit 411 of the storage apparatus 130 When the host group control unit 411 of the storage apparatus 130 receives the inquiry command, the host group control unit 411 transmits a response to the command to the information acquisition application 123.
- step 1602 when the information acquisition application 123 receives a response from the storage apparatus 130, the information acquisition application 123 transmits a notification that failover is possible to the information reception unit 214.
- the standby server 120 holds the information acquisition application 123 in advance, but in the second embodiment, the management server 100 holds the information acquisition application 123.
- the information receiving unit 214 instructs the standby server 120 belonging to the cluster corresponding to the received cluster identifier to start.
- the activation instruction includes an instruction to turn on the power.
- the information reception unit 214 When the information reception unit 214 receives a transmission request for the information acquisition application 123 from the activated standby server 120, the information reception unit 214 transmits the information acquisition application 123 to the standby server 120.
- the configuration of the computer system of the third embodiment and the hardware configuration and software configuration of the management server 100, the server 120, and the storage apparatus 130 are the same as those of the first embodiment, the description thereof is omitted. Hereinafter, the difference from the first embodiment will be mainly described.
- the dedicated storage area may be prepared in advance in the storage apparatus 130, or an LU for storing the information acquisition application 123 may be generated as necessary.
- a test LU for storing the information acquisition application 123 is generated in the storage apparatus 130.
- the storage operation unit 216 sends the host group control unit 411 to the WWN of the standby server 120 from the host group to which the test LU 135 belongs and the host group corresponding to the acquired host group ID 701. To delete.
- the WWN of the standby server 120 does not have to be deleted from the host group to which the test LU belongs. Further, the standby server WWN may be registered in advance in the host group to which the test LU belongs.
- the third embodiment it is possible to reduce the trouble of storing the information acquisition application 123 in advance in the standby server 120 and reduce the storage resources of the standby server for storing the information acquisition application 123. In addition, it is possible to reduce the trouble of setting the BIOS and UEFI for starting the information acquisition application 123.
- the fourth embodiment is different in that the information acquisition application 123 is stored in advance in an external storage device connected to the standby server 120.
- the configuration of the computer system, and the hardware configuration and software configuration of the management server 100, the server 120, and the storage apparatus 130 are the same as those in the first embodiment, and a description thereof will be omitted. Hereinafter, the difference from the first embodiment will be mainly described.
- the standby server 120 acquires and starts an information acquisition application from an external storage device.
- the external storage device may be a nonvolatile semiconductor memory, a storage device such as HDD or SSD, or a computer-readable non-transitory storage medium such as an IC card, SD card or DVD.
- step 1309 the processing in step 1309 is changed.
- step 1309 the information receiving unit 214 activates the standby server 120 belonging to the cluster corresponding to the received cluster identifier.
- the activation instruction includes an instruction to turn on the power.
- the information receiving unit 214 changes the activation order (boot loader) of the standby server 120 so that it is activated from the external storage device in which the information acquisition application 123 is stored.
- the information receiving unit 214 can change the activation order using the BIOS or UEFI of the standby server 120.
- the information acquisition application 123 can be operated on the standby server 120 by connecting an external storage device to the standby server 120.
- the fifth embodiment is different in that the information acquisition application 123 is stored in the LU 135 accessed by the active server 120.
- the configuration of the computer system, and the hardware configuration and software configuration of the management server 100, the server 120, and the storage apparatus 130 are the same as those in the first embodiment, and a description thereof will be omitted. Hereinafter, the difference from the first embodiment will be mainly described.
- the standby server 120 acquires and starts the information acquisition application 123 from the LU 135 accessed by the active server 120.
- the LU 135 accessed by the active server 120 includes a storage area used by the active server 120 and a storage area for storing the information acquisition application 123.
- a method may be considered in which the storage apparatus 130 generates a dedicated storage area in the LU 135 and stores the information acquisition application 123 when the LU 135 is generated.
- step 1309 the processing in step 1309 is changed.
- step 1309 the information receiving unit 214 activates the standby server 120 belonging to the cluster corresponding to the received cluster identifier.
- the activation instruction includes an instruction to turn on the power.
- the information receiving unit 214 instructs the standby server 120 to start the information acquisition application 123 from the dedicated storage area.
- the information reception unit 214 can change the storage area in which the information acquisition application 123 is activated using the BIOS or UEFI of the standby server 120.
- the information acquisition application 123 may store the acquired confirmation configuration information in a dedicated storage area, and the management server 100 may acquire the confirmation configuration information from the dedicated storage area.
- the information acquisition application 123 can be operated on the standby server 120 without the standby server 120 holding the information acquisition application 123.
- the priority is set based on the acquisition time 610, but in the sixth embodiment, the priority is set based on the performance information of the port 134 of the storage apparatus 130.
- the configuration of the cluster check table 251 held by the management server 100 is different. Note that the hardware configuration and other software configurations of the management server 100 are the same as those in the first embodiment, and a description thereof will be omitted.
- 19A and 19B are explanatory diagrams illustrating an example of the cluster check table 251 according to the sixth embodiment of this invention.
- the cluster check table 251 in the sixth embodiment is different from the first embodiment in that it includes a storage ID 1901 and a port ID 1902.
- the storage ID 1901 stores an identifier for uniquely identifying the storage device 130 connected to the standby server 120.
- the port ID 1902 stores an identifier for uniquely identifying the port 134 included in the storage apparatus 130.
- the standby server 120 accesses the LU 135 via the port 134 corresponding to the port ID 1902.
- FIG. 20 is an explanatory diagram illustrating an example of the port performance table 252 according to the sixth embodiment of this invention.
- the port performance table 252 stores performance information of the port 134 that is used when the server 120 connects to the storage apparatus 130.
- the port performance table 252 includes a storage ID 2001, a port ID 2002, a measurement time 2203, and an IOPS 2004.
- Storage ID 2001 and port ID 2002 are the same as storage ID 1901 and port ID 1902. Note that the data stored in the storage ID 2001 can be omitted by specifying one of the columns used in this table or a combination of a plurality of columns.
- the management server 100 may automatically assign an identification number in ascending order to each storage apparatus 130 as an identifier.
- Measurement time 2003 stores the measurement time of performance information of the port 134.
- the measurement time may be the time when the performance information is acquired in the storage apparatus 130, or the time when the management server 100 acquires the performance information via the management interface 131 of the storage apparatus 130.
- IOPS 2004 stores port 134 performance information.
- IOPS Input Output Per Second
- the IOPS here is an index representing the usage status at the measurement time 2003.
- IOPS is used.
- the transfer amount of read data or the transfer amount of write data may be used as the performance information of the port 134.
- the priority setting unit 217 acquires the storage ID 1901 and the port ID 1902 from the entry corresponding to the selected check pair after the process of step S1704.
- the priority setting unit 217 refers to the port performance table 252 based on the acquired storage ID 1901 and port ID 1902 and searches for an entry that matches the storage ID 1901 and port ID 1902 from which the storage ID 2001 and port ID 2002 are acquired. Furthermore, the priority setting unit 217 acquires the IOPS value from the IOPS 2004 of the searched entry.
- step S1705 the priority setting unit 217 sets the priority based on the acquired IOPS value.
- the priority setting unit 217 compares the IOPS 2004 values of the entries with the latest measurement time 2003, and sets the higher priorities in descending order of the IOPS 2004 values.
- the priority setting unit 217 may calculate a value such as an average value or a change amount of the IOPS in the matching period of the same port, and set the priority based on the calculated value.
- the performance of the standby server is lowered after the failover, thereby preventing the influence on the business executed on the standby server 120. It becomes possible.
- the priority is set based on the acquisition time 610.
- priority is given based on the cost in the network path connecting the standby server 120 and the storage apparatus 130, that is, the path cost. The difference is that the degree is set.
- the configuration of the cluster check table 251 held by the management server 100 is different. Note that the hardware configuration and other software configurations of the management server 100 are the same as those in the first embodiment, and a description thereof will be omitted.
- 21A and 21B are explanatory diagrams illustrating an example of the cluster check table 251 according to the seventh embodiment of this invention.
- the check flag 605, the LU flag 606, the confirmation result 607, and the reason 608 are omitted for simplicity of explanation.
- step S1701 to step S1704 is the same as that in the first embodiment.
- the priority setting unit 217 acquires the standby server ID 604, the storage ID 1901, and the port ID 1902 from the entry corresponding to the selected check pair after the process of step S1704.
- the priority setting unit 217 inquires of the network management unit 111 about the path cost including the acquired standby server ID 604, storage ID 1901, and port ID 1902.
- the priority setting unit 217 waits until it receives responses for all the standby servers 120 from the network management unit 111. Further, when receiving a response from the network management unit 111, the priority setting unit 217 updates the path cost 2101 of the corresponding entry in the cluster check table 251 based on the response.
- step S1705 the priority setting unit 217 sets the priority based on the value of the path cost 2101.
- step S1706 and step S1707 Since the processing in step S1706 and step S1707 is the same as that in the first embodiment, description thereof is omitted.
- FIG. 22 is a flowchart illustrating an example of processing executed by the network management unit 111 of the NW-SW 110 according to the seventh embodiment of the present invention.
- the network management unit 111 When the network management unit 111 receives a path cost inquiry from the priority setting unit 217, the network management unit 111 starts processing.
- the network management unit 111 acquires the path cost based on the standby server ID 604, the storage ID 1901, and the port ID 1902 included in the received inquiry (step S2201).
- the network management unit 111 holds information that associates the identifier of the server 120, the identifier of the storage device 130, the identifier of the port 134, and the configuration of the network path, and determines the number of switches included in the network path from the information.
- a method of obtaining the path cost is conceivable.
- the network management unit 111 identifies the network path between the standby server 120 and the storage apparatus 130 based on the received standby server ID 604, storage ID 1901, and port ID 1902.
- the network management unit 111 acquires the path cost by counting the number of switches included in the identified network path.
- the network management unit 111 transmits the acquired path cost to the priority setting unit 217 (step S2202), and ends the process.
- the NW-SW 110 has been described as including the network management unit 111, the present invention is not limited to this.
- the management server 100 may include the network management unit 111.
- the standby server 120 by setting the priority of the standby server 120 connected to the storage apparatus 130 via the network path with a low path cost, the standby server 120 has high network performance after the failover process. It can operate as the active server 120. This is because there is no delay or the like in communication between the server 120 and the storage apparatus 130 because the path cost is low.
- Example 8> In the first embodiment, a plurality of clusters are configured in advance, and the cluster confirmation process is executed on an existing cluster.
- the eighth embodiment is different in that, when configuring a cluster, the management server 100 executes a failover availability determination process and presents candidates for the server 120 that configures the cluster.
- the configuration of the computer system, and the hardware configuration and software configuration of the management server 100, the server 120, and the storage apparatus 130 are the same as those in the first embodiment, and a description thereof will be omitted. Hereinafter, the difference from the first embodiment will be mainly described.
- the processes executed by the trigger reception unit 215, the check list generation unit 211, and the process notification unit 218 are different. Hereinafter, each processing will be described.
- the trigger for starting the process of the trigger receiver 215 is different. Specifically, the administrator selects a candidate for the server 120 configuring the cluster in advance, and inputs an instruction to start cluster confirmation processing including the identifier of the selected server 120.
- the server 120 As the candidates for the server 120 to be selected, two types of servers, the active server 120 and the standby server 120, are conceivable.
- the administrator wants to realize a redundant system, the administrator selects the active server 120. As a result, the selected server 120 is registered as the active server 120, and the standby server 120 is searched for the active server 120. In addition, the administrator selects the standby server 120 when it is desired to further make the cluster configuration redundant in order to improve fault tolerance and the like.
- server 120 to be selected can designate all the servers 120 managed by the management server 100.
- the trigger reception unit 215 When the trigger reception unit 215 detects the above-described confirmation processing start instruction, the trigger reception unit 215 transmits a processing start instruction including the identifier of the server 120 selected as a candidate to the information acquisition unit 212.
- the processing executed by the checklist generation unit 211 is partially different.
- the checklist generation unit 211 When the checklist generation unit 211 receives a processing completion notification from the information acquisition unit 212, the checklist generation unit 211 starts processing.
- the notification includes the identifier of the server 120 selected by the administrator.
- step S1101 the process of step S1101 is not executed, and the process starts from step S1102.
- step S1102. since no cluster is configured, it is not necessary to search for the server 120 belonging to the cluster.
- step S1102 the check list generation unit 211 generates a check pair based on the identifier of the server 120 included in the processing completion notification. For example, the following processing can be considered.
- the check list generation unit 211 refers to the management target table 250 and searches for an entry that matches the received identifier of the server 120.
- the check list generation unit 211 searches for one or more entries having the same model 503 and configuration information 504 of the searched entries.
- FIG. 23 is a flowchart illustrating an example of processing executed by the processing notification unit 218 of the management server 100 according to the eighth embodiment of the present invention.
- the process notification unit 218 notifies the administrator of information on the candidate server 120 for configuring the cluster (step S2301).
- the process notification unit 218 transmits display information including information on a check pair that can be failed over as information of the candidate server 120 for configuring the cluster.
- the screen displayed by the display information will be described later with reference to FIG.
- the administrator selects the servers 120 constituting the cluster according to the operation screen displayed on the client terminal 190.
- the client terminal 190 transmits cluster registration information including the identifier of the server 120 selected by the administrator to the management server 100.
- the registration information includes at least the identifier of the cluster, the identifier of the server 120 selected as the active server 120, and the identifier of the server 120 selected as the standby server 120.
- the process notifying unit 218 receives the cluster registration information from the client terminal 190 (step S2302), updates the management target table 250 based on the registration information (step S2303), and ends the process. Specifically, the following processing is executed.
- the process notification unit 218 adds a new entry to the management target table 250, and the process notification unit 218 stores information necessary for the added entry.
- the process notification unit 218 executes the process, but the management server 100 may include a cluster registration unit that executes a cluster configuration process. In this case, in step S2303, the process notification unit 218 may transmit a process start instruction including cluster registration information to the cluster registration unit.
- FIG. 24 is an explanatory diagram showing an example of an operation screen according to the eighth embodiment of the present invention.
- the operation screen 2400 displays information for newly configuring a cluster or updating the configuration of a configured cluster.
- the operation screen 2400 is displayed on the display of the client terminal 190. It may be displayed via an input / output device provided in the management server 100.
- the management server 100 or the client terminal 190 displays using a dedicated browser or a dedicated program.
- the operation screen 2400 includes an active server selection unit 2410, a standby server selection unit 2420, an add button 2431, a delete button 2432, a cluster selection unit 2440, a cluster configuration display unit 2450, a cluster add button 2461, a cluster delete button 2462, and a confirmation.
- Button 2463 is included.
- the active server selection unit 2410 is a display unit that displays information about the servers 120 that are candidates for the active server 120 constituting the cluster.
- the active server selection unit 2410 displays a server ID 2411 and a status 2412.
- the server ID 2411 is an identifier of the server 120 that is a candidate for the active server 120.
- the state 2412 is information indicating whether registration with the cluster is possible. In this embodiment, when registration with the cluster is possible, “OK” is displayed in the state 2412, and when selected as the server 120 to be registered with the cluster, “Selected” is displayed in the state 2412 and registration cannot be performed with the cluster. In this case, “NG” is displayed in the state 2412.
- the administrator can select the active server 120 constituting the cluster from the servers 120 displayed in the active server selection unit 2410.
- the server 120 whose server ID 2411 is “server 2” is selected.
- the entry corresponding to the selected server 120 is highlighted.
- the standby server selection unit 2420 is a display unit that displays information about the servers 120 that are candidates for the standby server 120 that constitutes the cluster.
- the spare server selection unit 2420 displays a server ID 2421 and a status 2422.
- the administrator can select the standby server 120 constituting the cluster from the servers 120 displayed in the standby server selection unit 2420.
- the server 120 whose server ID 2421 is “server 5” is selected.
- the entry corresponding to the selected server 120 is highlighted.
- the cluster selection unit 2440 is an operation button for selecting a target cluster to which the server 120 is added. By operating the cluster selection unit 2440, the target cluster is highlighted.
- the cluster configuration display unit 2450 is a display unit that displays the configuration of the target cluster.
- the cluster configuration display unit 2450 displays an active server ID 2451 and a standby server ID 2452.
- the active server ID 2451 displays the identifier of the server 120 scheduled to be added as the active server 120.
- the spare server ID 2452 displays the identifier of the server 120 scheduled to be added as the spare server 120.
- the add button 2431 is an operation button for registering each server 120 selected by the active server selection unit 2410 and the standby server selection unit 2420 as a candidate for the server 120 to be added to the cluster.
- the administrator operates the add button 2431, the selected server 120 is registered in the cluster configuration display unit 2450.
- the delete button 2432 is an operation button for canceling registration of the candidate server 120 to be added to the cluster.
- the administrator selects the server 120 whose registration is to be canceled from the servers 120 displayed on the cluster configuration display unit 2450, and then operates the delete button 2432. As a result, the server 120 selected from the cluster configuration display unit 2450 is deleted.
- the cluster addition button 2461 is a button operated when a new cluster is configured.
- the confirmation button 2463 is a button for confirming the cluster configuration.
- a cluster having the configuration of the server 120 displayed on the cluster configuration display unit 2450 is configured.
- the client terminal 190 when the confirm button 2463 is operated, the client terminal 190 has registered information including the identifier of the cluster displayed on the cluster selection unit 2440 and the identifier of the server 120 displayed on the cluster configuration display unit 2450.
- the generated registration information is transmitted to the processing notification unit 218.
- a cluster is configured from the physical servers 120, but in the ninth embodiment, a cluster is configured using virtual servers.
- the configuration of the computer system, and the hardware configuration and the software configuration of the management server 100 and the storage apparatus 130 are the same, and thus the description thereof is omitted.
- the difference from the first embodiment will be mainly described.
- the standby server ID 604 of the management target table 250 stores either the identifier of the server 120 or the virtual server 2500.
- FIG. 25 is a block diagram illustrating the hardware configuration and software configuration of the server 120 according to the ninth embodiment of the present invention.
- the hardware configuration of the server 120 is the same as that of the first embodiment, but the software configuration is different.
- the memory 302 stores a program for realizing the virtualization unit 2510.
- the virtualization unit 2510 generates and manages one or more virtual servers 2500. Specifically, the virtualization unit 2510 virtually divides computer resources included in the server 120 and assigns the divided computer resources to generate one or more virtual servers 2500.
- the virtualization unit 2510 may be, for example, a hypervisor or a VMM (Virtual Machine Monitor).
- the virtualization unit 3521 includes a virtual switch 2511 and a virtualization unit management interface 2512.
- the virtual switch 2511 realizes communication between the virtual servers 2500 and communication between the virtual servers 2500 and external devices. Specifically, the virtual switch 2511 controls communication between the virtual server 2500 and an external device by connecting between an adapter connected to a physical interface such as the network interface 305 and the virtual server 32500. To do.
- the virtualization unit management interface 2512 is a control interface for communicating with the management server 100.
- the virtualization unit 2510 transmits information to the management server 100 and receives instructions from the management server 100 using the virtualization unit management interface 2512. It can also be used directly from a user terminal or the like.
- the virtualization unit 2510 holds information that associates the computer resources of the server 120 with the virtual computer resources of the virtual server 2500, the configuration information of the virtual server 2500, the operation history, and the like.
- processing executed by the priority setting unit 217 is different.
- FIG. 26 is a flowchart illustrating an example of processing executed by the priority setting unit 217 of the management server 100 according to the ninth embodiment of this invention.
- step S1701 to step S1704 Since the processing from step S1701 to step S1704 is the same as that in the first embodiment, description thereof is omitted.
- the priority setting unit 217 determines whether there is a check pair that is determined to be capable of failover (step S2601).
- the priority setting unit 217 transmits a processing start instruction to the standby system generation unit 219 (step S2602).
- the instruction includes the identifier of the active server 120.
- the priority setting unit 217 stands by until a processing completion notification is received from the standby system generation unit 219.
- the priority setting unit 217 Upon receiving the processing completion notification from the standby system generation unit 219, the priority setting unit 217 proceeds to step S1706.
- step S1706 the processing contents are the same, except that the standby server 120 configuring the check pair is a virtual server 2500.
- step S2601 if the number of check pairs determined to be capable of failover is equal to or less than the preset number of standby servers 120, a processing start instruction is transmitted to the standby generation unit 219. It may be. As the number of standby servers 120 set in advance, the number of standby servers 120 necessary for the cluster may be set.
- FIG. 27 is a flowchart illustrating an example of processing executed by the standby system generation unit 219 of the management server 100 according to the ninth embodiment of the present invention.
- the standby system generation unit 219 instructs the server 120 on which the virtualization unit 2510 operates to generate the virtual server 2500 to be the standby server 120 and to change the configuration to the virtual server 2500 capable of failover. .
- the virtual server 2500 generated as the standby server 120 is also referred to as a standby virtual server 2500.
- the standby system generation unit 219 refers to the management target table 250 (step S2701) and searches for an entry of the standby virtual server 2500 (step S2702).
- the standby generation unit 219 determines whether or not a virtual server 2500 that can be failed over with the active server 120 corresponding to the identifier of the active server 120 included in the processing start instruction can be generated (step S2703). . Specifically, the following processing is executed.
- the standby system generation unit 219 searches for an entry that matches the identifier of the active server 120 included in the processing start instruction.
- the standby system generation unit 219 acquires the model 503 and configuration information 504 of the retrieved entry.
- the standby generation unit 219 inquires of the virtualization unit 2510 whether or not the configuration of the searched standby virtual server 2500 can be changed to a configuration capable of failover.
- the management server 100 holds information in which the identifier of the virtualization unit 2510 is associated with the identifier of the virtual server 2500 managed by the virtualization unit 2510. A way to do this is conceivable.
- the standby system generation unit 219 determines that a virtual server 2500 capable of failover can be generated.
- the standby system generation unit 219 may make an inquiry including the model 503 and configuration information 504 of the active server 120 to all the virtualization units 2510 in the computer system.
- the standby system generation unit 219 proceeds to step S2709.
- the standby generation unit 219 transmits an instruction to generate the virtual server 2500 having a configuration requiring failover to the virtualization unit 2510 that has transmitted the response. (Step S2704).
- the standby system generation unit 219 stands by until a processing completion notification is received from the virtualization unit 2510.
- the standby system generation unit 219 receives a processing completion notification from the virtualization unit 2510 (step S2705).
- the notification includes the generated configuration information of the virtual server 2500.
- the standby system generation unit 219 updates the management target table 250 based on the configuration information of the virtual server 2500 included in the instruction.
- the standby system generation unit 219 transmits a processing start instruction to the information comparison unit 213 (step S2706).
- the instruction includes the identifier of the active server 120 and the identifier of the standby virtual server 2500.
- the standby system generation unit 219 waits until a processing completion notification is received from the information comparison unit 213.
- the information comparison unit 213 can omit the processing from step S1201 to step S1205.
- the standby system generation unit 219 determines whether or not failover can be performed between the active server 120 and the generated virtual server 2500 (step S2707).
- step S2710 If it is determined that failover is not possible, the standby system generation unit 219 proceeds to step S2710.
- the standby generation unit 219 adds the generated virtual server as the standby server 120 to the management target table 250 (step S2708). There are two ways to add the virtual server 2500.
- the standby system generation unit 219 updates the necessary column information of the entry.
- the standby system generation unit 219 adds a new entry to the management target table 250 and stores necessary information in the added entry.
- the process of adding the virtual server 2500 to the management target table 250 may be executed by a cluster registration unit (not shown).
- the standby system generation unit 219 transmits a process completion notification to the priority setting unit 217 (step S2709) and ends the process.
- the virtual server 2500 as the standby server 120, it is possible to configure a cluster without changing the configuration of the server 120 and the connection path between the active server 120 and the LU 135.
- the various software illustrated in the present embodiment can be stored in various recording media (for example, non-temporary storage media) such as electromagnetic, electronic, and optical, and through a communication network such as the Internet. It can be downloaded to a computer.
- recording media for example, non-temporary storage media
- a communication network such as the Internet. It can be downloaded to a computer.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Hardware Redundancy (AREA)
Abstract
Description
図1は、本発明の実施例1における計算機システムの構成例を示すブロック図である。
実施例1では、ステップS1207の判定処理において、情報比較部213が、現用系サーバ120がアクセスするLU135の情報と、予備系サーバ120がアクセス可能なLU135の情報とを比較することによって、フェイルオーバが可能か否かを判定していた。本発明はこれに限定されず、以下のような方法を用いることも可能である。
実施例1では、予備系サーバ120が予め情報取得アプリケーション123を保持していたが、実施例2では管理サーバ100が情報取得アプリケーション123を保持する点が異なる。
実施例3では、ストレージ装置130内の専用記憶領域に情報取得アプリケーション123を格納する点が異なる。
実施例4では、予め、予備系サーバ120に接続された外部記憶装置に情報取得アプリケーション123を格納する点が異なる。
実施例5では、現用系サーバ120がアクセスするLU135内に情報取得アプリケーション123を格納する点が異なる。
実施例1では、取得時間610に基づいて優先度を設定していたが、実施例6では、ストレージ装置130のポート134の性能情報に基づいて優先度を設定する点が異なる。
実施例1では、取得時間610に基づいて優先度を設定していたが、実施例7では、予備系サーバ120とストレージ装置130とを接続するネットワーク経路におけるコスト、すなわち、パスコストに基づいて優先度を設定する点が異なる。
実施例1では、予め、複数のクラスタが構成され、既存のクラスタに対してクラスタ確認処理を実行するものであった。実施例8では、クラスタを構成する場合に、管理サーバ100が、フェイルオーバの可否判定処理を実行し、クラスタを構成するサーバ120の候補を提示する点が異なる。
実施例1では、物理的なサーバ120からクラスタが構成されていたが、実施例9では、仮想的なサーバを用いてクラスタが構成されている点が異なる。
Claims (15)
- 計算機システムにおけるシステム冗長化確認方法であって、
前記計算機システムは、少なくとも一以上の第1の計算機と、少なくとも一以上の第2の計算機と、ストレージシステムと、前記少なくとも一以上の第1の計算機及び前記少なくとも一以上の第2の計算機を管理する管理計算機と、を有し、
前記少なくとも一以上の第1の計算機は、第1のプロセッサ、前記第1のプロセッサに接続される第1のメモリ、及び前記第1のプロセッサに接続される第1のI/Oインタフェースを有し、
前記少なくとも一以上の第2の計算機は、第2のプロセッサ、前記第2のプロセッサに接続される第2のメモリ、及び前記第2のプロセッサに接続される第2のI/Oインタフェースを有し、
前記ストレージシステムは、一つ以上のポートを有するコントローラを一つ以上含むディスクコントローラ、及び複数の記憶媒体を有し、
前記管理計算機は、第3のプロセッサ、前記第3のプロセッサに接続される第3のメモリ、及び前記第3のプロセッサに接続される第3のI/Oインタフェースを有し、
前記少なくとも一以上の第1の計算機は、業務を実行し、
前記少なくとも一以上の第2の計算機は、前記少なくとも一以上の第1の計算機に障害が発生した場合に、前記業務を引き継ぎ、
前記ストレージシステムは、前記少なくとも一以上の第1の計算機に、前記業務の実行に必要なデータを格納する記憶領域を提供し、
前記方法は、
前記管理計算機が、前記少なくとも一以上の第1の計算機のハードウェア構成に関する第1のハードウェア情報、及び前記少なくとも一以上の第2の計算機のハードウェア構成に関する第2のハードウェア情報を取得する第1のステップと、
前記管理計算機が、前記少なくとも一以上の第1の計算機に提供される記憶領域に関する第1の記憶領域情報を取得する第2のステップと、
前記管理計算機が、前記少なくとも一以上の第1の計算機に提供される記憶領域に関する第2の記憶領域情報の取得指示を、前記少なくとも一以上の第2の計算機に送信する第3のステップと、
前記少なくとも一以上の第2の計算機が、前記取得指示を受信した場合に、前記ストレージシステムから前記第2の記憶領域情報を取得し、前記管理計算機に前記取得された第2の記憶領域情報を送信する第4のステップと、
前記管理計算機が、前記取得された第1のハードウェア情報及び前記取得された第1の記憶領域情報と、前記取得された第2のハードウェア情報及び前記取得された第2の記憶領域情報とを比較し、前記比較の結果に基づいて、前記少なくとも一以上の第1の計算機と前記少なくとも一以上の第2の計算機との間でフェイルオーバが可能か否かを判定する第5のステップと、
を含むことを特徴とするシステム冗長化確認方法。 - 請求項1に記載のシステム冗長化確認方法であって、
前記第5のステップは、
前記第1のハードウェア情報と前記第2のハードウェア情報とを比較して、前記少なくとも一以上の第2の計算機がフェイルオーバが可能なハードウェア構成であるか否かを判定するステップと、
前記第1の記憶領域情報と前記第2の記憶領域情報とを比較して、前記少なくとも一以上の第2の計算機が前記業務を引き継ぐために必要なデータを格納する記憶領域にアクセス可能か否かを判定するステップと、を含むことを特徴とするシステム冗長化確認方法。 - 請求項2に記載のシステム冗長化確認方法であって、
前記少なくとも一以上の第2の計算機は、前記第2の記憶領域情報を取得する情報取得モジュールを有し、
前記第4のステップは、
前記少なくとも一以上の第2の計算機が、前記取得指示を受信した場合に、前記情報取得モジュールを起動するステップと、
前記情報取得モジュールが、前記ストレージシステムに問い合わせることによって、前記少なくとも一つ以上の第1の計算機に提供される記憶領域のうち前記少なくとも一つ以上の第2の計算機がアクセス可能な記憶領域に関する情報を、前記第2の記憶領域情報として取得するステップと、
前記情報取得モジュールが、前記管理計算機に、前記取得された第2の記憶領域情報を送信するステップと、
前記少なくとも一以上の第2の計算機が、前記情報取得モジュールの処理が終了した後に、前記少なくとも一以上の第2の計算機を停止させるステップと、を含むことを特徴とするシステム冗長化確認方法。 - 請求項2に記載のシステム冗長化確認方法であって、
前記計算機システムは、前記第2の記憶領域情報を取得する情報取得モジュールを有し、
前記第4のステップは、
前記少なくとも一以上の第2の計算機が、前記取得指示を受信した場合に、前記情報取得モジュールを取得するステップと、
前記少なくとも一以上の第2の計算機が、前記取得された情報取得モジュールを実行するステップと、
前記情報取得モジュールが、前記ストレージシステムに問い合わせることによって、前記少なくとも一つ以上の第1の計算機に提供される記憶領域のうち前記少なくとも一つ以上の第2の計算機がアクセス可能な記憶領域に関する情報を、前記第2の記憶領域情報として取得するステップと、
前記情報取得モジュールが、前記管理計算機に、前記取得された第2の記憶領域情報を送信するステップと、
前記少なくとも一以上の第2の計算機が、前記情報取得モジュールの処理が終了した後に、前記少なくとも一以上の第2の計算機を停止させるステップと、を含むことを特徴とするシステム冗長化確認方法。 - 請求項3又は請求項4に記載のシステム冗長化確認方法であって、
前記第4のステップは、
前記情報取得モジュールが、前記ストレージシステムに問い合わせてから前記第2の記憶領域情報が取得されるまでの取得時間を算出するステップと、
前記情報取得モジュールが、前記取得された第2の記憶領域情報及び前記算出された取得時間を、前記管理計算機に送信するステップと、を含み、
前記方法は、
前記管理計算機が、フェイルオーバが可能であると判定された第2の計算機が複数存在する場合、当該複数の第2の計算機から受信した前記取得時間に基づいて、前記フェイルオーバが可能であると判定された複数の第2の計算機に、フェイルオーバ処理の実行時における前記複数の第2の計算機の使用順を示す優先度を設定するステップを含むことを特徴とするシステム冗長化確認方法。 - 請求項3又は請求項4に記載のシステム冗長化確認方法であって、
前記管理計算機が、フェイルオーバが可能であると判定された第2の計算機が複数存在する場合に、当該複数の第2の計算機が前記ストレージシステムへのアクセス時に用いるポートの性能を取得するステップと、
前記管理計算機が、前記取得されたポートの性能に基づいて、前記フェイルオーバが可能であると判定された複数の第2の計算機に、フェイルオーバ処理の実行時における前記複数の第2の計算機の使用順を示す優先度を設定するステップと、を含むことを特徴とするシステム冗長化確認方法。 - 請求項3又は請求項4に記載のシステム冗長化確認方法であって、
前記管理計算機が、フェイルオーバが可能であると判定された第2の計算機が複数存在する場合、当該複数の第2の計算機と前記ストレージシステムとを接続する経路のコストを取得するステップと、
前記管理計算機が、前記取得されたコストに基づいて、前記フェイルオーバが可能であると判定された複数の第2の計算機に、フェイルオーバ処理の実行時における前記複数の第2の計算機の使用順を示す優先度を設定するステップと、を含むことを特徴とするシステム冗長化確認方法。 - 請求項3又は請求項4に記載のシステム冗長化確認方法であって、
前記管理計算機が、前記少なくとも一以上の第1の計算機、及び前記少なくとも一以上の第1の計算機との間でフェイルオーバが可能であると判定された第2の計算機を表示するステップと、
前記管理計算機が、前記表示に基づく操作を受け付けた場合に、当該操作によって選択された少なくとも一以上の第1の計算機及び少なくとも一以上の第2の計算機を用いて、冗長化されたシステムを構成するステップと、を含むことを特徴とするシステム冗長化確認方法。 - 請求項3又は請求項4に記載のシステム冗長化確認方法であって、
前記少なくとも一以上の第2の計算機は、計算機資源を用いて仮想計算機を生成する仮想化部を有し、
前記仮想化部によって生成された前記仮想計算機が前記業務を引き継ぎ、
前記方法は、
前記管理計算機が、フェイルオーバが可能な第2の計算機が存在するか否かを判定するステップと、
前記管理計算機が、前記フェイルオーバが可能な第2の計算機が存在しないと判定された場合、前記少なくとも一以上の第2の計算機に、フェイルオーバが可能な仮想計算機の生成指示を送信するステップと、
前記少なくとも一以上の第2の計算機が、前記仮想計算機の生成指示を受信した場合に、当該指示に基づいて、前記フェイルオーバが可能な仮想計算機を生成するステップと、
前記管理計算機が、前記少なくとも一以上の第1の計算機及び前記生成された仮想計算機を用いて、冗長化されたシステムを構成するステップと、を含むことを特徴とするシステム冗長化確認方法。 - 少なくとも一以上の第1の計算機と、少なくとも一以上の第2の計算機と、ストレージシステムと、前記少なくとも一以上の第1の計算機及び前記少なくとも一以上の第2の計算機を管理する管理計算機と、を備える計算機システムであって、
前記少なくとも一以上の第1の計算機は、第1のプロセッサ、前記第1のプロセッサに接続される第1のメモリ、及び前記第1のプロセッサに接続される第1のI/Oインタフェースを有し、
前記少なくとも一以上の第2の計算機は、第2のプロセッサ、前記第2のプロセッサに接続される第2のメモリ、及び前記第2のプロセッサに接続される第2のI/Oインタフェースを有し、
前記ストレージシステムは、一つ以上のポートを有するコントローラを一つ以上含むディスクコントローラ、及び複数の記憶媒体を有し、
前記管理計算機は、第3のプロセッサ、前記第3のプロセッサに接続される第3のメモリ、及び前記第3のプロセッサに接続される第3のI/Oインタフェースを有し、
前記少なくとも一以上の第1の計算機は、業務を実行し、
前記少なくとも一以上の第2の計算機は、前記少なくとも一以上の第1の計算機に障害が発生した場合に、前記業務を引き継ぎ、
前記ストレージシステムは、前記少なくとも一以上の第1の計算機に、前記業務の実行に必要なデータを格納する記憶領域を提供し、
管理計算機は、
前記少なくとも一以上の第1の計算機と前記少なくとも一以上の第2の計算機との間においてフェイルオーバが可能か否かを判定する制御部と、
前記少なくとも一以上の第1の計算機の構成情報及び前記少なくとも一以上の第2の計算機の構成情報を格納する管理情報と、を有し、
前記制御部は、
前記管理情報を参照して、前記少なくとも一以上の第1の計算機のハードウェアに関する第1のハードウェア情報、及び前記少なくとも一以上の第2の計算機のハードウェアに関する第2のハードウェア情報を取得し、
前記管理情報を参照して、前記少なくとも一以上の第1の計算機に提供される記憶領域に関する第1の記憶領域情報を取得し、
前記少なくとも一以上の第1の計算機に提供される記憶領域に関する第2の記憶領域情報の取得指示を、前記少なくとも一以上の第2の計算機に送信し、
前記少なくとも一以上の第2の計算機は、前記取得指示を受信した場合に、前記ストレージシステムから前記第2の記憶領域情報を取得し、前記取得された第2の記憶領域情報を前記管理計算機に送信し、
前記制御部は、
前記取得された第1のハードウェア情報及び前記取得された第1の記憶領域情報と、前記取得された第2のハードウェア情報及び前記取得された第2の記憶領域情報とを比較し、
前記比較の結果に基づいて、前記少なくとも一以上の第1の計算機と前記少なくとも一以上の第2の計算機との間でフェイルオーバが可能か否かを判定することを特徴とする計算機システム。 - 請求項10に記載の計算機システムであって、
前記制御部は、
前記少なくとも一以上の第1の計算機と前記少なくとも一以上の第2の計算機との間でフェイルオーバが可能か否かを判定する場合に、前記第1のハードウェア情報と前記第2のハードウェア情報とを比較して、前記少なくとも一以上の第2の計算機がフェイルオーバが可能なハードウェア構成であるか否かを判定し、
前記第1の記憶領域情報と前記第2の記憶領域情報とを比較して、前記少なくとも一以上の第2の計算機が前記業務を引き継ぐために必要なデータを格納する記憶領域にアクセス可能か否かを判定することを特徴とする計算機システム。 - 請求項11に記載の計算機システムであって、
前記少なくとも一以上の第2の計算機は、前記第2の記憶領域情報を取得する情報取得部を有し、
前記少なくとも一以上の第2の計算機は、前記取得指示を受信した場合に、前記情報取得部を起動し、
前記起動した情報取得部は、
前記ストレージシステムに問い合わせることによって、前記少なくとも一つ以上の第1の計算機に提供される記憶領域のうち前記少なくとも一つ以上の第2の計算機がアクセス可能な記憶領域に関する情報を、前記第2の記憶領域情報として取得し、
前記管理計算機に、前記取得された第2の記憶領域情報を送信し、
前記少なくとも一以上の第2の計算機は、前記情報取得部の処理が終了した後に、前記少なくとも一以上の第2の計算機を停止させることを特徴とする計算機システム。 - 請求項12に記載の計算機システムであって、
前記情報取得部は、
前記ストレージシステムに問い合わせてから前記第2の記憶領域情報が取得されるまでの取得時間を算出し、
前記管理計算機に、前記取得された第2の記憶領域情報及び前記算出された取得時間を送信し、
前記制御部は、フェイルオーバが可能であると判定された第2の計算機が複数存在する場合、当該複数の第2の計算機から受信した前記取得時間に基づいて、前記フェイルオーバが可能であると判定された複数の第2の計算機に、フェイルオーバ処理の実行時における前記複数の第2の計算機の使用順を示す優先度を設定することを特徴とする計算機システム。 - 請求項12に記載の計算機システムであって、
前記制御部は、
フェイルオーバが可能であると判定された第2の計算機が複数存在する場合、当該複数の第2の計算機が前記ストレージシステムへのアクセス時に用いるポートの性能を取得し、
前記取得されたポートの性能に基づいて、前記フェイルオーバが可能であると判定された複数の第2の計算機に、フェイルオーバ処理の実行時における前記複数の第2の計算機の使用順を示す優先度を設定することを特徴とする計算機システム。 - 請求項12に記載の計算機システムであって、
前記制御部は、
フェイルオーバが可能であると判定された第2の計算機が複数存在する場合、当該複数の第2の計算機と前記ストレージシステムとを接続する経路のコストを取得し、
前記取得されたコストに基づいて、前記フェイルオーバが可能であると判定された複数の第2の計算機に、フェイルオーバ処理の実行時における前記複数の第2の計算機の使用順を示す優先度を設定することを特徴とする計算機システム。
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/426,279 US9542249B2 (en) | 2012-11-02 | 2012-11-02 | System redundancy verification method and computer system |
JP2014544176A JP5955977B2 (ja) | 2012-11-02 | 2012-11-02 | システム冗長化確認方法及び計算機システム |
PCT/JP2012/078453 WO2014068764A1 (ja) | 2012-11-02 | 2012-11-02 | システム冗長化確認方法及び計算機システム |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2012/078453 WO2014068764A1 (ja) | 2012-11-02 | 2012-11-02 | システム冗長化確認方法及び計算機システム |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2014068764A1 true WO2014068764A1 (ja) | 2014-05-08 |
Family
ID=50626728
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2012/078453 WO2014068764A1 (ja) | 2012-11-02 | 2012-11-02 | システム冗長化確認方法及び計算機システム |
Country Status (3)
Country | Link |
---|---|
US (1) | US9542249B2 (ja) |
JP (1) | JP5955977B2 (ja) |
WO (1) | WO2014068764A1 (ja) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2016046951A1 (ja) * | 2014-09-26 | 2016-03-31 | 株式会社日立製作所 | 計算機システム及びそのファイル管理方法 |
WO2016162916A1 (ja) * | 2015-04-06 | 2016-10-13 | 株式会社日立製作所 | 管理計算機およびリソース管理方法 |
WO2024105891A1 (ja) * | 2022-11-18 | 2024-05-23 | 日本電信電話株式会社 | 検査装置、検査方法、および検査プログラム |
Families Citing this family (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9727357B2 (en) * | 2013-10-01 | 2017-08-08 | International Business Machines Corporation | Failover detection and treatment in checkpoint systems |
US9454416B2 (en) | 2014-10-14 | 2016-09-27 | Netapp, Inc. | Detecting high availability readiness of a distributed computing system |
GB2546789A (en) * | 2016-01-29 | 2017-08-02 | Bombardier Primove Gmbh | Arrangement with battery system for providing electric energy to a vehicle |
US10552272B2 (en) * | 2016-12-14 | 2020-02-04 | Nutanix, Inc. | Maintaining high availability during N-node failover |
JP2018133764A (ja) * | 2017-02-17 | 2018-08-23 | 株式会社リコー | 冗長構成システム、切替方法、情報処理システムおよびプログラム |
JP7032631B2 (ja) * | 2017-07-04 | 2022-03-09 | 富士通株式会社 | 送受信システム、送受信システムの制御方法、及び送信装置 |
US11481296B2 (en) * | 2018-09-10 | 2022-10-25 | International Business Machines Corporation | Detecting configuration errors in multiport I/O cards with simultaneous multi-processing |
JP7099272B2 (ja) * | 2018-11-19 | 2022-07-12 | 富士通株式会社 | 情報処理装置、ネットワークシステム及びチーミングプログラム |
JP7017546B2 (ja) * | 2019-09-27 | 2022-02-08 | 株式会社日立製作所 | ストレージシステム、パス管理方法、及びパス管理プログラム |
US11347601B1 (en) | 2021-01-28 | 2022-05-31 | Wells Fargo Bank, N.A. | Managing data center failure events |
CN116980293A (zh) * | 2022-04-22 | 2023-10-31 | 华为云计算技术有限公司 | 一种虚拟网络管理方法及相关装置 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH05303509A (ja) * | 1992-04-27 | 1993-11-16 | Fujitsu Ltd | 予備機制御システム |
JP2007066216A (ja) * | 2005-09-02 | 2007-03-15 | Hitachi Ltd | ブート構成変更方法 |
JP2009140194A (ja) * | 2007-12-06 | 2009-06-25 | Hitachi Ltd | 障害回復環境の設定方法 |
JP2011086316A (ja) * | 2011-01-31 | 2011-04-28 | Hitachi Ltd | 引継方法、計算機システム及び管理サーバ |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2007249343A (ja) | 2006-03-14 | 2007-09-27 | Nec Corp | 障害監視装置、クラスタシステム及び障害監視方法 |
-
2012
- 2012-11-02 JP JP2014544176A patent/JP5955977B2/ja not_active Expired - Fee Related
- 2012-11-02 US US14/426,279 patent/US9542249B2/en active Active
- 2012-11-02 WO PCT/JP2012/078453 patent/WO2014068764A1/ja active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH05303509A (ja) * | 1992-04-27 | 1993-11-16 | Fujitsu Ltd | 予備機制御システム |
JP2007066216A (ja) * | 2005-09-02 | 2007-03-15 | Hitachi Ltd | ブート構成変更方法 |
JP2009140194A (ja) * | 2007-12-06 | 2009-06-25 | Hitachi Ltd | 障害回復環境の設定方法 |
JP2011086316A (ja) * | 2011-01-31 | 2011-04-28 | Hitachi Ltd | 引継方法、計算機システム及び管理サーバ |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2016046951A1 (ja) * | 2014-09-26 | 2016-03-31 | 株式会社日立製作所 | 計算機システム及びそのファイル管理方法 |
WO2016162916A1 (ja) * | 2015-04-06 | 2016-10-13 | 株式会社日立製作所 | 管理計算機およびリソース管理方法 |
JPWO2016162916A1 (ja) * | 2015-04-06 | 2017-12-07 | 株式会社日立製作所 | 管理計算機およびリソース管理方法 |
US10592268B2 (en) | 2015-04-06 | 2020-03-17 | Hitachi, Ltd. | Management computer and resource management method configured to combine server resources and storage resources and allocate the combined resources to virtual machines |
WO2024105891A1 (ja) * | 2022-11-18 | 2024-05-23 | 日本電信電話株式会社 | 検査装置、検査方法、および検査プログラム |
Also Published As
Publication number | Publication date |
---|---|
JPWO2014068764A1 (ja) | 2016-09-08 |
US20150205650A1 (en) | 2015-07-23 |
US9542249B2 (en) | 2017-01-10 |
JP5955977B2 (ja) | 2016-07-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP5955977B2 (ja) | システム冗長化確認方法及び計算機システム | |
US11868323B2 (en) | Orchestrated disaster recovery | |
US11079966B2 (en) | Enhanced soft fence of devices | |
US8706859B2 (en) | Method and apparatus of data center file system | |
US9336103B1 (en) | Using a network bubble across multiple hosts on a disaster recovery site for fire drill testing of a multi-tiered application | |
WO2013157072A1 (ja) | 計算機システム、リソース管理方法及び管理計算機 | |
US20150234907A1 (en) | Test environment management apparatus and test environment construction method | |
JP2014044553A (ja) | プログラム、情報処理装置および情報処理システム | |
WO2013171865A1 (ja) | 管理方法及び管理システム | |
JP5782563B2 (ja) | 情報取得方法、計算機システム及び管理計算機 | |
WO2015097738A1 (ja) | ストレージシステム及び管理計算機 | |
US9400761B2 (en) | Management method for computer system, computer system, and non-transitory computer-readable storage medium | |
JP5439435B2 (ja) | 計算機システムおよびその計算機システムにおけるディスク共有方法 | |
KR101436101B1 (ko) | 사용자 단말의 저장 장치를 대체하는 서비스를 제공하는 서버 장치 및 그 방법 | |
JP6516875B2 (ja) | 統合プラットフォーム、サーバ、及び、フェイルオーバ方法 | |
JP5131336B2 (ja) | ブート構成変更方法 | |
US9652340B2 (en) | Computer switching method, computer system, and management computer | |
JP5750169B2 (ja) | 計算機システム、プログラム連携方法、及びプログラム | |
WO2015107676A1 (ja) | 計算機システムの管理システム及び管理方法 | |
WO2014184944A1 (ja) | 計算機システムの評価方法、計算機システムの制御方法及び計算機システム | |
KR101849708B1 (ko) | 사용자 단말의 저장 장치를 대체하는 서비스를 제공하는 서버 장치 및 그 방법 | |
JP5423855B2 (ja) | ブート構成変更方法 | |
US20240036988A1 (en) | Disaster recovery pipeline for block storage and dependent applications | |
JP4877368B2 (ja) | ディスク引き継ぎによるフェイルオーバ方法 | |
WO2016056050A1 (ja) | 計算機システム及びそれの管理システム |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 12887698 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 14426279 Country of ref document: US |
|
ENP | Entry into the national phase |
Ref document number: 2014544176 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 12887698 Country of ref document: EP Kind code of ref document: A1 |