WO2011074284A1

WO2011074284A1 - Migration method for virtual machine, virtual machine system, and storage medium containing program

Info

Publication number: WO2011074284A1
Application number: PCT/JP2010/063273
Authority: WO
Inventors: 齋藤　浩; 高本　良史; 正芳北村
Original assignee: 株式会社日立製作所
Priority date: 2009-12-18
Filing date: 2010-08-05
Publication date: 2011-06-23
Also published as: JP2011128967A

Abstract

Disclosed is a virtual machine system which has a plurality of physical computers having virtualization units which configure a plurality of virtual machines, and a management computer which is connected via a network. The management computer allocates the computer resources of a first physical computer from among the plurality of physical computers and operates the virtual machines. Information about the computer resources allocated to the virtual machines is retained as virtual machine definition information, and the computer resources of the first physical computer actually used by the virtual machines are acquired as an actual usage amount. When predetermined conditions arise, a physical computer which can maintain at least the same amount of computer resources as the actual usage amount is selected from among the plurality of physical computers as a second physical computer to be used as a migration destination for the virtual machines. Then the definition information is updated to the actual usage amount and the virtual machines are migrated to the selected second physical computer.

Description

Virtual computer migration method, virtual computer system, and storage medium storing program

The present invention relates to a method and system for improving the reliability of a virtual server.

As a result of the increase in the number of servers in corporate computer systems and data centers, operation management costs are also increasing. One method for solving this problem is server virtualization technology. The server virtualization technology is a technology capable of operating a plurality of virtual servers on a single physical server. Physical servers have resources such as processors and memory. Server virtualization technology divides physical resources and assigns each of the divided resources to different virtual servers on a single physical server. Run multiple virtual servers at the same time. The need for server virtualization technology has increased due to improved processor performance and lower costs for resources such as memory.

On the other hand, the need for high system reliability is also increasing. This is because the damage caused by the suspension of the system has increased due to the increased dependency of computers in enterprise systems. As a technology for improving the reliability of the system, there is generally a technology in which a standby server is prepared separately from the active server, and when a failure occurs in the active server, the standby server is replaced. .

From the flow of two needs of server virtualization and high reliability, it is natural that a need for a technology for maintaining high reliability in a virtualized server environment is generated (for example, Patent Document 1). ). However, the two technologies have opposite characteristics. For example, when a plurality of virtual servers are constructed on a physical server, when a failure occurs in the physical server, all virtual servers operating on the physical server are stopped at once. If the system is built with multiple independent servers, the failure scope of a single physical server is small, but the virtualization technology that can consolidate multiple servers on a single physical server is affected by the failure. The range becomes large. Therefore, the reliability tends to decrease in a virtual environment.

JP 2001-216171 A

The problem to be solved by the present invention is to achieve high reliability at low cost even in a virtualized server environment. In particular, it is to reduce the number of alternate (standby) servers for high reliability.

In order for a server to fail and to switch to a replacement server, it is necessary to know exactly which virtual server was running. Unlike a physical server, a virtual server can be increased or decreased relatively easily if there are sufficient resources such as a processor and memory of the physical server. However, the above-described conventional example has a problem that a failure cannot be recovered when resources are not sufficient. For example, when a serious failure occurs in a virtual server to which a resource with a CPU of 3 GHz and a memory of 4 GB is allocated, there must be a physical server having an equivalent resource.

Therefore, the present invention has been made in view of the above problems, and enables a virtual server to take over even if the takeover destination physical server has fewer resources than the takeover source physical server. The purpose is to do.

A typical example of the invention disclosed in this specification is as follows. That is, a virtual computer system including a plurality of physical computers having a virtualization unit that includes a processor and a memory to construct a plurality of virtual computers, and a management computer that includes the processor and the memory and is connected to the physical computer via a network. A virtual computer operating step in which the management computer allocates a computer resource of a first physical computer among the plurality of physical computers to the virtual computer and operates the virtual computer; A holding step in which the management computer holds information of a computer resource allocated to the virtual computer as definition information of the virtual computer; and the management computer is actually used by the first physical computer Resource usage acquisition step to acquire computer resources as actual usage, and previous A determination step of determining whether or not a predetermined condition is satisfied; and the management computer, when the predetermined condition is satisfied, of the plurality of physical computers as the migration destination of the virtual computer. A selection step of selecting, as the second physical computer, a physical computer capable of securing a computer resource equal to or greater than the usage amount; and the management computer updates the definition information to the actual usage amount and selects the virtual computer Moving to the second physical computer.

Therefore, according to the representative embodiment of the present invention, it is possible to move a virtual machine or recover from a failure with fewer computer resources than the physical computer before the movement.

It is a block diagram of the virtual machine system of the embodiment of this invention. It is a block diagram which shows the structure of the management server of embodiment of this invention. It is a block diagram which shows the detailed structure of the physical server in which the server virtualization mechanism used as the management object of the management server of embodiment of this invention is operating. It is a block diagram which shows the operation | movement outline | summary of the virtual computer system of embodiment of this invention. It is explanatory drawing which shows the detail of the server management table of embodiment of this invention. It is explanatory drawing which shows the detail of the virtual server management table 107 of embodiment of this invention. It is explanatory drawing which shows the structure of the virtual server definition information table 118 of embodiment of this invention. It is explanatory drawing which shows the content of the definition information of the virtual server of embodiment of this invention. It is a flowchart which shows an example of the process performed in the failure recovery part of embodiment of this invention. It is a flowchart which shows an example of the process performed in the failure sign management part of embodiment of this invention. It is a flowchart which shows an example of the process performed in the virtual server recovery method selection part of embodiment of this invention. It is a flowchart which shows the outline | summary of the process performed in the virtual server recovery part of embodiment of this invention. It is a flowchart which shows the outline | summary of the process performed in the virtual server definition information management part of embodiment of this invention. It is a flowchart which shows the outline | summary of the process performed in the virtual server return part of embodiment of this invention.

Hereinafter, an embodiment of the present invention will be described with reference to the accompanying drawings.

FIG. 1 is a block diagram of a virtual machine system to which the present invention is applied. The center of control in this embodiment is the management server 101. The management server 101 includes a failure recovery unit 102, a virtual server definition information management unit 103, a failure sign management unit 104, a virtual server recovery method selection unit 105, a virtual server recovery unit 106, a virtual server management table 107, a server management table 108, a virtual server A server definition information table 118 and a virtual server return unit 119 are included.

The management server 101 manages the network switch 120, the physical servers 111-1 to 111-n, the server virtualization mechanism 110, the storage switch 112, the virtual server 109, and the disk array device 116. Here, the server virtualization mechanism 110 has a function of making the physical server 111 appear to the plurality of virtual servers 109, and can integrate the plurality of servers into the single physical server 111. The server virtualization mechanism 110 can be configured by a hypervisor, a VMM (Virtual Machine Manager), or the like. Hereinafter, the physical server 111-1 to 111-n is collectively referred to as the physical server 111.

The disk array device 116 is connected to the physical server 111 via the storage switch 112. Note that the storage switch 112 that connects the disk array device 116 and the physical server 111 constitutes a SAN (Storage Area Network). The network switch 120 that connects the management server 101 and the physical server 111 constitutes the network 207 shown in FIG.

The disk array device 116 has a virtual server image storage disk 114 storing a program executed by the virtual server 109 and a definition information storage disk 115 storing computer resource allocation information of the virtual server 109. In this embodiment, when a failure or a failure sign is detected in any of the physical servers 111, a highly reliable system is configured by moving a virtual server affected by the failure or the failure sign to another physical server.

Here, an outline of functional elements of the management server 101 is as follows.

As described later, the failure recovery unit 102 controls recovery of the physical server 111 and recovery of the virtual server 109 when the occurrence of a failure of the physical server 111 or a failure sign is detected.

The virtual server definition information management unit 103 performs processing for improving reliability by moving the virtual server 109 even when there is not enough free resources in the physical server 111 that is the takeover destination of the virtual server 109.

The failure sign management unit 104 checks the occurrence of a failure or a sign of failure for each physical server 111 managed by the management server 101.

The virtual server recovery method selection unit 105 selects the virtual server 109 based on the detection result of the failure or predictive failure of the physical server 111, the priority of the virtual server 109 affected by the failure, and the availability of computer resources of the physical server 111. Process to select the recovery method.

The virtual server recovery unit 106 executes recovery of the virtual server 109 based on the execution result of the virtual server recovery method selection unit 105.

The virtual server management table 107 stores detailed information regarding the server virtualization mechanism 110. The server management table 108 manages resources of the physical server 111. The virtual server definition information table 118 stores the priority for each virtual server 109, the position of the original (original) allocation information to which the resources of the physical server 111 are allocated, and information regarding the takeover of the virtual server.

The virtual server return unit 119 executes a process for returning the virtual server 109 moved to the standby physical server 111 to the original active physical server 111. In FIG. 1, for example, the physical servers 111-1 and 111-2 can configure the active system, and the physical server 111-n can configure the standby system.

FIG. 2 is a block diagram showing the configuration of the management server 101 in the present invention. The management server 101 includes a memory 202, a processor 203, a FCA (Fibre Channel Adapter) 204, a NIC (Network Interface Card) 205, and a BMC (Baseboard Management Controller) 206.

The processor 203 executes various programs stored in the memory 202. The FCA 204 is connected to an external storage (for example, the disk array device 116). The NIC 205 and the BMC 206 are connected to the network 207. The NIC 205 communicates with other servers via the network 207 mainly in response to requests from various programs on the memory 202. The BMC 206 is used to detect a failure of the management server 101 and communicate with other servers via the network 207. In this embodiment, the NIC 205 and the BMC 206 are connected to the same network 207, but may be connected to different networks. Further, one FCA 204 and one NIC 205 are provided, but a plurality of FCAs 204 and NICs 205 may exist.

On the memory 202, a failure recovery unit 102, a virtual server definition information management unit 103, a failure sign management unit 104, a virtual server recovery method selection unit 105, a virtual server recovery unit 106, a virtual server management table 107, a server management table 108, The virtual server definition information table 118 is stored as a program. Each program stored in the memory 202 is executed by the processor 203. Each of the above programs is stored in a storage (for example, the disk array device 116) as a storage medium, and loaded into the memory 202 as necessary.

FIG. 3 is a block diagram showing a detailed configuration of the physical server 111 on which the server virtualization mechanism 110 to be managed by the management server 101 is operating. The physical server 111 includes a memory 301, a processor 303, an FCA (Fibre Channel Adapter) 304, an NIC (Network Interface Card) 305, and a BMC (Baseboard Management Controller) 306.

The processor 303 executes various programs stored in the memory 301. The FCA 304 is connected to the disk array device 116 via the storage switch 112 shown in FIG. The NIC 305 and the BMC 306 are connected to the network 207. The NIC 305 mainly communicates with other servers in response to requests from various programs on the memory 301. The BMC 306 is used to detect a failure of the physical server 111 and communicate with the management server 101 and other servers via the network 207. In this embodiment, the NIC 305 and the BMC 306 are connected to the same network, but may be connected to different networks. Further, one FCA 304 and one NIC 305 are provided, but a plurality of FCAs 304 and NICs 305 may exist.

A plurality of virtual servers 109-1 to 109-m can be constructed by operating the server virtualization mechanism 110 on the memory 301. The virtual servers 109-1 to 109 -m are collectively referred to as the virtual server 109. Each virtual server 109 can operate an OS (Operating System) 302 independently. When the server virtualization mechanism 110 is executed by the processor 303, a plurality of virtual servers 109 can be constructed on the server virtualization mechanism 110.

The virtual server 109 reads and executes a predetermined virtual server OS image 308 in the virtual server image storage disk 114 set in advance in the disk array device 116, and thereby the independent virtual server 109 is constructed. Is done. Also, the definition information storage disk 115 of the disk array device 116 stores virtual server definition information 309 in which definition information of each virtual server 109 is stored.

The virtual server definition information 309 of the disk array device 116 is shared from a plurality of physical servers 111 via the storage switch 112, and can be set so that it can be referenced from any physical server 111. By providing the virtual server OS image 308 and the virtual server definition information 309 for each of the plurality of virtual servers 109, completely different OSs and applications can be operated on the single physical server 301 for each virtual server 109.

A control I / F (Interface) 311 is an interface for controlling the server virtualization mechanism 110 from the external management server 101 or the like via the network 207. The management server 101 can create or delete the virtual server 109 on the physical server 111 via the control I / F 311. For this reason, a network address is set in the control I / F 311 of the virtualization mechanism 110 of each physical server 111.

FIG. 4 is a block diagram showing an outline of the operation of the virtual computer system of the present invention. The management server 101 is connected to a plurality of physical servers 111 to be managed via a network 207, and can transfer failure information, failure sign information, control information, and the like of each physical server 111.

In the present invention, when the management server 101 detects (1) the occurrence of a failure or a failure sign of the physical server 111, control is performed to move the virtual server 109 affected by the detection of the failure or the failure sign to another physical server 111. At this time, depending on the availability of the computer resources of each physical server 111, there may be a case where the computer resources of the destination physical server 111 are insufficient and cannot be moved. In such a case, in the present invention, (2) the definition information 309 of the virtual server is updated according to the actual operating status and the free computer resource status that are the actual operating status. The updated virtual server definition information 309 makes it possible to recover the virtual server 109 from a failure or a failure sign by using the physical server 111-n having fewer computer resources than the physical server 111-1 before the migration (3).

FIG. 5 is an explanatory diagram showing details of the server management table 108. The server management table 108 stores detailed information regarding the physical server 111. The physical server identifier 501 stores an identifier for specifying the physical server 111. The startup disk 502 indicates the location (for example, path) of the startup disk of the physical server 111. The server identifier 503 indicates a unique identifier (for example, World Wide Name: WWN) that the FCA 304 connected to the disk array device 116 has. The server mode 504 indicates the operating state of the physical server 111 and stores information for determining whether or not the server virtualization mechanism 110 is operating. The processor / memory 505 stores processor information and memory capacity of the physical server 111. The processor information includes the processor clock speed and the number of cores. The network identifier 506 stores information for identifying the NIC 205 that the physical server 111 has. When there are a plurality of NICs 205 in one physical server 111, a plurality of identifiers are stored. The network port 507 stores the port number of the network switch 120 to which the NIC 205 is connected. This is stored for setting the VLAN of the network switch when maintaining the network security of the physical server 111. The disk 508 stores the identifier of the disk in the disk array device 116 of the physical server 111. The disk identifier is, for example, LUN (Logical Unit Number), and LUN 10 is described in a plurality of physical servers 111 (servers 1 to 4 in the figure) in FIG. It can be shared from. The virtualization mechanism identifier 509 stores an identifier that identifies the server virtualization mechanism 110 when the server virtualization mechanism 110 is operating on the physical server 111. This virtualization mechanism identifier 509 is associated with a server virtualization mechanism management table (virtual server management table 107) described later. The server status 510 indicates the status and role of the physical server 111, and stores, for example, information indicating whether it is an active system or a standby system. In the embodiment of the present invention, it is used when performing a process of switching to a standby system when a failure occurs in any of the active systems.

FIG. 6 is an explanatory diagram showing details of the virtual server management table 107.

The virtual server management table 107 stores detailed information regarding the server virtualization mechanism 110 of each physical server 111. The virtualization mechanism identifier 601 stores information for identifying the server virtualization mechanism 110 for each physical server 111 managed by the management server 101. The control I / F address 602 stores a network address serving as access information to the control I / F 311 that controls the server virtualization mechanism 110 from the outside. The virtual server identifier 603 stores a unique identifier for each virtual server 109.

The virtual server OS image 604 stores the OS image used by the virtual server 109 and the location (path) of the virtual server OS image 308. The processor / memory allocation amount 605 indicates a computer resource amount allocated to the virtual server.

The computer resource amount includes, for example, the processor clock speed and the memory capacity allocated to the virtual server 109. The state 606 stores information indicating whether or not the virtual server 109 is currently operating. The processor / memory actual usage 607 stores the usage rate and memory capacity of the processor actually used by the virtual server. The processor / memory actual usage 607 includes means for collecting actual usage (performance information) for resources periodically allocated from, for example, the OS running on the server virtualization mechanism 110 or the virtual server 109. Can be acquired. As the processor / memory actual usage 607, a method of storing an average usage (or usage rate) per unit time can be considered. In the illustrated example, the average usage rate of the processor 303 is represented by “GHz” in terms of the clock speed, and the average usage amount of the memory 301 is represented by “GB”. The average usage rate of the processor 303 is an actual usage rate for the resource (virtual processor) of the processor 303 allocated to the virtual server 109.

The network assignment 608 stores a NIC identifier of the virtual server and assignment information between the NIC 305 of the physical server 111 corresponding to the NIC identifier. The disk 609 stores the location of the virtual server OS image 308 assigned to the virtual server 109 and the image file for data storage.

FIG. 7 shows the configuration of the virtual server definition information table 118. The virtual server identifier 701 stores a unique identifier for each virtual server 109. The virtual server priority 702 stores the importance of the virtual server 109 as a numerical value. The priority is “1” having the highest importance, and the importance decreases as the numerical value increases. The numerical value of the virtual server priority 702 is input according to the priority of a task executed on the virtual server when the administrator generates a virtual server from the management server 101 or the like.

The original definition information 703 stores the location where the definition information of the original of the virtual server 109 (a state initially assigned to the physical server 111) is stored. The present invention is characterized in that the reliability is improved by moving a virtual server to another physical server 111 by detecting the occurrence of a failure or a failure sign, but the definition information depends on the situation of the destination computer resource at that time. There is a case to change. However, the definition information of the original virtual server 109 is required when returning the virtual server 109 to the original physical server 111 after removing the cause of the occurrence of the failure or the failure sign. As a result, even if the definition information of the virtual server 109 is updated at the time of migration, the original definition information 703 can be referred to and the configuration of the original virtual server 109 can be always restored.

The moving definition information 704 stores definition information of the virtual server 109 that is updated when the virtual server 109 is moved. The movement date and time 705 stores the date and time when the virtual server 109 was moved. The movement definition information 704 and the movement date / time 705 may store a history. By leaving a history of changes in allocated resources that occur during migration, the location of the physical server 111 where the failure has occurred can be easily identified. This facilitates failure analysis. Further, the movement definition information 704 and the movement date and time 705 may be added each time the virtual server 109 moves when the virtual server 109 has moved a plurality of times. For example, after detecting a failure sign and moving the virtual server 109, it is possible to detect the failure sign again at the destination and move the virtual server 109 further. In such a case, the virtual server 109 moves a plurality of times, but the movement definition information 704 and the movement date / time 705 are added each time the virtual server 109 moves. After that, when the virtual server 109 is restored to the original physical server 111, such as when the physical server 111 that has detected the failure sign is repaired, the migration order may be restored in an order that goes back to the original order. Instead, the virtual server 109 may be moved based on the original definition information 703. Where to return can be flexibly dealt with in accordance with the purpose of moving the virtual server 109, such as repairing the physical server 111 or moving to the physical server 111 with higher performance.

As described above, having the movement history of the virtual server 109 a plurality of times can increase the options for returning to the physical server 111 and moving, so that the virtual server 109 can be operated more flexibly. Become.

FIG. 8 is an explanatory diagram showing the contents of the virtual server definition information 309. The virtual server name 801 stores the name of the virtual server 109. The allocation resource 802 suggests the processor allocation amount, the memory allocation amount, the network allocation information, the location where the virtual server OS image 308 is stored, and generates the virtual server 109 such as the location where the data disk image is stored. Information for storing is stored.

The priority 803 stores the same content as the virtual server priority 702 stored in the virtual server definition information table 118. In the movement history 804, movement history information of the virtual server 109 is stored. The movement history 804 can be used as information for determining a movement destination when the virtual server 109 moves. For example, when the movement histories 804 of a plurality of definition information are aggregated and the movement frequency is high due to a failure sign for a specific virtual server 109, it means that there is a high risk of lowering the service level of the virtual server 109. Therefore, the movement history 804 can also be used as analysis information when the physical server 111 with higher reliability is selected and the virtual server 109 is moved.

Definition information 805 indicates the storage location of the original definition information of the virtual server 109. In addition to the description of the original definition information, when the virtual server 109 has moved a plurality of times, the definition information used at the destination may be stored as a history each time the virtual server 109 moves.

FIG. 9 is a flowchart showing an outline of processing performed in the failure recovery unit 102. This flowchart is executed by the management server 101 at a predetermined cycle.

In step 901, the failure recovery unit 102 calls the failure sign management unit 104. The failure sign management unit 104 checks for occurrence of a failure or a failure sign for each physical server 111 that is the management target of the management server 101, as will be described later. That is, the failure sign management unit 104 inquires the BMC 203 and the failure sign detection unit 310 of each physical server 111 about the operation information, and acquires the operation information for each server 111.

In step 902, as a result of step 901, the failure sign management unit 104 determines the occurrence of a failure or the presence or absence of a failure sign for each operation information of each server 111. When the failure sign management unit 104 detects the occurrence of a failure or a failure sign, the failure sign management unit 104 identifies the physical server 111 and notifies the failure recovery unit 102, and the process proceeds to step 903. On the other hand, if there is no failure or no failure sign, the process is terminated.

In step 903, the failure recovery unit 102 determines whether or not the virtualization mechanism is being executed on the physical server 111 in which a failure or a failure sign is detected. If the server virtualization mechanism 110 is being executed on the target physical server 111, the failure recovery unit 102 calls the virtual server 109 recovery method selection unit 105 in step 904.

In step 904, the virtual server recovery method selection unit 105 recovers the virtual server 109 based on the detection result of the failure or the failure sign, the priority of the affected virtual server 109, and the availability of the computer resources of the physical server 111. Process to select the method. The processing of this recovery method will be described later.

In step 906, the failure recovery unit 102 calls the virtual server recovery unit 106. The virtual server recovery unit 106 executes recovery of the virtual server 109 based on the execution result of the virtual server recovery method selection unit 105 as described later.

In step 907, the result of the recovery process is notified to the administrator. This notification is performed by displaying the result on a display device (not shown) of the management server 101.

In step 903, if there is a failure or sign of failure from the physical server 111 in which the server virtualization mechanism 110 is not executed, the physical server 111 is recovered in step 905. The recovery of the physical server 111 that is not executing the server virtualization mechanism 110 is the same as that in the conventional example, and thus will not be described in detail in this embodiment.

Through the above processing, when the failure recovery unit 102 detects the occurrence of a failure or a failure sign in the physical server 111 executing the server virtualization mechanism 110, the amount of computer resources actually used by the virtual server 109 Alternatively, the allocation is performed again in accordance with the ratio, the computer resources less than those when the computer resources are initially allocated are reset, and the virtual server 109 is taken over by the standby physical server 111.

As a result, even when the computer resources of the standby physical server 111 are less than those of the active physical server 111, a plurality of virtual servers 109 can be reliably transferred to the standby physical server 111. Become. Therefore, it is possible to configure a standby system with physical servers 111 that have fewer computer resources than the active system, and it is possible to reduce the introduction and operation costs of a virtual computer system including a plurality of physical servers 111.

FIG. 10 is a flowchart illustrating an example of processing performed by the failure sign management unit 104. This processing corresponds to the processing in step 901 in FIG.

The failure sign management unit 104 performs processing to check whether a failure or a sign of failure has occurred in the physical server 111 to be managed. In step 1001, the failure sign management unit 104 selects a target physical server 111. In step 1002, the failure sign management unit 104 accesses the BMC 206 of the target physical server 111, and checks whether a hardware failure or a failure sign has occurred.

Here, the BMC 206 of each physical server 111 can monitor the hardware state. For example, processor 303, memory 301, temperature, fan, power supply status monitoring, and the like. A failure or a failure sign is, for example, detection of a failure sign when the BMC 206 detects some failure from the processor 303 and the failure is resolved by a number of retries, and is detected as a failure when the retry does not recover. To do. The same applies to other parts. In addition, when the temperature of the processor 303, the memory 301, or the chip set (not shown) is exceeded, a predetermined threshold value is exceeded. If the temperature remains above the threshold, it may be detected as a failure. Further, the importance level can be set in the failure predictor according to the influence level when the failure is reached. For example, the failure sign of the processor 303 is highly important because it is highly likely that the failure of the processor 303 will substantially stop the entire system when the processor 303 is stopped, but the failure sign of the memory 301 is within a range in which a bit error can be corrected. The importance is low. Further, even when a failure sign of the same processor 303 is installed, the importance may be lowered when a plurality of processors 303 are installed. In this way, it is possible to control to increase the options of the recovery means by providing the importance to the failure sign.

Next, in Step 1003, it is determined whether or not a hardware failure or failure sign is detected. If a failure or failure sign is detected, the process proceeds to Step 1006 and the detection result is reported to the failure recovery unit 102. To do.

On the other hand, if a hardware failure or failure sign is not detected, the failure sign detection unit 310 is called in step 1004 to detect a software level failure or failure sign that cannot be detected at the hardware level. The detection result by the failure sign detection unit 310 is reported to the failure sign management unit 104 in step 1006.

In step 1007, the failure sign management unit 104 determines whether or not all the management target physical servers 111 have been inspected. If there is an uninspected physical server 111, the processing is repeated from step 1001.

Through the above processing, it is possible to detect a wide range of failures and signs of failure in the physical server 111 from hardware to software in

steps

1002 and 1004.

FIG. 11 is a flowchart illustrating an example of processing performed by the virtual server recovery method selection unit 105. This process corresponds to step 904 in FIG.

The virtual server recovery method selection unit 105 determines how to recover the virtual server 109 when a failure or failure sign is detected in the physical server 111. First, in step 1101, a failure or a failure sign is analyzed. For example, if the failure predictor of the processor 303 is analyzed, the core number of the processor 303 that detected the failure predictor and the importance of the predictor are analyzed.

In step 1102, the virtual server recovery method selection unit 105 investigates the influence range of the failure and specifies the affected virtual server 109. The virtual server 109 affected by the failure can be specified by using the server management table 108 and the virtual server management table 107. For example, if a failure sign of the processor 303 is detected in the physical server 111-1, it is determined from the virtual server management table 107 in FIG. 6 that the virtual server identifiers 603 = “VM1” and “VM3” are affected by the sign of failure. it can.

In step 1103, the virtual server recovery method selection unit 105 determines whether or not there is a high possibility that the virtual server 109 is stopped due to the detected failure or failure sign. Note that there is a high possibility that the virtual server 109 will stop, for example, when the temperature of the processor 303 of the physical server 111 exceeds a predetermined value, or when the cooling fan of the processor 303 stops, In other words, the physical server 111 is currently operating but is expected to stop in the future.

If the possibility of stopping is high, in step 1105, the virtual server recovery method selection unit 105 checks the priority of the virtual server 109 affected by the failure. This priority can be acquired by searching the virtual server priority 702 in the virtual server definition information table 118.

If the priority of the virtual server 109 is high, that is, if the virtual server 109 has a large impact on the system or business, the virtual server recovery method selection unit 105 in step 1105 causes the other physical server 111 to be free. The resource is searched, and the migration destination candidate of the virtual server 109 that is likely to be stopped is determined. If there are sufficient free resources in the migration destination physical server 111, the virtual server recovery method selection unit 105 determines the recovery method of the virtual server 109 as the virtual server migration method in step 1112.

On the other hand, if the physical server 111 having sufficient free resources at the destination is not found, the virtual server recovery method selection unit 105 determines the definition change method in step 1113. If there is a low possibility that the virtual server 109 will be stopped, or if the virtual server 109 with a low priority is affected, the status is reported to the administrator at step 1110 or step 1111. These reports notify the virtual server 109 that receives a failure occurrence or a failure sign to a display device (not shown) of the management server 101. By these processes, it is possible to widen the recovery range of the failure of the virtual server 109, and it is possible to recover the failure of the virtual server 109 with higher reliability than before.

Through the above processing, for the virtual server 109 that receives a failure occurrence or a failure sign of the physical server 111, either the virtual server migration or the definition change processing is selected in Step 1112 or Step 1113. This selected process will be described in detail in the process of the virtual server recovery unit 106 described later.

FIG. 12 is a flowchart showing an outline of processing performed by the virtual server recovery unit 106. This process corresponds to the process in step 906 of FIG.

The virtual server recovery unit 106 executes the recovery process of the virtual server 109 according to the recovery method (virtual server migration or definition change) determined by the virtual server recovery method selection unit 105.

In step 1201, the virtual server recovery unit 106 determines whether or not the recovery method determined by the virtual server recovery method selection unit 105 is a definition change method. If the determined recovery method is not the definition change method, that is, if the recovery method has sufficient free resources to move, the virtual server recovery unit 106 extracts necessary resources from the definition information of the virtual server 109 in step 1202. This can be realized by referring to the original definition information 703 in the virtual server definition information table 118 and referring to the contents of the definition file. In other words, it indicates that the computer resource is allocated to the physical server 111 with the occurrence of a failure or a sign of failure, and is moved to a new physical server 111.

In step 1203, the virtual server recovery unit 106 searches the server management table 108 for a physical server 111 having a free resource equivalent to or higher than that of the physical server 111 where a failure has occurred or is predictive of failure.

In step 1204, the virtual server recovery unit 106 allocates the computer resource of the original definition information 703 to the searched physical server 111 and moves the virtual server 109.

On the other hand, when the definition change method is selected as the recovery method of the virtual server 109, the virtual server recovery unit 106 calls the virtual server definition information management unit 103 in step 1207.

The called virtual server definition information management unit 103 updates the virtual server definition information 309 by reducing the allocated amount of computer resources of the migration target virtual server 109, as will be described later.

Step 1208 moves the virtual server 109 using the changed virtual server definition information 309 used by the virtual server definition information management unit 103 at the destination. The movement of the virtual server 109 is to move the virtual server 109 onto the destination physical server 111. When moving the virtual server 109 while maintaining the operation state, a resource for executing the virtual server 109 is temporarily secured at both the movement source and the movement destination. Processing to secure memory 301, CPU 303, and I / O (FCA 304, NIC 305) as the migration destination, and then copy the memory information and I / O status of the migration source virtual server 109 to the migration destination virtual server 109 I do. There may be fewer computer resources to move to. For example, when the allocated amount of the processor 303 is small, the processing performance of the virtual server 109 decreases, but no particular processing is required for migration. On the other hand, when the migration destination physical server 111 has a small memory capacity, several methods are conceivable. One is a method of making the OS 302 appear as if it has the same capacity as the migration source. When copying the memory 301, the unused area that does not affect the operation of the OS 302 and the cache information of the OS 302 and the application are not copied, and a program such as a driver running on the OS 302 appears to have secured the insufficient memory area. Can be realized. Note that the above processing is not necessary in the case of moving while operating and the case where the virtual server 109 is temporarily shut down and started at the destination.

The virtual server definition information management unit 103 performs a process for improving reliability by moving the virtual server 109 even when there are not enough free resources in the physical server 111. Specifically, in order to move the virtual server 109 as much as possible, that is, to enable protection of the virtual server 109, the definition information is changed based on free resources, priorities, and actual usage of computer resources to enable the movement. Process.

FIG. 13 is a flowchart showing an outline of processing performed by the virtual server definition information management unit 103. This process corresponds to the process of step 1207 in FIG.

In Step 1301, the virtual server definition information management unit 103 searches the virtual server 109 for the migration target virtual server 109 from the virtual server management table 107 for the actual actual usage of computer resources. The virtual server definition information management unit 103 refers to the processor / memory actual usage 607 stored in the virtual server management table 107 to obtain the actual usage of computer resources.

In Step 1302, the virtual server definition information management unit 103 searches the virtual server management table 107, selects, for example, the physical server 111 having the largest unused resource, and acquires unused resource information.

In step 1303, the virtual server definition information management unit 103 compares the unused resource information acquired in step 1302 with the actual usage of the computer resource acquired in step 1301. It is judged whether it is larger than. This is because when only a part of the computer resources allocated to the virtual server 109 is operating, the physical server 111 having the minimum resources necessary to maintain the service level such as the current performance is selected. Is meant to do. When the unused resource information is larger than the actual usage amount and the physical server 111 holding the minimum computer resource capable of executing the migration target virtual server 109 is found, the process proceeds to step 1307 and the virtual server definition is performed. The information management unit 103 copies the virtual server definition information 309.

In step 1308, the virtual server definition information management unit 103 applies the virtual server definition information 309 copied in step 1307 to the virtual server definition information 309 and information on unused resources of the physical server 111 searched in step 1302. To change. As a result, the virtual server definition information 309 to be moved can execute the virtual server 109 with fewer computer resources than the original virtual server definition information 309.

In step 1309, the virtual server definition information management unit 103 reflects the virtual server definition information 309 changed in step 1308 in the virtual server definition information table 118. That is, the virtual server definition information management unit 103 stores the virtual server definition information 309 that has been copied and changed to unused resource information in the definition information storage disk 115 of the disk array device 116. Then, the virtual server definition information management unit 103 adds the storage location (path) of the virtual server definition information 309 that has been copied and changed to the movement definition information 704 in the virtual server definition information table 108, and also displays the current date and time. Stored in the movement date 705. Thereby, a movement history of the virtual server 109 to be moved is generated.

On the other hand, if it is determined in step 1303 that no unused resource larger than the actual usage amount is found, the process proceeds to step 1304.

In step 1304, when the virtual server 109 having the minimum free resource does not exist, the virtual server definition information management unit 103 selects, for example, the physical server 111 having the most unused resources.

In step 1305, the virtual server definition information management unit 103 searches the virtual server definition information table 118 and selects a virtual server 109 whose priority (= virtual server priority 702) is lower than the virtual server 109 to be moved. To do. At this time, the virtual server definition information management unit 103 selects a virtual server 109 having a lower priority than the migration target virtual server 109 as the migration destination physical server 111.

In step 1306, the allocated resources of the virtual server 109 selected in step 1305 are stripped and accommodated to the virtual server 109 having a high priority. That is, the virtual server definition information management unit 103 reduces the allocated resources of the virtual server 109 having a lower priority than the virtual server 109 to be moved, and adds the reduced allocated resources to the unused resources of the physical server 111. Then, the virtual server definition information management unit 103 allocates unused resources, which have been increased by the amount of computer resources deprived from the virtual server 109 with low priority, to the virtual server 109 to be moved. Then, the process proceeds to step 1307 described above. Note that the amount of computer resources to be deprived from the virtual server 109 with low priority refers to the virtual server management table 107 and the processor / memory allocation 605 to the processor / memory actual use amount 607 of the virtual server 109 to be deprived. The value obtained by subtracting. The virtual server definition information management unit 103 updates the virtual server definition information 309 of the virtual server 109 to be stripped with the reduced computer resources.

In addition, the virtual server definition information management unit 103, when the computer resource stripped from one virtual server 109 cannot secure a computer resource that can execute the migration target virtual server 109, is more than the migration target virtual server 109. Computer resources are stripped from a plurality of low priority virtual servers 109.

The above processing makes it possible to recover the virtual server 109 affected by the detection of the state of a free resource, a failure, or a failure sign in a stepwise manner. In other words, the computer resources allocated to the migration target virtual server 109 are reduced to the computer resources actually used, and the unused resources of the migration destination physical server 111 are allocated to the migration target virtual server 109, thereby minimizing the necessary amount. Computer resources can be secured in the migration destination physical server 111 to guarantee the operation of the virtual server 109.

In step 1303, it is determined whether the unused resource information is larger than the actual usage amount. However, it may be determined whether the unused resource information is equal to or greater than the actual usage amount.

FIG. 14 is a flowchart showing an outline of processing performed by the virtual server return unit 119. This process is executed when an administrator or the like instructs activation from a console (not shown) of the management server 101.

Step 1401 selects the virtual server 109 to which the virtual server return unit 119 returns. For the selection method, the administrator may explicitly select the virtual server 109 that the administrator wants to return to, or the standby system is triggered when an event that the physical server 111 is restored to a normal state by replacement of the physical server 111 or the like is received. The virtual server 109 moved to the physical server 111 may be automatically selected.

In step 1402, the virtual server return unit 119 determines whether or not the virtual server 109 selected in step 1401 is moving. Determination of whether or not the migration is in progress can be made by determining whether or not the definition information is recorded in the migration definition information 704 of the virtual server definition information table 118. If the virtual server 109 is moving, the process proceeds to step 1403. If the virtual server 109 is in the original state, the process proceeds to step 1403.

In step 1403, the virtual server return unit 119 extracts the movement definition information 704 of the virtual server 109 selected in step 1401 from the virtual server definition information table 118.

In step 1404, when the extracted movement definition information 704 has been moved a plurality of times, the virtual server return unit 119 selects the physical server 111 that is the movement destination. Whether or not the movement has been performed a plurality of times can be determined by the virtual server return unit 119 based on whether or not a plurality of movement histories are described in the movement definition information 704. Several methods are conceivable for selecting the physical server 111 to be moved to. In the case where the physical server 111 that has detected the failure sign is replaced with a new physical server 111 and repaired, the replaced physical server 111 may be selected. In addition, in a case where none of the physical servers 111 that have been moved a plurality of times have been repaired, a new physical server 111 that is different from the moved physical server 111 may be selected. At this time, it is possible to maintain high reliability by selecting the physical server 111 in which a failure or a failure sign has not occurred as much as possible.

In step 1405, the virtual server return unit 119 acquires the definition information of the virtual server 109 used at the movement destination from the movement definition information 704 in the virtual server definition information table 118. This acquires the movement definition information 704 corresponding to the physical server 111 selected as the movement destination from the movement definition information 704 extracted in step 1403. If moving to a physical server 111 different from the physical server 111 that has moved multiple times, the original virtual server definition information may be used. The original (initially assigned) virtual server definition information 309 can be acquired by referring to the virtual server definition information 309 described in the original definition information 703 of the virtual server 109 in the virtual server definition information table 118.

In step 1406, the virtual server return unit 119 searches for a free resource of the physical server 111 that is the movement destination. In general, it is desirable to secure computer resources according to the definition information 309 of the same virtual server 109 that was originally operating. However, in an operation in which the movement of the virtual server 109 is repeated, the same physical properties as before the movement of the virtual server 109 are necessarily used. There is a possibility that computer resources of the server 111 may not be secured, and this is a necessary step for confirmation.

In step 1407, it is determined whether or not the free resource acquired by the virtual server return unit 119 in step 1406 satisfies the contents of the virtual server definition information 309 of the virtual server 109 acquired in step 1405. If the free resource cannot satisfy the contents of the virtual server definition information 309 of the virtual server 109 acquired in step 1405, the administrator is notified that the resource is insufficient in step 1409. As another means, the processing of the virtual server definition information management unit 103 may be executed, the virtual server definition information 309 of the virtual server 109 may be changed within a range that satisfies the performance, and the movement may be continued.

Step 1408 moves the virtual server 109 using the virtual server definition information 309 of the virtual server 109 used at the destination.

By the above processing, the virtual server 109 can be restored to the computer resource before the movement without degrading the service level such as performance. In particular, when a failure or a sign of failure occurs in the active physical server 111 and the virtual server 109 is moved to the standby physical server 111 and then the active physical server 111 is restored, the original definition information 703 is referred to. Thus, the virtual server 109 can be returned to the active physical server 111 very easily.

In the above-described embodiment, the method of moving the virtual server 109 upon detection of a failure or a failure sign has been described, but the present invention can also be applied to a trigger other than a failure or a failure sign. For example, in an environment where a plurality of physical servers 111 are operating, when the load on the processor 303 is shifted to a specific physical server 111, the virtual server 109 is used to distribute the load by using the present invention. May be. That is, when the load (for example, the processor usage rate) of the physical server 111 exceeds a predetermined threshold, the virtual server 109 can be moved to the physical server 111 whose load is less than the predetermined threshold among other physical servers 111. .

Further, for the purpose of reducing the power consumption of the physical server 111, the virtual server 109 having a low load (less than a predetermined threshold) is moved and aggregated on the specific physical server 111 using the present invention. By cutting off the power supply of the physical server 111 that is not operating, the power consumption of the entire system can be reduced.

As described above, according to the present invention, not only the movement of the virtual server 109 triggered by a failure or a failure sign but also a predetermined condition such as a load, power, and other indexes is established in an environment where a plurality of physical servers 111 operate. Sometimes, the physical server 111 can be used to move the virtual server 109 for the purpose of load distribution and power consumption optimization.

Although the present invention has been described in detail with reference to the accompanying drawings, the present invention is not limited to such specific configurations, and various modifications and equivalents within the spirit of the appended claims Includes configuration.

As described above, the present invention can be applied to a virtual computer system that operates a plurality of virtual servers on a plurality of physical servers.

Claims

A virtual computer system having a plurality of physical computers having a virtualization unit for constructing a plurality of virtual computers with a processor and a memory, and a management computer having a processor and a memory and connected to the physical computer by a network. A method of moving a computer,
A virtual computer operating step in which the management computer allocates a computer resource of a first physical computer among the plurality of physical computers to the virtual computer and operates the virtual computer;
A holding step in which the management computer holds computer resource information assigned to the virtual computer as definition information of the virtual computer;
The management computer includes a resource usage acquisition step in which the virtual computer acquires a computer resource actually used by the first physical computer as an actual usage;
A determination step for determining whether or not the predetermined condition is satisfied by the management computer;
The management computer, when the predetermined condition is satisfied, out of the plurality of physical computers as a migration destination of the virtual computer, a physical computer capable of securing a computer resource equal to or greater than the actual usage amount is the second physical computer. A selection step to select as,
The management computer includes a moving step of updating the definition information to the actual usage amount and moving the virtual computer to the selected second physical computer. .
A method of moving a virtual machine according to claim 1,
The management computer further includes detecting that the physical computer has a failure or a failure sign,
In the determination step, the management computer determines that the predetermined condition is satisfied when the failure or the failure sign is detected.
A method of moving a virtual machine according to claim 2,
In the virtual machine operation step, the management computer assigns a plurality of virtual machines to a plurality of physical computers, and holds the relationship between the computer resources assigned to each virtual machine and the physical computer in the virtual machine management information,
The selection step includes
The management computer searches the virtual computer management information to identify a virtual computer affected by the failure or a failure sign from the virtual computer management information; and
The management computer selecting a physical computer capable of securing a computer resource equal to or greater than the actual usage of the identified virtual computer as the second physical computer to which the identified virtual computer is moved; and A method for moving a virtual machine, comprising:
A method of moving a virtual machine according to claim 1,
In the migration step, the management computer duplicates the definition information, changes the duplicated definition information so as to correspond to the actual usage, and changes the virtual computer based on the changed definition information. A method of moving a virtual machine, wherein the virtual machine is moved to a second physical computer.
A method of moving a virtual machine according to claim 1,
In the virtual machine operation step, the management computer assigns a plurality of virtual machines to a plurality of physical computers, and the priority between the computer resources assigned to each virtual machine and the physical computer is set for each virtual machine. Degree is stored in the virtual machine management information,
The selection step includes
When there is no physical computer that can secure a computer resource equal to or greater than the actual usage, the management computer searches the virtual computer management information by the management computer, thereby giving priority to the virtual computer. Selecting a physical computer that operates a low-virtual virtual machine as a second physical computer to be moved;
The management computer includes a step of depriving a computer resource corresponding to the actual usage amount from a virtual computer having a lower priority than the virtual computer.
The virtual computer migration method according to claim 5,
The management computer further comprising detecting that the physical computer has a failure or a sign of failure;
The determination step determines that the predetermined condition is satisfied when the failure or a failure sign is detected,
When it is determined that the predetermined condition is satisfied, the selection step includes:
When the operating state of the physical computer that operates the virtual computer is in a predetermined state and there is no physical computer that can secure a computer resource equal to or greater than the actual usage, the management computer Searching for management information and selecting a physical computer that operates a virtual machine having a lower priority than the virtual machine as a second physical computer to be moved;
Removing a computer resource corresponding to the actual usage amount from a virtual computer having a lower priority than the virtual computer.
A method of moving a virtual machine according to claim 1,
The management computer further includes a return step of returning to the first physical computer each virtual computer that has moved to the second physical computer,
In the virtual machine operation step, the management computer holds the information of the computer resource initially allocated to the virtual machine as original definition information of the virtual machine,
In the migration step, the management computer duplicates the definition information, changes the duplicated definition information so as to correspond to the actual usage amount, and changes the virtual computer based on the changed definition information. Move to the second physical computer, hold the changed definition information as a movement history,
In the returning step, the management computer moves the virtual computer to the first physical computer with the original definition information of the virtual computer.
A plurality of physical computers having a virtualization unit that includes a processor and a memory to construct a plurality of virtual computers;
A virtual machine system comprising a management computer comprising a processor and a memory and connected to the physical computer via a network,
The management computer is
In order to allocate the computer resource of the first physical computer among the plurality of physical computers to the virtual computer and operate the virtual computer, information on the computer resource allocated to the virtual computer is held as definition information of the virtual computer Virtual machine definition information to be
Virtual computer management information that holds a relationship between the actual usage of the computer resource that the virtual computer actually used in the first physical computer and the computer resource allocated to the virtual computer;
When a predetermined condition is satisfied, a physical computer capable of securing a computer resource equal to or greater than the actual usage amount is selected as the second physical computer among the plurality of physical computers as the migration destination of the virtual computer, and the definition A virtual machine system comprising: a recovery unit that updates information to the actual usage amount and moves the virtual machine to the selected second physical computer.
The virtual machine system according to claim 8,
The management computer is
A failure sign management unit for detecting that the physical computer has a failure or a failure sign,
The virtual machine system, wherein the recovery unit determines that the predetermined condition is satisfied when the failure or a failure sign is detected.
The virtual machine system according to claim 9, wherein
The virtual machine management information is
In order to assign multiple virtual machines to multiple physical computers, respectively, maintain the relationship between the assigned computer resources and physical computers,
The recovery unit is
By searching for the virtual machine management information, the virtual machine affected by the failure or the failure predictor is identified from the virtual machine management information,
A virtual computer system, wherein a physical computer capable of securing a computer resource equal to or greater than the actual usage amount of the specified virtual computer is selected as the second physical computer to which the specified virtual computer is moved. .
The virtual machine system according to claim 8,
The recovery unit is
Duplicate definition information corresponding to the virtual machine,
Change the duplicated definition information to correspond to the actual usage,
A virtual computer system, wherein the virtual computer is moved to the second physical computer based on the changed definition information.
The virtual machine system according to claim 8,
The virtual machine management information retains the priority set for each virtual machine with respect to the relationship between the assigned computer resource and the physical machine in order to assign a plurality of virtual machines to a plurality of physical machines, respectively.
The recovery unit is
If there is no physical computer that can secure a computer resource equal to or greater than the actual usage, the management computer searches for the virtual computer management information, and a virtual computer having a lower priority than the virtual computer is selected. Select the physical computer to run as the second physical computer to move to,
A virtual computer system, wherein a computer resource corresponding to the actual usage is deprived from a virtual computer having a lower priority than the virtual computer.
The virtual computer system according to claim 12,
The management computer is
A failure sign management unit for detecting that the physical computer has a failure or a failure sign,
The recovery unit is
When the failure or a failure sign is detected, it is determined that the predetermined condition is satisfied,
When the predetermined condition is satisfied, the operating state of the physical computer that operates the virtual computer becomes a predetermined operating state, and there is no physical computer that can secure a computer resource equal to or greater than the actual usage amount, The management computer searches the virtual computer management information, selects a physical computer that operates a virtual computer having a lower priority than the virtual computer as a second physical computer to be moved,
A virtual computer system, wherein a computer resource corresponding to the actual usage is deprived from a virtual computer having a lower priority than the virtual computer.
The virtual machine system according to claim 8,
The management computer further includes a virtual computer restoration unit that causes the first physical computer to restore each virtual computer that has moved to the second physical computer,
The virtual machine definition information holds information of a computer resource initially assigned to the virtual machine as original definition information of the virtual machine,
The recovery unit is
Duplicate the definition information,
Change the duplicated definition information to correspond to the actual usage,
Moving the virtual machine to the second physical machine based on the changed definition information;
Holding the changed definition information as a movement history;
The virtual computer restoration unit moves the virtual computer to the first physical computer with original definition information of the virtual computer.
In a virtual computer system having a plurality of physical computers having a virtualization unit for constructing a plurality of virtual computers with a processor and a memory, and a management computer having a processor and a memory and connected to the physical computer by a network, A storage medium storing a program for controlling the management computer,
A procedure for allocating a computer resource of a first physical computer among the plurality of physical computers to the virtual computer and operating the virtual computer;
A procedure for retaining information of computer resources allocated to the virtual machine as definition information of the virtual machine;
A procedure in which the virtual computer acquires a computer resource actually used by the first physical computer as an actual usage amount;
A procedure for determining whether or not a predetermined condition is satisfied;
A procedure for selecting, as the second physical computer, a physical computer capable of securing a computer resource equal to or greater than the actual usage amount among the plurality of physical computers as the migration destination of the virtual computer when the predetermined condition is satisfied; ,
A storage medium storing a program for updating the definition information to the actual usage amount and causing the management computer to execute a procedure for moving the virtual computer to the selected second physical computer .