KR20170041557A

KR20170041557A - Apparatus and method for determining failover in virtual system

Info

Publication number: KR20170041557A
Application number: KR1020150141161A
Authority: KR
Inventors: 장동수; 주성완; 한원탁; 유창완
Original assignee: 주식회사 엘지유플러스
Priority date: 2015-10-07
Filing date: 2015-10-07
Publication date: 2017-04-17
Also published as: KR101883251B1

Abstract

The present invention relates to a method for determining failover in a virtual system including a management server, a host server, and a virtual server. The method for determining failover comprises: a first step of determining whether the host server is failed by exchanging a first heartbeat message between the management server and the host server; a second step of determining whether the virtual server is failed by exchanging a second heartbeat message between the virtual server and the management server which operate on the host server if the host server is determined to be failed in the first step; and a step of performing a high availability function depending on the results of the first step and the second step.

Description

[0001] APPARATUS AND METHOD FOR DETERMINING FAILURE IN A VIRTUAL SYSTEM [0002]

The present invention relates to an apparatus and method for determining a failure in a virtual system, and more particularly, to an apparatus and method for determining whether a failure occurs in a virtual system, Availability) function efficiently and efficiently.

Cloud computing is a system that utilizes Internet technology to provide high-level scalable IT resources to a large number of customers as a service. Users can be assigned a virtualized resource and use it as a personal desktop environment.

Concerns about the introduction of virtualization technology can be a system failure. If there are already 20 physical servers (machines), if a failure occurs, it only affects the affected server. However, in the case of virtualization, a physical server has multiple virtual servers (machines), which can complicate the problem.

In a computer environment providing a service, various high availability (HA) technologies exist to reduce the downtime that a service is unavailable due to the occurrence of a physical failure. For example, a cluster system including a plurality of server computers communicably connected includes an active server for providing a predetermined service and a standby server for restarting the service in the event of a failure of the active server There is a way to provide. In the event of a failure of the active server, while the standby server is performing a failover, the administrator of the system can identify the cause of the failure of the active server and recover or replace the active server with a new one.

When a system managing multiple servers determines whether or not a failure of a host through a network is determined, when a network connection error occurs between the management server and a host server (e.g., a physical server), the management server determines that the host server has a failure, (HA) function. In this case, if the host server does not react temporarily, the management server may perform unnecessary operations by performing the high availability function even though the virtual server can normally operate.

KR 10-2015-0029181 A

The present invention can reduce unnecessary high availability (HA) operations by checking whether a host server is operating normally through a plurality of paths by using a plurality of criteria in a virtual system including a management server, a host server, and a virtual server Methods and apparatus can be provided.

It is another object of the present invention to provide a method and apparatus for efficiently using an extended system for high availability (HA) operation by reducing system load due to unnecessary operation, Can be provided.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention, unless further departing from the spirit and scope of the invention as defined by the appended claims. It will be possible.

The present invention can provide a method and apparatus for efficiently performing a high availability function through failover by determining whether the elements constituting a virtual system are operated by a plurality of methods.

A method for determining a failover in a virtual system according to an embodiment of the present invention is a virtual system including a management server, a host server, and a virtual server, A first step of exchanging a heartbeat message to determine whether the host server has failed; A second step of exchanging a second heartbeat message between the virtual server operating on the host server and the management server to determine whether the virtual server has failed if the host server is determined to have failed in the first step ; And performing a high availability function according to a result of the first step and the second step.

The method of determining a failover in a virtual system may further include confirming existence of the virtual server operating on the host server when the host server determines that the host server has failed in the first step.

In addition, if there is no virtual server operating on the host server determined to be a failure, the high availability function may not be performed.

In addition, if the host server is determined to be in failure in the first step and the virtual server is not in failure in the second step, the high availability function may not be performed.

In addition, when both the host server and the virtual server are disabled in the first step and the second step, the high availability function can be performed.

The method for determining a failover in a virtual system may further include, when the host server and the virtual server are determined to have failed in the first step and the second step, And a third step of determining whether the host server is normally interlocked and determining whether the host server has failed.

In addition, when the host server and the virtual server are determined to be in a failure state in the first to third steps, the high availability function may be performed.

In addition, the management server interlocks with the plurality of host servers and the plurality of virtual servers, and at least one or more virtual servers may operate on each host server.

In addition, when the high availability function is performed, the virtual server operating on the failed host server may be transferred to another host server in which the failure has not occurred.

Also, when the virtual server is transferred, it can be determined by reflecting the resource usage rate of the other host server.

An apparatus for determining a failover in a virtual system according to another embodiment of the present invention includes a host server included in a virtual system and a management server interlocked with a virtual server, A first determination unit for determining whether the host server has failed or not by exchanging messages; A second determination unit for determining whether the virtual server has failed by exchanging a second heartbeat message with the virtual server operating on the host server when the first determination unit determines that the host server is a failure; And a transfer unit performing a high availability function according to a result of the first determination unit and the second determination unit.

In addition, when the first determination unit determines that the host server is a failure, the second determination unit can confirm existence of the virtual server that is operating on the host server.

The second determination unit may prevent the escalation unit from performing the high availability function when the first determination unit determines that there is no virtual server that is operating on the host server determined to be a failure.

In addition, if the first determination unit determines that the host server has failed and the second determination unit determines that the virtual server is not a failure, the transfer unit may not perform the high availability function.

In addition, when the first determination unit and the second determination unit determine that both the host server and the virtual server have failed, the transfer unit can perform the high availability function.

The apparatus for determining a fail-over in a virtual system may further include a first determination unit and a second determination unit, wherein when the host server and the virtual server are determined as a failure, And a third determination unit for determining whether the host server is normally interlocked with the storage and determining whether the host server has failed.

In addition, if the host server and the virtual server determine that the host server and the virtual server have failed in the first to third determination units, the transfer unit may perform the high availability function.

Also, when the virtual server is transferred, the transfer unit may determine the resource utilization rate of the other host server.

While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments, but, on the contrary, And can be understood and understood.

Effects of the method and apparatus according to the present invention will be described as follows.

The present invention prevents the abnormal operation of the high availability (HA) function by accurately judging whether or not the host server in the virtual system has failed, in accordance with a plurality of criteria, The phenomenon can be reduced.

In addition, according to the present invention, when a high availability (HA) function based on a false judgment is performed, a virtual server is transferred to another host server, thereby preventing a new failure that may occur due to an increase in resource utilization of another host in normal operation.

The effects obtainable by the present invention are not limited to the effects mentioned above, and other effects not mentioned can be clearly understood by those skilled in the art from the following description.

BRIEF DESCRIPTION OF THE DRAWINGS The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and, together with the description, serve to explain the principles of the invention. It is to be understood, however, that the technical features of the present invention are not limited to the specific drawings, and the features disclosed in the drawings may be combined with each other to constitute a new embodiment.
Figure 1 illustrates a method for determining failover in a virtual system.
2 illustrates a system structure for determining a failover in a virtual system.
Figure 3 illustrates High Availability (HA) functionality in a virtual system.
4 illustrates an apparatus for determining a failover in a virtual system.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, an apparatus and various methods to which embodiments of the present invention are applied will be described in detail with reference to the drawings. The suffix "module" and " part "for the components used in the following description are given or mixed in consideration of ease of specification, and do not have their own meaning or role.

In the description of the embodiments, when it is described as being formed on the "upper" or "lower" of each element, the upper or lower (lower) And that at least one further component is formed and arranged between the two components. Also, the expression "upward" or "downward" may include not only an upward direction but also a downward direction on the basis of one component.

A virtual server refers to a computing environment created by software configuration through a hypervisor on a host. High availability (HA) must be followed when virtualization is introduced into the service environment through the actual network. The reason for high availability is to prepare for downtime. There are two types of downtime. As one of them, Planned Downtime means scheduled downtime such as server maintenance, hardware replacement, etc., and since it is already working after some notice work or after business hours, The impact of service provision and user inconvenience is not significant. However, the other is unplanned downtime, which is caused by hardware problems in the server, problems with the operating system due to problems with the software, unavailability of services due to problems with the software, unexpected downtime, And the user's discomfort. High Availability (HA) is a technology that minimizes downtime with the aim of continuous operation of the user's server. When the system is stopped, Can be included in the high-availability function.

Sudden downtime can be detected through system hangs. A system hang, sometimes referred to as a hang or freeze, is a state in which the system can not be operated with no response. In this state, it can be determined that the system is not able to process processes even if there is no input and no input. The factors causing such a hang can be thought of as hardware and software.

For example, hardware factors may include mismatched hardware, hardware thermal damage due to operating environment factors, memory shortage, hardware defects, and so on. On the other hand, the software factor may include an infinite loop or a race condition, a deadlock, a spyware or a computer virus.

In order to understand the causes of such a hang, it is necessary to monitor resources basically. Monitoring includes monitoring of system resources and actions, such as central processing unit (CPU), memory, process status, and heat, and the program (kernel, application) There is a log monitoring method to check, and a heartbeat monitoring method to check whether a response is coming by sending a simple message between system components.

There is a rapid increase in commercial services on the Internet, and it is important to continue to provide these services without interruption. Multiple hardware or software can be deployed for high availability (HA). Normally, multiple system resources can periodically exchange messages. If this message does not arrive for more than an hour, one of the system resources can be considered as failed. A message sent and received at this time can be called a heartbeat message.

Figure 1 illustrates a method for determining failover in a virtual system.

As shown in the figure, in a virtual system including a management server, a host server, and a virtual server, a method of determining a failover in a virtual system includes receiving a first heartbeat message between the management server and the host server A first step (10) for judging whether or not the host server has failed and a second heartbeat message between the virtual server operating on the host server and the management server when it is determined that the host server has failed in the first step (10) (16) performing a high availability function according to the result of the first step (10) and the second step (12), a second step (12) . &Lt; / RTI >

In a virtual system including a management server, a host server, and a virtual server, the management server can interwork with a plurality of the host servers and a plurality of the virtual servers. Also, at least one virtual server may be operated on each host server. If a High Availability (HA) function is performed, a virtual server operating on the failed host server may be transferred to another host server that has not failed. Also, when the virtual server is transferred, it can be determined by reflecting the resource utilization rate of other host servers included in the system to determine the host server to be transferred.

The method for determining a failover in the virtual system may further include confirming existence of the virtual server running on the host server when the host server determines that the host server has failed in the first step (10). If there is no virtual server running on the host server determined to be a failure, the high availability (HA) function may not be performed.

If the host server is determined to have failed in the first step (10) but the virtual server is not in failure in the second step (12), the high availability (HA) function may not be performed. In the first step 10, the host server can immediately perform a normal operation immediately after the host server is temporarily stopped for various reasons. In such a situation, it may be rather inefficient to perform the high availability function. Therefore, when both the host server and the virtual server are disabled in the first and second steps 10 and 12, a high availability (HA) function can be performed.

A method of determining a failover in a virtual system is as follows. In a first step (10) and a second step (12), when a host server and a virtual server are determined to be in failure, And a third step (14) of determining whether the host server is normally interlocked and determining whether the host server has failed. In this case, when the host server and the virtual server are judged to have failed in the first to third steps (10) to (14), a high availability (HA) function can be performed.

2 illustrates a system structure for determining a failover in a virtual system.

As shown, the virtual system may include a management server 20, host servers 24A and 24B, and virtual servers 22A and 22B. In order to determine a failover in the virtual system and determine whether the HA function is enabled, it is necessary to first check the heartbeat message between the management system and the host, and secondly to check the heartbeat message between the management system and the virtual server The heartbeat message can be checked and the host can be more precisely judged by checking whether there is a normal connection between the shared storage and the host.

Specifically, the management server 20 can determine whether the host is malfunctioning by exchanging a heartbeat message with an agent installed in the host servers 24A and 24B.

Thereafter, if there is a host server (assumed to be 24A in this case) checked as a failure, the presence or absence of the virtual server 22A being operated on the host server 24A can be confirmed.

The management server 20 judges whether or not the virtual server has failed by exchanging a heartbeat message with the agent installed in the virtual server 22A on the host server 24A judged as the primary failure .

If the host server 24A fails in the first judgment, or if the virtual server 22A is normal in the second judgment, the high availability (HA) function may not operate.

When the virtual server 22A is also identified as a failure through the secondary determination, it is possible to check whether or not the connection between the shared storage 26 and the failed host server 22A exists.

When it is determined that all of the connections with the shared storage 26 are determined to be a failure, the host server 22A is judged as a failure and finally operates the high availability (HA) function.

As described above, it is possible to prevent the unnecessary operation of the high availability (HA) function by judging the failure of the host server 22A in accordance with three criteria, and as a result, Of the session.

Figure 3 illustrates High Availability (HA) functionality in a virtual system.

As shown, a plurality of host servers 50A, 50B, and 50C are connected to each other through a network. Each of the host servers 50A, 50B and 50C may include resources 52A, 52B and 52C including virtual machine drivers 54A, 54B and 54C for operating a virtual machine and memories, .

For example, three virtual servers 56A_1, 56A_2 and 56A_3 operate on the first host server 50A, two virtual servers 56B_1 and 56B_2 operate on the second host server 50B, It is assumed that one virtual server 56C_1 is operating on the third host server 50C.

If it is determined that the second host server 50B has failed in the manner described with reference to FIG. 2, the virtual servers 56B_1 and 56B_2 operating on the second host server 50B through the high availability (HA) 1 or the third host server 50A or 50C. At this time, since the number of virtual servers operating on the third host server 50C is smaller than that of the first host server 50A, the virtual servers 56B_1 and 56B_2 operating on the second host server 50B in consideration of the system load And can be transferred to the third host server 50C.

When a high availability (HA) function is executed through the above-described high availability (HA) function, a virtual server can be transferred to another host, and a resource utilization rate of a host under normal operation may be exceeded or rapidly increased .

4 illustrates an apparatus for determining a failover in a virtual system.

As shown in the figure, in the management server (not shown) interlocked with the host server 34 and the virtual server 32 included in the virtual system, the management server includes a plurality of host servers 34 and a plurality of virtual servers 32 ). Also, at least one or more virtual servers 32 may be operated on each host server 34.

The device 40 for determining a failover in the virtual system exchanges a first heartbeat message with the host server 34 to determine a failure of the host server 34. The first determination unit 42 ), Exchanges a second heartbeat message with a virtual server (32) operating on the host server (34) determined to be a failure, if at least one of the host servers (34) A second judging unit 44 for judging whether the virtual server 34 has failed or not and a transfer unit 48 for performing a high availability (HA) function according to the results of the first judging unit and the second judging unit. . &Lt; / RTI >

If the first determination unit 42 determines that at least one of the host servers 34 has failed, the second determination unit 44 determines that the virtual server 32 is operating on the host server 34, Can be confirmed. If the second determination unit 44 determines that the virtual server 32 is not operating on the host server 34 determined to be a failure in the first determination unit 42, It is not necessary to perform the above-described operation.

On the other hand, if the first determination unit 42 determines that the host server 34 is a failure, but the second determination unit 44 determines that the virtual server 32 operating on the host server 34 is not a failure, (48) may not perform the high availability (HA) function.

When the first determination unit 42 and the second determination unit 44 determine that both the host server 34 and the corresponding virtual server 32 have failed, the transfer unit 48 performs a high availability (HA) function can do.

In the case where the host server 34 and the corresponding virtual server 32 are judged to have failed in the first judging unit 42 and the second judging unit 44, A third determination unit 46 for determining whether the host server 34 has failed or not by checking whether the shared storage 36 in which the virtual server image of the host server 34 is stored is normally interlocked with the host server 34, ). In this case, if the host server 34 and the virtual server 32 determine that the first server 42 and the virtual server 32 have failed in the first determination unit 42, the second determination unit 44, and the third determination unit 46, Can perform a high availability (HA) function.

If a High Availability (HA) function is performed, a virtual server operating on the failed host server may be transferred to another host server that has not failed. Also, when the virtual server is transferred, it can be determined by reflecting the resource utilization rate of other host servers included in the system to determine the host server to be transferred.

The method according to the above-described embodiments may be implemented as a program to be executed by a computer and stored in a computer-readable recording medium. Examples of the computer-readable recording medium include a ROM, a RAM, a CD- , A floppy disk, an optical data storage device, and the like, and may also be implemented in the form of a carrier wave (for example, transmission over the Internet).

The computer readable recording medium may be distributed over a networked computer system so that computer readable code can be stored and executed in a distributed manner. And, functional program, code, and code segments for implementing the above-described method can be easily inferred by programmers in the technical field to which the embodiment belongs.

It will be apparent to those skilled in the art that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof.

Accordingly, the above description should not be construed in a limiting sense in all respects and should be considered illustrative. The scope of the present invention should be determined by rational interpretation of the appended claims, and all changes within the scope of equivalents of the present invention are included in the scope of the present invention.

20: Management server
22A, 22B, 32, 56A_1, 56A_2, 56A_3, 56B_1, 56B_2, 56C_1:
24A, 24B, 34, 50A, 50B, and 50C:
26, 36: Shared storage
54A, 54B, and 54C: virtual machine driver
52A, 52B, 52C: resources
40: Device to judge failover in a virtual system
42: first judgment unit 44: second judgment unit
46: third judgment section 48:

Claims

In a virtual system including a management server, a host server, and a virtual server,
A first step of exchanging a first heartbeat message between the management server and the host server to determine whether the host server has failed;
A second step of exchanging a second heartbeat message between the virtual server operating on the host server and the management server to determine whether the virtual server has failed if the host server is determined to have failed in the first step ; And
Performing a high availability function according to a result of the first step and the second step
And determining a failover in the virtual system.

The method according to claim 1,
If it is determined that the host server has failed in the first step, confirming existence of the virtual server operating on the host server
Further comprising: determining a failover condition in the virtual system.

3. The method of claim 2,
Wherein the high availability function is not performed when there is no virtual server operating on the host server determined to be a failure.

The method according to claim 1,
Wherein the high availability function is not performed when the host server is determined to be a failure in the first step and the virtual server is not a failure in the second step.

The method according to claim 1,
Wherein the high availability function is performed when both the host server and the virtual server are disabled in the first step and the second step.

The method according to claim 1,
If the host server and the virtual server are determined to be in failure in the first and second steps, checking whether the shared storage in which the virtual server image of the host server is stored and the host server are normally interworked, The third step of determining whether the host server has failed
And determining a failover in the virtual system.

The method according to claim 6,
Wherein the high availability function is performed when the host server and the virtual server are determined as a failure in the first step to the third step.

The method according to claim 1,
Wherein the management server interlocks with the plurality of host servers and the plurality of virtual servers, and at least one or more virtual servers can operate on each of the host servers.

The method according to claim 1,
Wherein the virtual server operating on the failed host server is migrated to another host server in which the failure has not occurred when the high availability function is performed.

10. The method of claim 9,
And determining, when the virtual server is transferred, the resource utilization rate of the other host server based on the resource utilization rate.

A management server interlocked with a host server and a virtual server included in a virtual system,
A first determination unit for determining whether the host server has failed by exchanging a first heartbeat message with the host server;
A second determination unit for determining whether the virtual server has failed by exchanging a second heartbeat message with the virtual server operating on the host server when the first determination unit determines that the host server is a failure; And
And a high-availability function according to a result of the first determination unit and the second determination unit,
And determining a failover in the virtual system.

12. The method of claim 11,
Wherein the second determination unit determines existence of the virtual server operating on the host server when the first determination unit determines that the host server has failed.

13. The method of claim 12,
Wherein the second determination unit determines that the migration unit does not perform the high availability function when it is determined that the virtual server is not operating on the host server determined to be a failure in the first determination unit, Device.

12. The method of claim 11,
When the first determination unit determines that the host server has failed and the second determination unit determines that the virtual server is not a failure, the transfer unit determines that the failover is not performed in the virtual system, Device.

12. The method of claim 11,
Wherein when the first determination unit and the second determination unit determine that both the host server and the virtual server have failed, the transfer unit performs the high availability function.

12. The method of claim 11,
If the host server and the virtual server are determined to be in failure by the first determination unit and the second determination unit, whether the shared storage in which the virtual server image of the host server is stored and the host server are normally interworked A third determination unit for determining whether the host server has failed,
And determining a failover in the virtual system.

17. The method of claim 16,
Wherein the migration unit performs the high availability function when the host server and the virtual server determine that the host server and the virtual server have failed in the first determination unit and the third determination unit.

12. The method of claim 11,
Wherein the management server interlocks with a plurality of the host servers and the plurality of virtual servers, and at least one or more virtual servers can operate on each of the host servers.

12. The method of claim 11,
Wherein when the high availability function is performed, the virtual server operating on the failed host server is transferred to another host server in which the failure has not occurred.

20. The method of claim 19,
Wherein when the virtual server is transferred, the transfer unit determines the resource utilization rate of the other host server based on the resource usage rate of the other host server.