KR20170041557A - Apparatus and method for determining failover in virtual system - Google Patents
Apparatus and method for determining failover in virtual system Download PDFInfo
- Publication number
- KR20170041557A KR20170041557A KR1020150141161A KR20150141161A KR20170041557A KR 20170041557 A KR20170041557 A KR 20170041557A KR 1020150141161 A KR1020150141161 A KR 1020150141161A KR 20150141161 A KR20150141161 A KR 20150141161A KR 20170041557 A KR20170041557 A KR 20170041557A
- Authority
- KR
- South Korea
- Prior art keywords
- server
- virtual
- host server
- host
- determination unit
- Prior art date
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/08—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
- H04L43/0805—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0654—Management of faults, events, alarms or notifications using network fault recovery
- H04L41/0663—Performing the actions predefined by failover planning, e.g. switching to standby network elements
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/10—Active monitoring, e.g. heartbeat, ping or trace-route
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Environmental & Geological Engineering (AREA)
- Health & Medical Sciences (AREA)
- Cardiology (AREA)
- General Health & Medical Sciences (AREA)
- Hardware Redundancy (AREA)
Abstract
Description
The present invention relates to an apparatus and method for determining a failure in a virtual system, and more particularly, to an apparatus and method for determining whether a failure occurs in a virtual system, Availability) function efficiently and efficiently.
Cloud computing is a system that utilizes Internet technology to provide high-level scalable IT resources to a large number of customers as a service. Users can be assigned a virtualized resource and use it as a personal desktop environment.
Concerns about the introduction of virtualization technology can be a system failure. If there are already 20 physical servers (machines), if a failure occurs, it only affects the affected server. However, in the case of virtualization, a physical server has multiple virtual servers (machines), which can complicate the problem.
In a computer environment providing a service, various high availability (HA) technologies exist to reduce the downtime that a service is unavailable due to the occurrence of a physical failure. For example, a cluster system including a plurality of server computers communicably connected includes an active server for providing a predetermined service and a standby server for restarting the service in the event of a failure of the active server There is a way to provide. In the event of a failure of the active server, while the standby server is performing a failover, the administrator of the system can identify the cause of the failure of the active server and recover or replace the active server with a new one.
When a system managing multiple servers determines whether or not a failure of a host through a network is determined, when a network connection error occurs between the management server and a host server (e.g., a physical server), the management server determines that the host server has a failure, (HA) function. In this case, if the host server does not react temporarily, the management server may perform unnecessary operations by performing the high availability function even though the virtual server can normally operate.
The present invention can reduce unnecessary high availability (HA) operations by checking whether a host server is operating normally through a plurality of paths by using a plurality of criteria in a virtual system including a management server, a host server, and a virtual server Methods and apparatus can be provided.
It is another object of the present invention to provide a method and apparatus for efficiently using an extended system for high availability (HA) operation by reducing system load due to unnecessary operation, Can be provided.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention, unless further departing from the spirit and scope of the invention as defined by the appended claims. It will be possible.
The present invention can provide a method and apparatus for efficiently performing a high availability function through failover by determining whether the elements constituting a virtual system are operated by a plurality of methods.
A method for determining a failover in a virtual system according to an embodiment of the present invention is a virtual system including a management server, a host server, and a virtual server, A first step of exchanging a heartbeat message to determine whether the host server has failed; A second step of exchanging a second heartbeat message between the virtual server operating on the host server and the management server to determine whether the virtual server has failed if the host server is determined to have failed in the first step ; And performing a high availability function according to a result of the first step and the second step.
The method of determining a failover in a virtual system may further include confirming existence of the virtual server operating on the host server when the host server determines that the host server has failed in the first step.
In addition, if there is no virtual server operating on the host server determined to be a failure, the high availability function may not be performed.
In addition, if the host server is determined to be in failure in the first step and the virtual server is not in failure in the second step, the high availability function may not be performed.
In addition, when both the host server and the virtual server are disabled in the first step and the second step, the high availability function can be performed.
The method for determining a failover in a virtual system may further include, when the host server and the virtual server are determined to have failed in the first step and the second step, And a third step of determining whether the host server is normally interlocked and determining whether the host server has failed.
In addition, when the host server and the virtual server are determined to be in a failure state in the first to third steps, the high availability function may be performed.
In addition, the management server interlocks with the plurality of host servers and the plurality of virtual servers, and at least one or more virtual servers may operate on each host server.
In addition, when the high availability function is performed, the virtual server operating on the failed host server may be transferred to another host server in which the failure has not occurred.
Also, when the virtual server is transferred, it can be determined by reflecting the resource usage rate of the other host server.
An apparatus for determining a failover in a virtual system according to another embodiment of the present invention includes a host server included in a virtual system and a management server interlocked with a virtual server, A first determination unit for determining whether the host server has failed or not by exchanging messages; A second determination unit for determining whether the virtual server has failed by exchanging a second heartbeat message with the virtual server operating on the host server when the first determination unit determines that the host server is a failure; And a transfer unit performing a high availability function according to a result of the first determination unit and the second determination unit.
In addition, when the first determination unit determines that the host server is a failure, the second determination unit can confirm existence of the virtual server that is operating on the host server.
The second determination unit may prevent the escalation unit from performing the high availability function when the first determination unit determines that there is no virtual server that is operating on the host server determined to be a failure.
In addition, if the first determination unit determines that the host server has failed and the second determination unit determines that the virtual server is not a failure, the transfer unit may not perform the high availability function.
In addition, when the first determination unit and the second determination unit determine that both the host server and the virtual server have failed, the transfer unit can perform the high availability function.
The apparatus for determining a fail-over in a virtual system may further include a first determination unit and a second determination unit, wherein when the host server and the virtual server are determined as a failure, And a third determination unit for determining whether the host server is normally interlocked with the storage and determining whether the host server has failed.
In addition, if the host server and the virtual server determine that the host server and the virtual server have failed in the first to third determination units, the transfer unit may perform the high availability function.
In addition, the management server interlocks with the plurality of host servers and the plurality of virtual servers, and at least one or more virtual servers may operate on each host server.
In addition, when the high availability function is performed, the virtual server operating on the failed host server may be transferred to another host server in which the failure has not occurred.
Also, when the virtual server is transferred, the transfer unit may determine the resource utilization rate of the other host server.
While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments, but, on the contrary, And can be understood and understood.
Effects of the method and apparatus according to the present invention will be described as follows.
The present invention prevents the abnormal operation of the high availability (HA) function by accurately judging whether or not the host server in the virtual system has failed, in accordance with a plurality of criteria, The phenomenon can be reduced.
In addition, according to the present invention, when a high availability (HA) function based on a false judgment is performed, a virtual server is transferred to another host server, thereby preventing a new failure that may occur due to an increase in resource utilization of another host in normal operation.
The effects obtainable by the present invention are not limited to the effects mentioned above, and other effects not mentioned can be clearly understood by those skilled in the art from the following description.
BRIEF DESCRIPTION OF THE DRAWINGS The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and, together with the description, serve to explain the principles of the invention. It is to be understood, however, that the technical features of the present invention are not limited to the specific drawings, and the features disclosed in the drawings may be combined with each other to constitute a new embodiment.
Figure 1 illustrates a method for determining failover in a virtual system.
2 illustrates a system structure for determining a failover in a virtual system.
Figure 3 illustrates High Availability (HA) functionality in a virtual system.
4 illustrates an apparatus for determining a failover in a virtual system.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, an apparatus and various methods to which embodiments of the present invention are applied will be described in detail with reference to the drawings. The suffix "module" and " part "for the components used in the following description are given or mixed in consideration of ease of specification, and do not have their own meaning or role.
In the description of the embodiments, when it is described as being formed on the "upper" or "lower" of each element, the upper or lower (lower) And that at least one further component is formed and arranged between the two components. Also, the expression "upward" or "downward" may include not only an upward direction but also a downward direction on the basis of one component.
A virtual server refers to a computing environment created by software configuration through a hypervisor on a host. High availability (HA) must be followed when virtualization is introduced into the service environment through the actual network. The reason for high availability is to prepare for downtime. There are two types of downtime. As one of them, Planned Downtime means scheduled downtime such as server maintenance, hardware replacement, etc., and since it is already working after some notice work or after business hours, The impact of service provision and user inconvenience is not significant. However, the other is unplanned downtime, which is caused by hardware problems in the server, problems with the operating system due to problems with the software, unavailability of services due to problems with the software, unexpected downtime, And the user's discomfort. High Availability (HA) is a technology that minimizes downtime with the aim of continuous operation of the user's server. When the system is stopped, Can be included in the high-availability function.
Sudden downtime can be detected through system hangs. A system hang, sometimes referred to as a hang or freeze, is a state in which the system can not be operated with no response. In this state, it can be determined that the system is not able to process processes even if there is no input and no input. The factors causing such a hang can be thought of as hardware and software.
For example, hardware factors may include mismatched hardware, hardware thermal damage due to operating environment factors, memory shortage, hardware defects, and so on. On the other hand, the software factor may include an infinite loop or a race condition, a deadlock, a spyware or a computer virus.
In order to understand the causes of such a hang, it is necessary to monitor resources basically. Monitoring includes monitoring of system resources and actions, such as central processing unit (CPU), memory, process status, and heat, and the program (kernel, application) There is a log monitoring method to check, and a heartbeat monitoring method to check whether a response is coming by sending a simple message between system components.
There is a rapid increase in commercial services on the Internet, and it is important to continue to provide these services without interruption. Multiple hardware or software can be deployed for high availability (HA). Normally, multiple system resources can periodically exchange messages. If this message does not arrive for more than an hour, one of the system resources can be considered as failed. A message sent and received at this time can be called a heartbeat message.
Figure 1 illustrates a method for determining failover in a virtual system.
As shown in the figure, in a virtual system including a management server, a host server, and a virtual server, a method of determining a failover in a virtual system includes receiving a first heartbeat message between the management server and the host server A first step (10) for judging whether or not the host server has failed and a second heartbeat message between the virtual server operating on the host server and the management server when it is determined that the host server has failed in the first step (10) (16) performing a high availability function according to the result of the first step (10) and the second step (12), a second step (12) . ≪ / RTI >
In a virtual system including a management server, a host server, and a virtual server, the management server can interwork with a plurality of the host servers and a plurality of the virtual servers. Also, at least one virtual server may be operated on each host server. If a High Availability (HA) function is performed, a virtual server operating on the failed host server may be transferred to another host server that has not failed. Also, when the virtual server is transferred, it can be determined by reflecting the resource utilization rate of other host servers included in the system to determine the host server to be transferred.
The method for determining a failover in the virtual system may further include confirming existence of the virtual server running on the host server when the host server determines that the host server has failed in the first step (10). If there is no virtual server running on the host server determined to be a failure, the high availability (HA) function may not be performed.
If the host server is determined to have failed in the first step (10) but the virtual server is not in failure in the second step (12), the high availability (HA) function may not be performed. In the
A method of determining a failover in a virtual system is as follows. In a first step (10) and a second step (12), when a host server and a virtual server are determined to be in failure, And a third step (14) of determining whether the host server is normally interlocked and determining whether the host server has failed. In this case, when the host server and the virtual server are judged to have failed in the first to third steps (10) to (14), a high availability (HA) function can be performed.
2 illustrates a system structure for determining a failover in a virtual system.
As shown, the virtual system may include a
Specifically, the
Thereafter, if there is a host server (assumed to be 24A in this case) checked as a failure, the presence or absence of the
The
If the
When the
When it is determined that all of the connections with the shared
As described above, it is possible to prevent the unnecessary operation of the high availability (HA) function by judging the failure of the
Figure 3 illustrates High Availability (HA) functionality in a virtual system.
As shown, a plurality of
For example, three virtual servers 56A_1, 56A_2 and 56A_3 operate on the
If it is determined that the
When a high availability (HA) function is executed through the above-described high availability (HA) function, a virtual server can be transferred to another host, and a resource utilization rate of a host under normal operation may be exceeded or rapidly increased .
4 illustrates an apparatus for determining a failover in a virtual system.
As shown in the figure, in the management server (not shown) interlocked with the
The
If the
On the other hand, if the
When the
In the case where the
If a High Availability (HA) function is performed, a virtual server operating on the failed host server may be transferred to another host server that has not failed. Also, when the virtual server is transferred, it can be determined by reflecting the resource utilization rate of other host servers included in the system to determine the host server to be transferred.
The method according to the above-described embodiments may be implemented as a program to be executed by a computer and stored in a computer-readable recording medium. Examples of the computer-readable recording medium include a ROM, a RAM, a CD- , A floppy disk, an optical data storage device, and the like, and may also be implemented in the form of a carrier wave (for example, transmission over the Internet).
The computer readable recording medium may be distributed over a networked computer system so that computer readable code can be stored and executed in a distributed manner. And, functional program, code, and code segments for implementing the above-described method can be easily inferred by programmers in the technical field to which the embodiment belongs.
It will be apparent to those skilled in the art that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof.
Accordingly, the above description should not be construed in a limiting sense in all respects and should be considered illustrative. The scope of the present invention should be determined by rational interpretation of the appended claims, and all changes within the scope of equivalents of the present invention are included in the scope of the present invention.
20: Management server
22A, 22B, 32, 56A_1, 56A_2, 56A_3, 56B_1, 56B_2, 56C_1:
24A, 24B, 34, 50A, 50B, and 50C:
26, 36: Shared storage
54A, 54B, and 54C: virtual machine driver
52A, 52B, 52C: resources
40: Device to judge failover in a virtual system
42: first judgment unit 44: second judgment unit
46: third judgment section 48:
Claims (20)
A first step of exchanging a first heartbeat message between the management server and the host server to determine whether the host server has failed;
A second step of exchanging a second heartbeat message between the virtual server operating on the host server and the management server to determine whether the virtual server has failed if the host server is determined to have failed in the first step ; And
Performing a high availability function according to a result of the first step and the second step
And determining a failover in the virtual system.
If it is determined that the host server has failed in the first step, confirming existence of the virtual server operating on the host server
Further comprising: determining a failover condition in the virtual system.
Wherein the high availability function is not performed when there is no virtual server operating on the host server determined to be a failure.
Wherein the high availability function is not performed when the host server is determined to be a failure in the first step and the virtual server is not a failure in the second step.
Wherein the high availability function is performed when both the host server and the virtual server are disabled in the first step and the second step.
If the host server and the virtual server are determined to be in failure in the first and second steps, checking whether the shared storage in which the virtual server image of the host server is stored and the host server are normally interworked, The third step of determining whether the host server has failed
And determining a failover in the virtual system.
Wherein the high availability function is performed when the host server and the virtual server are determined as a failure in the first step to the third step.
Wherein the management server interlocks with the plurality of host servers and the plurality of virtual servers, and at least one or more virtual servers can operate on each of the host servers.
Wherein the virtual server operating on the failed host server is migrated to another host server in which the failure has not occurred when the high availability function is performed.
And determining, when the virtual server is transferred, the resource utilization rate of the other host server based on the resource utilization rate.
A first determination unit for determining whether the host server has failed by exchanging a first heartbeat message with the host server;
A second determination unit for determining whether the virtual server has failed by exchanging a second heartbeat message with the virtual server operating on the host server when the first determination unit determines that the host server is a failure; And
And a high-availability function according to a result of the first determination unit and the second determination unit,
And determining a failover in the virtual system.
Wherein the second determination unit determines existence of the virtual server operating on the host server when the first determination unit determines that the host server has failed.
Wherein the second determination unit determines that the migration unit does not perform the high availability function when it is determined that the virtual server is not operating on the host server determined to be a failure in the first determination unit, Device.
When the first determination unit determines that the host server has failed and the second determination unit determines that the virtual server is not a failure, the transfer unit determines that the failover is not performed in the virtual system, Device.
Wherein when the first determination unit and the second determination unit determine that both the host server and the virtual server have failed, the transfer unit performs the high availability function.
If the host server and the virtual server are determined to be in failure by the first determination unit and the second determination unit, whether the shared storage in which the virtual server image of the host server is stored and the host server are normally interworked A third determination unit for determining whether the host server has failed,
And determining a failover in the virtual system.
Wherein the migration unit performs the high availability function when the host server and the virtual server determine that the host server and the virtual server have failed in the first determination unit and the third determination unit.
Wherein the management server interlocks with a plurality of the host servers and the plurality of virtual servers, and at least one or more virtual servers can operate on each of the host servers.
Wherein when the high availability function is performed, the virtual server operating on the failed host server is transferred to another host server in which the failure has not occurred.
Wherein when the virtual server is transferred, the transfer unit determines the resource utilization rate of the other host server based on the resource usage rate of the other host server.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020150141161A KR101883251B1 (en) | 2015-10-07 | 2015-10-07 | Apparatus and method for determining failover in virtual system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020150141161A KR101883251B1 (en) | 2015-10-07 | 2015-10-07 | Apparatus and method for determining failover in virtual system |
Publications (2)
Publication Number | Publication Date |
---|---|
KR20170041557A true KR20170041557A (en) | 2017-04-17 |
KR101883251B1 KR101883251B1 (en) | 2018-07-31 |
Family
ID=58703004
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
KR1020150141161A KR101883251B1 (en) | 2015-10-07 | 2015-10-07 | Apparatus and method for determining failover in virtual system |
Country Status (1)
Country | Link |
---|---|
KR (1) | KR101883251B1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113810227A (en) * | 2021-09-13 | 2021-12-17 | 阳光新能源开发有限公司 | Main and standby machine switching method and power station |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20060031538A (en) * | 2004-10-08 | 2006-04-12 | (주)제너시스템즈 | System and method for detecting error state which occurs in internet protocol communication system replicated in active-standby mode |
KR20140030557A (en) * | 2012-08-31 | 2014-03-12 | 주식회사 포스코아이씨티 | Virtualized home network system and operating method for thereof |
KR20150029181A (en) | 2013-09-09 | 2015-03-18 | 삼성에스디에스 주식회사 | Cluster system and method for providing service availbility in cluster system |
-
2015
- 2015-10-07 KR KR1020150141161A patent/KR101883251B1/en active IP Right Grant
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20060031538A (en) * | 2004-10-08 | 2006-04-12 | (주)제너시스템즈 | System and method for detecting error state which occurs in internet protocol communication system replicated in active-standby mode |
KR20140030557A (en) * | 2012-08-31 | 2014-03-12 | 주식회사 포스코아이씨티 | Virtualized home network system and operating method for thereof |
KR20150029181A (en) | 2013-09-09 | 2015-03-18 | 삼성에스디에스 주식회사 | Cluster system and method for providing service availbility in cluster system |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113810227A (en) * | 2021-09-13 | 2021-12-17 | 阳光新能源开发有限公司 | Main and standby machine switching method and power station |
Also Published As
Publication number | Publication date |
---|---|
KR101883251B1 (en) | 2018-07-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR101888029B1 (en) | Method and system for monitoring virtual machine cluster | |
JP5851503B2 (en) | Providing high availability for applications in highly available virtual machine environments | |
EP2518627B1 (en) | Partial fault processing method in computer system | |
US11330071B2 (en) | Inter-process communication fault detection and recovery system | |
US20110191627A1 (en) | System And Method for Handling a Failover Event | |
CN104408071A (en) | Distributive database high-availability method and system based on cluster manager | |
CN106980529B (en) | Computer system for managing resources of baseboard management controller | |
US20130227359A1 (en) | Managing failover in clustered systems | |
US20080288812A1 (en) | Cluster system and an error recovery method thereof | |
US10331472B2 (en) | Virtual machine service availability | |
US20090164565A1 (en) | Redundant systems management frameworks for network environments | |
WO2015058711A1 (en) | Rapid fault detection method and device | |
CN101442437B (en) | Method, system and equipment for implementing high availability | |
US8370897B1 (en) | Configurable redundant security device failover | |
CN110046064B (en) | Cloud server disaster tolerance implementation method based on fault drift | |
JP5712714B2 (en) | Cluster system, virtual machine server, virtual machine failover method, virtual machine failover program | |
KR101883251B1 (en) | Apparatus and method for determining failover in virtual system | |
JP2012014674A (en) | Failure recovery method, server, and program in virtual environment | |
US20180107502A1 (en) | Application continuous high availability solution | |
JP2011203941A (en) | Information processing apparatus, monitoring method and monitoring program | |
JP2007280155A (en) | Reliability improving method in dispersion system | |
JP7474168B2 (en) | Monitoring system and fault monitoring method | |
US11954509B2 (en) | Service continuation system and service continuation method between active and standby virtual servers | |
US11947431B1 (en) | Replication data facility failure detection and failover automation | |
Kitamura | Configuration of a Power-saving High-availability Server System Incorporating a Hybrid Operation Method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
A201 | Request for examination | ||
E902 | Notification of reason for refusal | ||
E701 | Decision to grant or registration of patent right | ||
GRNT | Written decision to grant |