KR100264896B1

KR100264896B1 - Apparatus and method for detecting cluster node failure of the heterogeneous cluster system

Info

Publication number: KR100264896B1
Application number: KR1019980030112A
Authority: KR
Inventors: 이준수
Original assignee: 윤종용; 삼성전자주식회사
Priority date: 1998-07-27
Filing date: 1998-07-27
Publication date: 2000-09-01
Also published as: KR20000009580A

Abstract

여러 운영 체제를 사용하는 다중 클러스터 시스템 환경에서 클러스터 노드 고장을 감지하는 장치 및 방법에 대해 개시한다. 본 방법은 여러 운영 체제를 사용하는 다중 클러스터 시스템에 있어서, 클러스터 시스템에 연결된 클라이언트 내부에 존재하는 클러스터 고장 관리자가 감시할 클러스터 시스템의 노드와 가상 서버의 이름을 입력받는 입력 단계, 클러스터 시스템의 노드와 가상 서버의 이름, IP 주소, MAC 주소 정보를 저장하는 저장 단계, 저장 단계 후 서버가 고장나면 서버의 고장 상태를 표시하는 표시 단계, 고장 표시된 서버가 고장을 수리한 후 재등록을 요구하는 요구 단계, 요구 단계에 따라 클러스터 고장 관리자가 서버의 정보를 재구성하는 재구성 단계, 고장이 아닌 시스템의 주소 상태를 조사하는 주소 조사 단계, 조사 단계에 의해 시스템의 주소가 저장된 주소와 같으면 상기 표시 단계으로 돌아가는 단계 및 조사 단계에 의해 시스템의 주소가 저장된 주소와 다르면 다른 시스템의 주소 상태를 조사한 후 표시 단계로 돌아가는 단계를 구비하여 이루어진다.An apparatus and method for detecting a cluster node failure in a multi-cluster system environment using multiple operating systems are disclosed. In the multi-cluster system using several operating systems, the cluster failure manager in the client connected to the cluster system inputs the node of the cluster system and the name of the virtual server to be monitored, and the node of the cluster system. A storage step that stores the name, IP address, and MAC address information of the virtual server, a display step that displays the server's failure status if the server fails after the storage step, and a request step that requires re-registration after the failed server repairs the failure. A reconfiguration step in which the cluster failure manager reconfigures information of the server according to a request step, an address investigation step of examining an address state of a system that is not a failure, and returning to the display step if the address of the system is the same as the stored address by the investigation step And the address where the system's address is stored by the If it is different, the step of checking the address status of another system and returning to the display step is included.

Description

Apparatus and method for detecting cluster node failure in multi-cluster system

본 발명은 여러 운영 체제를 사용하는 다중 클러스터(Cluster) 시스템 환경에서 클러스터 노드 고장을 감지하는 장치 및 방법에 관한 것으로서, 특히 여러 클러스터 시스템이 혼재 되어 있는 환경에서 클러스터 노드를 통합적으로 관리하여 노드가 고장 났을 때 혹은 응용 프로그램에 문제가 생겨 절체되는 상황을 클러스터의 운영체제에 상관없이 보편적인 프로토콜(Protocol)을 통해서 감지하여 관리자에게 알려줌으로써 빠른 복구를 할 수 있는, 여러 운영 체제(Operating System : OS)를 사용하는 다중 클러스터 시스템 환경에서 클러스터 노드 고장을 감지하는 장치 및 방법에 관한 것이다.The present invention relates to an apparatus and method for detecting a cluster node failure in a multi-cluster system environment using multiple operating systems. In particular, the present invention relates to an apparatus and method for detecting a cluster node failure in a multi-cluster system. Regardless of the operating system of the cluster, operating systems can be detected by using a common protocol regardless of the operating system of the cluster. An apparatus and method for detecting a cluster node failure in a multi-cluster system environment.

이후 클러스터 시스템의 구성을 도 2 의 종래 기술에 의한 클러스터 시스템의 구성도를 참고로 설명한다. 당업자에게 잘 알려진 바와 같이, 클러스터 시스템이란 여러개의 계산기를 하나의 계산기처럼 보이게 하고 사용자로부터 요구되는 작업을 여러 계산기에서 분산 처리 할 수 있는 기술이다. 클러스터 시스템은 수퍼 컴퓨터에 비해 가격대 성능비가 좋고, 확장성, 가용성 등이 좋다. 클러스터 시스템은 운영 체제에 따라 유닉스 계열의 운영 체제에서 수행되는 클러스터와 윈도우즈 NT에서 사용되는 클러스터 등 여러 종류로 나눌 수 있는데 각 클러스터 시스템(100)(110) 마다 클러스터 관리자(130)(140)가 클라이언트(Client : 120)에 있어서 고장을 관리한다. 이 클러스터 관리자를 사용하여 클러스터에 가상 IP(Internet Protocol) 주소, 가상 네트웍 이름, 자원 등을 만들 수 있고 클러스터 그룹을 만들어 노드가 고장났을 때 그룹의 자원들이 다른 노드에서 수행되게 할 수도 있다.Hereinafter, the configuration of the cluster system will be described with reference to the configuration diagram of the cluster system according to the related art of FIG. 2. As is well known to those skilled in the art, a cluster system is a technique that allows multiple calculators to appear as a single calculator and to distribute the tasks required by the user across multiple calculators. Clustered systems offer better price / performance, scalability, and availability than supercomputers. The cluster system can be divided into various types, such as a cluster running on a Unix-like operating system and a cluster used on Windows NT, depending on the operating system. For each cluster system 100, 110, the cluster manager 130, 140, the client (Client: 120) manages the failure. You can use this cluster manager to create virtual Internet Protocol (IP) addresses, virtual network names, resources, and so on, and create cluster groups so that when a node fails, the resources in the group run on another node.

종래 기술에 의한 클러스터 시스템의 동작을 도 1 의 종래 기술에 의한 클러스터 시스템 고장 감지 흐름도를 참고로 설명한다. 도시된 바와 같이 여러개의 클러스터 시스템에 따라 다른 클러스터 관리자를 클라이언트에서 수행(150)한다. 클러스터 관리자는 클러스터 이름을 입력 받고 클러스터의 상태를 사용자에게 보여준다.(160) 이후 클러스터 노드나 서버가 고장나면 해당 클러스터 관리자가 사용자에게 고장을 표시한다.(170)The operation of the cluster system according to the prior art will be described with reference to the cluster system failure detection flowchart according to the prior art of FIG. As shown in the drawing, a different cluster manager is performed 150 by the client according to several cluster systems. The cluster manager receives the cluster name and displays the cluster status to the user (160). If the cluster node or server fails afterwards, the cluster manager displays the failure to the user (170).

그러나, 이와 같은 종래의 방법은, 운영 체제가 다른 클러스터 시스템이 혼재되어 있을 때, 고장을 통합적으로 관리하는 방법이 없는 문제점이 있다.However, this conventional method has a problem in that there is no method for integrated management of failures when cluster systems having different operating systems are mixed.

따라서 본 발명은 상기한 바와 같은 문제점을 해결하기 위하여 창안된 것으로서, 여러 클러스터 시스템이 혼재 되어 있는 환경에서 클러스터 노드를 통합적으로 관리하여 노드가 고장 났을 때 혹은 응용 프로그램에 문제가 생겨 절체되는 상황을 클러스터의 운영체제에 상관없이 보편적인 프로토콜을 통해서 감지하여 관리자에게 알려주어 빠른 복구가 가능하게 하는, 여러 운영 체제를 사용하는 다중 클러스터 시스템 환경에서 클러스터 노드 고장을 감지하는 장치 및 방법의 제공을 목적으로 한다.Therefore, the present invention was devised to solve the above problems, and clustered the situation when the node is broken or a problem occurs in the application by integrally managing the cluster nodes in an environment where several cluster systems are mixed. It is an object of the present invention to provide a device and a method for detecting a cluster node failure in a multi-cluster system environment using multiple operating systems, which can detect and notify a manager through a universal protocol regardless of the operating system.

본 발명의 다른 목적과 장점은 아래의 발명의 상세한 설명을 읽고 아래의 도면을 참조하면 보다 명백해질 것이다.Other objects and advantages of the present invention will become more apparent from the following detailed description of the invention and the accompanying drawings.

도 1 은 종래 기술에 의한 클러스터 시스템 고장 감지 방법의 흐름도이다.1 is a flowchart illustrating a cluster system failure detection method according to the prior art.

도 2 는 종래 기술에 의한 클러스터 시스템의 구성도이다.2 is a block diagram of a cluster system according to the prior art.

도 3 은 본 발명에 의한 클러스터 시스템의 고장 감지 방법의 흐름도이다.3 is a flowchart illustrating a failure detection method of a cluster system according to the present invention.

도 4 는 본 발명에 사용된 클러스터 시스템의 주소표이다.4 is an address table of a cluster system used in the present invention.

도 5 는 본 발명에 의한 클러스터 시스템의 구성도이다.5 is a configuration diagram of a cluster system according to the present invention.

<도면의 주요 부분에 대한 부호의 설명><Explanation of symbols for the main parts of the drawings>

100,110,300,310 : 클러스터 시스템100,110,300,310: Cluster system

120,320 : 클라이언트120,320: Client

130,140 : 클러스터 관리자130,140: Cluster Administrator

330 : 통합 클러스터 관리자330: Integrated Cluster Manager

상기와 같은 목적을 달성하기 위하여 본 발명에 따른, 다중 클러스터 시스템의 클러스터 노드 고장 감지 장치의 바람직한 실시예는,In order to achieve the above object, according to the present invention, a preferred embodiment of the cluster node failure detection apparatus of a multi-cluster system,

다른 운영 체제를 사용하는 다수의 클러스터 시스템; 및Multiple cluster systems using different operating systems; And

상기 클러스터 시스템과 가상 서버가 MAC(Medium Access Control) 주소로 등록되어 있어서 고장 여부를 TCP(Transmission Control Protocol)/IP 프로토콜을 이용하여 진단하는 클러스터 고장 관리자를 포함한 클라이언트;를 구비하여 이루어진다.And a client including a cluster failure manager configured to diagnose whether the cluster system and the virtual server are registered with a medium access control (MAC) address using a Transmission Control Protocol (TCP) / IP protocol.

본 발명의 실시예에 있어서, 상기 클러스터 고장 관리자는,In an embodiment of the invention, the cluster failure manager,

감시할 클러스터 시스템의 노드·가상 서버의 이름, IP 주소, MAC 주소 정보를 저장하는 저장 장치를 추가로 포함하는 것이 바람직하다.It is preferable to further include a storage device for storing the name, IP address, and MAC address information of the node and virtual server of the cluster system to be monitored.

본 발명에 따른 다중 클러스터 시스템의 클러스터 노드 고장 감지 방법의 바람직한 실시예는, 여러 운영 체제를 사용하는 다중 클러스터 시스템에 있어서,Preferred embodiment of the cluster node failure detection method of a multi-cluster system according to the present invention, in a multi-cluster system using several operating systems,

클러스터 시스템과 가상 서버를 등록하는 등록 단계;A registration step of registering a cluster system and a virtual server;

상기 등록된 시스템과 응용 프로그램을 감시하는 감시 단계; 및A monitoring step of monitoring the registered system and application program; And

상기 감시 단계 중에 발생한 클러스터 고장이나 응용 프로그램 고장을 알려주는 보고 단계;을 구비하여 이루어진다.And a reporting step of notifying a cluster failure or an application program failure that occurred during the monitoring step.

본 발명의 다른 실시예에 있어서, 상기 등록 단계는,In another embodiment of the present invention, the registration step,

상기 클러스터 시스템에 연결된 클라이언트 내부에 존재하는 클러스터 고장 관리자가 감시할 클러스터 시스템의 노드와 가상 서버의 이름을 입력받는 입력 단계; 및An input step of receiving a name of a node and a virtual server of a cluster system to be monitored by a cluster failure manager existing in a client connected to the cluster system; And

상기 클러스터 시스템의 노드와 가상 서버의 이름, IP 주소, MAC 주소 정보를 저장하는 저장 단계를 구비하여 이루어지는 것이 바람직하며,Preferably, a storage step of storing the name, IP address, MAC address information of the node and the virtual server of the cluster system,

상기 저장 단계는, ICMP(Internet Control Message Protocol) 프로토콜과 ARP(Address Resolution Protocol) 프로토콜을 이용하여 IP 주소, MAC 주소 정보를 조사하는 것이 바람직하며,In the storing step, it is preferable to examine IP address and MAC address information using the Internet Control Message Protocol (ICMP) protocol and the Address Resolution Protocol (ARP) protocol.

상기 감시 단계는, 상기 저장 단계 후 서버가 고장나면 상기 서버의 고장 상태를 표시하는 표시 단계;The monitoring step may include a display step of displaying a failure state of the server if the server fails after the storing step;

상기 고장 표시된 서버가 고장을 수리한 후 재등록을 요구하는 요구 단계;A requesting step of requesting re-registration after the fault-indicated server repairs the fault;

상기 요구 단계에 따라 클러스터 고장 관리자가 서버의 정보를 재구성하는 재구성 단계; 및A reconfiguration step of the cluster failure manager reconfiguring information of the server according to the request step; And

고장이 아닌 시스템의 주소 상태를 조사하는 주소 조사 단계;를 구비하여 이루어지는 것이 바람직하며,It is preferable to have a; address checking step of examining the address status of the system, not a failure,

상기 주소 조사 단계는, 주기적으로 이루어지는 폴링에 의해 주소를 조사하는 것이 바람직하며,In the address lookup step, it is preferable to check the address by polling that is periodically performed.

상기 보고 단계는, 상기 조사 단계에 의해 시스템의 주소가 저장된 주소와 같으면 상기 표시 단계로 돌아가는 단계; 및The reporting step may include returning to the display step if the address of the system is equal to the stored address by the investigation step; And

상기 조사 단계에 의해 시스템의 주소가 저장된 주소와 다르면 다른 시스템의 주소 상태를 조사한 후 상기 표시 단계로 돌아가는 복귀 단계를 구비하여 이루어지는 것이 바람직하다.If the address of the system is different from the stored address by the checking step, it is preferable to include a return step of returning to the display step after checking the address status of another system.

하기 설명에서 본 발명의 바람직한 실시예에 따른 여러 운영 체제를 사용하는 다중 클러스터 시스템 환경에서 클러스터 노드 고장을 감지하는 시스템 및 방법이 첨부된 도면을 참조로, 클러스터 관리자의 기능 등과 같은 많은 특정 상세들이 본 발명의 보다 전반적인 이해를 제공하기 위해 나타나 있다. 그러나, 당해 기술분야에 숙련된 자들에게 있어서는 본 발명이 이러한 상세한 항목들이 없이도 상기한 본 발명의 기술적 사상에 의해 충분히 실시될 수 있다는 것이 명백할 것이다.In the following description, with reference to the accompanying drawings, a system and method for detecting a cluster node failure in a multi-cluster system environment using multiple operating systems according to a preferred embodiment of the present invention, many specific details, such as the function of a cluster manager, are seen. It is shown to provide a more general understanding of the invention. However, it will be apparent to those skilled in the art that the present invention can be sufficiently implemented by the above-described technical spirit of the present invention without these detailed items.

또한, 잘 알려진 클러스터 시스템의 특징 및 기능들은 본 발명을 모호하지 않게 하기 위해 상세히 설명하지 않으며, 동일 용어에 대해서는 설명 및 이해의 편의상 영문 이니셜 또는 국어 해석용어를 병용한다.In addition, features and functions of well-known cluster systems are not described in detail in order not to obscure the present invention, and the same terminology is used in English initials or Korean language for convenience of explanation and understanding.

도 5 는 본 발명에 의한 클러스터 시스템의 구성도이다. 상기 도면을 참고로 본 발명의 구성을 설명하면, 본 발명에 의한 여러 운영 체제를 사용하는 다중 클러스터 시스템(300)(310) 환경에서 클러스터 노드 고장을 감지하는 시스템은 다른 운영 체제를 사용하는 다수의 클러스터 시스템 및 상기 클러스터 시스템과 가상 서버가 MAC 주소로 등록되어 있어서 고장 여부를 TCP/IP 프로토콜을 이용하여 진단하는 클러스터 고장 관리자(330)를 포함한 클라이언트(320)를 구비하여 이루어진다.5 is a configuration diagram of a cluster system according to the present invention. Referring to the configuration of the present invention with reference to the drawings, the system for detecting a cluster node failure in a multi-cluster system (300, 310) environment using a number of operating system according to the present invention is a plurality of using a different operating system The cluster system and the cluster system and the virtual server is registered with a MAC address, so that the client 320 including a cluster failure manager 330 for diagnosing the failure using the TCP / IP protocol.

도 3 은 본 발명에 의한 클러스터 시스템의 고장 감지 흐름도이고, 도 4 는 본 발명에 사용된 시스템 주소표이다. 도시된 바와 같이 클라이언트에는 클러스터 원격 감지 기능을 구현한 고장 관리자가 수행되게 된다. 클러스터와 가상 서버의 설치가 끝나면 먼저 클러스터와 가상 서버의 등록(200)을 거쳐야 한다. 등록은 서버의 노드 이름이나 IP 주소를 입력한다. 하나의 클러스터를 예로 들면 각 물리적 노드 이름, 클러스터 이름, 가상 서버 이름을 입력한다. 클러스터 고장 관리자는 만약 이름이 입력되었다면 이를 토대로 IP 주소를 찾아낸 후 다시 이 IP 주소에 해당하는 각 시스템의 MAC 주소를 찾아내의 고장 상태, 서버 이름, IP 주소, MAC 주소를 도 4 와 같은 형태의 표로 관리한다.(210) MAC 주소를 찾아내는 방법은 TCP/IP 프로토콜이 설치되어 있는 환경에서 ICMP 프로토콜과 ARP 프로토콜을 사용한다. 그리고 주기적으로 이 IP 주소를 폴링하여 기존에 등록되어 있는 MAC 주소와 비교한다.(230) 만약 노드에 해당하는 IP 주소가 접근이 안될 때는 노드가 고장난 것으로 간주하고 가상 서버의 MAC 주소가 바뀐 경우에는 응용 프로그램이 절체된 것으로 간주하고 이전에 있던 노드에 이상이 있음을 소리를 내어 알린다.(240) 이상이 생긴 서버는 고장 상태로 표시하고 사용자로부터 서버 재구성이 요구되면 다시 감시를 시작한다.(220)3 is a flowchart illustrating a failure detection of the cluster system according to the present invention, and FIG. 4 is a system address table used in the present invention. As shown in the figure, a failure manager that implements a cluster remote sensing function is performed on the client. After the installation of the cluster and the virtual server has to go through the registration (200) of the cluster and virtual server first. To register, enter the server's node name or IP address. For example, enter one physical node name, one cluster name, and one virtual server name. If a name is entered, the cluster failure manager finds the IP address based on the name, and then finds the MAC address of each system corresponding to the IP address, and shows the failure status, server name, IP address, and MAC address in a table as shown in FIG. (210) The method of finding the MAC address uses the ICMP protocol and the ARP protocol in an environment where the TCP / IP protocol is installed. It polls this IP address periodically and compares it with the MAC address already registered. (230) If the IP address corresponding to the node is not accessible, the node is considered to be broken and if the MAC address of the virtual server is changed. The application is regarded as a switchover and sounds audible to a node that was previously there. (240) The server that failed is marked as faulted and restarts monitoring when the server requires reconfiguration. (220 )

상기의 설명에서 본 발명을 도면을 중심으로 예를 들어 설명하고 한정하였지만, 본 분야의 통상의 지식을 가진 자에게는 본 발명의 기술적 사상을 벗어나지 않는 범위 내에서 여러 가지 변화와 변경이 가능함이 명백할 것이다.In the above description, the present invention has been described and limited by way of example with reference to the drawings. However, it will be apparent to those skilled in the art that various changes and modifications can be made without departing from the spirit of the present invention. will be.

따라서, 본 발명은 명세서에서 언급된 특별한 형태로 한정되는 것이 아닌 것으로 이해되어야 하며, 오히려 본 발명은 첨부된 청구범위에 의해 정의된, 본 발명의 기술적 사상과 범위 내에 있는 모든 변형물, 균등물 및 대체물을 포함하는 것으로 이해되어야 한다.Therefore, it is to be understood that the invention is not limited to the specific forms referred to in the specification, but rather that the invention is defined by all the modifications, equivalents and equivalents within the spirit and scope of the invention as defined by the appended claims. It should be understood to include substitutes.

상기한 바와 같이 동작하는 본 발명은, 여러 운영체제의 클러스터가 혼용되어 사용되는 경우에 모든 클러스터의 고장을 즉시 알려주어 가용성을 높일 수 있다.According to the present invention operating as described above, when clusters of various operating systems are used in combination, the availability of all clusters can be immediately reported.

Claims

Multiple cluster systems using different operating systems; And

And a client including a cluster failure manager for diagnosing a failure by using the TCP / IP protocol because the cluster system and the virtual server are registered with a MAC address.

The apparatus of claim 1, wherein the cluster failure manager further comprises a storage device that stores name, IP address, and MAC address information of a node / virtual server of a cluster system to be monitored.

A registration step of registering a cluster system and a virtual server;

A monitoring step of monitoring the registered system and application program; And

And a reporting step informing of a cluster failure or an application failure occurring during the monitoring step.

The method of claim 3, wherein the registration step,

An input step of receiving a name of a node and a virtual server of a cluster system to be monitored by a cluster failure manager existing in a client connected to the cluster system; And

And a storage step of storing names, IP addresses, and MAC address information of nodes and virtual servers of the cluster system.

The method of claim 4, wherein the storing step,

A method for detecting cluster node failure in a multi-cluster system that examines IP address and MAC address information using ICMP protocol and ARP protocol.

The method of claim 3, wherein the monitoring step,

A display step of displaying a failure state of the server if the server fails after the storing step;

A requesting step of requesting re-registration after the fault-indicated server repairs the fault;

A reconfiguration step of the cluster failure manager reconfiguring information of the server according to the request step; And

A method for detecting a cluster node failure of a multi-cluster system, comprising: an address lookup step of checking the address status of a system that is not a failure.

7. The method of claim 6, wherein the address checking step examines the address by polling periodically.

The method of claim 3, wherein the reporting step,

If the address of the system is equal to the stored address by the investigating step, returning to the displaying step; And

And if the address of the system is different from the stored address by the investigating step, examining the address status of another system and then returning to the displaying step.