KR100235570B1

KR100235570B1 - The method of the cluster management for the cluster management master system of the parallel ticom

Info

Publication number: KR100235570B1
Application number: KR1019960067592A
Authority: KR
Inventors: 김재민
Original assignee: 구자홍; 엘지전자주식회사
Priority date: 1996-12-18
Filing date: 1996-12-18
Publication date: 1999-12-15
Also published as: KR19980048937A

Abstract

본 발명은 2이상의 주전산기를 병립 연결한 시스템의 클러스터 관리 마스터 시스템의 클러스터 관리방법에 관한 것으로서, 보다 상세하게는 클러스터 관리 환경 테이블에 등록되어 있는 각각의 시스템으로 패킷 정보를 전송하고, 각 시스템에서 클러스터 관리 마스터에 시스템의 정보를 제공하여 다른 시스템에서 결함이 발생했을 경우 이를 클러스터 서버에 전달하여 알림으로써, 결함을 복구하도록 함과 아울러, 다른 시스템에서 수행하던 애플리케이션을 계속 수행하도록 하는 것이다.The present invention relates to a cluster management method of a cluster management master system of a system in which two or more main computers are connected in parallel. More specifically, the present invention provides packet information to each system registered in a cluster management environment table, and transmits cluster information to each cluster. By providing information of the system to the management master and notifying the cluster server when a failure occurs in another system, the system not only recovers the defect but also continues to execute the application that was executed in the other system.

이와 같은 본 발명은 클러스터 관리 환경 테이블에 등록되어 있는 각 시스템의 노드로 시스템 정보가 실린 패킷을 일정시간 간격으로 전송하여 각 노드의 상태를 검출하는 노드상태 검출과정과; 패킷에서의 메시지 검출 결과 해당 노드가 정상상태로 판단될 경우 이를 클러스터 관리 환경 테이블에 업데이트하고 사용자에게 알리는 정상상태 처리과정과; 패킷에서의 메시지 검출 결과 해당 노드가 비정상으로 판단될 경우 이를 클러스터 관리 환경 테이블에 업데이트하고 사용자에게 알리는 결함발생 처리과정으로 이루어짐을 특징으로 한다.As described above, the present invention provides a node state detection process of detecting a state of each node by transmitting a packet containing system information to a node of each system registered in a cluster management environment table at predetermined time intervals; A normal state processing process of updating a cluster management environment table and informing the user when the node is determined to be in a normal state as a result of message detection in the packet; If it is determined that the corresponding node is abnormal as a result of message detection in the packet, it is characterized in that it consists of a defect generation process of updating the cluster management environment table and informing the user.

Description

Cluster management method of a system that connects two or more main computers in parallel.

본 발명의 목적은 클러스터 관리 환경 테이블에 등록되어 있는 각 시스템으로 패킷 정보를 전송하고, 각 시스템에서 클러스터 관리 마스터에 시스템의 정보를 제공하여 다른 시스템에서 결함을 검출함과 아울러 클러스터 서버에 전달하여 결함을 복구하도록 하고, 결함이 발생한 시스템에서 수행하던 애플리케이션을 정상적인 시스템에서 수행하도록 하는 2이상의 주전산기를 병립 연결한 시스템의 클러스터 관리 마스터 시스템의 클러스터 관리방법을 제공함에 있다.An object of the present invention is to transmit packet information to each system registered in the cluster management environment table, provide system information to the cluster management master in each system, detect defects in other systems, and deliver them to the cluster server. It provides a cluster management method of a cluster management master system of a system in which two or more main computers are connected in parallel to recover an application and to execute an application executed in a system in which a defect occurs.

본 발명은 주전산기Ⅲ의 클러스터(cluster)에 관한 것으로서, 특히 클러스터 시스템내의 시스템 구성과 네트워크 상태 등을 감시하여 결함이 발생할 경우 이를 검출하여 복구하고, 각 노드들의 자원을 관리하도록 하는 2이상의 주전산기를 병립 연결한 시스템의 클러스터 관리 마스터 시스템의 클러스터 관리방법에 관한 것이다.The present invention relates to a cluster of the host computer III. In particular, two or more main computers for monitoring the system configuration and network state in the cluster system, detecting and recovering when a defect occurs, and managing resources of each node in parallel Cluster management of connected systems Cluster management method of a master system.

종래의 주전산기Ⅲ 시스템은 도 1에 도시된 바와 같이 시스템버스를 공유하며, 다수의 프로세서로 구성되는 프로세서 보드(10)와; 메모리 보드(20)와; 시스템 제어를 위한 시스템 제어기(System Control Module : 이하 ＂SCM＂이라 약칭함) 보드(30)와; 디스크(41) 및 카트리지 테이프 등을 구동하기 위한 입출력 프로세서가 구비된 입출력(Input/Output Processor : 이하 ＂IOP＂라 약칭함) 보드(40)와; 상기 SCM 보드(30)에서 제공된 브엠이(VME) 버스상에 연결되어 동작하는 랜(LAN : Local Area Network) 제어보드(50)로 구성된다.The conventional main computer III system includes a processor board 10 that shares a system bus as shown in FIG. 1 and includes a plurality of processors; A memory board 20; A system controller board (30) for system control (hereinafter abbreviated as " SCM "); An input / output processor (hereinafter referred to as " IOP ") board 40 having an input / output processor for driving the disk 41 and cartridge tape; The SCM board 30 includes a LAN (Local Area Network) control board 50 connected to and operated on a VME bus.

이와 같이 구성된 종래의 주전산기Ⅲ 시스템은 먼저 SCM 보드(30)는 시스템 공유자원을 제공함과 아울러 프로세서의 부하를 분담하고 시스템 버스와 VME 버스 사이의 데이터를 저장하며, 또한 두 버스의 독립적인 동작을 통하여 시스템의 성능을 향상시킨다.In the conventional main computer III system configured as described above, first, the SCM board 30 provides system shared resources, shares the load of the processor, stores data between the system bus and the VME bus, and also operates independently of the two buses. Improve system performance.

IOP 보드(40)는 디스크(41) 또는 카트리지 테이프를 구동시키고, 시스템 버스로부터의 데이터의 기록 및 판독을 제어한다.The IOP board 40 drives the disk 41 or cartridge tape and controls the writing and reading of data from the system bus.

또한 LAN 제어보드(50)는 VME 버스상에 존재하여 네트워크 서비스를 제공하고, 다른 시스템으로의 데이터를 전송(telnet, ftp, smtp 등)하는 기능들을 제공한다.In addition, the LAN control board 50 exists on the VME bus to provide network services and to provide functions for transmitting data (telnet, ftp, smtp, etc.) to other systems.

종래의 주전산기Ⅲ 시스템은 2 이상의 각각의 시스템을 근거리 통신망(ethernet)으로 연결한다 하더라도, 전체 클러스터 시스템의 관리기능이 없어 한 시스템에서 결함이 발생할 경우 다른 시스템에서는 이를 전혀 알지 못하여 결함 발생을 인지하지 못하여 사용불능 상태를 방치하게 되는 문제점이 있었다. 따라서 본 발명의 목적은 상기한 종래 기술의 문제점을 개선코자 하여 클러스터 관리 마스터 시스템에서 클러스터 관리 환경 테이블에 등록되어 있는 각 시스템으로 패킷 정보를 전송하고, 각 시스템에서 클러스터 관리 마스터로 제공된 시스템의 정보를 점검하여 각 시스템에서의 결함을 검출함과 아울러 클러스터 관리 마스터 시스템에서 각 시스템중 정상 동작중인 시스템의 클러스터 서버에 업무를 전달하고, 결함이 발생한 시스템에서 수행하던 애플리케이션을 정상적인 시스템에서 수행하도록 하는 2이상의 주전산기를 병립 연결한 시스템의 클러스터 관리 마스터 시스템의 클러스터 관리방법을 제공함에 있다.In the conventional main computer III system, even if two or more systems are connected by a local area network (ethernet), there is no management function of the entire cluster system, so when a fault occurs in one system, the other system does not know the fault and cannot recognize the fault. There was a problem of leaving unusable. Accordingly, an object of the present invention is to improve the above-described problems of the prior art and transmit packet information from the cluster management master system to each system registered in the cluster management environment table, and transmit the information of the system provided as the cluster management master in each system. Check and detect defects in each system, and transfer the work from the cluster management master system to the cluster server of the normally operating system of each system, and execute the application running in the system where the defect occurred on the normal system. The present invention provides a cluster management method of a cluster management master system of a system in which a main computer is connected in parallel.

도 1은 종래 주전산기Ⅲ 시스템의 개략 구성도.1 is a schematic configuration diagram of a conventional main computer III system.

도 2는 본 발명을 구현하기 위한 2대의 주전산기Ⅲ 시스템을 연결한 네트워크 구성도.Figure 2 is a network diagram connecting two main computer III system for implementing the present invention.

도 3은 본 발명에 의한 2이상의 주전산기를 병립 연결한 시스템의 클러스터 관리 마스터 시스템의 클러스터 관리과정의 흐름도.3 is a flowchart of a cluster management process of a cluster management master system of a system in which two or more main computers are connected in parallel.

도 4는 본 발명에 의한 각 주전산기Ⅲ 시스템에서의 제어 흐름도4 is a control flowchart of each main computer III system according to the present invention.

＜도면의 주요부분에 대한 부호의 설명＞<Description of the code | symbol about the principal part of drawing>

100, 200 : 제 1, 제 2 주전산기Ⅲ 시스템100, 200: first and second main computer III system

300 : 클러스터 관리 마스터300: cluster administration master

본 발명의 목적을 달성하기 위한 방법은 클러스터 관리 환경 테이블에 등록되어 있는 각 시스템의 노드로 패킷 정보를 일정시간 간격으로 전송하여 각 노드의 상태를 검출하는 노드상태 검출과정과; 상기 패킷에서의 메시지 검출결과 해당 노드가 정상상태로 판단될 경우 이를 클러스터 관리 환경 테이블에 업데이트(Update)하고 사용자에게 알리는 정상상태 처리과정과; 상기 패킷에서의 메시지 검출 결과 해당 노드가 비정상으로 판단될 경우 이를 클러스터 관리 환경 테이블에 업데이트하고, 해당 시스템을 사용중인 터미널에 알리는 결함발생 처리과정으로 이루어짐을 특징으로 한다.A method for achieving the object of the present invention includes a node state detection process of detecting the state of each node by transmitting packet information to nodes of each system registered in the cluster management environment table at regular intervals; A normal state processing step of updating the cluster management environment table to the cluster management environment table and informing the user when the node is determined to be in a normal state as a result of the message detection in the packet; If the node is determined to be abnormal as a result of message detection in the packet, the node is updated with a cluster management environment table, and a defect generation process of informing the terminal using the system is characterized in that the process is performed.

여기서 정상상태 처리과정은 상기 클러스터 관리 환경 테이블에 현재상태를 등록하는 단계와; 해당 시스템의 클러스터 서버의 모니터상에 현재 상태를 디스플레이 시키는 단계로 이루어짐을 특징으로 한다.The steady state processing may include registering a current state in the cluster management environment table; And displaying the current status on the monitor of the cluster server of the corresponding system.

또한 상기 결함발생 처리과정은 결함발생 상태를 클러스터 관리환경 테이블에 등록하는 단계와; 상기 결함 발생상태를 클러스터 서버의 모니터상에 디스플레이 시키는 단계와; 정상 동작하는 다른 시스템으로 결함발생 상태 메시지를 전송하는 단계로 이루어짐을 특징으로 한다.In addition, the defect generation process includes registering a defect state in a cluster management environment table; Displaying the defect occurrence state on a monitor of a cluster server; And transmitting a fault status message to another system in normal operation.

본 발명의 다른 목적을 달성하기 위한 방법은 클러스터 관리 대몬정보(Cluster Management Daemon information : 이하 ＂cmdinfo＂라 약칭함) 명령을 수행하여 클러스터에 필요한 정보를 정보 테이블에 저장함을 특징으로 한다.A method for achieving another object of the present invention is characterized by storing information necessary for a cluster in an information table by performing a Cluster Management Daemon information (hereinafter abbreviated as "cmdinfo") command.

본 발명의 또 다른 목적을 달성하기 위한 어느 한 주전산기 시스템에서의 처리방법은 클러스터 관리 마스터로부터 송출된 패킷 정보를 검색한 결과 다른 시스템의 결함 발생 정보일 경우 결함이 발생된 시스템에서 수행하던 애플리케이션을 수행하는 결함 처리과정과; 상기 클러스터 관리 마스터로부터 송출된 패킷정보가 커맨드(command) 메시지 일 경우 커맨드를 검출하여 쉘(shell)상에서 해당 커맨드를 수행하는 커맨드 메시지 처리과정과; 상기 클러스터 관리 마스터로부터 송출된 패킷정보가 정보 메시지일 경우 cmdinfo 명령으로 얻은 정보를 패킷에 넣어 클러스터 관리 마스터에 전송하는 정보 메시지 처리과정으로 이루어짐을 특징으로 한다.According to another embodiment of the present invention, a processing method of a main computer system performs an application performed in a system in which a defect is generated when the packet information transmitted from a cluster management master is found as a result of defect information of another system. Fault handling process; A command message processing step of detecting a command and executing a corresponding command in a shell when the packet information sent from the cluster management master is a command message; When the packet information sent from the cluster management master is an information message, the information message processing is performed by putting information obtained by the cmdinfo command into a packet and transmitting the information to the cluster management master.

이와 같이 이루어진 본 발명을 구현하기 위한 주전산기Ⅲ 시스템의 한 실시예의 구성은 다음과 같다.One embodiment of the host computer III system for implementing the present invention made as described above is as follows.

도 2는 주전산기Ⅲ 시스템 두 대인 경우 서로 네트워크로 연결된 블록도로서, 이에 도시된 바와 같이 네트워크로 연결된 두 대의 주전산기Ⅲ 시스템(100)(200)과; 상기 각 주전산기Ⅲ 시스템(100)(200)의 데이터를 상호 전송하는 이더넷(Ethernet)상에서 각 주전산기Ⅲ 시스템(100)(200)을 관리하고, 결함검출 및 이를 처리하기 위한 클러스터 관리 마스터(300)가 워크스테이션(Workstation)으로 구성된다.FIG. 2 is a block diagram of two main computer III systems connected to each other by a network. As shown in FIG. 2, two main computer III systems 100 and 200 are connected to each other by a network; The cluster management master 300 manages the main computer III system 100 and 200 on Ethernet, which transmits data of the main computer III system 100 and 200 to each other, and detects and processes the defects. It consists of a workstation.

여기서 LAN 제어보드(50)를 통해서 전체 클러스터 시스템이 구성되고, 디스크(41)는 제 1, 제 2 주전산기Ⅲ 시스템(100)(200)에서 공유한다.Here, the entire cluster system is configured through the LAN control board 50, and the disk 41 is shared by the first and second main computer III systems 100 and 200.

또한 제 1, 제 2 주전산기Ⅲ 시스템(100)(200)은 종래 기술과 동일하므로, 이하 도 1의 각 블록과 동일 부호를 부여하고 그 설명은 생략한다.In addition, since the 1st, 2nd main computer III system 100 (200) is the same as that of the prior art, the same code | symbol is attached | subjected to each block of FIG. 1, and the description is abbreviate | omitted.

이와 같이 이루어진 본 발명의 바람직한 실시예를 첨부된 도면 도 2 내지 도 4를 참조하여 상세히 설명하면 다음과 같다.When described in detail with reference to the accompanying drawings, preferred embodiments of the present invention made as described above 2 to 4 as follows.

먼저 도 3은 주전산기Ⅲ의 클러스터 관리과정의 흐름도로서, 이에 도시된 바와 같이 본 발명은 워크스테이션에서 클러스터 관리 마스터 프로세스가 수행되고, 각 주전산기Ⅲ 시스템(100)(200)에서는 클러스터 관리 마스터(300)와 통신하면서 시스템의 정보를 제공하고, 다른 시스템에서 결함이 발생했을 때 이를 처리하도록 조치를 취하는 기능을 하는 클러스터 서버를 수행시킨다.First, Figure 3 is a flow chart of the cluster management process of the main computer III, as shown in the present invention, the cluster management master process is performed in the workstation, the cluster management master 300 in each main computer III system (100) (200) It runs a cluster server that provides information about the system and communicates with it and takes action to deal with it when another system fails.

여기서 클러스터 관리 마스터는 각 주전산기Ⅲ 시스템(100)(200)의 클러스터 서버와 통신하여 해당 시스템의 실행여부를 판단하고 실행상태이면 시스템 정보들을 패킷으로 받아 온다.Here, the cluster management master communicates with the cluster server of each main computer III system 100 (200) to determine whether the corresponding system is executed, and receives the system information as a packet if the system is in the running state.

좀더 상세히 설명하면, 클러스터 관리 마스터(300)는 클러스터 환경 테이블에 등록되어 있는 각 클러스터 시스템의 노드 상태를 알아보기 위해 정보 메시지를 패킷 형태로 각 노드에 전송한다.In more detail, the cluster management master 300 transmits an information message to each node in the form of a packet to find out the node status of each cluster system registered in the cluster environment table.

이와 같은 패킷을 통해 정보를 일정주기로 검출하고 정보 메시지의 결과를 분석하여 클러스터 관리 환경 테이블에 업데이트 시킨 후 윈도우상에 그래픽 형태로 노드의 상태를 디스플레이 시킨다.Through such a packet, information is detected at regular intervals, the result of the information message is analyzed and updated in the cluster management environment table, and the node status is displayed in a graphic form on a window.

만약 전송된 패킷의 정보 메시지의 분석 결과 노드가 정상이 아닐 경우에는 정상인 시스템의 클러스터 서버로 결함 메시지를 전송하여 해당 애플리케이션을 수행하도록 한다.If the node is not normal as a result of analyzing the information message of the transmitted packet, the fault message is transmitted to the cluster server of the normal system to execute the corresponding application.

도 4는 클러스터 관리 마스터에 의해 관리되는 제 1, 제 2 주전산기Ⅲ 시스템의 제어 흐름도로서, 이에 도시한 바와 같이 주전산기Ⅲ 시스템(100)(200)에서는 클러스터 관리 대몬 정보(cmdinfo : Cluster Management Daemon Information) 명령을 수행하여 각 클러스터에서 필요한 정보를 정보 테이블에 저장한다.FIG. 4 is a control flowchart of the first and second main computer III systems managed by the cluster management master. As shown in FIG. 4, cluster management daemon information (cmdinfo: Cluster Management Daemon Information) is performed in the main computer III system 100 and 200. Run the command to store the information needed for each cluster in the information table.

또한 클러스터 관리 마스터로부터 송출된 패킷을 일정간격으로 검색하여 패킷에 실린 메시지가 결함 메시지일 경우에는 파일 시스템에 관련된 로크(lock)정보를 클리어시키고 결함에 의해 정상적으로 동작하지 못하는 시스템에서 수행하던 애플리케이션을 정상적으로 수행한다.In addition, if the message sent from the cluster management master is searched at regular intervals and the message contained in the packet is a fault message, the lock information related to the file system is cleared, and the application running in the system that cannot operate normally due to the fault is normally executed. Perform.

만약 패킷에 실린 메시지가 클러스터 시스템을 관리하기 위한 코맨드 메시지일 경우 패킷으로부터 커맨드를 검출하여 쉘모드에서 해당 커맨드를 수행한다.If the message in the packet is a command message for managing the cluster system, the command is detected from the packet and the command is executed in the shell mode.

그리고 패킷에 실린 메시지가 정보 메시지일 경우에는 cmdinfo 명령으로 얻은 정보를 페킷에 넣은 후 클러스터 관리 마스터(300)로 전송한다.When the message included in the packet is an information message, information obtained by the cmdinfo command is put into a packet and then transmitted to the cluster management master 300.

또한 클러스터 관리 마스터(300)는 각 클러스터 시스템을 관리하기 위해 프로세서의 부하, 메모리 사용상태와 디스크의 사용 상태등을 진단 및 감시하는 기능을 더 제공하게 된다.In addition, the cluster management master 300 further provides a function of diagnosing and monitoring a load of a processor, a memory use state, and a disk use state to manage each cluster system.

이상에서 설명한 바와 같이 본 발명은 클러스터 관리 환경 테이블에 등록되어 있는 각각의 시스템으로 패킷 정보를 전송하고, 각 시스템에서 클러스터 관리 마스터에 시스템의 정보를 제공하고, 다른 시스템에서 결함이 발생했을 경우 이를 클러스터 서버에 전달하여 알림으로써, 발생된 결함을 복구하도록 할 수 있으며, 또한 결함이 발생된 시스템에서 수행하던 애플리케이션을 정상 동작하는 다른 시스템에서 수행하도록 할 수 있는 효과가 있다.As described above, the present invention transmits packet information to each system registered in the cluster management environment table, provides system information to the cluster management master in each system, and if a defect occurs in another system, By sending a notification to the server, it is possible to recover a fault that occurred, and also has the effect that the application running on the system where the fault occurred can be executed on another system that operates normally.

Claims

A node state detection process of detecting a state of each node by transmitting a packet including system information to a node of each system registered in a cluster management environment table at regular intervals; A normal state processing step of updating the cluster management environment table if the node is determined to be in a normal state as a result of message detection in the packet; If the node is determined to be abnormal as a result of the message detection in the packet, at least two main computers are configured to update the cluster management environment table and notify the terminal in use by connecting to the system to which the node belongs. Cluster Management of Parallel System Cluster management method of master system.

The method of claim 1, wherein the steady state processing comprises: registering a current state in the cluster management environment table; A cluster management method of a cluster management master system of a system in which two or more main computers are in parallel, comprising: displaying a current state on a monitor of a cluster server of a corresponding system.

The method of claim 1, wherein the defect generation process comprises: registering a defect occurrence state in a cluster management environment table; Displaying the defect occurrence state in a graphic form on a window of a corresponding cluster server; A cluster management method of a cluster management master system of a system in which two or more main computers are connected in parallel to each other.

A defect processing process of executing an application that was executed in a system in which a defect was generated when the packet information transmitted from the cluster management master was found to be defect information of another system; A command message processing step of detecting a command and executing a corresponding command in a shell mode when the packet information sent from the cluster management master is a command message; If the packet information sent from the cluster management master is an information message, the information message processing process is performed by putting information obtained by the cluster management daemon information command into a packet and transmitting the information to the cluster management master. Cluster management of the cluster management master system.