KR20000038701A

KR20000038701A - Method for duplicating error monitoring of distribution processor using internet control message protocol

Info

Publication number: KR20000038701A
Application number: KR1019980053782A
Authority: KR
Inventors: 홍성주; 권은희; 김상현
Original assignee: 이계철; 한국전기통신공사
Priority date: 1998-12-08
Filing date: 1998-12-08
Publication date: 2000-07-05
Also published as: KR100279660B1

Abstract

PURPOSE: A method for duplicating an error monitoring of a distribution processor using an internet control message protocol(ICMP) is provided to correctly monitor an operation of an active process and increase reliability of the error monitoring, by enabling a standby processor to perform the monitoring using an echo message and echo reply message of the ICMP as a heart-beat message. CONSTITUTION: A communication initializing environment is set for an active processor and standby processor(S1). The standby processor performs a function for processing an acceptation in order to monitor an operation state of the active processor, and sets an alarm signal so as to prevent an infinite standby state for performing the acceptation processing function(S2). If the acceptation processing function has been performed successfully(S3), the active processor performs a function for performing the error monitoring(S4-S6). If the execution of the acceptation processing function has failed(S3), a server of the standby processor directly communicates with the ICMP of the active processor in order to check operation state of the active processor again, and then performs a corresponding process according to the operation state thereof(S8-S12).

Description

Fault Monitoring Redundancy Method of Distributed Processing Unit Using Internet Control Message Protocol (ICPM)

본 발명은 인터넷 제어 메시지 프로토콜(Internet Control Message Protocol : 이하 ICMP라 칭한다)을 이용한 분산처리 장치의 장애감시 이중화 방법에 관한 것으로, 여러 대의 프로세서들이 이중화된 근거리통신망(LAN)으로 연결되어 하나의 시스팀을 구성해 서비스를 제공하는 분산 처리 장치에서 장애감시 기능을 액티브/스탠바이(active/standby) 방식으로 이중화 시켜 액티브 프로세서에 장애가 발생하여 장애감시 활동을 수행할 수 없을 경우, 스탠바이 프로세서가 자동으로 이를 감지하여 액티브 프로세서의 장애감시 기능을 인수받아 연속적으로 장애감시 기능을 수행하고, 액티브 프로세서가 다시 구동된 것이 감지되면 스탠바이 프로세서가 수행하던 장애감시 기능을 중단하고 원래 상태로 환원하는 기술로써, 분산 처리 장치의 신뢰도를 향상시키는 방법에 관한 것이다.The present invention relates to a failure monitoring duplication method of a distributed processing apparatus using the Internet Control Message Protocol (hereinafter referred to as ICMP), wherein a plurality of processors are connected to a redundant local area network (LAN) to connect a system. In the distributed processing unit that provides the service, the fault monitoring function is duplicated in an active / standby manner. When the active processor fails and the fault monitoring activity cannot be performed, the standby processor automatically detects it. It is a technology that takes over the fault monitoring function of the active processor and continuously performs the fault monitoring function, and when the active processor is detected to be restarted, stops the fault monitoring function performed by the standby processor and returns it to its original state. On how to improve reliability Will.

일반적으로 분산 처리 장치를 구성하는 각 프로세서는 다른 프로세서의 동작 상태를 정확하게 인지하고 있어야만 장애 발생 시에도 프로세서간 통신에서 데이터 유실 없이 안정적으로 서비스를 제공할 수 있으므로 지능망 시스팀처럼 다수의 가입자가 이용하는 시스팀은 장애감시 기능의 이중화가 필수적이다.In general, each processor constituting the distributed processing unit needs to be aware of the operation status of the other processor to provide reliable service without loss of data in interprocessor communication even in the event of a failure. Redundancy of the fault monitoring function is essential.

이러한 이중화 방식의 장애감시는 주로 액티브/스탠바이 방식으로 장애감시 기능을 이중화시키는 방식인데, 이를 위해서는 첫째 액티브 프로세서에 언제 장애가 발생할지 알 수 없으므로 액티브 프로세서와 스탠바이 프로세서는 분산 처리 장치의 장애감시 대상(LAN 장애, 프로세스 장애, 프로세서 장애, 서비스 장애 등)에 대하여 언제나 일치된 상태를 유지하고 있어야 한다.This redundant type of fault monitoring mainly uses the active / standby method to double the fault monitoring function. For this purpose, since the first active processor does not know when a failure occurs, the active processor and the standby processor are the targets of the distributed processing unit. There must always be a consistent state of failure, process failure, processor failure, service failure, etc.).

이를 위해 액티브 프로세서는 수집/분석한 장애메시지와 형상테이블(장애 감시 대상의 현재 상태를 나타내는 테이블)을 실시간으로 스탠바이 프로세서에 넘겨 줘야 한다. 그래야만 액티브 프로세서에 언제 장애가 발생하더라도 스탠바이 프로세서가 그 뒤를 이어 연속적으로 장애감시를 수행할 수 있다.To do this, the active processor must pass the collected / analyzed fault messages and the configuration table (a table representing the current state of the fault monitor target) to the standby processor in real time. That way, when an active processor fails, the standby processor can continue to perform fault monitoring.

둘째 스탠바이 프로세서는 액티브 프로세서의 동작 상태를 실시간으로 정확하게 감시하고 있어야 한다.Second, the standby processor must accurately and accurately monitor the active state of the active processor in real time.

만약 스탠바이 프로세서가 액티브 프로세서의 상태를 잘못 감시하는 경우에는 액티브 프로세서와 스탠바이 프로세서 모두에서 장애감시관련 프로세스들이 동작할 수 있다.If the standby processor incorrectly monitors the state of the active processor, fault monitoring processes may operate on both the active processor and the standby processor.

이와 같은 조건하에 현재 사용되고 있는 이중화 방식은 액티브 프로세서와 스탠바이 프로세서에 데몬 프로세스를 두어 두 어플리케이션(application) 프로세스가 메시지를 주고 받으며 상태를 감시하도록 하는 방식을 사용하고 있다. 그런데 이 방식은 상대방 어플리케이션 프로세스가 응답하는 것이므로 프로세서의 자원(resource)을 낭비할 뿐만 아니라 상대 프로세스가 다운된 경우 프로세서 장애로 간주되어 장애감시의 신뢰도를 떨어트리는 문제점이 있다.Under these conditions, the redundancy scheme currently in use employs a daemon process on the active processor and the standby processor to allow two application processes to send and receive messages and monitor status. However, since the counterpart application process responds, it not only wastes the resources of the processor but also causes the processor failure when the counterpart process is down.

본 발명은 상기와 같은 종래 문제점 및 요구사항을 감안하여, 스탠바이 프로세서가 액티브 프로세스의 동작 상태를 정확하게 감시하기 위해 전송제어프로토콜/인터넷 프로토콜(TCP/IP)에서 제공하는 ICMP 프로토콜의 에코(이하 ICMP_ECHO라 칭한다) 메시지와 에코응답(이하 ICMP_ECHOREPLY라 칭한다) 메시지를 하트-비트(heart-beat) 메시지로 사용하여 장애감시를 수행하므로써, 장애감시 신뢰도를 향상시키는 것을 목적으로 한다.In view of the above problems and requirements, the present invention provides an echo of ICMP protocol (hereinafter referred to as ICMP_ECHO) provided by a transmission control protocol / Internet protocol (TCP / IP) for a standby processor to accurately monitor the operating state of an active process. By using the message and the echo response (hereinafter referred to as ICMP_ECHOREPLY) message as a heart-beat message, the fault monitoring is performed to improve the fault monitoring reliability.

즉, ICMP 프로토콜을 사용하는 경우에는 스탠바이 프로세서의 어플리케이션이 전송한 ICMP_ECHO 메시지에 대하여, 액티브 프로세서(A)의 어플리케이션이 응답하는 것이 아니라 ICMP 프로토콜이 ICMP_ECHOREPLY 메시지로 응답하는 것이므로 프로세서의 자원 낭비가 없고, 장애감시의 신뢰도를 높일 수 있게 된다.In other words, when the ICMP protocol is used, the ICMP protocol is not responded to the ICMP_ECHO message sent by the standby processor's application, but the ICMP protocol responds with the ICMP_ECHOREPLY message. The reliability of monitoring can be increased.

여기서 상기 ICMP_ECHO 메시지와 ICMP_ECHOREPLY 메시지는 인터넷 프로토콜 통신 시 에러를 발견한 프로세서나 라우터(router)가 에러의 원인이 되는 패킷을 전송한 프로세서 측으로 에러 발생 사실을 통지하는데 사용하는 ICMP 프로토콜의 메시지들 중 하나이고 , 하트-비트 메시지는 어떤 대상의 동작상태를 주기적으로 체크하기 위해 사용되는 메시지를 말한다.Herein, the ICMP_ECHO message and the ICMP_ECHOREPLY message are one of the ICMP protocol messages used by the processor or router that finds an error in the Internet protocol communication to notify the processor that sent the packet causing the error. For example, a heart-beat message is a message used to periodically check an operation state of a target.

도 1은 본 발명이 적용되는 분담 처리 장치의 개략적인 구성을 나타내는 블록도.BRIEF DESCRIPTION OF THE DRAWINGS The block diagram which shows schematic structure of the shared processing apparatus to which this invention is applied.

도 2는 분담 처리 장치에서 본 발명에 의한 액티브/스탠바이 구조의 장애감시 이중화를 나타내는 블록도.2 is a block diagram showing fault monitoring redundancy of an active / standby structure according to the present invention in a sharing processing apparatus.

도 3은 본 발명에 의한 장애감시용 스탠바이 프로세서에서 장애감시 및 그 처리 과정을 나타내는 순서도.Figure 3 is a flow chart illustrating a fault monitoring and its processing in the fault monitoring standby processor according to the present invention.

＜ 도면의 주요부분에 대한 부호의 설명 ＞<Explanation of symbols for the main parts of the drawings>

1 : 장애감시용 액티브 프로세서1: Active processor for fault monitoring

11 : 장애감시용 스탠바이 프로세서11: standby processor for fault monitoring

2, 21, 3, 31 : 프로세서 a : 서버2, 21, 3, 31: processor a: server

b : 클라이언트b: client

상기와 같은 목적을 달성하기 위해, 분산처리 장치에서 액티브 처리기와 스탠바이 처리기를 구비하는 장애감시 이중화 방법에 있어서,In order to achieve the above object, in the failure monitoring redundancy method comprising an active processor and a standby processor in the distributed processing device,

상기 액티브 처리기와 스탠바이 처리기의 통신 초기 환경을 설정하는 제 1 과정과;A first step of establishing an initial communication environment between the active processor and a standby processor;

상기 스탠바이 처리기에서 액티브 처리기의 동작 상태를 감시하기 위해 수락처리 기능을 수행하고, 이 수락처리 기능이 무한 대기 상태로 빠지는 것을 방지하기 위해 알람신호를 설정하는 제 2 과정과;A second step of performing an acceptance processing function to monitor an operation state of an active processor in the standby processor, and setting an alarm signal to prevent the acceptance processing function from falling into an infinite standby state;

상기 수락처리 기능 수행의 성공적인 수행 여부를 판단하여 성공적으로 수행이 완료되면 액티브 프로세서가 정상 동작하고 있는 경우이므로, 스탠바이 프로세서에 장애감시관련 데몬 프로세스가 동작 중인지 아닌지 검사하여 동작중이면 액티브 프로세서가 장애 상태에서 정상 상태로 환원된 경우이므로 스탠바이 프로세서에서 동작중인 장애감시 관련 데몬 프로세스들을 정지시킨 후 액티브 프로세서가 수집한 장애 데이터를 수집/분석하는 루틴을 수행하고, 장애감시 관련 데몬 프로세스가 동작중이 아니라면 그 이전에도 액티브 프로세서가 정상동작하고 있는 경우이므로 장애메시지 수집/분석 루틴만 수행하여 액티브 프로세서와 스탠바이 프로세서간 형상을 일치시키고 장애감시 주 임무는 액티브 프로세서가 계속 수행하게 하는 제 3 과정과;If it is determined that the acceptance processing function is successfully performed and the successful execution is completed, the active processor is operating normally. If the standby processor is in operation, the active processor is in a failure state. In this case, the standby process stops the fault monitoring daemon processes running on the standby processor and executes the routine to collect / analyze fault data collected by the active processor.If the fault monitoring daemon process is not running, Since the active processor has been operating normally, a third process of performing only a fault message collection / analysis routine to match a shape between the active processor and the standby processor and maintaining the active task is performed by the active processor;

상기 수락처리 기능 수행이 실패하면 액티브 처리기의 동작상태 재 확인을 위해 스탠바이 처리기의 클라이언트가 액티브 처리기의 인터넷 제어 메시지 프로토콜(ICMP)과 직접 통신을 한 후, 액티브 처리기의 동작 상태에 따라 해당 처리를 수행하는 제 4 과정을 포함하여 수행하는 것을 특징으로 한다.If the acceptance processing fails, the client of the standby processor communicates directly with the Internet Control Message Protocol (ICMP) of the active processor to recheck the operation state of the active processor, and then performs the processing according to the operation state of the active processor. Including a fourth process to perform.

상술한 목적, 특징 그리고 장점은 첨부된 도면과 관련한 다음의 상세한 설명을 통하여 보다 분명해 질 것이다. 이하 첨부된 도면을 참조하여 본 발명의 실시 예를 상세히 설명하면 다음과 같다.The above objects, features and advantages will become apparent from the following detailed description taken in conjunction with the accompanying drawings. Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명이 적용되는 분담 처리 장치의 개략적인 구성을 나타내는 블록도로, 부하분담 방식으로 이중화되어 서비스를 처리하는 프로세서(B, B', C, C')(2, 21, 3, 31)들과;1 is a block diagram showing a schematic configuration of a sharing processing apparatus to which the present invention is applied, and is a processor (B, B ', C, C') (2, 21, 3, 31) that is duplicated in a load sharing scheme to process a service. );

장애 감시를 담당하는 액티브 프로세서(A)(1) 및 스탠바이 프로세서(A')(11)와;An active processor (A) 1 and a standby processor (A ') 11 that are responsible for fault monitoring;

상기 각 프로세서들을 연결하는 이중화된 근거리통신망(LAN)(4)을 구비한다.It is provided with a redundant local area network (LAN) 4 connecting the respective processors.

여기서 상기 프로세서(A)(1)는 장애 감시 기능을 수행하는 액티브 프로세서로서, 분산 처리 장치를 구성하는 각 프로세서에서 발생하는 장애메시지를 수집/분석하는 프로세서이고, 프로세서(A')(11)는 액티브 프로세서(A) 장애 시 액티브 프로세서(A)의 기능을 백업하기 위해 스탠바이로 대기하고 있는 프로세서이다.Here, the processor (A) 1 is an active processor that performs a failure monitoring function, a processor that collects / analyzes a failure message generated by each processor constituting the distributed processing device, and the processor (A ') 11 is Active processor (A) is a processor waiting in standby to back up the function of the active processor (A) in case of failure.

도 2는 도 1과 같이 구성된 분담 처리 장치에서 정상 상황에서의 장애감시 순회를 나타낸 도면으로, 각 종 서비스를 처리하는 프로세서(2, 21, 3, 31)들은 그 내부에 서버(a)들이 존재하며, 장애감시용 액티브 프로세서(1) 내부에는 클라이언트(b)가 존재하고, 장애감시용 스탠바이 프로세서(11) 내부에는 서버(a)와, 클라이언트(b : 액티브 프로세서 장애 시에만 동작)가 각각 존재한다.FIG. 2 is a diagram illustrating a fault monitoring traversal in a normal situation in the shared processing apparatus configured as shown in FIG. 1, wherein the processors 2, 21, 3, and 31 that process various services have servers a in them. The client (b) is present inside the active processor for failure monitoring (1), and the server (a) and the client (b: operates only when an active processor fails) exist in the standby processor (11) for monitoring. do.

상기와 같은 상태에서 장애감시용 액티브 프로세서(A)(1)에서 동작하는 클라이언트(b) 프로세스는 주기적으로 분산 처리 장치를 구성하는 프로세서(B, B',C, C')(2, 21, 3, 31)에서 동작하는 서버(a) 프로세스에 순차적으로 접속(connect)하여 장애메시지를 수집/분석한 후, 형상테이블을 업데이트 시킨다.In the above state, the client (b) process operating in the fault-monitoring active processor (A) 1 periodically processes the processors B, B ', C, C' (2, 21, 3, 31) sequentially connect to the server (a) process operating in 3, 31 collects and analyzes the fault message, and then updates the shape table.

그런 다음 스탠바이 프로세서(A')(11)에서 동작하고 있는 서버(a)에 접속하여 상기 수집한 장애메시지와 형상테이블을 전송해 두 프로세서(A,A')(1, 11)간 형상을 일치시킨다.Then, the server (a) operating in the standby processor (A ') 11 is connected to transmit the collected fault message and the shape table to match the shape between the two processors (A, A') (1, 11). Let's do it.

만약 두 프로세서(A, A')(1, 11) 사이의 형상이 일치하지 않으면 액티브 프로세서(A)(1) 장애 시 스탠바이 프로세서(A')(11)는 분산 처리 장치에 대하여 잘못된 형상을 유지하게 되어 서비스 처리에 심각한 영향을 미칠 수 있다.If the shapes between the two processors (A, A ') (1, 11) do not match, the standby processor (A') 11 maintains an incorrect shape for the distributed processing unit when the active processor (A) 1 fails. This can seriously affect service processing.

스탠바이로 대기하고 있는 프로세서(A')(11)에서 동작하고 있는 서버(a) 프로세스는 일정시간 동안 액티브 프로세서(A)(1)로부터 접속 요구가 없으면 액티브 프로세서(A)(1)에 장애가 발생한 것으로 간주하고 액티브 프로세서의 ICMP 프로토콜로 ICMP_ECHO 메시지를 전달하고 이에 대한 응답 메시지인 ICMP_ECHOREPLY 메시지 수신을 대기한다.The server (a) process operating in the processor (A ') 11 waiting on standby fails in the active processor (A) 1 when there is no connection request from the active processor (A) 1 for a predetermined time. The ICMP protocol is forwarded to the active processor's ICMP protocol and waits for an ICMP_ECHOREPLY message.

즉 ICMP_ECHO 메시지와 ICMP_ECHOREPLY 메시지를 일종의 하트-비트(heart-beat) 메시지처럼 사용하여 액티브 프로세서(A)(1)의 동작 상태를 검사하는 것이다.That is, the ICMP_ECHO message and the ICMP_ECHOREPLY message are used as a kind of heart-beat message to check the operation state of the active processor A.

이러한 상태에서 만약 액티브 프로세서(A)(1)가 ICMP_ECHO 메시지에 대한 응답메시지를 전송하지 않으면 장애감시용 스탠바이 프로세서(A')의 서버(a) 프로세스는 장애감시용 액티브 프로세서(A)(1)에 장애가 발생한 것으로 간주하고, 장애감시용 스탠바이 프로세서(A')(11)에 장애감시 관련 데몬(daemon) 프로세스를 동작시켜 스탠바이 프로세서(A')(11)가 장애감시 주 임무를 수행하도록 한다.In this state, if the active processor (A) 1 does not send a response message for the ICMP_ECHO message, the server (a) process of the standby standby processor (A ') is the active processor (A) for failure monitoring (1). A fault has occurred, and a standby monitoring daemon (A ') 11 operates a failure monitoring related daemon (daemon) process so that the standby processor (A') 11 performs a failure monitoring main task.

이러한 방식으로 장애감시 기능을 이중화할 수 있는 것은 장애감시용 액티브 프로세서(A)(1)에서 장애메시지를 수집/분석하는 프로세스는 클라이언트(b)로 동작하므로, 서버(a) 프로세스가 동작하는 프로세서(B, B', C, C')(2, 21, 3, 31)의 인터넷 프로토콜 어드레스(IP address)와 포트번호(port number)만 알면 클라이언트(b) 프로세스는 어느 프로세서에서 동작해도 서버(a) 프로세스에 접속을 시도하여 장애메시지를 수집/분석할 수 있기 때문이다.In this way, the fault monitoring function can be duplicated because the process of collecting / analyzing fault messages from the fault-monitoring active processor (A) 1 acts as a client (b), so that the server (a) process operates. Knowing only the IP address and port number of (B, B ', C, C') (2, 21, 3, 31), the client (b) process can run on any processor a) A failure message can be collected / analyzed by attempting to access the process.

참고로 상기 본 발명에서는 장애감시 기능을 위해 장애감시용 전용 프로세서(A, A')(1, 11)를 설치하였으나, 전용 프로세서를 사용하지 않고 서비스를 처리하는 프로세서에 장애 감시기능을 구현해도 무방하다.For reference, in the present invention, the fault monitoring dedicated processor (A, A ') (1, 11) is installed for the fault monitoring function, but the fault monitoring function may be implemented in a processor for processing a service without using a dedicated processor. Do.

그리고 상기에서 장애감시용 액티브 프로세서(A)(1)에 장애가 발생하여 장애감시용 스탠바이 프로세서(A')(11)에서 클라이언트(b)가 동작하여 장애감시 기능을 수행하게 될 경우, 상기 장애감시용 액티브 프로세서(A)(1)에서 동작하는 클라이언트(b)와 다른 점은 장애메시지를 수집하여 파일 형태로 저장하였다가 프로세서(A)가 다시 동작하면 프로세서(A)로 이를 전송하여 장애 발생 내역을 관리할 수 있도록 한다는 것이다.If the failure occurs in the failure monitoring active processor (A) 1 and the client (b) in the failure monitoring standby processor (A ') 11 to perform the failure monitoring function, the failure monitoring The difference from the client (b) operating in the active processor (A) (1) is that the failure message is collected and stored in the form of a file, when the processor (A) operates again and transmits it to the processor (A) details Is to manage.

이하, 상기와 같은 장애감시 절체 동작을 수행하는 장애감시용 스탠바이 프로세서(A')(11) 내 서버(a) 프로세스의 역할을 도 3을 참조하여 상세히 설명하면 하기와 같다.Hereinafter, the role of the server a process in the standby monitoring processor (A ') 11 for performing the above-described failure monitoring switching operation will be described in detail with reference to FIG. 3.

장애감시용 스탠바이 프로세서(A')(11)는 장애감시용 액티브 프로세서(A)(1)의 동작 상태를 감시하면서 상기 액티브 프로세서(A)(1)의 클라이언트(b)가 분산 처리 장치를 구성하는 프로세서(2, 21, 3, 31)들로부터 수집/분석한 장애메시지와 형상테이블을 수신하고, 액티브 프로세서(A) 장애 시 액티브 프로세서(A)에서 동작하던 장애감시관련 데몬 프로세스들을 동작시켜 장애감시 기능을 이중화시키는 역할을 하는 바, 이를 위한 절차는 먼저 스탠바이 프로세서(A')(11)의 서버(a)는 액티브 프로세서(A)(1)의 동작 여부를 감시하기 위한 하트-비트 메시지로 사용하는 ICMP_ECHO 메시지 및 이에 대한 응답 메시지인 ICMP_ECHOREPLY 메시지를 송/수신할 수 있는 환경과, 액티브 프로세서(A)(1)의 클라이언트(b)와 수락처리 기능 을 블록(block) 모드로 수행 할 수 있도록 소켓 통신 환경을 설정한다(제 1단계(S1)).The standby monitoring processor (A ') 11 monitors the operating state of the active monitoring processor (A) 1 while the client (b) of the active processor (A) 1 constitutes a distributed processing apparatus. Receives the fault message and shape table collected / analyzed from the processors 2, 21, 3, and 31, and operates the fault monitoring related daemon processes operating in the active processor A when the active processor A fails. The procedure for this purpose is to duplicate the monitoring function. First, the server (a) of the standby processor (A ') 11 is a heart-beat message for monitoring the operation of the active processor (A) (1). It is possible to send / receive the ICMP_ECHO message and the ICMP_ECHOREPLY message, which is a response message, and to execute the client (b) and the acceptance processing function of the active processor (A) in block mode. Socket communication It sets a path (step 1 (S1)).

이어 스탠바이 프로세서(A')(11)의 서버(a)는 액티브 프로세서(A)(1) 내 클라이언트(b)의 접속 요구(①)를 수신하기 위한 수락처리 기능(accept())에 대한 콜을 수행하기 전에, 일정 시간 후에 알람신호가 발생하도록 알람시그날(SIGALRM : 이 신호는 유닉스 시스팀에서 제공하는 신호임) 신호를 설정한 다음 상기 수락처리 기능 콜을 수행한다.Subsequently, the server (a) of the standby processor (A ') 11 calls a call for an acceptance processing function (accept ()) for receiving a connection request (1) of the client (b) in the active processor (A) (1). Before performing the operation, the alarm signal (SIGALRM: this signal is provided by the Unix system) is set to generate an alarm signal after a predetermined time, and then the acceptance processing function call is performed.

이때 상기 알람신호가 발생되는 시간은 액티브 프로세서(A)(1)에서 동작하는 클라이언트(a)가 분산 처리 장치를 구성하는 모든 프로세서(2, 21, 3, 31)들에 대해 장애감시 순회 루틴을 한번 수행하는 시간에 의존하는 파라미터로 장애감시 순회 루틴을 1회 수행하는데 소요되는 시간의 두 배로 한다(제 2 단계(S2)).At this time, the time when the alarm signal is generated, the client (a) operating in the active processor (A) (1) performs a failure monitoring circuit routine for all the processors (2, 21, 3, 31) constituting the distributed processing unit As a parameter depending on the time to perform once, it doubles the time required to perform the fault monitoring circuit routine once (second step S2).

이 상태에서 상기 수락처리 기능이 성공적으로 리턴이 되는지의 여부를 판단하여 일정 시간 안에 수락처리 기능 콜이 성공하면, 이는 액티브 프로세서(A)(1)가 정상 동작하는 것으로 간주하고, 스탠바이 프로세서(A')(11) 내 서버(a)는 설정되어 있는 알람신호를 해제한다(제 3단계(S3)).In this state, it is determined whether the acceptance processing function returns successfully, and if the acceptance processing function call succeeds within a certain time, it is considered that the active processor A 1 operates normally, and the standby processor A The server a in the '11' releases the set alarm signal (step S3).

이어 스탠바이 프로세서(A')(11)내 서버(a)는 스탠바이 프로세서(A')(11)에 장애감시 관련 프로세스가 동작 중인가를 판단하여, 장애감시 관련 프로세스들이 동작중이면 스탠바이 프로세서에서 동작하고 있는 장애감시 관련 프로세스들의 동작을 중지시킨다. 여기서 수락처리 기능 콜이 성공한 것은 액티브 프로세서가 장애 상태에서 정상 상태로 다시 환원된 것을 의미하기 때문이다 (제 4단계(S4)).Subsequently, the server (a) in the standby processor (A ') 11 determines whether a failure monitoring-related process is running on the standby processor (A') 11, and if the failure monitoring-related processes are running, the server (a) operates in the standby processor. Disables fault monitoring related processes. In this case, the acceptance processing function call is successful because it means that the active processor is returned to the normal state from the failure state (S4).

상기 제 4단계 수행 후, 스탠바이 프로세서(A')(11)에서 동작하는 서버 프로세스는 상기 액티브 프로세서(A)(1)의 클라이언트(a)가 전송하는 장애메시지를 수신한다(제 5단계(S5)).After performing the fourth step, the server process operating in the standby processor (A ') 11 receives a failure message transmitted by the client (a) of the active processor (A) 1 (S5) )).

이어 상기 수신한 메시지가 널 메시지인지 아닌지 구별하여, 널 메시지가 아니면 수신한 장애메시지를 분석하여 상태테이블을 업데이트하고 다시 장애 메시지를 수신하는 루틴(제 5단계(S5)을 수행하고, 상기 확인결과 수신한 메시지가 널 메시지이면 더 이상 장애메시지가 없는 경우이므로 접속을 해제하고 상기 제 2단계(S2)부터 반복 수행한다(제 6단계(S6)).Then, by distinguishing whether the received message is a null message or not, analyzing the received failure message if it is not a null message, updating the status table, and receiving a failure message again (step S5). If the received message is a null message and there is no error message anymore, the connection is released and the process is repeated from the second step S2 (sixth step S6).

한편, 상기 제 3단계(S3)에서 확인결과 스탠바이 프로세서(A')(11) 내 서버(a)가 수락처리 기능 콜에 일정 시간 이상 블록 되어 있어 타임아웃이 발생하면, 상기 설정하였던 알람신호가 서버 프로세스(a)로 전송되어 수락처리 기능 콜은 실패하게 된다(제 7단계(S7)).On the other hand, if the server (a) in the standby processor (A ') 11 is blocked for more than a certain time in the acceptance processing function call in the third step (S3) and the timeout occurs, the set alarm signal is It is transmitted to the server process (a) and the acceptance processing function call fails (seventh step S7).

수락처리 기능 시도가 실패하면 스탠바이 프로세서(A')(11)의 서버(a)는 액티브 프로세서(A)(1)의 상태를 정확하게 파악하기 위하여 하트-비트 메시지로 사용하는 ICMP_ECHO 메시지를 액티브 프로세서(A)(1)의 ICMP 레벨로 전달하고 이에 대한 응답 메시지인 ICMP_ECHOREPLY 메시지를 기다리는 루틴을 두 번 반복한다.If the attempt to accept processing fails, the server (a) of the standby processor (A ') 11 uses the ICMP_ECHO message, which is used as a heart-beat message, to accurately determine the state of the active processor (A) (1). A) It repeats twice the routine that delivers the ICMP level of (1) and waits for the ICMP_ECHOREPLY message, which is a response message.

상기 루틴을 반복시키는 방법은 루틴의 반복횟수를 나타내는 변수( i = 0 )와 ICMP_ECHOREPLY 메시지를 응답 받지 못한 횟수를 나타내는 변수( fail = 0 )를 초기화한 후 i 값이 2가 되었을 때 fail 값이 2이면 액티브 프로세서에 장애가 발생한 것으로 판단한다. 여기서 ICMP_ECHO 메시지를 두 번 전송하는 것은 정확한 판단을 하기 위함이다(제 8단계(S8)).The method for repeating the routine is to initialize a variable (i = 0) indicating the number of times the routine is repeated and a variable (fail = 0) indicating the number of times the ICMP_ECHOREPLY message is not responded, and then fail value 2 when i becomes 2. If so, it is determined that the active processor has failed. In this case, the transmission of the ICMP_ECHO message twice is for accurate determination (S8).

이때 상기 제 8단계에서 ICMP_ECHOREPLY 메시지를 무한정 기다릴 수 없으므로 수락처리 기능 콜을 수행하기 전에 알람신호를 1초로 설정한 다음, ICMP_ECHO 메시지를 전달한다(제 9단계(S9)).In this case, since the ICMP_ECHOREPLY message cannot be waited indefinitely in step 8, the alarm signal is set to 1 second before performing the acceptance processing function call, and then the ICMP_ECHO message is transmitted (step S9).

그런 다음 액티브 프로세서(A)(1)로부터 ICMP_ECHOREPLY 메시지가 수신되었는지 여부를 확인하여, 메시지를 수신하였으면 상기 1초로 설정하였던 알람신호를 해제한다. 이 경우는 액티브 프로세서(A)(1)는 정상 동작하고 있고 단지 액티브 프로세서(A)(1)에서 동작하는 클라이언트 프로세스에 장애가 발생하여 스탠바이 프로세서(A')(11)의 서버(a)로 접속 요구를 하지 않은 경우이므로 액티브 프로세서(A)(1)의 클라이언트 프로세스 장애메시지를 생성한다(제 10단계(S10)).Then, it is checked whether the ICMP_ECHOREPLY message has been received from the active processor (A) 1, and if the message is received, the alarm signal set to 1 second is released. In this case, the active processor (A) 1 is operating normally, and a client process running in the active processor (A) 1 has failed and connected to the server (a) of the standby processor (A ') 11. Since no request is made, a client process failure message of the active processor (A) 1 is generated (S10).

상기 메시지 수신확인 결과 ICMP_ECHOREPLY 메시지를 수신하지 못했으면 메시지를 수신하지 못하였음을 카운트하는 값을 1 증가(fail++)시킨 다음, 현재 메시지 체크 루틴이 2번 반복되었는지 여부를 확인(i == 2 ?)하여 반복이 완료되지 않았으면 상기 루틴을 다시 반복하고, 반복이 완료되었으면 메시지 수신 체크가 2번 모두 수신하지 못하였는지를 검사(fail == 2 ?)한다.If the ICMP_ECHOREPLY message has not been received as a result of the message acknowledgment, the value counting that the message has not been received is increased by one (fail ++), and then whether the current message checking routine is repeated twice (i == 2?) If the repetition is not completed, the routine is repeated again, and if the repetition is completed, it is checked whether the message reception check has not been received twice (fail == 2?).

상기 검사결과 한번이라도 메시지를 수신하였으면 상기 제 10단계(S10)에서 수행한 알람신호 해제 및 액티브 프로세서(A)(1) 내 클라이언트 프로세스 장애 발생 메시지를 생성한다(제 11단계(S11)).If the message is received at least once as a result of the check, an alarm signal release performed in the tenth step S10 and a client process failure occurrence message in the active processor A 1 are generated (an eleventh step S11).

반대로 상기 메시지 수신 체크 결과 두 번 모두 ICMP_ECHOREPLY 메시지를 수신하지 못하였으면 액티브 프로세서(A)(1)가 다운(down)된 것으로 간주하고, 액티브 프로세서(A)(1)에서 동작하는 장애감시 관련 데몬 프로세스들을 스탠바이 프로세서(A')(11)에 기동시켜 스탠바이 프로세서(A')(11)가 장애감시 주 임무를 수행하도록 한다(제 12단계(S12)).On the contrary, if the ICMP_ECHOREPLY message is not received twice as a result of the message reception check, the fault monitoring related daemon process operating in the active processor (A) 1 is considered to be down. The standby processor (A ') 11 is started to cause the standby processor (A') 11 to perform the failure monitoring main task (12th step S12).

여기서 상기 스탠바이 프로세서(A')(11) 내 서버(a)가 ICMP_ECHO 메시지를 액티브 프로세서(A)(1)로 두 번 전달하는 이유는 액티브 프로세서(A)의 동작 상태를 정확하게 파악하기 위함이다.The reason why the server (a) in the standby processor (A ') 11 transfers the ICMP_ECHO message twice to the active processor (A) (1) is to accurately determine the operating state of the active processor (A).

스탠바이 프로세서(A')(11)의 서버는 프로세서(A')(11)가 장애감시의 주 임무를 수행하는 동안에도 장애가 발생한 액티브 프로세서(A)(1)의 클라이언트(b)로부터 접속 요구가 있는지 계속 감시한다.The server of the standby processor (A ') 11 receives a connection request from the client (b) of the failed active processor (A) 1 while the processor (A') 11 performs the main task of fault monitoring. Keep an eye on it.

만약 액티브 프로세서(A)(1)의 클라이언트(b)로부터 접속 요구가 다시 성공하면 액티브 프로세서(A)(1)가 다시 구동된 것이므로 스탠바이 프로세서(A')(11)에서 장애감시 주 임무를 수행하고 있는 데몬 프로세스들을 정지 시켜 액티브 프로세서(A)(1)가 다시 장애감시 주 임무를 수행할 수 있도록 제어한다.If the connection request from the client (b) of the active processor (A) 1 succeeds again, since the active processor (A) 1 is driven again, the standby processor (A ') 11 performs the main task of fault monitoring. By stopping the daemon processes that are running, the active processor (A) 1 controls to perform the main task of fault monitoring again.

이상에서 상세히 설명한 바와 같이 본 발명은 분산 처리 장치의 장애감시 기능을 액티브/스탠바이 방식으로 이중화시켜 액티브 프로세서에 장애가 발생하면 스탠바이 프로세서가 이를 감지하여 자동으로 액티브 프로세서에서 동작하던 장애감시 데몬 프로세스들을 스탠바이 프로세스에서 동작시키는 방식으로 이중화하여 분산 처리 장치의 신뢰도를 향상시키고, 또한 액티브/스탠바이로 동작하는 장치에서 스탠바이 프로세서가 액티브 프로세서의 동작 상태를 감시하는 방법으로 ICMP 프로토콜의 ICMP_ECHO 메시지와 ICMP_ECHOREPLY 메시지를 하트-비트 메시지로 사용함으로써, 스탠바이 프로세서의 액티브 프로세서 동작 상태 감시동작의 정확성을 향상시키는 이점을 수반한다.As described in detail above, the present invention duplicates the failure monitoring function of the distributed processing device in an active / standby manner, and when a failure occurs in the active processor, the standby processor detects this and automatically standbys the failure monitoring daemon processes operating in the active processor. It improves the reliability of distributed processing device by redundancy by operating on the network and heart-beats ICMP_ECHO message and ICMP_ECHOREPLY message of ICMP protocol by the standby processor to monitor the operation status of active processor in active / standby device. By using it as a message, it entails the advantage of improving the accuracy of the active processor operating state monitoring operation of the standby processor.

아울러 본 발명의 바람직한 실시 예는 예시의 목적을 위해 개시된 것이며, 당업자라면 본 발명의 사상과 범위 안에서 다양한 수정, 변경, 부가 등이 가능할 것이며, 이러한 수정 변경 등은 이하의 특허 청구의 범위에 속하는 것으로 보아야 할 것이다.In addition, preferred embodiments of the present invention are disclosed for the purpose of illustration, those skilled in the art will be able to various modifications, changes, additions, etc. within the spirit and scope of the present invention, such modifications and modifications belong to the following claims You will have to look.

Claims

A failure monitoring redundancy method comprising an active processor and a standby processor in a distributed processing apparatus,

A first step of establishing an initial communication environment between the active processor and a standby processor;

A second step of performing an acceptance processing function to monitor an operation state of an active processor in the standby processor and setting an alarm signal to prevent an infinite waiting state of performing the acceptance processing function;

A third step of determining whether to perform the acceptance processing function successfully, and if the successful execution is completed, performing a failure monitoring function in an active processor;

If the acceptance processing fails, the server of the standby processor directly communicates with the Internet Control Message Protocol (ICMP) of the active processor to recheck the operation state of the active processor, and then performs the processing according to the operation state of the active processor. A failure monitoring redundancy method of a distributed processing apparatus using the Internet Control Message Protocol (ICMP), characterized in that performed by the fourth process.

The method of claim 1,

The first process sets an environment in which a server of a standby processor transmits and receives messages transmitted and received in order to monitor the operation of an active processor, and performs socket communication with a client of the active processor in block mode. A failure monitoring redundancy method of a distributed processing device using the Internet Control Message Protocol (ICMP), characterized in that the setting.

The method of claim 1,

In the second process, the alarm signal is set to twice the time required for the client operating in the active processor to perform the failure monitoring circuit routine once for all the distributed processing devices constituting the distributed processing device. Fault monitoring redundancy method of distributed processing apparatus using Internet Control Message Protocol (ICMP).

The method of claim 1,

The third process includes the first step of releasing the set alarm signal when the acceptance processing function is successfully completed;

A second step of determining whether the standby processor is performing a failure monitoring function by the server process in the current standby processor and stopping the operation if the failure monitoring function is being performed;

Determining that the standby processor is not performing the failure monitoring function, stopping the failure monitoring function of the standby processor, and receiving failure data transmitted by a client of the active processor;

A fourth step of checking whether the received message is a null message and if it is not a null message, analyzing the received failure message and updating a state table to maintain a shape consistent with an active processor, and then performing a failure message reception routine again;

If the received message is a null message, the connection is released using the Internet Control Message Protocol (ICMP). Failure monitoring redundancy method of processing unit.

The method of claim 1,

In the fourth process, the client of the standby processor is a heart-beat message for directly communicating with the active processor Internet Control Message Protocol (ICMP), and an ICMP_Echo (ECHO) message and a response message thereof. ICMP_ECHOREPLY) is a failure monitoring redundancy method of the distributed processing device using the Internet Control Message Protocol (ICMP).

The method of claim 5,

When the ICMP echo signal is transmitted, an alarm signal is set in order to prevent an infinite waiting for response reception. The failure monitoring redundancy method of the distributed processing apparatus using the Internet Control Message Protocol (ICMP).

The method of claim 1,

In step 4, the process of receiving the ICMP_response message for analyzing the operation state of the active processor is repeated twice. If the message is received more than once, a failure message indicating that a failure occurs only in the client process in the active processor is repeated. The generation and failure monitoring redundancy method of the distributed processing apparatus using the Internet Control Message Protocol (ICMP), characterized in that it is repeated from the second process.

The method of claim 7, wherein

If the message is not received two times as a result of the repetition, it is regarded as a failure in the active processor, and the standby processor starts the failure monitoring daemon process operated in the active processor so that the standby processor performs the failure monitoring main task. Redundancy Monitoring of Fault Monitoring Devices Using Internet Control Message Protocol (ICMP)