KR100940488B1

KR100940488B1 - Method for operating resilience system by using multi-mode

Info

Publication number: KR100940488B1
Application number: KR1020050061150A
Authority: KR
Inventors: 최윤석; 허성길
Original assignee: 삼성탈레스 주식회사
Priority date: 2005-07-07
Filing date: 2005-07-07
Publication date: 2010-02-04
Also published as: KR20070006098A

Abstract

본 발명은 예를 들어 중요한 소프트웨어 컴포넌트나 시스템에 대해서 다수의 노드로 복사본을 만들어 두고, 노드들 중에서 하나 이상에서 고장(failure)이 발생하더라도 다른 노드에서 그 기능을 계속해서 수행하도록 함으로써 사용자에게 데이터의 손실없이 지속적이고 안정된 서비스를 제공하도록 하기 위한 고장 복구(resilience) 시스템의 운용 방법에 관한 것으로서, 더 상세하게는 중요한 소프트웨어 컴포넌트나 시스템에 대해서 노드 수의 제한을 두지 않고 다수의 노드에서 다중화 서비스가 제공될 수 있도록 하는 다중화 모드를 이용한 고장 복구 시스템의 운용 방법에 관한 것이다. The present invention makes a copy of data to a user by, for example, making copies of a number of nodes for important software components or systems, and having the other nodes continue to perform their functions even if a failure occurs in one or more of the nodes. The present invention relates to a method of operating a resilience system to provide continuous and stable service without loss. More specifically, multiplexing services are provided by multiple nodes without limiting the number of nodes for important software components or systems. The present invention relates to a method of operating a fault recovery system using a multiplexing mode.

이를 위한 본 발명은, 마스터(Master)와 슬레이브(Slave)를 포함하여 이루어지는 소프트웨어 컴포넌트 또는 시스템에서 다중화 모드를 이용하여 고장을 복구하는 시스템의 운용방법에 있어서, 하나의 마스터와 다수의 슬레이브로 이루어지는 다중화 노드를 구성하는 단계; 및 상기 다수의 슬레이브가 각 디몬(Daemon)에 있는 결합(join)된 그룹에 대한 정보를 이용하여 마스터로 전이하는 단계를 포함하는 것을 특징으로 한다. The present invention for this purpose, in the operating method of a system for recovering a failure using a multiplexing mode in a software component or system comprising a master and a slave, multiplexing consisting of one master and a plurality of slaves Configuring a node; And the plurality of slaves transitioning to a master by using information about a joined group in each daemon.

Description

Method for operating resilience system by using multi-mode}

도 1은 종래기술에 의한 고장 복구 시스템의 운용방법의 흐름도. 1 is a flow chart of a method of operating a failure recovery system according to the prior art.

도 2a,b는 본 발명에 따른 다중화 모드를 이용한 고장 복구 시스템의 운용방법을 설명하기 위한 망 구성도 및 흐름도.Figure 2a, b is a network configuration and flow chart illustrating a method of operating a failure recovery system using a multiplexing mode according to the present invention.

도 3은 본 발명에 따른 다중화 모드를 이용한 고장 복구 시스템의 운용방법에 있어서, 마스터에서의 데이터 동기화 과정 및 슬레이브에서의 데이터 동기화 과정의 흐름도. 3 is a flowchart illustrating a data synchronization process in a master and a data synchronization process in a slave in a method of operating a failure recovery system using a multiplexing mode according to the present invention.

도 4는 본 발명에 따른 다중화 모드를 이용한 고장 복구 시스템의 운용방법에 있어서, 고장 검출 및 테이크오버(takeover) 하는 과정의 흐름도. 4 is a flowchart of a process of detecting and taking over a fault in a method of operating a fault recovery system using a multiplexing mode according to the present invention;

본 발명은 예를 들어 중요한 소프트웨어 컴포넌트나 시스템에 대해서 다수의 노드로 복사본을 만들어 두고, 노드들 중에서 하나 이상에서 고장(failure)이 발생하더라도 다른 노드에서 그 기능을 계속해서 수행하도록 함으로써 사용자에게 데이터의 손실없이 지속적이고 안정된 서비스를 제공하도록 하기 위한 고장 복구 (resilience) 시스템의 운용 방법에 관한 것으로서, 더 상세하게는 중요한 소프트웨어 컴포넌트나 시스템에 대해서 노드 수의 제한을 두지 않고 다수의 노드에서 다중화 서비스가 제공될 수 있도록 하는 다중화 모드를 이용한 고장 복구 시스템의 운용 방법에 관한 것이다. The present invention makes a copy of data to a user by, for example, making copies of a number of nodes for important software components or systems, and having the other nodes continue to perform their functions even if a failure occurs in one or more of the nodes. The present invention relates to a method of operating a resilience system to provide continuous and stable service without loss. More specifically, multiplexed services are provided by multiple nodes without limiting the number of nodes for important software components or systems. The present invention relates to a method of operating a fault recovery system using a multiplexing mode.

당업자에게 잘 알려진 바와 같이, 기존에 중요한 소프트웨어 컴포넌트나 시스템의 장애를 복구하기 위한 이중화 알고리즘은, 중요 시스템에 대하여 액티브(active) 및 스탠바이(standby)로 시스템을 구성해서 액티브 시스템의 각 소프트웨어 컴포넌트나 노드 전체에 고장이 발행하였을 경우, 스탠바이 시스템의 해당 소프트웨어 컴포넌트나 노드 전체가 그 기능을 이양하여 데이터의 손실없이 통신이 안정하게 유지할 수 있도록 한다. 따라서, 기존의 이중화 서비스를 노드의 수의 제한없이 다수의 노드에서 적용할 수 있는 다중화 알고리즘이 필요하다. As is well known to those skilled in the art, a redundant algorithm for recovering from a failure of an existing critical software component or system consists of configuring the system in an active and standby manner with respect to the critical system so that each software component or node in the active system is In the event of a failure in the entirety, the corresponding software component or node in the standby system transfers its function so that communication can be kept stable without data loss. Therefore, there is a need for a multiplexing algorithm that can apply an existing redundant service to a plurality of nodes without limiting the number of nodes.

그런데, 기존의 이중화 알고리즘은 이중으로 구성된 노드들에 동시에 고장이 발생할 경우 더 이상 복구할 수 없는 문제점이 있었다. By the way, the existing redundancy algorithm has a problem that can not be recovered any more if the failure occurs in the nodes configured in duplicate.

즉, 도 1을 참조하면, 기존의 이중화 알고리즘은, 고장 복구 컴포넌트로 유저 어플리케이션이 시작(S10) 된 후; 그룹에의 가입요구(S20)를 거쳐; 동일 이름을 가진 유저의 개수(N)가 1보다 크면(S30), 1보다 큰 수의 컴포넌트들은 FIFO 방식에 따라 슬레이브(N-1)로 동작(S40) 하고; S30 단계에서 동일 이름을 가진 유저의 개수(N)가 1보다 크지 않으면 해당 컴포넌트는 마스터로 동작한다(S50). 따라서, 상기와 같이 이중으로 구성된 컴포넌트들에 동시에 고장이 발생할 경우 더 이상 복구할 수 없는 문제점이 있었다. That is, referring to Figure 1, the existing redundancy algorithm, after the user application is started (S10) with a failure recovery component; Through the request to join the group (S20); If the number N of users with the same name is greater than 1 (S30), the number of components greater than 1 operates as the slave N-1 according to the FIFO method (S40); If the number N of users with the same name in step S30 is not greater than 1, the corresponding component operates as a master (S50). Therefore, there is a problem that can no longer be restored if a failure occurs in the dual components as described above.

따라서, 본 발명이 이루고자 하는 기술적 과제는, 중요한 소프트웨어 컴포넌트나 시스템에 대해서 노드 수의 제한을 두지 않고 다수의 노드에서 다중화 서비스가 제공될 수 있도록 하는 다중화 모드를 이용한 고장 복구 시스템의 운용 방법을 제공하는 데 있다.Accordingly, the technical problem to be achieved by the present invention is to provide a method of operating a failure recovery system using a multiplexing mode to provide a multiplexing service in a plurality of nodes without limiting the number of nodes for an important software component or system. There is.

본 발명은 상기한 기술적 과제를 달성하기 위하여, 마스터(Master)와 슬레이브(Slave)를 포함하여 이루어지는 소프트웨어 컴포넌트 또는 시스템에서 다중화 모드를 이용하여 고장을 복구하는 시스템의 운용방법에 있어서, 하나의 마스터와 다수의 슬레이브로 이루어지는 다중화 노드를 구성하는 단계; 및 상기 다수의 슬레이브가 각 디몬(Daemon)에 있는 결합(join)된 그룹에 대한 정보를 이용하여 마스터로 전이하는 단계를 포함하는 것을 특징으로 하는 다중화 모드를 이용한 고장 복구 시스템의 운용방법이 제공된다. The present invention provides a method for operating a system for recovering a failure using a multiplexing mode in a software component or system including a master and a slave, in order to achieve the above technical problem. Configuring a multiplexing node consisting of a plurality of slaves; And a plurality of slaves transitioning to a master by using information about a joined group in each daemon. There is provided a method of operating a failure recovery system using a multiplexing mode. .

본 발명의 바람직한 실시예에 있어서, 상기 노드의 초기화시, 고장시 및 희생(alive)시 선입선출(FIFO) 방식으로 모드 전이한다. In a preferred embodiment of the present invention, the node transitions to a first in, first out (FIFO) mode upon initialization, failure, and sacrifice.

본 발명의 바람직한 실시예에 있어서, 상기 마스터에서 데이터 동기화 과정으로서:In a preferred embodiment of the present invention, as a data synchronization process in the master:

복구 데이터를 수신하는 단계; 상기 복구 데이터의 수신을 스탠바이(Standby) 노드들에게 통보하는 단계; 상기 복구 데이터가 갱신되었는지를 확인하여 갱신된 복구 데이터를 상기 슬레이브들에게 전송하는 단계; 및 상기 갱신된 복 구 데이터를 슬레이브들에게 전송한 후, 상기 복구 데이터 처리가 완료되었으면 처리 결과를 상기 스탠바이 노드들에게 통보하는 단계를 더 포함한다. Receiving recovery data; Notifying standby nodes of receiving the recovery data; Checking whether the recovery data has been updated and transmitting the updated recovery data to the slaves; And transmitting the updated recovery data to the slaves, and notifying the standby nodes of the processing result when the recovery data processing is completed.

본 발명의 바람직한 실시예에 있어서, 상기 슬레이브에서 데이터 동기화 과정으로서:In a preferred embodiment of the present invention, as the data synchronization process in the slave:

복구 데이터를 수신하는 단계; 상기 마스터의 통보 메시지를 수신하는 단계; 저장된 복구 데이터에 상태(status)를 동기화하는 단계; 갱신된 복구 데이터를 저장하는 단계; 및 갱신된 복구로 최종 변경하는 단계를 더 포함한다. Receiving recovery data; Receiving a notification message of the master; Synchronizing status to stored recovery data; Storing the updated recovery data; And making a final change to the updated recovery.

본 발명의 바람직한 실시예에 있어서, 고장 검출 및 테이크오버(takeover) 하는 과정으로서:In a preferred embodiment of the invention, the process of fault detection and takeover is as follows:

액티브/슬레이브(N) 유저(User)에서 고장이 발생했을 때, 토큰 시간이 경과했는지를 체크하는 단계; 상기 토큰 시간이 경과했으면, 새로운 망을 구성하고 갱신된 그룹 정보를 유저에게 전송하는 단계; 및 상기 그룹 정보를 바탕으로 선입선출(FIFO) 방식으로 새로운 모드를 결정하는 단계를 포함한다. Checking whether a token time has elapsed when a failure occurs in an active / slave N user; If the token time has elapsed, establishing a new network and transmitting updated group information to a user; And determining a new mode in a first in, first out (FIFO) manner based on the group information.

이하, 첨부한 도면을 참조하면서 본 발명에 따른 다중화 모드를 이용한 고장 복구 시스템의 운용방법의 바람직한 실시예를 상세하게 설명한다. 본 발명을 설명함에 있어서 관련된 공지기술 또는 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명은 생략할 것이다. 그리고, 후술되는 용어들은 본 발명에서의 기능을 고려하여 정의된 용어들로서 이는 사용자, 운용자의 의도 또는 관례 등에 따라 달라질 수 있다. 그러므로 그 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다. Hereinafter, with reference to the accompanying drawings will be described in detail a preferred embodiment of the operating method of the failure recovery system using a multiplexing mode according to the present invention. In the following description of the present invention, when it is determined that detailed descriptions of related well-known technologies or configurations may unnecessarily obscure the subject matter of the present invention, the detailed description will be omitted. In addition, terms to be described below are terms defined in consideration of functions in the present invention, which may vary according to the intention or custom of a user or an operator. Therefore, the definition should be made based on the contents throughout the specification.

한편, 이하의 설명에 있어서, 종래기술에 따른 구성부재와 본 발명에 의한 구성부재가 동일한 경우에는 종래기술에서 사용하였던 도면 부호를 그대로 사용하고, 이에 대한 상세한 설명은 생략한다. In the following description, when the member according to the prior art and the member according to the present invention are the same, the same reference numerals used in the prior art are used as they are, and detailed description thereof will be omitted.

도 2a,b는 본 발명에 따른 다중화 모드를 이용한 고장 복구 시스템의 운용방법을 설명하기 위한 망 구성도 및 흐름도이고, 도 3은 본 발명에 따른 다중화 모드를 이용한 고장 복구 시스템의 운용방법에 있어서, 마스터에서의 데이터 동기화 과정 및 슬레이브에서의 데이터 동기화 과정의 흐름도이이고, 도 4는 본 발명에 따른 다중화 모드를 이용한 고장 복구 시스템의 운용방법에 있어서, 고장 검출 및 테이크오버(takeover) 하는 과정의 흐름도이다. 2A and 2B are network diagrams and flowcharts for explaining a method of operating a fault recovery system using a multiplexing mode according to the present invention, and FIG. 3 is a method of operating a fault recovery system using a multiplexing mode according to the present invention. 4 is a flowchart of a data synchronization process in a master and a data synchronization process in a slave, and FIG. 4 is a flowchart of a process of detecting and taking over a fault in a method of operating a fault recovery system using a multiplexing mode according to the present invention. .

먼저, 도 2를 참조하면, 본 발명에 따른 다중화 모드를 이용한 고장 복구 시스템의 운용방법은, 하나의 마스터와 다수의 슬레이브로 이루어지는 다중화 노드를 구성하는 단계(S102); 및 상기 다수의 슬레이브가 각 디몬(D1, .., D4)에 있는 결합(join)된 그룹(G1)에 대한 정보를 이용하여 마스터로 전이하는 단계(S104)를 포함하여 이루어진다. 본 발명에 있어서, 상기 노드들의 초기화시, 고장시 및 희생(alive)시 선입선출(FIFO) 방식으로 모드 전이한다는 것은 전술한 바와 같다. First, referring to FIG. 2, a method of operating a failure recovery system using a multiplexing mode according to the present invention includes: configuring a multiplexing node including a master and a plurality of slaves (S102); And the plurality of slaves transitioning to the master by using information on the joined group G1 in each of the daemons D1,..., D4 (S104). In the present invention, as described above, the node transitions to the first-in, first-out (FIFO) method upon initialization, failure, and sacrifice of the nodes.

도 3을 참조하면, 본 발명에 따른 마스터에서의 데이터 동기화 과정(100)은, 복구 데이터를 수신하는 단계(S110); 상기 복구 데이터의 수신을 스탠바이(Standby) 노드들에게 통보하는 단계(S120); 상기 복구 데이터가 갱신되었는지를 확인하여 갱신된 복구 데이터를 상기 슬레이브들에게 전송하는 단계(S130)(S140); 및 상기 갱신된 복구 데이터를 슬레이브들에게 전송한 후, 상기 복구 데이터 처리 가 완료되었으면(S150), 그 처리 결과를 상기 스탠바이 노드들에게 통보하는 단계(S160)를 포함하여 이루어진다. Referring to FIG. 3, a data synchronization process 100 in a master according to the present invention may include receiving recovery data (S110); Notifying standby nodes of reception of the repair data (S120); Checking whether the recovery data has been updated and transmitting the updated recovery data to the slaves (S130) (S140); And transmitting the updated recovery data to the slaves, and when the recovery data processing is completed (S150), notifying the standby nodes of the processing result (S160).

또한, 도 3를 참조하면, 본 발명에 따른 슬레이브에서의 데이터 동기화 과정(200)은, 복구 데이터를 수신하는 단계(S210); 상기 마스터의 통보 메시지를 수신하는 단계(S220); 저장된 복구 데이터에 상태(status)를 동기화하는 단계(S230); 갱신된 복구 데이터를 저장하는 단계(S240); 및 갱신된 복구로 최종 변경하는 단계(S250)를 포함하여 이루어진다. In addition, referring to FIG. 3, the data synchronization process 200 in the slave according to the present invention may include receiving recovery data (S210); Receiving a notification message of the master (S220); Synchronizing a state with the stored recovery data (S230); Storing the updated recovery data (S240); And finally changing to the updated recovery (S250).

도 4를 참조하면, 본 발명에 따른 고장 검출 및 테이크오버(takeover) 하는 과정(300)은, 액티브/슬레이브(N) 유저(User)에서 고장이 발생했을 때(S310), 토큰 시간이 경과했는지를 체크하는 단계(S320); 상기 토큰 시간이 경과했으면, 새로운 망을 구성하고 갱신된 그룹 정보를 유저에게 전송하는 단계(S330); 및 상기 그룹 정보를 바탕으로 선입선출(FIFO) 방식으로 새로운 모드를 결정하는 단계(S340)를 포함하여 이루어진다. Referring to FIG. 4, in the process of detecting and taking over a fault 300 according to the present invention, when a failure occurs in an active / slave N user (S310), whether the token time has elapsed Checking (S320); If the token time has elapsed, forming a new network and transmitting updated group information to a user (S330); And determining a new mode in a first-in first-out (FIFO) manner based on the group information (S340).

상기와 같이 구성된 본 발명에 따른 다중화 모드를 이용한 고장 복구 시스템의 운용방법의 작용을 도 2 내지 도 4를 참조하여 설명하면 다음과 같다. The operation of the failure recovery system using the multiplexing mode according to the present invention configured as described above will be described with reference to FIGS. 2 to 4.

먼저, 본 발명에 적용되는 다중화 방안의 기본 원칙과 알고리즘에 대해 설명하면 다음과 같다. First, the basic principles and algorithm of the multiplexing scheme applied to the present invention will be described.

- 즉, 본 발명에서 정의되는 모드란, 다중화를 위한 소프트웨어 컴포넌트의 상태를 의미하고, 다중화 알고리즘은 각 모드마다 수행해야 할 작업과 노드 고장 시 모드의 전이를 정의한다. 모드는 크게 마스터(Master), 슬레이브(Slave), 캔디 데이트(Candidate)로 나눌 수 있다. 마스터란 현재 활성화되어 데이터를 처리하는 상태를 말하고 슬레이브란 마스터 상태의 소프트웨어 컴포넌트가 고장이 났을 경우, 이를 복구할 수 있게 중요한 데이터를 이중으로 관리하는 상태를 말한다. 캔디데이터는 현재는 아무런 작업을 하지 않지만 향후, 마스터나 슬레이브가 될 수 있는 상태를 말한다. In other words, the mode defined in the present invention means a state of a software component for multiplexing, and the multiplexing algorithm defines a task to be performed in each mode and a transition of the mode in the event of a node failure. Modes can be broadly divided into Master, Slave, and Candy Date. A master is a state that is currently active and processes data, and a slave is a state in which important data is dually managed to recover when a software component in the master state fails. Candy data refers to a state in which no work is currently performed but it can be a master or a slave in the future.

- 본 발명에 따른 다중화 모드를 이용한 고장 복구 시스템의 운용방법에서는, 마스터, 슬레이브 모드만 정의한다. 왜냐하면, 마스터와 슬레이브 상태의 소프트웨어 컴포넌트가 동시에 고장이 날 경우, 캔디데이터들은 데이터 복구에 대한 아무런 정보가 없어서 원하는 서비스를 제공할 수 없기 때문이다. 따라서, 본 발명의 방법에서는 하나의 마스터와 다수의 슬레이브로 전체 망을 구성한다. In the operation method of the fault recovery system using the multiplexing mode according to the present invention, only the master and slave modes are defined. This is because, if the master and slave software components fail at the same time, the candy data cannot provide the desired service because there is no information about data recovery. Therefore, in the method of the present invention, the entire network is composed of one master and a plurality of slaves.

- 모드는 각 유저(User; 소프트웨어 컴포넌트)의 결합(Join) 시점에 따라 FIFO 방식을 원칙으로 결정된다. 즉, 가장 먼저 결합되는 유저가 마스터가 되고, 그 뒤로 슬레이브1, 슬레이브2, 슬레이브3 등이 된다. 슬레이브의 우선순위는 향후, 마스터로 전이함에 있어서의 순서를 나타낸다.The mode is determined based on the FIFO method according to the joining point of each user (software component). In other words, the first user to join becomes the master, and then slave1, slave2, slave3, and so on. The priority of the slave indicates the order in transitioning to the master in the future.

- 현재 통신하고 있는 유저들의 모임인 그룹과 각 유저에 대한 정보를 관리하는 디몬(Daemon)은 현재 결합된 그룹에 대한 정보(User 이름#Daemon 이름)를 유지하며, 이러한 정보는 노드가 고장 또는 다시 회생(alive)하는 경우 유저의 모드를 결정하는데 사용된다. The daemon, which is a group of users currently communicating and the daemon that manages information about each user, maintains information about the currently joined group (User name # Daemon name), which means that the node fails or restarts. It is used to determine the user's mode in case of alive.

- 임의의 소프트웨어 컴포넌트나 시스템이 고장날 경우, 초기화 과정과 마찬가지로 FIFO 방식으로 모드가 결정된다.If any software component or system fails, the mode is determined in a FIFO fashion, just like the initialization process.

- 새로운 노드가 망에 들어올 경우(기존의 고장났던 노드가 다시 살아나는 경우 포함), FIFO의 가장 마지막 모드로 동작한다.When a new node enters the network (including when a previously failed node comes back up), it operates in the last mode of the FIFO.

도 2을 참조하여 본 발명에 따른 다중화 모드를 이용한 고장 복구 시스템의 운용방법의 보다 구체적인 작용을 설명하면 다음과 같다. Referring to Figure 2 will be described in more detail the operation of the operating method of the failure recovery system using the multiplexing mode according to the present invention.

도 2a는 U1 소프트웨어 컴포넌트가 네 개의 노드에서 다중화 서비스를 제공하는 망 구성도를 나타낸다. 각 디몬(D1, D2, D3, D4)은 G1 그룹의 정보(U1#D1, U1#D2, U1#D3, U1#D4)를 유지하며, 유저의 모드는 결합할 당시 순서에 의해 결정된다(U1#D1 : 마스터, U1#D2 : 슬레이브1, U1#D3 : 슬레이브2, U1#D4 : 슬레이브3). 만약, 노드2(Node2)가 고장나면, 디몬들은 새롭게 망을 구성하고 유저에게 변경된 G1 그룹 정보(U1#D1, U1#D3, U1#D4)를 전달한다. 이 정보를 바탕으로 FIFO 방식으로 유저의 모드가 결정된다(U1#D1 : 마스터, U1#D3 : 슬레이브1, U1#D4 : 슬레이브2). 전에 고장났던 노드2(Node2)가 다시 살아날 경우, G1 그룹 정보를 바탕으로 슬레이브3로 동작한다.2A shows a network diagram in which a U1 software component provides multiplexing service in four nodes. Each daemon (D1, D2, D3, D4) maintains information of the G1 group (U1 # D1, U1 # D2, U1 # D3, U1 # D4), and the mode of the user is determined by the order at the time of combining ( U1 # D1: Master, U1 # D2: Slave1, U1 # D3: Slave2, U1 # D4: Slave3). If Node2 fails, the daemons form a new network and deliver the changed G1 group information (U1 # D1, U1 # D3, U1 # D4) to the user. Based on this information, the user mode is determined by the FIFO method (U1 # D1: master, U1 # D3: slave1, U1 # D4: slave2). When Node2, which previously failed, is revived, it operates as slave3 based on the G1 group information.

이로써, 본 발명은 중요한 소프트웨어 컴포넌트나 시스템을 수의 제한없이 다중으로 구성할 수 있어서 하나 이상의 시스템에서 고장이 발생하더라도, 나머지 시스템에서 그 기능을 대체함으로써 안정적이고 지속적인 서비스를 응용 프로그램에 제공한다.In this way, the present invention can be configured in any number of important software components or systems, even if a failure occurs in more than one system, by providing a stable and continuous service to the application by replacing its function in the rest of the system.

이상에서 살펴본 바와 같은 본 발명에 따른 다중화 모드를 이용한 고장 복구 시스템의 운용방법은, 중요한 소프트웨어 컴포넌트나 시스템에 대해서 노드 수의 제한을 두지 않고 다수의 노드에서 다중화 서비스가 제공될 수 있도록 하는 이점을 제공한다. As described above, the operating method of the failure recovery system using the multiplexing mode according to the present invention provides an advantage that multiplexing services can be provided in a plurality of nodes without limiting the number of nodes for an important software component or system. do.

이상 본 발명의 바람직한 실시예에 대해 상세히 기술하였지만, 본 발명이 속하는 기술분야에 있어서 통상의 지식을 가진 사람이라면, 첨부된 청구 범위에 정의된 본 발명의 정신 및 범위를 벗어나지 않으면서 본 발명을 여러 가지로 변형 또는 변경하여 실시할 수 있음을 알 수 있을 것이다. 따라서 본 발명의 앞으로의 실시예들의 변경은 본 발명의 기술을 벗어날 수 없을 것이다.Although a preferred embodiment of the present invention has been described in detail above, those skilled in the art to which the present invention pertains may make various changes without departing from the spirit and scope of the invention as defined in the appended claims. It will be appreciated that modifications or variations may be made. Therefore, changes in the future embodiments of the present invention will not be able to escape the technology of the present invention.

Claims

In the operating method of a system for recovering a failure using the multiplexing mode in a software component or system comprising a master and a slave,

Configuring a multiplexing node consisting of one master and a plurality of slaves;

The plurality of slaves transitioning to a master by using information about a joined group in each daemon; And

Transitioning to a first in, first out (FIFO) mode upon initialization, failure, and sacrifice of the node;

As a data synchronization process in the master,

Receiving recovery data;

Notifying standby nodes of receiving the recovery data;

Checking whether the recovery data has been updated and transmitting the updated recovery data to the slaves; And

After transmitting the updated recovery data to the slaves, notifying the standby nodes of the processing result when the recovery data processing is completed;

As a data synchronization process in the slave,

Receiving recovery data;

Receiving a notification message of the master;

Synchronizing status to stored recovery data;

Storing the updated recovery data; And

A method of operating a failure recovery system using a multiplexing mode, characterized in that it comprises the step of the last change to the updated recovery.

delete

The method of claim 1,

As a process of fault detection and takeover,

Checking whether a token time has elapsed when a failure occurs in an active / slave N user;

If the token time has elapsed, establishing a new network and transmitting updated group information to a user; And

And determining a new mode by a first-in first-out (FIFO) method based on the group information.