KR100298319B1

KR100298319B1 - Redundancy Device in Communication System_

Info

Publication number: KR100298319B1
Application number: KR1019980062712A
Authority: KR
Inventors: 김상수
Original assignee: 윤종용; 삼성전자 주식회사
Priority date: 1998-12-31
Filing date: 1998-12-31
Publication date: 2001-08-07
Also published as: KR20000046037A

Abstract

본 발명은 통신시스템의 이중화 장치에 관한 것으로, 특히, 차세대 이동통신시스템(IMT-2000)등과 같이 고성능(High-Performance)을 요구하는 통신시스템에 있어 시스템의 신뢰성 및 가용성을 향상시켜 지속적이고 안정적인 서비스를 제공할 수 있도록 하는 로그-배이스 롤백-복원(Log-Based Rollback-Recovery)방식에 따른 이중화 장치에 관한 것이다. 이러한 본 발명은 통신시스템의 이중화 장치에 있어서, 상대 보드의 장애가 감지되면 상대 보드의 장애를 상위 프로세서에게 보고하고 그에 따른 절체를 판단하는 장애 검출 모듈과, 메시지 주사 순서를 일치시키고 데이터의 백업 시기 동안 대기측에 로깅된 메시지를 제거하는 동기화 모듈과, 주사된 메시지의 처리가 끝날 때 까지 기다린 후 모든 처리가 끝났으면 백업 대상 영역에 대한 백업을 시작하는 백업 모듈을 구비하는 동작 모듈과; 상기 폴트 검출 모듈과, 상기 동기화 모듈과, 상기 백업 모듈과, 대기측에서 동작측으로 절체된 순간 이전까지 입력 큐에 저장된 입력 메시지를 처리하여 절체되기 전의 동작측이 수행했던 상태까지를 복구하는 복구 모듈을 더 구비하는 대기 모듈과: 상기 동작 모듈과 상기 대기 모듈간에 메시지 전송경로를 제공하는 통신 링크로 이루어짐을 특징으로 한다.The present invention relates to a duplexing device of a communication system, and in particular, in a communication system requiring high performance such as next generation mobile communication system (IMT-2000), etc., to improve the reliability and availability of the system and to provide continuous and stable service. The present invention relates to a redundancy device based on a log-based rollback-recovery method that can provide a. The present invention relates to a duplication device of a communication system, wherein a failure detection module for reporting a failure of a counterpart board to a higher processor and determining a changeover according to a duplication board, and matching a message scanning sequence and backing up data during a backup time. An operation module including a synchronization module for removing a message logged on the standby side, and a backup module which waits until the processing of the scanned message is finished and starts the backup of the backup target area when all processing is completed; A recovery module for processing the fault detection module, the synchronization module, the backup module, and an input message stored in an input queue up to a moment before the transfer from the standby side to the operating side and restoring to the state performed by the operating side before the transfer. And a standby module further comprising: a communication link providing a message transmission path between the operation module and the standby module.

Description

Redundancy Device in Communication System

본 발명은 통신시스템의 이중화 장치에 관한 것으로, 특히, 차세대 이동통신시스템(IMT-2000)등과 같이 고성능(High-Performance)을 요구하는 통신시스템에 있어 시스템의 신뢰성 및 가용성을 향상시켜 지속적이고 안정적인 서비스를 제공할 수 있도록 로그-배이스 롤백-복원(Log-Based Rollback-Recovery)방식을 근간으로 하는 이중화 방법에 관한 것이다.The present invention relates to a duplexing device of a communication system, and in particular, in a communication system requiring high performance such as next generation mobile communication system (IMT-2000), etc., to improve the reliability and availability of the system and to provide continuous and stable service. Log-Based Rollback-Recovery (Log-Based Rollback-Recovery) based on the redundancy method to provide a.

통상적인 메시지 전달 시스템(Messge Pass System)에 있어서 통상 사용되는 이중화 방법으로 로그-배이스 롤백-복구(Log-Based Rollback-Recovery) 방법이 있다. 이러한 로그-배이스 롤백-복구 방법은 검사점(Check Point)과 메시지 로깅(Message Logging)을 함께 사용함을 특징으로 하며, 동시에 시스템으로의 메시지 주사 순서(Message Injection Order)가 동일한 경우 시스템은 동일한 작업 수행을 이루도록 되어 있음을 특징으로 한다. 이는 메시지 전달 시스템이 유한상태머신(Finite State Machine ;FSM)으로 구현될 수 있음을 나타내는 것이며, 따라서 시스템의 동작은 발생되는 이벤트(Event)에 따라 항상 일정하게 되는 것이다.Log-Based Rollback-Recovery is a duplication method commonly used in a typical message pass system. This log-base rollback-recovery method uses check point and message logging together, and the system performs the same operation when the message injection order to the system is the same. Characterized in that it is to achieve. This indicates that the message delivery system can be implemented as a finite state machine (FSM), so that the operation of the system is always constant according to the events that occur.

상기한 로그-배이스 롤백-복구 방법에 따른 이중화 방법을 상세히 살펴보면, 먼저 시스템은 로그-배이스 롤백-복구 프로토콜을 통한 동작 수행을 이루는데 이때 메시지의 로깅과 검사점을 하드디스크와 같은 안정적인 저장 장치에 저장하도록 한다. 그러면, 시스템에 있어 장애(Fault)가 발생된 후 시스템 복구(Recovery)를 이루는데, 이때 장애가 발생하기 전의 동작 수행에 있어 저장된 부분에 해당하는 로깅된 메시지의 정보(Information)를 재실행시켜 복구하게 되는 것이다.Looking at the duplication method according to the log-base rollback-recovery method described above, the system first performs the operation through the log-bass rollback-recovery protocol, in which logging and checking points of messages are stored in a stable storage device such as a hard disk. Save it. Then, the system recovers after a fault occurs in the system. In this case, the information of the logged message corresponding to the stored part is re-executed to recover the operation before the fault occurs. will be.

이러한 로그-배이스 방법은 통상 비관 로깅(Pessimistic logging)과 최적화 로깅(Optimistic logging)과, 그리고 임시 로깅(Casual logging) 3가지로 구분되며, 이들 3가지 방법은 다음 표 1에 나타난 바와 같은 각각의 특성을 가지게 된다.These log-basis methods are generally classified into pessimistic logging, optimization logging, and casual logging, each of which has three characteristics as shown in Table 1 below. Will have

Pessimistic loggingPessimistic logging Optimistic loggingOptimistic logging CasualloggingCasuallogging Over HeadOver head 매우 높음Very high 높음height 높음height Output CommitOutput commit 매우 빠름Very fast 느림Slow 빠름speed Garbage CollectionGarbage Collection 단순simple 복잡complication 복잡complication RecoveryRecovery 단순simple 복잡complication 복잡complication OrphansOrphans 불가능impossible 가능possible 불가능impossible Rollback ExtentRollback extent Last 검사점Last checkpoint Some previous 검사점Some previous checkpoint Last 검사점Last checkpoint

한편, 통상적으로 사용되는 통신시스템 또한 메시지에 의해 동작이 이루어지는데, 이러한 점은 통신시스템 또한 앞서 설명한 메시지 전달 시스템(Message Pass System)이라 할 수 있음의 사실을 보여준다. 이는 통신 시스템에 있어서도 상기한 바와 같은 로그-배이스 롤백-복구 방법의 적용에 다른 이중화가 이루어질 수 있음을 나타내는 것이다.Meanwhile, a communication system that is commonly used is also operated by a message, which shows that the communication system may also be referred to as a message pass system. This indicates that even in a communication system, other redundancy may be achieved by applying the log-base rollback-recovery method as described above.

그러나, 통신 시스템은 이중화 대상 모듈(Module)에 있어 장애가 발생시 발생된 모듈의 절체가 이루어져 빠른 시간내에 서비스가 지속적으로 가능해야 하는(가용성)특성을 가져야만 하며, 절체된 후의 서비스 상태와 절체되기 이전 상태의 불일치가 최소화 되어야 하는(신뢰성) 특성을 가져야만 한다. 결국, 가용성을 위해서는 빠른 복구 시간이 요구되어야 하고, 상태의 불일치를 최소화하기 위해서는 출력 커미트와 불요부분 정리(Garbage Collection)의 정확한 수행이 요구되어야 한다. 그리고 정상 동작중 시스템에 있어 오버헤드를 최소화시키기 위해 메시지 로깅에 따른 오버헤드를 최소화하는 동작 수행이 요구되어야 한다.However, the communication system should have the characteristic that the service should be continuously available (availability) as soon as possible due to the changeover of the module generated when a failure occurs in the redundancy target module (Module), and before the transfer to the service state after the changeover. The inconsistency of the states must have a property that should be minimized (reliability). Ultimately, availability requires fast recovery times, and minimizing state inconsistencies requires accurate execution of output commits and garbage collection. And in order to minimize the overhead in the system during normal operation, it is required to perform the operation to minimize the overhead according to message logging.

하지만, 상술한 바와 같은 로그-배이스 롤백-복구 방법에 따른 시스템 이중화는 앞서 표 1을 통해 살펴본 바와 같이 단점 또한 공존하고 있어, 이러한 단점들에 의해 보다 개선된 안정성 및 신뢰성을 요구하는 고성능의 차세대 통신 시스템의 적용에 있어서는 많은 무리가 따랐다. 따라서, 상기한 로그-배이스 롤백-복구 방법에 따른 이중화는 차세대 통신 시스템에 있어 적절히 적용되지 못하는 문제점이 있었다.However, the system redundancy according to the log-base rollback-recovery method as described above also coexists with the disadvantages as described in Table 1 above, so that these disadvantages require high-performance, next-generation communication. There were a lot of crowds in the application of the system. Therefore, the redundancy according to the log-bass rollback-recovery method has a problem that is not properly applied in the next generation communication system.

따라서, 본 발명의 목적은 차세대 이동통신 시스템에서의 이중화에 있어 상기한 로그-배이스 롤백-복구 방법의 적용을 이루는 이중화 장치를 제공함에 있으며, 아울러, 상기 로그-배이스 롤백-복구 방법들이 가지는 단점들을 제거한 이중화 장치를 제공함에 있다.Accordingly, an object of the present invention is to provide a redundancy apparatus for applying the log-bass rollback-recovery method in redundancy in a next-generation mobile communication system, and also has disadvantages of the log-bass rollback-recovery methods. The present invention provides a redundant device.

보다 구체적으로는, 비관 메시지 로깅의 장점과 최적화 메시지 로깅의 장점을 적절히 이용하여 보다 신뢰성있고 안정된 롤백-복구 방법을 적용한 이중화 장치를 제공함에 있다.More specifically, the present invention provides a redundancy apparatus employing a more reliable and stable rollback-recovery method by appropriately utilizing the advantages of pessimistic message logging and optimized message logging.

본 발명에서는 상기한 목적들을 달성함에 있어 다음과 같은 동작 수행을 이루는 이중화 장치의 제안을 이룬다. 먼저, 본 발명에 따른 이중화 장치는 동일한 구조를 갖는 하드웨어 상에서 이중화 대상이 되는 두 보드가 동일한 입력을 받도록 하고, 이때 한 보드는 동작(Active) 모듈로 정상동작을 수행하고 다른 한 보드는 대기(Stan By) 모듈로 입력 메시지를 로깅 하도록 한다. 그리고 동작 모듈이 대기 모듈로 외부 메시지 입력시에 입력되는 메시지에 대한 정보(Information)를 별도의 통신 링크(Communication Link)를 통해 대기 모듈로 비동기적으로 제공하여 대기 모듈에서의 메시지 로깅 순서가 항상 동작 모듈에서의 메시지 수행순서와 일치하도록 한다. 이때, 동작 모듈과 대기 모듈은 상호 메시지의 유실 없이 같은 메시지를 받는 것을 보증 받도록 한다.In the present invention to achieve the above object to achieve a redundancy device that achieves the following operation. First, the redundancy apparatus according to the present invention allows two boards to be duplicated on the hardware having the same structure to receive the same input. At this time, one board performs a normal operation with an active module and the other board stands by. By) Log input message to module. In addition, the message logging sequence in the standby module always operates by asynchronously providing information on the message input when the operation module inputs an external message to the standby module to the standby module through a separate communication link. Match the message execution order in the module. At this time, the operation module and the standby module are guaranteed to receive the same message without losing the mutual message.

이러한 동작 수행을 이루는 이중화 장치의 제안에 의해 비관 메시지 로깅에서의 입력 메시지에 대한 오버 헤드 없이도 비관 메시지 로깅에서 얻을 수 있는 장점을 이용할 수 있으며, 이를 통해 고성능을 요구하는 차세대 통신 시스템에 적합한 이중화 장치의 제안을 이룰 수 있다.With the proposal of a redundancy device that accomplishes this operation, it is possible to take advantage of the advantages of pessimistic message logging without the overhead of input messages in pessimistic message logging. Proposals can be made.

즉, 상기한 바와 같은 동작 수행을 이루는 이중화 장치를 구현하는 것이 본 발명의 목적을 달성하는 기술적 과제가 된다.That is, implementing a duplexing device for performing the above operation is a technical problem to achieve the object of the present invention.

도 1은 본 발명의 바람직한 실시 예에 따른 이동통신시스템에서의 이중화 장치의 내부 블록 구성을 보여주는 도면.1 is a block diagram illustrating an internal block configuration of a duplication apparatus in a mobile communication system according to an exemplary embodiment of the present invention.

이하 본 발명의 바람직한 실시 예를 첨부한 도면들을 참조하여 상세히 설명한다. 우선 각 도면의 구성 요소들에 부가된 참조 부호를 통해 본 발명을 설명함에 있어, 비록 다른 도면상에 표시된 참조 부호일 지라도 동일한 구성 요소를 나타내는 경우에는 동일한 참조부호를 사용하고 있음에 유의해야 한다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings. First, in describing the present invention through the reference numerals added to the components of each drawing, it should be noted that the same reference numerals are used even when the same reference numerals are shown on the other drawings to indicate the same components.

또한 하기 설명에서는 구체적인 회로의 구성 소자 등과 같은 많은 특정(特定) 사항들이 나타나고 있는데, 이는 본 발명의 보다 전반적인 이해를 돕기 위해서 제공된 것일 뿐 이러한 특정 사항들 없이도 본 발명이 실시될 수 있음은 이 기술 분야에서 통상의 지식을 가진 자에게는 자명하다 할 것이다. 그리고 본 발명을 설명함에 있어, 관련된 공지 기능 혹은 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다.Also, in the following description, many specific details such as components of specific circuits are shown, which are provided to help a more general understanding of the present invention, and the present invention may be practiced without these specific details. It is self-evident to those of ordinary knowledge in Esau. In the following description of the present invention, if it is determined that a detailed description of a related known function or configuration may unnecessarily obscure the subject matter of the present invention, the detailed description thereof will be omitted.

도 1은 본 발명의 바람직한 실시 예에 따른 통신 시스템에 있어서의 이중화 장치의 블록 구성도이다.1 is a block diagram of a duplexing apparatus in a communication system according to a preferred embodiment of the present invention.

상기 도 1에 도시된 바와 같이 본 발명에 따른 이중화 장치는 하드웨어적인 구성과 소프트웨어적인 구성으로 구분된다.As shown in FIG. 1, the duplexing apparatus according to the present invention is classified into a hardware configuration and a software configuration.

먼저, 하드웨어적인 구성에 있어서는 이중화 대상이 되는 두 보드가 공통의 구성을 가지고 있게 된다. 이는, 외부의 입력을 공통으로 받을 수 있게 함에 따른 것이다. 그러나, 출력에 있어서는 서로 다른 출력을 발생시킬 수 있도록 하드웨어적인 구성을 갖는다. 그리고, 상대 보드의 정상 동작 및 장애를 감지할 수 있는 하드웨어가 구성된다. 또한 기존의 입출력을 위한 통신 경로 외에 두 보드 사이의 통신을 위해 별도의 통신 링크가 구성된다.First, in the hardware configuration, the two boards to be redundant have a common configuration. This is because the external input can be commonly received. However, the output has a hardware configuration so that different outputs can be generated. Then, the hardware is configured to detect the normal operation and failure of the counterpart board. In addition to the existing communication path for input and output, a separate communication link is configured for communication between the two boards.

다음으로, 소프트웨어적인 구성에 있어서는 각각의 보드가 동작측(Active Side)과 대기측(Stanby Side)으로 구별되며, 서비스는 동작측에서만 지원될 수 있도록 하며 대기측은 메시지의 로깅, 체크포인팅(Check Pointing), 불요부분 정리를 수행한다. 그리고, 상기 동작측은 장애 검사시에 복구 작업 및 절체 동작 수행을 이루고 동시에 동작 모듈로 동작하게 된다.Next, in the software configuration, each board is divided into an active side and a standby side, so that the service can be supported only on the operating side, and the standby side logs and checks points of messages. ), Perform unnecessary cleanup. The operation side performs a recovery operation and a transfer operation at the time of failure inspection, and simultaneously operates as an operation module.

소프트웨어적인 구성의 각 모듈은 다음과 같은 구성을 갖는다.Each module of the software configuration has the following configuration.

- 장애 검출 모듈(Fault Detection Module ; FDM)(110-1,2)Fault Detection Module (FDM) 110-1,2

- 백업 모듈(Back Up Module;BM)(120-1,2)Back Up Module (BM) 120-1,2

- 동기화 모듈(Synchronization Module;SM)(130-1,2)Synchronization Module (SM) 130-1,2

- 복구 모듈(Recovery Module ;RM)(140)Recovery Module (RM) 140

먼저, 장애 검출 모듈(110-1,2)의 동작을 살펴보면,First, look at the operation of the failure detection module (110-1,2),

장애 감지는 하드웨어 감지와 소프트웨어 감지로 구분할 수 있다. 하드웨어 장애 감지는 보드의 장애를 검출하는 것으로 이는 하드웨어적으로 지원된다.Failure detection can be divided into hardware detection and software detection. Hardware failure detection detects board failures, which is supported in hardware.

상기 하드웨어 장애 감지는 두 가지 동작으로 분류된다. 하나는 자체적으로 시스템의 동작에 영향을 미칠 수 있는 보드내의 하드웨어적 장애를 감지하여 상대 보드에 알려주는 동작이며, 다른 하나는 상대 보드의 하드웨어적 장애를 감지하는 동작이다. 본 발명에 있어서 상기 FDM은 상대 보드의 장애가 감지되면 상대 보드의 장애를 상위 프로세서에게 보고하고 그에 따른 절체를 판단한다.The hardware failure detection is classified into two operations. One is to detect hardware failures in the board that may affect the operation of the system by itself, and notify the other board. The other is to detect hardware failure of the other board. In the present invention, when the failure of the counterpart board detects a counterpart board failure, the FDM reports the failure of the counterpart board to a higher processor and determines the transfer.

소프트웨어적인 장애 감지는 소프트웨어에 의해 상대 보드의 장애를 감지하며, 이는 하드웨어적으로 감지할 수 없는 장애를 감지하고 상위 프로세서로 보고하고 그에 따른 절체를 판단한다.Software fault detection detects the fault of the other board by software, which detects the fault that cannot be detected by hardware, reports to the upper processor, and determines the switchover accordingly.

두 번째로, 동기화 모듈(130-1,2)의 동작을 살펴보면,Second, looking at the operation of the synchronization module (130-1,2),

동기화 모듈은 동작측과 대기측에서 응용 타스크(Application Task)들로의 메시지 입력순서(Injection Order)를 일치시키고, 데이터의 백업 시기 동안 대기측에 로깅된 메시지를 제거하는 동작을 수행한다. 이때 동작측과 대기측은 물리층(Phisical Layer)에서 외부 입력 메시지를 거의 동시에 받는다. 하지만 양측 사이드의 처리속도 차이 때문에 프로세서간 통신(IPC)을 통해 두 측이 동시에 같은 순서로 메시지를 받을 수 있다는 보장이 없게 된다. 또한, 동작측에서만 내부 메시지가 발생되며 결과적으로 동작측에서의 메시지는 외부 메시지와 내부 메시지의 혼합 형태로 일련화(Serialization)되어 응용 타스크(Application Task)들로 전달된다. 이때, 복구를 위해 동작측에서와 같은 순서로 대기측에서 메시지가 로깅되어야 한다. 이를 위해 동작측은 외부/내부 입력메시지를 수신하면 이 메시지에 대한 정보를 대기측으로 전달한다. 대기측은 이들 메시지 및 메시지에 대한 정보를 전달 받아 동작측에서 응용 프로그램으로 전달한 순서와 같은 순서로 메시지를 로깅한다. 대기측에서는 각 입력 메시지들에 대해 전체적인 순서를 기억하며, 각 응용 타스크에 대해 분리하여 로깅한다. 이렇게 하는 이유는 대기측으로 데이터가 백업 될 때 해당 타스크의 입력 메시지에서 백업에 관계된 메시지들의 제거를 용이하게 하기 위함이며, 복구 시에 각 타스크로 입력 메시지들을 순서대로 전달할 수 있도록 하기 위함에 따른 동작이 된다. 또한 동작측에서 내부 메시지 전달과정에서 장애가 발생하여 메시지 손실이 발생한 경우, 복구시 외부 메시지가 발생되도록 하기 위함이다.The synchronization module matches the message injection order to application tasks at the operation side and the standby side, and removes the message logged at the standby side during the data backup time. At this time, the operation side and the standby side receive the external input message at the same time at the physical layer. However, because of differences in throughput between the two sides, there is no guarantee that the two sides can receive messages in the same order at the same time through interprocessor communication (IPC). In addition, the internal message is generated only at the operation side, and as a result, the message at the operation side is serialized in a mixed form of the external message and the internal message and transferred to the application tasks. At this time, the message should be logged on the standby side in the same order as on the operation side for recovery. To this end, when the operator receives an external / internal input message, the operator transmits information about the message to the standby side. The standby side receives these messages and the information about them and logs the messages in the same order that they were delivered from the operating side to the application program. The standby side remembers the entire order for each input message and logs them separately for each application task. The reason for this is to facilitate the removal of backup-related messages from the input messages of the task when the data is backed up to the standby side, and the operation to deliver the input messages in order to each task during recovery. do. In addition, when a message loss occurs due to a failure in the internal message transfer process, an external message is generated during recovery.

상기한 동작측과 대기측 사이의 통신 링크에는 프로토콜(Protocol)을 두어 로깅 정보와 백업되는 데이터가 정확히 전달되도록 한다. 동작측에서 대기측으로 데이터를 백업 시킬 때는 자신이 마지막으로 받아 처리한 메시지에 대한 정보를 같이 전달 한다. 이는 불요정보 정리 시에 제거되어야 할 메시지들에 대한 정보를 제공하기 위함이다. 즉, 검사점이 설정된 다음 설정된 검사점의 설정에 관여했던 메시지를 제거하는 작업을 위해 백업 바로 전에 입력된 메시지에 대한 정보가 필요함에 따른 것이다. 상기한 메시지에 대한 정보 전달은 메시지의 헤더(Header)나 시퀀스 넘버(Sequnce Number)등의 기존 정보나, 검사합계(Check sum), CRC와 같은 태그(Tag)를 이용하여 수행한다. 그리고, 데이터의 백업 시기 및 백업 데이터에 대한 결정은 응용 프로그램 프로그래머에 의해 결정된다. 응용 프로그램에서 하나의 입력 메시지에 의해 여러 부분의 데이터가 갱신(Update)되면 이들 갱신된 데이터 각각이 백업되어야 한다. 따라서 백업과는 별도로 검사점 설정을 수행한다. 여기서 검사점이란 하나의 트랜젝션(Transaction)이 종료되었다는 것을 알리는 동작으로 검사점이 설정된 백업 데이터에 대해서만 대기 모듈로의 백업을 수행한다.A protocol is provided in the communication link between the operation side and the standby side so that logging information and backed up data are correctly transmitted. When backing up data from the operating side to the standby side, information about the last message received and processed is also delivered. This is to provide information about the messages to be removed when cleaning up unnecessary information. That is, the information about the message input just before the backup is needed for the task of removing the message that was involved in the setting of the checkpoint after the checkpoint is set. The information transmission on the message is performed by using existing information such as a header or a sequence number of the message, or a tag such as a check sum or a CRC. The backup timing and the backup data are determined by the application programmer. If multiple parts of data are updated by an input message in an application, each of these updated data must be backed up. Therefore, checkpoint setting is performed separately from backup. In this case, the checkpoint is an operation informing that a transaction has ended, and only the backup data for which the checkpoint is set is backed up to the standby module.

세번째로, 데이터 백업 모듈(120-1,2)의 동작을 살펴보면,Third, look at the operation of the data backup module (120-1,2),

데이터 백업은 정상 수행중 동작측이 대기측으로 데이터를 백업하는 경우와, 절체후 동작측이 대기측으로 이중화 대상 데이터를 덤프(Dump) 하는 경우에 발생한다. 정상 동작 수행중 동작측의 각 응용 프로그램들은 자신이 백업 시켜야할 데이터를 알고 있게 되며 해당 데이터가 변경된 경우 이에 대한 백업을 요청해야 한다. 요청 방법은 제공되는 응용프로그램(API)을 사용하며, 이는 기능 레벨(Function Level)로 지원이 이루어진다. 이 API는 해당 데이터와 입력 메시지를 파라미터(Parameter)로 받아 이중화 서버로 큐 기입(Queue Posting)을 한다. 그러면 상기 이중화 서버는 큐 기입을 위하여 메모리를 재할당 받고, 해당 데이터 영역을 재할당 받은 메모리 영역에 복사한다. 복사시에 이 API는 위험지역(Critical Section)을 사용한다. 이 위험지역을 사용하는 이유는 이 영역이 쉐어 영역인 경우 다른 타스크에 의해 발생할 수 있는 간섭으로 인하여 발생되는 데이터의 모순을 제거하고자 함이다. 상기 큐 기입을 하는 이유는 이중화 서버에서 입력 메시지에 대한 정보들과 백업 데이터들을 일련화(Serialize)하기 위함이다. 이중화 서버는 응용 프로그램으로부터 이중화 대상 데이터에 대한 백업 요청을 받으면 시퀀스 넘버를 생성하여 대기측으로 통신 링크를 통해 전송한다. 이때, 이중화 서버는 백업 데이터와 입력 메시지 정보를 구분하지 않고 전송해야할 동일한 데이터로 간주하므로 이들에 대한 처리는 동일하게 수행한다. 데이터 백업의 요청을 앞에서 언급한 바와 같이 기능 레벨로 하게되므로 이중화 관련 기능의 수행시간에 대한 오버헤드는 응용 프로그램이 부담하게 된다. 응용프로그램에서 백업을 요청할 때 백업될 데이터는 복사되어 큐 기입되므로 복사에 의한 오버헤드를 최소화할 책임을 가지며, 따라서 데이터 구조(Data Structure) 정의 등을 통해 오버헤드를 최소화 시켜야 한다.The data backup occurs when the operating side backs up data to the standby side during normal execution, and when the operating side dumps the redundant data to the standby side after the transfer. During the normal operation, each application program on the operating side knows the data to back up, and if the data has changed, it should request a backup. The request method uses the provided application program (API), which is supported at the function level. This API takes relevant data and input messages as parameters and queues them to the redundant server. The duplication server then reallocates memory for writing the queue, and copies the data area to the reallocated memory area. When copying, this API uses Critical Sections. The reason for using this hazardous area is to eliminate data contradictions caused by interference that may be caused by other tasks when this area is a shared area. The reason for writing the queue is to serialize the information on the input message and the backup data in the duplication server. When the replication server receives a backup request for the replication target data from the application program, the replication server generates a sequence number and transmits it to the standby side through the communication link. In this case, since the redundant server regards the backup data and the input message information as the same data to be transmitted without distinguishing them, the processing for them is performed the same. As mentioned above, since the request for data backup is made at the functional level, the overhead of the execution time of the duplication related functions is borne by the application program. When an application requests a backup, the data to be backed up is copied and enqueued, so it is responsible for minimizing the overhead of copying. Therefore, the overhead should be minimized through data structure definition.

대기측이 준비된 상태에서 동작측에 저장되어 있는 데이터를 대기측으로 덤프 시킬 때는 동작측은 백업해야할 데이터의 영역을 알고 있게 된다. 상기 영역은 초기 로깅 시에 이미 결정된다. 이때 백업되는 데이터의 확실성을 보증하기 위해 주의가 필요하며, 데이터 확실성의 보증은 다음과 같이 이루어진다. 우선, 대기측은 자신의 로깅이 끝난 다음 외부로부터 받은 메시지가 있으면 메시지에 대한 정보를 동작측으로 전송하며 데이터의 백업을 요청한다. 상기 동작측은 대기측으로부터 데이터 백업 요청을 받으면 대기측에서 받은 메시지가 있는지 여부를 조사하고 이 메시지 바로 전 메시지까지를 응용 타스크로 주사한다. 만약 없다면 현재 자신이 가지고 있던 메시지까지를 주사하고 더 이상의 메시지를 상기 응용 타스크들로 주사하지 않는다. 그리고, 주사된 메시지의 처리가 끝날 때 까지 기다린 후 모든 처리가 끝났으면 백업 대상 영역에 대한 백업을 시작한다. 이때 동작측과 대기측이 가지고 있는 메시지의 내용은 동일해 진다. 모든 데이터가 백업되었으면 두 측은정상적인 동작을 수행한다.When dumping data stored in the operating side to the standby side while the standby side is prepared, the operating side knows the area of data to be backed up. The area is already determined at the time of initial logging. At this time, care must be taken to ensure the authenticity of the data being backed up, and the data authenticity is as follows. First of all, if there is a message received from the outside after the logging is finished, the waiting side transmits information on the message to the operation side and requests a backup of the data. When the operation side receives a data backup request from the standby side, it examines whether there is a message received at the standby side and scans the message immediately before the message to the application task. If not, it scans up to the message it currently has and does not inject any more messages into the application tasks. After the processing of the scanned message is finished, if all processing is completed, the backup of the backup target area is started. At this time, the contents of the message of the operation side and the standby side are the same. Once all the data has been backed up, both sides perform normal operations.

마지막으로 복구 모듈(140)의 동작을 살펴보면,Finally, looking at the operation of the recovery module 140,

대기측에서 동작측으로 절체된 순간 새로이 동작 상태가 된 모듈은 복구를 수행해야 한다. 복구란 이전 동작이 수행되었던 곳까지를 추적하는 동작으로 이전의 동작측에서 데이터 백업이 완전히 수행되지 않는 경우 새로이 동작 상태가 된 측은 현재 백업되어 있는 데이터를 바탕으로 절체된 순간 이전까지 입력 큐에 저장된 입력 메시지를 처리하여 절체되기 전의 동작측이 수행했던 상태까지를 복구하는 동작을 말하는 것이다.The module that is newly operating when the transfer from the standby side to the operating side needs to perform a recovery. Restoration is an operation to keep track of where the previous operation was performed. If data backup is not completely performed on the previous operating side, the newly operated side is stored in the input queue until the moment when it is transferred based on the currently backed up data. It refers to an operation of recovering up to the state performed by the operating side before the transfer by processing the input message.

절체된 동작측은 정상 동작 상태에서, 동기화 작업에 의해 메시지는 각 타스크 별로 분리되어 로그 되어 있으며 또한 로그 되어 있는 전체 메시지의 순서를 알고 있으므로 복구 모듈은 이 순서에 맞추어 각 메시지를 해당 타스크로 주사 한다. 이때 각 응용 타스크는 정상 상태로 동작을 수행하므로 출력 메시지를 발생시킬 수 있게 된다. 발생된 출력 메시지는 보드 외부로 나가는 메시지일 수도 있으며 내부 메시지일 수도 있다. 내부 메시지의 경우는 목적지(Destination)가 되는 타스크의 로깅된 메시지 버퍼에서 이미 로그되어 있는 메시지인가를 확인하고 로그되어 있는 메시지의 경우는 이 메시지를 폐기하고 그렇지 않으면 그 타스크 메시지 버퍼의 맨 끝에 삽입한다.In the normal operation state, the transferred operating side separates and logs the messages for each task by the synchronization operation, and also knows the order of all the logged messages. Therefore, the recovery module scans each message to the corresponding task in this order. At this time, each application task operates in a normal state, so an output message can be generated. The output message generated may be an outgoing message or an internal message. In case of internal message, it checks whether it is already logged in the logged message buffer of the task that is the destination, and in case of logged message, discards this message and inserts it at the end of the task message buffer. .

상술한 바와 같은 본 발명은 로그-배이스 롤백-복원(Log-Based Rollback-Recovery)방식을 통한 통신시스템의 이중화 장치 구현을 이룰 수 있으며, 특히, 비관 메시지 로깅에서의 입력 메시지에 대한 오버 헤드 없이도 비관 메시지 로깅에서 얻을 수 있는 장점을 이용할 수 있다.As described above, the present invention can achieve a redundancy device of a communication system through a log-based rollback-recovery method, and in particular, pessimism without overhead for input messages in pessimistic message logging. You can take advantage of the benefits of message logging.

따라서, 상대적으로 고성능을 요구하는 차세대 통신 시스템에 적합한 이중화 장치의 제안을 이루는 이점이 있다.Accordingly, there is an advantage of achieving a proposal of a duplication device suitable for a next generation communication system requiring relatively high performance.

Claims

In the redundant device of the communication system,

A failure detection module that reports a failure of the other board to a higher processor and determines a transfer according to the detection of the other board;

A synchronization module that matches the order of message scanning and removes messages logged to the standby during data backup;

An operation module including a backup module which waits until the processing of the scanned message is finished and starts the backup of the backup target area when all processing is finished;

The fault detection module, the synchronization module, the backup module,

A standby module further comprising a recovery module for processing an input message stored in an input queue up to the instant of the transfer from the standby side to the operating side to recover the state performed by the operating side before the transfer;

And a communication link providing a message transmission path between the operation module and the standby module.

The method of claim 1, wherein the operation module and the standby module,

Redundancy device of a communication system, characterized in that it receives the same message input.

The method of claim 1, wherein the operation module and the standby module,

Redundancy device of a communication system, characterized in that for performing each message output.

The method of claim 1,

And transmitting the input message information from the operation module to the standby module through the communication link, and matching the order of messages between the two modules through the transmission of the message information.

The method of claim 1, wherein the operation module and the standby module,

Redundancy apparatus of a communication system, characterized in that for transmitting the input message information from the operation module to the standby module through a communication link installed between the two modules, thereby matching the message order.