KR20030074982A

KR20030074982A - Software fault tolerance method by software

Info

Publication number: KR20030074982A
Application number: KR1020020014028A
Authority: KR
Inventors: 박경원
Original assignee: 삼성전자주식회사
Priority date: 2002-03-15
Filing date: 2002-03-15
Publication date: 2003-09-22

Abstract

PURPOSE: A software fault tolerance method is provided to embody a software fault tolerance, using a primary/backup redundancy method. CONSTITUTION: When a call is generated in a system, an application(102) requests the assignment of an index ID to a CID(Call Instance Data) table(104) of a primary(100)(111). The CID table(104) generates a CID object to store information about the call(113). The CID table(104) retrieves an idle index ID to be assigned to the call from a free ID pool(115). The CID table(104) assigns the retrieved idle index ID to the call, and transmits the call to the application(102)(117). The application(102) stores call process association data in the corresponding CID object of the CID table(104)(119,121). The CID table(104) transmits the CID object to a CID table(154) of a backup(150)(123). The CID table(154) generates an application session, using information stored in the CID object(125), and transmits the application session to an application(152) of the backup(150)(127). The application(152) requests call process data to the CID table(154)(129). The application(152) processes the call, using the call process data received from the CID table(154). If a release request with respect to the call is generated from the application(102) of the primary(100)(131), the CID table(104) deletes the CID object with respect to the corresponding call(133), and returns the index ID assigned in the call to the free ID pool(135). The CID table(104) notifies the release with respect to the call to the CID table(154) of the backup(150)(137). The CID table(154) deletes the CID object with respect to the corresponding call(139), and requests the application(152) to release the application session corresponding to the call(141).

Description

SOFTWARE FAULT TOLERANCE METHOD BY SOFTWARE}

본 발명은 시스템 상에서의 소프트웨어 폴트 톨러런스(Fault Tolerance)를 구현하기 위한 방법에 관한 것으로, 특히 프라이머리/백업 리던던시(Primary/Backup Redundancy) 기법을 이용하여 소프트웨어 폴트 톨러런스를 구현하기 위한 방법에 관한 것이다.The present invention relates to a method for implementing software fault tolerance on a system, and more particularly to a method for implementing software fault tolerance using a primary / backup redundancy technique.

상기 폴트 톨러런스(Fault Tolerance, 내고장성, 고장감내, 장해허용범위)란 시스템의 일부가 고장이 나도 전체에는 영향을 주지 않고 처리를 계속하며, 그 동안에 고장 부분을 수리하고 교환할 수 있도록 하는 것을 의미한다(혹은 상기 "폴트톨러런스"는 "시스템의 전체, 일부 또는 프로그램이 내부적인 고장의 영향에 견딜 수 있는 정도"를 의미하기도 한다). 종래에는 일반적으로, 폴트 톨로런스 시스템을 구현하기 위해, 하드웨어 폴트 톨러런스 기법을 이용하여 하드웨어의 다중화, 특히 프로세싱 요소의 다중화를 통해 중앙처리장치(Central Processing Unit, CPU)나 주변 요소의 오류를 검출(detection)하고 마스킹(masking)하는 방법을 사용해 왔다. 상기 폴트 마스킹(Fault Masking)이란 시스템에서 어떤 기능을 수행하는데 필요한 기능 단위와, 상기 기능 단위와 같은 동작을 하는 기능 단위를 조합하여, 장해 발생 시 그 장해 발생에 의한 오류가 출력에 나타나지 않도록 하는 방법을 의미한다. 한편, 상기에서 언급한 하드웨어 폴트 톨러런스 기법을 이용할 시에는 다음과 같은 장점이 있다. 첫째, 소프트웨어 이중화 처리 방법에 비하여 복구 시간이 짧으므로, 전체적인 시스템다운타임(System Down Time)이 단축된다. 둘째, 폴트 검출 및 복구 처리를 하드웨어에 의존하므로, 소프트웨어의 복잡성이 감소된다. 또, 상기와 같이 소프트웨어의 복잡성을 감소시킬 수 있음으로 인해, 상기 하드웨어 폴트 톨러런스 기법 이용 시에는 소프트웨어 개발비용이 상대적으로 감소한다.The fault tolerance (Fault Tolerance, fault tolerance, fault tolerance) means that even if a part of the system fails, the processing can continue without affecting the whole, and the fault can be repaired and replaced in the meantime. (Or "fault tolerance" may mean "the degree to which all, parts or programs of the system can withstand the effects of internal failures"). Conventionally, in order to implement a fault tolerance system, a hardware fault tolerance technique is used to detect an error in a central processing unit (CPU) or a peripheral element through multiplexing of hardware, in particular, multiplexing of processing elements. It has been used to detect and mask. The fault masking is a method of combining a functional unit required to perform a function in a system with a functional unit that operates the same as the functional unit, so that an error due to the occurrence of the failure does not appear on the output when the failure occurs. Means. On the other hand, when using the above-mentioned hardware fault tolerance technique has the following advantages. First, since the recovery time is shorter than that of the software duplication processing method, the overall system down time is shortened. Second, since the fault detection and recovery process is hardware dependent, the complexity of the software is reduced. In addition, since the complexity of the software can be reduced as described above, the software development cost is relatively reduced when using the hardware fault tolerance technique.

상기와 같은 장점들이 있는 반면에, 상기 하드웨어 폴트 톨러런스 기법을 이용할 시는 다음과 같은 단점이 있다. 첫째, 항상 요구 조건에 맞는 하드웨어가 필요하며, 대부분의 경우 특정한 하드웨어가 요구되기 때문에 개발 및 생산비용이 매우 크다. 둘째, 자체 개발로 인한 개발 기간의 연장으로 사용 기술 트렌드(trend)에 몇 년 이상 뒤떨어 질 수 있다. 셋째, 하드웨어 기술의 변화에 쉽게 적응하지 못할 수 있다.On the other hand, there are the following disadvantages when using the hardware fault tolerance technique. First, there is always a need for hardware that meets the requirements. In most cases, specific hardware is required, so development and production costs are very high. Second, in-house development can lead to more than a few years of use technology trends. Third, it may not be easy to adapt to changes in hardware technology.

따라서, 상기와 같은 문제점들을 해결하기 위한 본 발명의 목적은 구조적으로 유연한 시스템 폴트 톨러런스를 구현하기 위한 방법을 제공함에 있다.Accordingly, an object of the present invention to solve the above problems is to provide a method for implementing a structurally flexible system fault tolerance.

본 발명의 다른 목적은 하드웨어 기술의 변화에 용이하게 적응할 수 있는 시스템 폴트 톨러런스를 구현하기 위한 방법을 제공함에 있다.Another object of the present invention is to provide a method for implementing a system fault tolerance that can be easily adapted to changes in hardware technology.

본 발명의 또 다른 목적은 시스템 폴트 톨러런스 구현 시의 비용을 감소시킬 수 있는 방법을 제공함에 있다.It is another object of the present invention to provide a method for reducing the cost of implementing a system fault tolerance.

상기 목적들을 달성하기 위해 본 발명은, 프라이머리/백업으로 동작하고, 상기 프라이머리 및 백업이 각각 프리 아이디 풀, 체크포인티드 아이디 풀, 이퀄라이제이션 아이디 풀 및 발생하는 객체들에 대한 정보를 저장하는 씨아이디 테이블을 구비하는 시스템이 폴트 톨러런스를 구현하기 위해 시스템에서 호가 발생할 시 상기 프라이머리에 상기 호에 대한 인덱스 아이디를 요청하는 제 1과정과, 상기 요청을 받은 프라이머리가 상기 발생한 호에 대한 정보를 가지는 씨아이디 객체를 생성하고 이를 씨아이디 테이블에 저장하는 제 2과정과, 프리 아이디 풀을 검색하여 상기 프리 아이디 풀에 존재하는 하나의 인덱스 아이디를 상기 호에 할당하는 제 3과정과, 상기 할당된 아이디를 상기 프리 아이디 풀에서 삭제하고 상기 체크 포인티드 아이디 풀에 저장하는 제 4과정과, 상기 씨아이디 테이블에 저장된 씨아이디 객체에 상기 발생된 호에 대한 정보들을 저장하는 제 5과정과, 상기 체크 포인티드 아이디 풀에 저장된 아이디에 대한 씨아이디 객체를 백업으로 전달하는 제 6과정과, 상기 전달받은 씨아이디 객체를 백업의 씨아이디 테이블에 저장하는 제 7과정과, 상기 호에 대한 해제 요구 시 프라이머리의 씨아이디 테이블에 저장된 상기 씨아이디 객체를 삭제하고, 상기 호에 할당되었던 인덱스 아이디를 프리 아이디 풀로 반환하는 제 8과정과, 백업으로 상기 호에 대한 해제 요청을 하는 제 9과정으로 동작함을 특징으로 한다.In order to achieve the above objects, the present invention operates as a primary / backup, wherein the primary and backup store the information on the pre-identity pool, the checkpointed ID pool, the equalization ID pool, and the generated objects, respectively. The first process of requesting the index ID for the call to the primary when a system is generated in order to implement fault tolerance by the system having the D table, and the requested primary receives information about the generated call. A second step of creating a ID object and storing the ID object in the ID table; and a third step of searching for a free ID pool and assigning an index ID present in the free ID pool to the call; Delete an ID from the free ID pool and store it in the checkpointed ID pool. And a fifth process of storing information on the generated call in the ID object stored in the ID table, and transmitting the ID object for the ID stored in the checkpointed ID pool as a backup. A sixth step; storing the received ID object in a backup ID table; and deleting the ID object stored in the primary ID table when the call is released. And an eighth process of returning the index ID assigned to the free ID pool and a ninth process of requesting the release of the call as a backup.

도 1은 프라이머리와 백업간의 CID 인덱스 아이디 관리 알고리즘을 도시하는 도면1 is a diagram illustrating a CID index ID management algorithm between primary and backup

도 2는 프라이머리와 백업간의 CID 전달동작에 따른 순서흐름도.2 is a flowchart illustrating a CID transfer operation between a primary and a backup.

이하 본 발명에 따른 바람직한 실시 예를 첨부한 도면을 참조하여 상세히 설명한다. 하기에서 각 도면의 구성요소들에 참조부호를 부가함에 있어서, 동일한 구성요소들에 대해서는 비록 다른 도면상에 표시되더라도 가능한 한 동일한 부호를 가지도록 하고 있음에 유의해야 한다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings. In the following description of the reference numerals to the components of the drawings, it should be noted that the same reference numerals as much as possible even if displayed on different drawings.

본 발명은 시스템의 폴트 톨러런스를 소프트웨어로 구현하기 위한 방법에 관한 것이다. 범용 하드웨어 상에서의 소프트웨어 폴트 톨러런스 방식은 하드웨어를 사용하는 것보다 적은 비용을 통해 시스템의 폴트 톨러런스를 구현할 수 있는 방식으로, 워크 스테이션 등 범용 하드웨어의 성능 및 신뢰도의 향상과 더불어 분산 기술의 발전으로 많은 폴트 톨러런스 시스템에 적용되어 가고 있는 방식이다. 상기 소프트웨어 폴트 톨러런스 방식은 하드웨어 폴트 톨러런스 방식에 비해 구조적으로 유연하고, 범용 하드웨어를 사용하기 때문에 하드웨어 기술의 변화에 쉽게 적응할 수 있다. 후술하는 본 발명에서는 소프트웨어 폴트 톨러런스를 구현하기 위해, 특히, 프라이머리/백업(Primary/Backup) 리던던시 방식을 이용한다.The present invention is directed to a method for implementing fault tolerance of a system in software. Software fault tolerance on commodity hardware is a way to implement fault tolerance of the system at a lower cost than using hardware.In addition to improving performance and reliability of commodity hardware such as workstations, many faults have been developed due to the development of distributed technologies. That's how it's being applied to tolerance systems. The software fault tolerance scheme is structurally more flexible than the hardware fault tolerance scheme and can be easily adapted to changes in hardware technology because it uses general-purpose hardware. In the present invention described below, a primary / backup redundancy scheme is particularly used to implement software fault tolerance.

상기 프라이머리/백업 방식을 이용할 시는 상기 프라이머리와 백업간의 호처리를 위해 Call Instance Data(씨아이디, 이하 "CID"라 칭함)를 이용한다. 상기 CID는 상기 프라이머리와 백업간의 동기화 처리를 효과적으로 수행하기 위해 사용된다.When using the primary / backup method, Call Instance Data (hereinafter referred to as “CID”) is used for call processing between the primary and backup. The CID is used to effectively perform the synchronization process between the primary and the backup.

후술할 도 1을 참조하면 상기 프라이머리(100) 및 백업(150)은 각각 프리 아이디 풀(free IdPool)(110)(160), 체크포인티드 아이디 풀(checkpointedIdPool)(120)(170), 이퀄라이제이션 아이디 풀(EqualizationIdPool)(130)(180)의 인덱스 아이디 관리를 위한 세 개의 아이디 풀을 구비하고 있다.Referring to FIG. 1 to be described later, the primary 100 and the backup 150 are free ID pools 110 and 160, and checkpointed ID pools 120 and 170, and equalization. Three ID pools for managing index IDs of an ID pool (EqualizationIdPool) 130 and 180 are provided.

먼저, 프라이머리(100)에서의 CID 수집 및 전달 과정은 다음과 같다. 호가 발생되면 프라이머리(100)의 호 처리를 담당하는 각각의 어플리케이션(102)은 백업(150)으로 전달해야 할 CID들을 선택하여 이를 관리하는 CID 테이블에 각각 저장한다. 이때 CID 정보는 상기 CID 테이블로부터 할당받은 고유의 ID를 인덱스로 하여 각각의 ID가 지정하는 CID 테이블에서 관리하는 각각의 CID 객체(object) 내에 저장된다. 해당 호에 대한 모든 CID가 CID 테이블 내에 저장되면, CID 테이블은 백업(150)으로 해당 CID 및 인덱스 Id 정보를 전달한다. 이는 호당 개별적 데이터 전달 방법이다. 한편, 만약 프라이머리(100)만 구동되고 있다면 백업(150)으로 CID를 전송하지 않고 프라이머리(100)의 CID 테이블 내에 저장 처리만 수행한다. 상기 프라이머리(100)만 구동되는 예로는 백업(150)에서 장애가 발생하여 복구되지 않은 상태를 들 수 있다. 계속적으로 호 처리가 수행되고 있는 상황에서 백업(150)이 새로 구동되는 경우, 프라이머리(100)는 그동안 발생된 모든 호에 대한 각각의 CID를 즉시 백업(150)으로 전송하여 프라이머리(100)와 백업(150)간의 동기화 처리를 수행한다. 이는 일괄적 데이터 전달 방법이다. 다른 한편, 상술한 두 데이터 전달 방법들은 동기화 처리를 수행하는 동안에는 함께 수행될 수도 있다.First, the CID collection and delivery process in the primary 100 is as follows. When a call is generated, each application 102 in charge of call processing of the primary 100 selects CIDs to be delivered to the backup 150 and stores them in a CID table managing them. At this time, the CID information is stored in each CID object managed by the CID table designated by each ID using the unique ID assigned from the CID table as an index. When all CIDs for the call are stored in the CID table, the CID table forwards the CID and index Id information to the backup 150. This is an individual data transfer method per call. On the other hand, if only the primary 100 is being driven, only the storage process is performed in the CID table of the primary 100 without transmitting the CID to the backup 150. An example in which only the primary 100 is driven may be a state in which a failure occurs in the backup 150 and is not recovered. When the backup 150 is newly driven in a situation where call processing is continuously performed, the primary 100 immediately transmits each CID for all calls that have been generated to the backup 150 immediately to the primary 100. And the synchronization process between the backup 150 is performed. This is a batch data delivery method. On the other hand, the above two data transfer methods may be performed together while performing the synchronization process.

한편, 백업(150)은 상기 프라이머리(100)로부터 전달받은 모든 CID들을 프라이머리(100)에서와 동일한 구조로 CID 테이블 내에 복원하기 위한 처리를 수행하여야 한다. 즉, 백업(150)은 상기 프라이머리(100)로부터 전달받은 CID를 함께 수신한 인덱스 Id 정보를 사용하여 CID 테이블 내에 해당 호에 대한 CID를 저장한다. 또, 백업(150)은 CID 정보 중에서 호처리 어플리케이션 세션간의 매핑 정보를 이용해 프라이머리(100)와 동일한 구조로 복원 처리를 수행한다. 복원된 어플리케이션 세션들은 자신에 해당하는 데이터를 CID에서 구해 백업(150)에서의 호처리 수행에 사용하도록 한다.Meanwhile, the backup 150 should perform a process for restoring all CIDs received from the primary 100 in the CID table in the same structure as in the primary 100. That is, the backup 150 stores the CID for the call in the CID table using the index Id information received together with the CID received from the primary 100. In addition, the backup 150 performs the restoration process in the same structure as the primary 100 using mapping information between call processing application sessions among the CID information. The restored application sessions obtain their data from the CID and use it to perform call processing in the backup 150.

한편, 상기 CID들은 해당 호가 종료되면 삭제된다. 하기에서는 상기 CID의 삭제 과정에 대해 기술한다. 호처리가 종료되면 CID 테이블 내에 저장되어 있던 CID 정보들도 해제되어야 한다. 프라이머리(100)로 호처리 해제 요구가 들어오는 경우에, 시스템은 프라이머리(100) 내의 해당 호에 대한 CID를 삭제하고 이를 백업(150)에도 상기 호의 인덱스 Id를 통해 이를 통보하여 백업(150) 내의 CID도 삭제되도록 한다. 하기의 표 1은 인덱스 Id와 CID 객체(Object) 간의 매핑 테이블(Mapping Table)인 씨아이디 테이블의 예를 표시하고 있다.Meanwhile, the CIDs are deleted when the call ends. The following describes the deletion process of the CID. When call processing ends, CID information stored in the CID table must also be released. If a call release request comes into the primary 100, the system deletes the CID for the call in the primary 100 and notifies the backup 150 of the call through the index Id of the call to backup 150. The CID inside is also deleted. Table 1 below shows an example of the ID table, which is a mapping table between the index ID and the CID object.

Index IdIndex Id CID Object PointerCID Object Pointer 00 CID1ObjectCID1Object 1One CID2ObjectCID2Object 22 NULLNULL .............. .............. Max_IndexId_CntrMax_IndexId_Cntr NULLNULL

이하 상기 인덱스 아이디 및 씨아이디 객체들을 통해 이루어지는 본 발명의 폴트 톨러런스 방법을 도면을 참조하여 설명한다.Hereinafter, the fault tolerance method of the present invention made through the index ID and the ID objects will be described with reference to the accompanying drawings.

도 1은 프라이머리와 백업간의 CID 인덱스 아이디 관리 알고리즘을 도시하는 도면이다.1 is a diagram illustrating a CID index ID management algorithm between a primary and a backup.

전술한 바와 같이 프라이머리(100) 및 백업(150)은 각각 프리 아이디 풀(free IdPool)(110)(160), 체크포인티드 아이디 풀(checkpointedIdPool)(120)(170), 이퀄라이제이션 아이디 풀(EqualizationIdPool)(130)(180)을 구비하고 있다. 한편, 상기 프라이머리(100) 및 백업(150)은 시스템의 장애 발생 여부에 따라 상호간에 역할의 변경이 가능한데, 상기 equalizationIdPool(130)(180)은 프라이머리로 동작할 때만 작동된다.As described above, the primary 100 and the backup 150 each include a free ID pool 110, 160, a checkpointed ID pool 120, 170, and an equalization ID pool. 130 and 180 are provided. On the other hand, the primary 100 and the backup 150 can change the role of each other depending on whether a failure of the system, the equalizationIdPool (130) 180 is operated only when operating as a primary.

호처리 관련 데이터가 저장 장소인 씨아이디 객체(CID Object)는 프라이머리(100)의 freeIdPool(110)에 의해 할당받은 free Index Id 에 의해 관리되고, 이 Index Id는 상기 세 개의 Id Pool에 의해 관리된다. 즉, 초기에는 freeIdPool(110)에 저장되어 있는 유휴한 임의의 Index Id를 할당한 CID Table 은 Index Id와 매핑되는 CID Object를 생성하고 이 Index Id 정보와 CID Object Pointer 정보를 Application Session에 전달한다.The CID object where the call processing data is stored is managed by the free Index Id allocated by the freeIdPool 110 of the primary 100, and the Index Id is managed by the three Id Pools. do. That is, initially, the CID Table that allocates any idle Ids stored in the freeIdPool 110 creates a CID Object mapped to the Index Id, and transfers the Index Id information and the CID Object Pointer information to the application session.

Application Session은 할당받은 Index Id 정보를 저장하고 CID Object Pointer를 이용하여 CID Object에 호처리 관련 데이터를 저장하고, 상기 호처리 관련 데이터가 모두 저장되면 CID Table은 이를 백업(150)으로 전송한다. 이때 Index Id 정보는 checkpointedIdPool(120) 에 다시 저장된다.The Application Session stores the allocated Index Id information, stores the call processing related data in the CID Object using the CID Object Pointer, and when all the call processing related data are stored, the CID Table transmits it to the backup 150. At this time, the Index Id information is stored in the checkpointedIdPool 120 again.

프라이머리(100)로부터 상기 CID Object를 전송 받은 백업(150)은 자신의 freeIdPool(160)과 checkpointedIdPool(170)에서 각각 해당 Index Id 를 삭제하고 추가한다. 또한 CID 정보를 저장하기 위한 CID Object를 생성하고 CID 를 저장한다. 또한 CID 정보 중에서 호 처리 Application Session 간의 매핑 정보를 이용해 프라이머리(100) 와 동일한 Application Session들을 생성한다. 상기 생성된 Application 들은 CID Table에 호처리 관련 데이터를 요청하여 이를 자신의 영역에 저장한다.The backup 150 receiving the CID Object from the primary 100 adds and deletes the corresponding Index Id in its freeIdPool 160 and checkpointedIdPool 170, respectively. In addition, it creates a CID object for storing CID information and stores the CID. In addition, the same application sessions as the primary 100 are generated using mapping information between call processing application sessions among the CID information. The generated applications request call processing related data from the CID table and store it in their own area.

도 2는 프라이머리와 백업간의 CID 전달동작에 따른 순서흐름도이다.2 is a flowchart illustrating a CID transfer operation between a primary and a backup.

이하 상기 도 1 및 도 2를 참조하여 본 발명의 동작을 상세히 설명한다.Hereinafter, the operation of the present invention will be described in detail with reference to FIGS. 1 and 2.

시스템에서 호가 발생할 시 어플리케이션(102)은 제 111단계에서 프라이머리(100)의 CID 테이블(104)로 인덱스 아이디 할당을 요청한다. 제 113단계에서 CID 테이블(104)은 상기 호에 대한 정보를 저장할 CID 객체를 생성한다. 제 115단계에서 CID 테이블(104)은 freeIdPool(110)에서 상기 호에 할당할 유휴한 인덱스 아이디를 검색한다. 제 117단계에서 CID 테이블(104)은 상기 검색한 유휴 인덱스 아이디를 상기 호에 할당하여 어플리케이션(102)으로 전달한다. 제 119단계 및 제 121단계에서 상기 어플리케이션(102)은 상기 호에 대해 발생하는 호처리관련 데이터들을 상기 CID 테이블(104)의 해당 CID 객체에 저장한다. 제 123단계에서 상기 CID 테이블(104)은 상기 CID 객체를 백업(150)의 CID 테이블(154)로 전달한다. 상기 CID 테이블(154)은 상기 CID 객체에 저장된 정보를 이용하여 어플리케이션 세션을 생성하여 이를 백업(150)의 어플리케이션(152)으로 전달한다. 어플리케이션(152)은 상기 CID 테이블(154)로 상기 호에 대한 호처리 데이터를 요청한다. 상기 CID 테이블(154)로부터 전달받은 호처리 데이터를 이용하여 상기 어플리케이션(152)은 상기 호에 대한 처리들을 수행할 수 있게 된다.When a call occurs in the system, the application 102 requests an index ID allocation to the CID table 104 of the primary 100 in step 111. In step 113, the CID table 104 generates a CID object to store information about the call. In step 115, the CID table 104 searches for an idle index ID to be allocated to the call in the freeIdPool 110. In step 117, the CID table 104 assigns the retrieved idle index ID to the call and transmits the retrieved idle index ID to the application 102. In steps 119 and 121, the application 102 stores call processing related data generated for the call in a corresponding CID object of the CID table 104. In step 123, the CID table 104 transfers the CID object to the CID table 154 of the backup 150. The CID table 154 generates an application session using the information stored in the CID object and transfers it to the application 152 of the backup 150. Application 152 requests call processing data for the call to the CID table 154. Using the call processing data received from the CID table 154, the application 152 may perform the processes for the call.

한편, 프라이머리의 어플리케이션(102)으로부터 상기 호에 대한 해제 요구가 발생할 시(제 131단계), CID 테이블(104)은 제 133단계에서 상기 해당 호에 대한 CID 객체를 삭제하고, 제 135단계에서 상기 호에 할당되었던 인덱스 아이디를 freeIdPool(110)로 반환한다. 즉, 상기 인덱스 아이디를 다시 상기 freeIdPool(110)의 아이디들에 추가하여 유휴한 아이디로 관리한다. 제 137단계에서 상기 CID 테이블(104)은 상기 호에 대한 해제를 백업(150)의 CID 테이블(154)로 통보한다. 이에 따라, 상기 CID 테이블(154)은 제 139단계에서 해당 호에 대한 CID 객체를 삭제하고, 제 141단계에서 어플리케이션(152)으로 상기 호에 해당하는 어플리케이션 세션을 해제할 것을 요청한다.On the other hand, when a release request for the call occurs from the primary application 102 (step 131), the CID table 104 deletes the CID object for the corresponding call in step 133, and in step 135 The index ID assigned to the call is returned to freeIdPool 110. That is, the index ID is added to the IDs of the freeIdPool 110 again and is managed as an idle ID. In step 137, the CID table 104 notifies the CID table 154 of the backup 150 of the release of the call. Accordingly, the CID table 154 deletes the CID object for the call in step 139 and requests the application 152 to release the application session corresponding to the call in step 141.

상술한 과정을 통해 상기 프라이머리(100)와 백업(150)간의 동기화 처리 과정이 완료된다. 이를 통해, 이후 프라이머리(100)나 백업(150) 중 하나에 오류가 발생되어 다시 복구되는 경우에도 상대편이 데이터를 저장하고 있는 상태이므로, 복구되는 순간 일괄적 데이터 복구 처리가 수행되어 정상 동작하고 있는 쪽으로부터 장애가 발생한 쪽으로 현재 호처리 중인 데이터가 바로 전달되어 복구처리가 가증하게 된다.Through the above-described process, the synchronization process between the primary 100 and the backup 150 is completed. Through this, even when an error occurs in one of the primary 100 and the backup 150, the other side is storing data even when an error occurs and recovers it again. The data currently being processed is transferred immediately from the side to the side of the failure, making the recovery process abominable.

상술한 바와 같이 소프트웨어를 이용하는 본 발명은 하드웨어를 이용할 시보다 적은 비용으로 폴트 톨러런스 시스템을 구현할 수 있다. 또, 본 발명은 하드웨어 폴트 톨러런스에 비해 구조적으로 유연하며, 하드웨어 기술의 변화에 용이하게 적응할 수 있다.As described above, the present invention using software can implement a fault tolerance system at a lower cost than using hardware. In addition, the present invention is structurally flexible compared to hardware fault tolerance, and can be easily adapted to changes in hardware technology.

Claims

Fault tolerance in a system operating as primary / backup, wherein the primary and backup each have a pre-ID pool, a checkpointed ID pool, an equalization ID pool, and a seed table that stores information about the objects that occur. In the method,

Requesting an index ID for the call to the primary when a call occurs in the system;

A second step of the primary receiving the request generating a CD object having information on the generated call and storing the ID object in a CD table;

A third step of searching for a free ID pool and assigning one index ID existing in the free ID pool to the call;

A fourth process of deleting the allocated ID from the free ID pool and storing the allocated ID in the check pointed ID pool;

A fifth process of storing information on the generated call in a CD object stored in the ID table;

A sixth process of transferring the ID object for the ID stored in the checkpointed ID pool to a backup;

A seventh process of storing the received ID object in a backup ID table;

An eighth step of deleting the ID object stored in the ID table of the primary upon requesting release of the call, and returning the index ID assigned to the call to the free ID pool;

And a ninth process of requesting release of the call as a backup.

The method of claim 1,

In the sixth process,

The fault tolerance method, characterized in that when the backup does not operate normally, the ID stored in the checkpointed ID pool is stored in the equalization ID pool and transferred to the backup when the backup is normally operated.