JP2005056297A

JP2005056297A - Data recovery method and system therefor in duplex system

Info

Publication number: JP2005056297A
Application number: JP2003288512A
Authority: JP
Inventors: Masatsugu Kimata; 正嗣木全
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2003-08-07
Filing date: 2003-08-07
Publication date: 2005-03-03
Anticipated expiration: 2023-08-07
Also published as: JP4352224B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a new system for data recovery and a method therefor in a duplex system which can speed up resumption for services. <P>SOLUTION: In a data recovery method for the old operational system 100 by executing data replication from an operational system to a standby system when the system is switched, management information regarding the data yet to be replicated in the data replication is stored in a table 190. When the system is switched, the management information regarding the data yet to be replicated on the old operational system 100 is replicated in a table 200 for a new operational system 110 (S12), which enables the required data to be confirmed if it is replicated (S22) when the new operational system 110 executes the service. Thus, the new operational system (110) can recover the data by replicating the required data from the old operational system 100 (S24-S27) when the required data is not replicated (S23). <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は運用系および待機系を有する二重化システムのレプリケーション（メモリ同期）技術に係り、特に系切り替え時のデータ救済方法およびシステムに関する。 The present invention relates to a replication (memory synchronization) technique for a duplex system having an active system and a standby system, and more particularly to a data relief method and system at the time of system switching.

運用系および待機系を有する二重化システムでは、運用系に障害が発生すると、直ちに待機系に切り替えて処理を続行することができる。その際、運用系におけるデータをそのまま待機系に反映させること必要があり、従来より種々のデータ救済方式が提案されている。 In a duplex system having an active system and a standby system, if a failure occurs in the active system, it is possible to immediately switch to the standby system and continue processing. At that time, it is necessary to reflect the data in the active system as it is in the standby system, and various data relief methods have been proposed.

特公平６−８３３１４号公報に開示された「呼情報救済方式」では、運用系からの系切替信号に基づいて、運用系の主メモリに格納されている情報を待機系に転送し、待機系は転送された情報に従って処理を続行する。しかしながら，この方式では運用系から待機系へのデータ転送が完了しない限り処理の続行が不可能となり、サービスの再開が遅れるという問題があった。 In the “call information relief method” disclosed in Japanese Patent Publication No. 6-83314, the information stored in the main memory of the active system is transferred to the standby system based on the system switching signal from the active system, and the standby system Continues processing according to the transferred information. However, this method has a problem that the processing cannot be continued unless the data transfer from the active system to the standby system is completed, and the restart of the service is delayed.

特開平６−６７９７９号公報には、サービス再開を迅速化する方法の一例が開示されている。同公報によれば、二重化プロセッサシステムの現用系および待機系の各々にコピーバック・キャッシュを採用し、各メインメモリの内容が常に一致するように制御する。さらに、各系のメインメモリのライト内容を保存するバッファメモリを設け、系切替が発生した時に、待機系がメモリアクセス毎に現用系のバッファメモリを参照し、待機系に反映されていないデータがある場合には、現用系のバッファメモリの内容を待機系のメインメモリに書き込む。こうして、待機系のバッファメモリを用いることなくデータが救済されるために、障害時の系切り替え処理を高速化することができる。 Japanese Patent Laid-Open No. 6-67979 discloses an example of a method for speeding up service restart. According to the publication, a copy-back cache is employed for each of the active system and standby system of the duplex processor system, and control is performed so that the contents of the main memories always match. In addition, a buffer memory is provided to save the write contents of the main memory of each system. When a system switch occurs, the standby system refers to the active buffer memory for each memory access, and data that is not reflected in the standby system In some cases, the contents of the active buffer memory are written to the standby main memory. In this way, data can be relieved without using a standby buffer memory, so that the system switching process at the time of failure can be speeded up.

特公平６−８３３１４号公報Japanese Patent Publication No. 6-83314 特開平６−６７９７９号公報（段落０００８、００２５）JP-A-6-67979 (paragraphs 0008, 0025)

しかしながら、特許文献２に開示された方式では、メモリアクセス毎に現用系のバッファメモリを検索する必要があるために、現用系と待機系との間の通信量が増大する。このために、特にメモリアクセスの多いシステムの場合にデータ救済の高速化が困難となる。 However, in the method disclosed in Patent Document 2, it is necessary to search the active buffer memory for each memory access, so the amount of communication between the active system and the standby system increases. For this reason, it is difficult to increase the speed of data relief especially in a system with many memory accesses.

そこで、本発明の目的は、サービス再開を高速化できる二重化系における新たなデータ救済方法およびシステムを提供することにある。 Therefore, an object of the present invention is to provide a new data rescue method and system in a duplex system that can speed up service restart.

本発明によれば、レプリケーションが完了していないデータの管理情報を系切り替え時に新運用系にレプリケートし、これにより、新運用系がサービスを実行する際、新運用系側で必要なデータがレプリケーションされているか否かを確認することができる。したがって、系切り替えによる新運用系は、必要なデータがレプリケーションされていない場合は旧運用系から当該必要なデータをレプリケーションさせることによりデータの救済を行うことができる。すなわち、新運用系の処理で必要なデータがレプリケーションを完了していない場合であっても、新運用系の要求により当該データを救済することが可能となる。 According to the present invention, the management information of data that has not been replicated is replicated to the new operation system at the time of system switchover, so that when the new operation system executes a service, the necessary data on the new operation system side is replicated. It can be confirmed whether or not. Therefore, the new operation system by system switching can relieve data by replicating the necessary data from the old operation system when the necessary data is not replicated. That is, even if the data required for the new operation system processing has not been replicated, the data can be relieved by the request of the new operation system.

また、本発明によれば、レプリケーションが完了していないデータの管理情報を系切り替え時に新運用系にレプリケートするために、新運用系側で必要なデータがレプリケーションされているか否かを確認することができ、レプリケーションされていない場合は、そのプロセスを停止して、他のプロセスを実行させることが可能となる。 Further, according to the present invention, in order to replicate the management information of data that has not been replicated to the new active system at the time of system switchover, it is confirmed whether or not necessary data is replicated on the new active system side. If it is not replicated, the process can be stopped and another process can be executed.

本発明の第１の側面によれば、二重化系の一方を運用系、他方の待機系とし、運用系から待機系へのデータレプリケーションを実行することで、系切り替え時に旧運用系のデータを救済する方法において、運用系から待機系へのデータレプリケーションの際に、更新データのうちデータレプリケーションが完了していない残存データに関する管理情報をテーブルに保持し、運用系が切り替わると、新運用系は前記管理情報に基づいて旧運用系の残存データを必要に応じてレプリケーションする、ことを特徴とする。 According to the first aspect of the present invention, one of the duplex systems is set as the active system and the other as the standby system, and data replication from the active system to the standby system is executed, so that the data of the old operating system is relieved at the time of system switching. In this method, during data replication from the active system to the standby system, management information regarding the remaining data that has not been replicated among the update data is stored in a table, and when the active system is switched, the new active system Based on the management information, the remaining data of the old operation system is replicated as necessary.

前記新運用系は、前記旧運用系の残存データをすべてレプリケーションする前に、前記管理情報を参照しながらサービスを開始することが望ましい。その際、前記新運用系は、サービスの実行に必要なデータに関する管理情報が前記テーブルに存在するか否かを判定し、前記テーブルに当該管理情報が存在する場合には、当該データのレプリケーションを完了させるために、当該データのレプリケーションを前記旧運用系へ要求し、当該データのレプリケーションが完了すると、当該データに関する管理情報を前記テーブルから削除する、ことが望ましい。 It is desirable that the new operation system starts a service with reference to the management information before replicating all the remaining data of the old operation system. At that time, the new active system determines whether or not management information related to data necessary for execution of the service exists in the table, and if the management information exists in the table, replication of the data is performed. In order to complete, it is desirable to request replication of the data to the old operational system, and when the replication of the data is completed, delete management information regarding the data from the table.

本発明の第２の側面によれば、二重化系の一方を運用系、他方の待機系とし、運用系から待機系へのデータレプリケーションを実行することで、系切り替え時に旧運用系のデータを救済する方法において、ａ）運用系から待機系へのデータレプリケーションの際に、更新データのうちデータレプリケーションが完了していない残存データに関する管理情報をテーブルに保持し、ｂ）運用系が切り替わると、新運用系は前記管理情報を参照しながら一プロセスを実行し、ｃ）前記一プロセスで前記管理情報に対応する残存データを必要とする場合には、当該一プロセスを所定時間停止して他のプロセスを実行し、ｄ）前記所定時間経過すると、前記ステップｂ）に戻る、ことを特徴とする。 According to the second aspect of the present invention, one of the redundant systems is set as the active system and the other as the standby system, and data replication from the active system to the standby system is executed, so that the data of the old operating system is relieved at the time of system switching. In this method, a) during the data replication from the active system to the standby system, management information regarding the remaining data that has not been replicated among the update data is held in a table, and b) when the active system is switched, a new The active system executes one process with reference to the management information, and c) when the remaining data corresponding to the management information is required in the one process, the one process is stopped for a predetermined time and another process is performed. And d) when the predetermined time has elapsed, the process returns to step b).

したがって、本発明によれば、系切り替え時にレプリケーションの完了を待つことなく新運用系でサービスを再開することができ、サービス再開の高速化が可能となる。すなわち、レプリケーションが完了していないデータの管理情報が新運用系側で保持されるために、新運用系がレプリケーションの完了／未完了を監視することができ、新運用系のサービス実行時に必要になれば、旧運用系からデータをレプリケーションさせることが可能となり、あるいは、当該プロセスを停止して他のプロセスを実行し、所定時間経過後に元のプロセスに戻って処理を続行することも可能となる。このために、救済されるべきデータのレプリケーションが完了していなくても、サービスを再開し続行することができる。 Therefore, according to the present invention, the service can be restarted in the new operation system without waiting for the completion of replication at the time of system switching, and the speed of service restart can be increased. In other words, since the management information of data that has not been replicated is held on the new operating side, the new operating system can monitor the completion / uncompletion of replication, which is necessary when executing the service of the new operating system. If so, it is possible to replicate data from the old operational system, or to stop the process and execute another process, and return to the original process after a predetermined time and continue processing. . For this reason, even if the replication of data to be rescued is not completed, the service can be resumed and continued.

図１は、本発明の一実施形態による二重化系データ救済システムの概略的構成を示す模式的構成図である。本実施形態は、運用(ＡＣＴ)系の処理プロセッサ１００と待機(ＳＢＹ)系の処理プロセッサ１１０との二重化システム構成であり、処理プロセッサ１００で生成されたレプリケーションプロセス１２０と処理プロセッサ１１０で生成されたレプリケーションプロセス１３０とによってレプリケーション(メモリ同期)が実行される。 FIG. 1 is a schematic configuration diagram showing a schematic configuration of a duplex data relief system according to an embodiment of the present invention. The present embodiment has a duplex system configuration of an active (ACT) processor 100 and a standby (SBY) processor 110, and is generated by the replication process 120 and processor 110 generated by the processor 100. Replication (memory synchronization) is executed by the replication process 130.

より詳しくは、運用系の処理プロセッサ１００には、レプリケーションプロセス１２０およびプロセス１４０が生成され、さらに、処理に必要なデータの書き込み及び読み出しを行う共有メモリ１６０と、メモリ同期未完了のデータの管理情報を格納するテーブル１９０とが設けられている。また、処理プロセッサ１００にはキュー１８０が設けられ、プロセス間通信や処理プロセッサ１１０への通信で使用される。 More specifically, the replication processor 120 and the process 140 are generated in the active processor 100, and further, the shared memory 160 that writes and reads data necessary for processing, and the management information of data that has not been synchronized with the memory Are stored in the table 190. Further, the processing processor 100 is provided with a queue 180, and is used for inter-process communication and communication to the processing processor 110.

同様に、待機系の処理プロセッサ１１０には、レプリケーションプロセス１３０およびプロセス１５０が生成され、さらに、処理に必要なデータの書き込み及び読み出しを行う共有メモリ１７０と、テーブル１９０に保持されたデータをレプリケーションするテーブル２００とが設けられている。障害等の理由により運用系処理プロセッサ１００のサービスが停止すると、系が切り替わり、待機系の処理プロセッサ１１０が新運用系としてサービスを開始する。 Similarly, a replication process 130 and a process 150 are generated in the standby processing processor 110, and the shared memory 170 that writes and reads data necessary for processing and the data held in the table 190 are replicated. A table 200 is provided. When the service of the active processor 100 is stopped due to a failure or the like, the system is switched, and the standby processor 110 starts the service as the new active processor.

詳しくは後述するが、処理プロセッサ１１０のレプリケーションプロセス１２０は、共有メモリ１６０の変更内容を待機系処理プロセッサ１１０に伝えると共に、共有メモリ１６０内の変更場所を特定できるデータ（管理情報）をテーブル１９０に保持する。そして、データのレプリケーションが完了すると、当該データに対応する管理情報がテーブル１９０から削除される。したがって、テーブル１９０にはレプリケーションが完了していないデータの管理情報が残存している。処理プロセッサ１１０のテーブル２００は、系切り替え時のレプリケーションにより、処理プロセッサ１００のテーブル１９０からデータを取得し、レプリケーションが完了していない共有メモリ１６０のデータの管理情報として保持する。 As will be described in detail later, the replication process 120 of the processing processor 110 informs the standby processor 110 of the change contents of the shared memory 160 and stores data (management information) that can specify the change location in the shared memory 160 in the table 190. Hold. When the data replication is completed, the management information corresponding to the data is deleted from the table 190. Therefore, the management information of data for which replication has not been completed remains in the table 190. The table 200 of the processing processor 110 acquires data from the table 190 of the processing processor 100 by replication at the time of system switching, and holds it as management information of data in the shared memory 160 that has not been replicated.

このようなレプリケーション処理を行っているときに系が切り替わると、新運用系としての処理プロセッサ１１０は、レプリケーションされた処理プロセッサ１００のデータを用いてサービスを再開するが、この時点ですべてのデータがレプリケーションされているとは限らない。そこで、新運用系のプロセス１５０はテーブル２００を検索することで、サービスに必要なデータがレプリケートされているか否かを判定し、レプリケートされていない場合にはレプリケーションプロセス１３０を通じて旧運用系から必要なデータを取得する。 If the system is switched during such a replication process, the processing processor 110 as the new active system restarts the service using the data of the replicated processing processor 100, but at this point all data is It is not necessarily replicated. Therefore, the new active process 150 searches the table 200 to determine whether or not the data necessary for the service is replicated. If the data is not replicated, the new active process 150 needs the old active system through the replication process 130. Get the data.

このように、レプリケーションが完了していなくても、新運用系は必要なデータを旧運用系からその都度取得することができ、最新のデータを用いてサービスを実行することができる。したがって、新運用系は、サービス再開を早い時点で行うことが可能となる。以下、図２〜図４を参照しながら、本実施形態の全体的動作についてさらに詳細に説明する。 As described above, even if the replication is not completed, the new operation system can acquire necessary data from the old operation system each time, and can execute the service using the latest data. Therefore, the new operational system can perform service restart at an early point. Hereinafter, the overall operation of the present embodiment will be described in more detail with reference to FIGS.

（障害発生前）
図２は、図１に示すデータ救済システムの正常運転時の動作を説明するための模式的構成図である。ここでは、処理プロセッサ１００が運用系、処理プロセッサ１１０が待機系である。 (Before failure)
FIG. 2 is a schematic configuration diagram for explaining an operation during normal operation of the data rescue system shown in FIG. Here, the processing processor 100 is an active system, and the processing processor 110 is a standby system.

まず、運用系処理プロセッサ１００のプロセス１４０は、必要に応じて共有メモリ１６０に対してデータの読み出しや書き込みを実行し（ステップＳ１）、同時に共有メモリ１６０の更新を待機系に反映させるためにレプリケーションプロセス１２０へレプリケーション要求を発行する（ステップＳ２）。 First, the process 140 of the active processor 100 executes data reading and writing to the shared memory 160 as necessary (step S1), and at the same time, replication is performed to reflect the update of the shared memory 160 to the standby system. A replication request is issued to the process 120 (step S2).

レプリケーションを要求されたレプリケーションプロセス１２０は、共有メモリ１６０若しくはプロセス１４０からデータの変更内容と変更個所を特定する管理情報とを取得し（ステップＳ３、Ｓ４）、レプリケーションを実行すると共に（ステップＳ５）、テーブル１９０に変更箇所を特定する管理情報を書き込む（ステップＳ６）。 The replication process 120 requested to replicate acquires the change contents of the data and the management information for identifying the change location from the shared memory 160 or the process 140 (steps S3 and S4), executes replication (step S5), and Management information for specifying the changed part is written in the table 190 (step S6).

レプリケーションにより待機系の共有メモリ１７０にデータ変更が反映されると（ステップＳ７）、レプリケーション完了が通知される（ステップＳ８）。レプリケーションが完了すると、レプリケーションプロセス１２０はテーブル１９０から完了したレプリケーションに対する変更箇所を特定する管理情報を消去する。従って、レプリケーションが完了していないデータに関しては、そのデータの管理情報がテーブル１９０に存在することになる。 When the data change is reflected in the standby shared memory 170 by replication (step S7), the replication completion is notified (step S8). When the replication is completed, the replication process 120 deletes the management information for identifying the changed part for the completed replication from the table 190. Therefore, for data for which replication has not been completed, management information for the data exists in the table 190.

（系切り替え時のレプリケーション）
図３は図１に示すデータ救済システムにおける系切り替え時のレプリケーション動作を説明するための模式的構成図であり、図４は系切り替え時のレプリケーション動作を示すフローチャートである。 (Replication during system switchover)
FIG. 3 is a schematic configuration diagram for explaining a replication operation at the time of system switching in the data rescue system shown in FIG. 1, and FIG. 4 is a flowchart showing the replication operation at the time of system switching.

図４において、運用系処理プロセッサ１００に障害などが発生することで系切り替えが実行され（ステップＳ１０）、処理プロセッサ１００が旧運用系、処理プロセッサ１１０が新運用系となり、旧運用系処理プロセッサ１００が再開されたものとする。 In FIG. 4, system switching is executed when a failure or the like occurs in the active processing processor 100 (step S <b> 10), the processing processor 100 becomes the old operating system, the processing processor 110 becomes the new operating system, and the old operating processing processor 100. Shall be resumed.

再開が完了した旧運用系処理プロセッサ１００は、まず、レプリケーションプロセス１２０に指示してテーブル１９０に格納されている管理情報を新運用系の処理プロセッサ１１０のテーブル２００へレプリケーションさせる。すなわち、レプリケーションプロセス１２０は、テーブル１９０に管理情報が残っているか否かをチェックし（ステップＳ１１）、残存すれば（ステップＳ１１のＹＥＳ）、テーブル１９０内の管理情報を新運用系処理プロセッサ１１０のテーブル２００へコピーし（ステップＳ１２）、管理情報のレプリケーションが完了したらテーブル１９０の内容は全て削除する（ステップＳ１３）。 The old active processor 100 that has completed the restart first instructs the replication process 120 to replicate the management information stored in the table 190 to the table 200 of the new active processor 110. That is, the replication process 120 checks whether or not management information remains in the table 190 (step S11). If it remains (YES in step S11), the management information in the table 190 is transferred to the new active processor 110. The contents are copied to the table 200 (step S12), and when the management information replication is completed, all the contents of the table 190 are deleted (step S13).

管理情報のレプリケーションが完了すると、レプリケーションプロセス１２０は、障害発生前にレプリケーションが完了していなかったデータをレプリケーションさせる。すなわち、レプリケーションプロセス１２０はレプリケーション未完了データがあるか否かを判定し（ステップＳ１４）、レプリケーション未完了データが残っていれば（ステップＳ１４のＹＥＳ）、当該データを新運用系処理プロセッサ１１０へ転送し、レプリケーションプロセス１３０によって共有メモリ１７０に格納される（ステップＳ１５）。こうしてレプリケーションが完了したデータに対応する管理情報は、テーブル２００から削除される（ステップＳ１６）。 When the management information replication is completed, the replication process 120 replicates data that has not been replicated before the failure occurred. That is, the replication process 120 determines whether there is replication incomplete data (step S14). If there is replication incomplete data remaining (YES in step S14), the data is transferred to the new active processor 110. Then, it is stored in the shared memory 170 by the replication process 130 (step S15). The management information corresponding to the data for which replication has been completed in this way is deleted from the table 200 (step S16).

このように、本実施形態によれば、データのレプリケーションに先立って、テーブル１９０内の管理情報のレプリケーションが実行される。この管理情報のレプリケーションが完了すると、次に詳述するように、新運用系処理プロセッサ１１０のサービス開始が可能となる。 Thus, according to the present embodiment, the management information in the table 190 is replicated prior to data replication. When the replication of the management information is completed, the service of the new active processor 110 can be started as will be described in detail below.

（新運用系によるサービス開始時）
図５は図１に示すデータ救済システムにおける新運用系のサービス再開動作を説明するための模式的構成図であり、図６は系切り替え時のレプリケーション動作および新運用系のサービス再開動作を示すシーケンス図である。 (At the time of service start by new operation system)
FIG. 5 is a schematic configuration diagram for explaining the service restart operation of the new active system in the data rescue system shown in FIG. 1, and FIG. 6 is a sequence showing the replication operation at the time of system switching and the service restart operation of the new active system. FIG.

図６において、ステップＳ１１〜Ｓ１６は、図４のステップＳ１１〜Ｓ１６に対応する系切り替え時のレプリケーション動作を示す。上述したように、管理情報のレプリケーションが実行され（ステップＳ１２）、レプリケーションプロセス１３０から完了通知があると（ステップＳ２０）、プロセス１５０は、データレプリケーションの完了（ステップＳ１７）を待つことなく、サービスを開始する（ステップＳ２１）。 In FIG. 6, steps S11 to S16 indicate a replication operation at the time of system switching corresponding to steps S11 to S16 of FIG. As described above, when the management information is replicated (step S12), and the completion notification is received from the replication process 130 (step S20), the process 150 does not wait for the completion of data replication (step S17). Start (step S21).

サービス実行において、共有メモリ１７０に対してデータの読み出し又は書き込みを実行する必要があると、その前にプロセス１５０はテーブル２００を参照し当該データがレプリケーション未完了部分のデータであるか否かを判定する（ステップＳ２２）。すなわち、当該データを示す管理情報がテーブル２００に存在するか否かをサーチする。 In the service execution, when it is necessary to read or write data to the shared memory 170, before that, the process 150 refers to the table 200 and determines whether or not the data is data of an incomplete replication portion. (Step S22). That is, it is searched whether or not management information indicating the data exists in the table 200.

テーブル２００に当該データを示す管理情報が存在する場合には（ステップＳ２３）、レプリケーション未完了と判断し、プロセス１５０はレプリケーションプロセス１３０を通して旧運用系のレプリケーションプロセス１２０へ当該データを含む部分のレプリケーションを完了させるように要求する（ステップＳ２４）。 If the management information indicating the data exists in the table 200 (step S23), it is determined that the replication has not been completed, and the process 150 replicates the part including the data to the old active replication process 120 through the replication process 130. A request is made to complete (step S24).

要求を受けたレプリケーションプロセス１２０は、キュー１８０を検索してレプリケーションを実行しようとしているか否かを確認する（ステップＳ２５）。キュー１８０にレプリケーション実行キューが無い場合には、共有メモリ１６０を参照して該当するデータを取得し（ステップＳ２６）、新運用系のレプリケーションプロセス１３０へ送信してレプリケーションを実行する。 The replication process 120 that has received the request searches the queue 180 to check whether or not replication is to be executed (step S25). If there is no replication execution queue in the queue 180, the corresponding data is acquired by referring to the shared memory 160 (step S26), and transmitted to the replication process 130 of the new active system to execute replication.

これによって、新運用系のレプリケーションプロセス１３０は、共有メモリ１７０にデータを反映してレプリケーションを完了させると（ステップＳ２７）、テーブル２００から該当するデータを削除し（ステップＳ２８）、プロセス１５０へレプリケーションが完了したことを通知する（ステップＳ２９）。通知を受けたプロセス１５０は、共有メモリ１７０へアクセスすることにより更新されたデータを読み出して処理を続行することができる（ステップＳ３０）。 Thus, when the replication process 130 of the new active system reflects the data in the shared memory 170 and completes the replication (step S27), the corresponding data is deleted from the table 200 (step S28), and the replication to the process 150 is performed. The completion is notified (step S29). Receiving the notification, the process 150 can read the updated data by accessing the shared memory 170 and continue the processing (step S30).

以上の手順により、新運用系処理プロセッサ１１０では、サービスが開始した後でも、レプリケーション未完了のデータを救済することが可能となる。従って、救済が必要なすべてのデータレプリケーションが完了（ステップＳ１７）するのを待つ必要が無くなるために、サービス再開を早い時点で開始することができる。 With the above procedure, the new active processor 110 can rescue data that has not been replicated even after the service is started. Accordingly, since it is not necessary to wait for completion of all data replications that require relief (step S17), service resumption can be started at an early point.

（他の実施形態）
上記実施形態では、図６におけるサービス開始（ステップＳ２１）後、新運用系のプロセス１５０が共有メモリ１７０のデータの読み出しや書き込みを実行する時に、テーブル２００を参照してレプリケーションが未完了であるか否かを確認し、未完了であれば、完了要求を発行して当該データのレプリケーションを実行させる。すなわち、レプリケーションの未完了データは、ステップＳ１４において開始され、その後もデータレプリケーションが継続しているのだが、新運用系のプロセス１５０が共有メモリ１７０にアクセスする時点では当該データのレプリケーションは未完了であったわけである。したがって、その時点では未完了であっても、ある時間が経過すればレプリケーションが完了するはずであり、その間に他のプロセスを実行することで全体的な効率を向上させることができる。 (Other embodiments)
In the above embodiment, after the service start in FIG. 6 (step S21), whether or not replication is incomplete with reference to the table 200 when the new active process 150 reads or writes data in the shared memory 170 If it is not completed, a completion request is issued to execute replication of the data. That is, replication incomplete data is started in step S14, and data replication continues thereafter. However, when the new active process 150 accesses the shared memory 170, replication of the data is incomplete. It was there. Therefore, even if it is not completed at that time, replication should be completed after a certain time has elapsed, and the overall efficiency can be improved by executing another process during that time.

このような観点から、本発明の他の実施形態によるデータ救済方法では、テーブル２００を参照してレプリケーションが未完了であれば、プロセス１５０以外のプロセスを先に実行させ、ある時間が経過した後で、再びプロセス１５０の必要とするデータに関してテーブル２００を参照してレプリケーションが未完了であるか否かを判定する。言い換えれば、プロセス１５０が能動的にレプリケーション完了を要求するのではなく、通常のデータレプリケーション手順に従ってレプリケーションが完了するのを待ち、その間、他のプロセスを実行することで全体としてサービスの処理効率を向上させようとするものである。 From this point of view, in the data rescue method according to another embodiment of the present invention, if replication is not completed with reference to the table 200, a process other than the process 150 is executed first, and after a certain time has elapsed. Then, it is determined again whether or not the replication is incomplete by referring to the table 200 for the data required by the process 150. In other words, the process 150 does not actively request the completion of replication, but waits for the replication to complete according to the normal data replication procedure, and during that time, other processes are executed to improve the overall processing efficiency of the service. I will try to let you.

本発明の一実施形態による二重化系データ救済システムの概略的構成を示す模式的構成図である。1 is a schematic configuration diagram showing a schematic configuration of a duplex data relief system according to an embodiment of the present invention. FIG. 図１に示すデータ救済システムの正常運転時の動作を説明するための模式的構成図である。It is a typical block diagram for demonstrating the operation | movement at the time of normal driving | operation of the data relief system shown in FIG. 図１に示すデータ救済システムにおける系切り替え時のレプリケーション動作を説明するための模式的構成図である。FIG. 2 is a schematic configuration diagram for explaining a replication operation at the time of system switching in the data rescue system shown in FIG. 1. 図１に示すデータ救済システムにおける系切り替え時のレプリケーション動作を示すフローチャートである。3 is a flowchart showing a replication operation at the time of system switching in the data rescue system shown in FIG. 1. 図１に示すデータ救済システムにおける新運用系のサービス再開動作を説明するための模式的構成図である。It is a typical block diagram for demonstrating the service resumption operation | movement of the new operation type | system | group in the data relief system shown in FIG. 図１に示すデータ救済システムにおける系切り替え時のレプリケーション動作および新運用系のサービス再開動作を示すシーケンス図である。FIG. 2 is a sequence diagram showing a replication operation at the time of system switching and a service restart operation of a new operation system in the data rescue system shown in FIG. 1.

Explanation of symbols

１００運用系処理プロセッサ
１１０待機系処理プロセッサ
１２０運用系レプリケーションプロセス
１３０待機系レプリケーションプロセス
１４０運用系プロセス
１５０待機系プロセス
１６０運用系共有メモリ
１７０待機系共有メモリ
１８０運用系キュー
１９０運用系テーブル
２００待機系テーブル

DESCRIPTION OF SYMBOLS 100 Active processor 110 Standby processor 120 Active replication process 130 Standby replication process 140 Active process 150 Standby process 160 Active shared memory 170 Standby shared memory 180 Active queue 190 Active table 200 Standby system table

Claims

In the method of relieving the data of the old operation system at the time of system switchover by executing data replication from the active system to the standby system, with one of the redundant systems as the active system and the other standby system,
At the time of data replication from the active system to the standby system, the management information about the remaining data that has not been replicated among the update data is stored in the table.
When the active system is switched, the new active system replicates the remaining data of the old active system as necessary based on the management information.
A data relief method characterized by the above.

The data rescue method according to claim 1, wherein the new operational system starts a service while referring to the management information before replicating all the remaining data of the old operational system.

The new operational system determines whether or not management information related to data necessary for execution of the service exists in the table,
If the management information exists in the table, in order to complete the replication of the data, request the replication of the data to the old operational system,
When replication of the data is completed, management information related to the data is deleted from the table.
The data relief method according to claim 2, wherein:

In the method of relieving the data of the old operation system at the time of system switchover by executing data replication from the active system to the standby system, with one of the redundant systems as the active system and the other standby system,
a) At the time of data replication from the active system to the standby system, the management information regarding the remaining data that has not been replicated among the update data is held in a table,
b) When the active system is switched, the new active system executes one process while referring to the management information,
c) When the remaining data corresponding to the management information is required in the one process, the one process is stopped for a predetermined time and another process is executed.
d) When the predetermined time has elapsed, the process returns to step b).
A data relief method characterized by the above.

One of the dual processing systems consisting of the first processing system and the second processing system is used as the active system and the other standby system, and data replication from the active system to the standby system is performed to relieve data from the old operating system when the system is switched In the system to
Each of the first processing system and the second processing system includes:
Memory for readable and writable storage of data used by the process;
A table for storing management information relating to remaining data for which data replication has not been completed among update data stored in the memory during data replication from the active system to the standby system;
When the processing system switches from the standby system to the active system, the management information is replicated from the other processing system table to the processing system table, and the service is executed in the processing system based on the replicated management information. Control means;
A data relief system comprising:

When the processing system needs data corresponding to the management information existing in the table when the processing system executes the service, the control unit requests the data to be replicated from the other processing system. 6. The data relief system according to claim 5, wherein

When one process of the processing system needs data corresponding to the management information existing in the table when the process executes the service, the control means stops the execution of the one process for a predetermined time and 6. The data rescue system according to claim 5, wherein a process is executed, and the one process is re-executed after the predetermined time has elapsed.