JP2002373084A

JP2002373084A - Method for both exchanging states and detecting failure of duplex system

Info

Publication number: JP2002373084A
Application number: JP2001179481A
Authority: JP
Inventors: Munetoshi Tsuge; 宗俊柘植
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2001-06-14
Filing date: 2001-06-14
Publication date: 2002-12-26

Abstract

PROBLEM TO BE SOLVED: To provide a method by which one of two computer systems detects a failure of the other computer system by reflecting change of various states stored in a local memory of the one system on a local memory of the other system by using a shared memory and also utilizing its state reflection mechanism in the two computer systems having a storage device of both the local memory and the shared memory. SOLUTION: The shared memory is provided with a ring buffer, and the one system writes information representing change contents to the ring buffer each time the one system changes various states of its local memory. The other system reads the information from the ring buffer and applies state change to the local memory. The one system also detects a failure of the other system or a shared memory mechanism by periodically monitoring the value of a pointer rewritten in accordance with operation to the ring buffer by the other system.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、一方の系のみが参
照できる局所記憶領域と両系が互いに参照できる共有記
憶領域の両方の記憶装置を持つ二つの計算機系におい
て、一方の系がその局所記憶領域内に持つ各種状態を、
状態変化に応じて共有記憶領域を経由して他方の系へ順
次引き渡すことによって、両系の局所記憶領域に保持さ
れている各種状態を同期させるとともに、その状態引渡
しのための機構を利用して、一方の系が他方の系の障害
を検出するための技術に関する。The present invention relates to two computer systems having both a local storage area that can be referred to by only one system and a shared storage area that can be referred to by both systems. Various states in the storage area,
By sequentially transferring to the other system via the shared storage area according to the state change, the various states held in the local storage areas of both systems are synchronized, and a mechanism for transferring the state is used. The present invention relates to a technique for one system to detect a failure in the other system.

【０００２】[0002]

【従来の技術】計算機やネットワークの普及により、多
数のユーザが各自の計算機からネットワークを介してサ
ーバ計算機へ接続し、そこで提供されるサービスを受け
るという用途が広がっている。このような形態のサービ
スにおいては、サービスを受けるユーザの数やサービス
の内容に応じてサーバ計算機やネットワークに高い信頼
性が求められるため、サーバ計算機、およびネットワー
ク内に設置されてデータ転送を行なう計算機の一種であ
るルータや交換機の信頼性向上が課題となっている。2. Description of the Related Art With the widespread use of computers and networks, applications in which a large number of users connect to server computers from their own computers via a network and receive services provided therefrom are expanding. In such a service, the server computer and the network require high reliability according to the number of users who receive the service and the contents of the service. Therefore, the server computer and the computer installed in the network and performing data transfer. One of the issues is how to improve the reliability of routers and exchanges.

【０００３】これらの計算機において信頼性の向上を達
成する手法の一つとして、計算機の二重化が挙げられ
る。この手法では、計算機の内部に同一機能を有する２
系統の計算機（運用系と待機系）を設け、運用系が計算
機全体に本来課された処理を行い、待機系は運用系の停
止に備えて待機する。そして、運用系に障害が発生した
場合は、その運用系は待機系となり、今まで待機系とし
て動作していた系が新たな運用系となって継続動作を行
なうことにより、計算機全体の停止を回避する。このよ
うに、二重化された計算機において運用系と待機系を入
れ替えることを、系切替と呼ぶ。One of the techniques for improving the reliability of these computers is to duplicate the computers. In this method, two computers having the same function
A system computer (operating system and standby system) is provided, the operating system performs processing originally imposed on the entire computer, and the standby system stands by in preparation for stopping the operating system. When a failure occurs in the active system, the active system becomes a standby system, and the system that has been operating as the standby system becomes a new active system and continues to operate, thereby stopping the entire computer. To avoid. In this way, switching the active system and the standby system in a duplicated computer is called system switching.

【０００４】二重化を用いて計算機の信頼性を向上させ
る場合、系切替中に起こるサービス停止時間を如何に短
くするか、および、系切替前まで提供していたサービス
を如何にして系切替後も継続して行なうかが問題とな
る。これらの問題を解決するためには、運用系における
障害発生を素早く待機系が検知すること、旧運用系が系
切替直前に保持していた各種状態を新運用系で忠実に再
現すること、系切替を開始してから新運用系で各種状態
を旧運用系の通りに再現するまでの時間が十分短いこ
と、の３つの条件を満たす必要がある。[0004] When the reliability of a computer is improved by using duplication, how to shorten the service stop time that occurs during system switchover and how to provide services provided before the system switchover after system switchover are performed. The question is whether to continue. To solve these problems, the standby system must quickly detect the occurrence of a failure in the active system, and the new operating system must faithfully reproduce the various states held by the old operating system immediately before system switching. It is necessary to satisfy three conditions that the time from the start of switching to the reproduction of various states in the new operating system as in the old operating system is sufficiently short.

【０００５】１つ目の条件において検知しなければなら
ない障害には、大きく分けてハードウェア障害とソフト
ウェア障害の２種類が挙げられる。ハードウェア障害の
検知については、障害検知用のハードウェアを設けるこ
とによって実現するのが一般的だが、本発明ではこれを
議論の対象としない。ソフトウェア障害の検知について
は、運用系のソフトウェアに通信等の処理を定期的に行
わせ、その処理結果を待機系が監視するという手法を採
るのが一般的である。[0005] Faults that must be detected under the first condition can be roughly classified into two types: hardware faults and software faults. The detection of hardware failure is generally realized by providing hardware for failure detection, but this is not discussed in the present invention. To detect a software failure, it is common to employ a method in which the active software periodically performs processing such as communication, and the result of the processing is monitored by the standby system.

【０００６】２つ目と３つ目の条件を満たす従来手法と
しては、共有メモリを用いて両系で状態を共有する手法
と、運用系と待機系に全く同一の動作をさせる手法の２
つが挙げられる。前者の共有メモリ従来手法では、運用
系は系切替時に新運用系へ引き渡さなければならない状
態全てを共有メモリ上に置き、系切替後は新運用系がそ
の状態をそのまま用いて動作を継続する。後者の同一動
作手法では、計算機全体が外部から受信した各種データ
を両系が全く同じように受信し、そのデータを両系が全
く同じように処理し、その処理結果のデータを全く同じ
ように両系が送信する（ただし、計算機全体の外部に送
信されるデータは、運用系が送信したものだけである）
ことによって、両系の状態を同一にする。Conventional methods satisfying the second and third conditions include a method of sharing a state between the two systems using a shared memory, and a method of causing the active system and the standby system to perform exactly the same operation.
One is. In the former method of the shared memory, the active system puts all states that must be delivered to the new operating system at the time of system switching on the shared memory, and after the system switching, the new operating system continues operation using the state as it is. In the latter same operation method, both systems receive various data received from outside by the entire computer in exactly the same way, both systems process the data exactly in the same way, and the data of the processing result is exactly the same. Both systems transmit (however, the only data transmitted outside the computer is the one transmitted by the active system)
This makes the state of both systems the same.

【０００７】[0007]

【発明が解決しようとする課題】共有メモリ従来手法で
は、両系で共有する状態を全て共有メモリ上に持たなけ
ればならないため、状態の記憶に必要なメモリ容量が大
きい場合はそれに見合うだけの大容量共有メモリを持た
なければならず、ハードウェアの製造コストがかさんで
しまう。また、系切替後の新運用系は、障害が発生する
直前まで旧運用系が用いていたデータ構造をそのまま用
いて旧運用系の状態を引継ぐため、旧運用系に障害が発
生した原因が状態のデータ構造の異常である場合は、新
運用系が同じ原因で停止したり、状態の一部を引継げな
い可能性がある。In the conventional method of shared memory, all states shared by both systems must be stored in the shared memory. Therefore, if the memory capacity required for storing the state is large, a large enough memory is needed. It must have a shared capacity memory, which increases the hardware manufacturing cost. Also, after the system switchover, the new active system takes over the state of the old active system by using the data structure used by the old active system as it was until immediately before the failure occurs. If the data structure is abnormal, the new operating system may stop for the same reason or may not be able to take over part of the state.

【０００８】一方、同一動作手法では、両系が全く同一
の動作を行なうため、障害の原因がハードウェアやソフ
トウェアのバグである場合は、両系で同時に障害が発生
して、二重化システムとしての意味を為さない可能性が
ある。On the other hand, in the same operation method, since both systems perform exactly the same operation, if the cause of the failure is a bug in hardware or software, a failure occurs in both systems at the same time, resulting in a redundant system. May not make sense.

【０００９】また、従来は両系の状態同期機能とソフト
ウェア障害検知機能が全く別個のモジュールとして実装
されていたため、実装に手間がかかる、障害検知のため
に運用系に余計な負荷がかかる、といった問題があっ
た。Conventionally, since the status synchronization function and the software failure detection function of both systems are implemented as completely separate modules, it takes time and effort to implement, and an extra load is imposed on the operation system for failure detection. There was a problem.

【００１０】本発明は、少量の共有メモリを用いて両系
の状態同期を行ない、かつ、両系が状態データを直接共
有せず、かつ、その状態同期機構を用いて一方の系が他
方の系のソフトウェア障害を検出するためのものであ
る。According to the present invention, the statuses of the two systems are synchronized by using a small amount of shared memory, the two systems do not directly share the status data, and one of the two systems uses the state synchronization mechanism to allow the other system to perform the status synchronization. This is for detecting a software failure in the system.

【００１１】[0011]

【課題を解決するための手段】本発明では、運用系およ
び待機系は、共有メモリに加え、一方の系しかアクセス
できない局所メモリをそれぞれ持ち、その局所メモリに
各自の系内で用いる状態データを格納することとする。
そして、運用系で動作しているＯＳ（Ｏｐｅｒａｔｉｎ
ｇＳｙｓｔｅｍ）やアプリケーションは、運用系の局
所メモリに記録されている状態の変更を行なう度に、ど
のような変更を行なったかを表す状態変更情報を、共有
メモリを経由して待機系へ逐次引き渡す。待機系は、受
け取った状態変更情報を元にして、待機系の局所メモリ
に記録されている状態を変更する。本発明では、運用系
のＯＳやアプリケーションが状態変更情報を待機系のＯ
Ｓやアプリケーションへ引き渡す手順については統一す
るが、状態変更情報そのもののデータ形式や解釈につい
ては規定せず、各ＯＳ・アプリケーションが独自に規定
することとする。これにより、限られた容量の共有メモ
リを効率良く用いると同時に、ＯＳやアプリケーション
それぞれの特性に適したデータ形式で状態変更情報を伝
達することができる。According to the present invention, the active system and the standby system each have, in addition to the shared memory, a local memory that can be accessed only by one of the systems, and store the state data used in each system in the local memory. Shall be stored.
Then, the OS (Operatin) operating in the operation system
g System) or the application sequentially transfers state change information indicating what kind of change has been made to the standby system via the shared memory each time the state recorded in the local memory of the active system is changed. . The standby system changes the state recorded in the local memory of the standby system based on the received state change information. In the present invention, the operating system OS or application transmits the status change information to the standby system OS.
The procedure for delivering to the S and the application is unified, but the data format and interpretation of the status change information itself are not defined, and each OS / application defines it independently. As a result, it is possible to efficiently use the limited capacity of the shared memory and to transmit the state change information in a data format suitable for the characteristics of the OS and the application.

【００１２】共有メモリにはリングバッファを設け、こ
れを系間の状態変更情報引渡しに用いる。リングバッフ
ァは、バッファに貯まっているデータを保持するバッフ
ァメモリ領域、バッファメモリ領域内で実際にデータが
記録されている範囲の先頭位置を示す先頭ポインタ、末
尾位置を示す末尾ポインタで構成される。運用系のＯＳ
やアプリケーションが生成した状態変更情報を共有メモ
リへ書き込む際には、状態変更情報にその生成元のＯＳ
・アプリケーションを表すアプリＩＤを付加したデータ
を作成し、それをリングバッファのバッファメモリ領域
に、末尾ポインタが指す位置から書き込む。そして、そ
のデータの長さの分だけ末尾ポインタの値をずらす。待
機系が状態変更情報を共有メモリから読み出す際には、
先頭ポインタが指す位置から順にリングバッファのバッ
ファメモリ領域を読み出し、そのデータに含まれる状態
変更情報を、そのデータに含まれるアプリＩＤが示すＯ
Ｓ・アプリケーションへ引き渡す。そして、読み出した
データの長さの分だけ先頭ポインタの値をずらす。A ring buffer is provided in the shared memory, and is used for transferring state change information between systems. The ring buffer includes a buffer memory area for holding data stored in the buffer, a head pointer indicating a head position of a range where data is actually recorded in the buffer memory area, and a tail pointer indicating a tail position. Operating OS
When writing the status change information generated by the application or the application to the shared memory, the status change information includes
Create data to which an application ID representing an application is added, and write it to the buffer memory area of the ring buffer from the position indicated by the end pointer. Then, the value of the tail pointer is shifted by the length of the data. When the standby system reads the status change information from the shared memory,
The buffer memory area of the ring buffer is read in order from the position indicated by the head pointer, and the state change information included in the data is indicated by the application ID indicated by the application ID included in the data.
S ・ Deliver to application. Then, the value of the leading pointer is shifted by the length of the read data.

【００１３】上記したリングバッファの仕組みから分か
るように、運用系が状態変更情報を書き込むと末尾ポイ
ンタの値が変わり、待機系が状態変更情報を読み出すと
先頭ポインタの値が変わる。本発明では、この特徴を用
いて他系のソフトウェア障害を検出する。すなわち、待
機系は定期的にリングバッファの末尾ポインタを監視
し、一定期間その値が変わらなければ、運用系（か、あ
るいは共有メモリ機構）に障害が発生したと判断する。
ただし、この障害検出方式を正しく動作させるために、
運用系は待機系へ伝えるべき状態変更情報が無いとき
も、状態変更がないことを表す状態変更情報（以降、こ
れをＮＯＰ情報と称す）を定期的にリングバッファへ書
き込む。待機系は、読み出した状態変更情報がＮＯＰ情
報である場合は、それをどのＯＳ・アプリケーションに
も引き渡さずに、破棄する。さらに、一定期間以内に待
機系が（ＮＯＰ情報を含む）状態変更情報をリングバッ
ファから読み出すこととすれば、同様に、運用系は先頭
ポインタを監視することによって待機系の障害を検出で
きる。As can be seen from the above-described mechanism of the ring buffer, the value of the end pointer changes when the active system writes the state change information, and the value of the start pointer changes when the standby system reads the state change information. In the present invention, a software failure of another system is detected using this feature. That is, the standby system periodically monitors the tail pointer of the ring buffer, and if the value does not change for a certain period of time, determines that a failure has occurred in the active system (or the shared memory mechanism).
However, in order for this failure detection method to work properly,
Even when there is no state change information to be transmitted to the standby system, the active system periodically writes state change information indicating that there is no state change (hereinafter, this is referred to as NOP information) to the ring buffer. If the read state change information is NOP information, the standby system discards the information without passing it to any OS / application. Furthermore, if the standby system reads the state change information (including the NOP information) from the ring buffer within a certain period, the active system can detect the failure of the standby system by monitoring the head pointer.

【００１４】[0014]

【発明の実施の形態】以下、図面を用いて本発明につい
て説明する。DESCRIPTION OF THE PREFERRED EMBODIMENTS The present invention will be described below with reference to the drawings.

【００１５】図１は二重化された計算機のシステム構成
を示している。FIG. 1 shows a system configuration of a duplicated computer.

【００１６】本システムは、計算機全体１００の中に、
同一機能を有する２系統の計算機１１０−Ａ、１１０−
Ｂを有し、それら２系統の計算機間は共有メモリ同期機
構１２０で接続されている。本実施例では、共有メモリ
の実体となる記憶装置を両系が別個に持ち、それらの記
憶装置間を共有メモリ同期機構１２０で接続し、一方の
記憶装置に書きこまれたデータを共有メモリ同期機構１
２０が自動的に他方の記憶装置へコピーすることによっ
て共有メモリを実現する場合を想定する。しかし、本発
明はこの方式に依らない共有メモリでも実施可能であ
る。The present system includes:
Two systems of computers 110-A and 110- having the same function
B, and these two computers are connected by a shared memory synchronization mechanism 120. In the present embodiment, both systems have a storage device that is an entity of the shared memory separately, these storage devices are connected by the shared memory synchronization mechanism 120, and the data written to one storage device is shared memory synchronization. Mechanism 1
Assume that the shared memory 20 is realized by automatically copying the data to the other storage device. However, the present invention can be implemented with a shared memory that does not rely on this method.

【００１７】本実施例では、２系統の計算機１１０−
Ａ、１１０−Ｂの間が、共有メモリ同期機構１２０の他
に、系間通信回線１３０でも接続されている。この系間
通信回線１３０は、一方の系が共有メモリを用いて他方
の系あるいは共有メモリ同期機構に障害が発生したこと
を検知した際に、他方の系と共有メモリ同期機構のいず
れの障害であるのかを判別するためだけに用いられる。
本発明は、他の手段でこの判別を行なうことによっても
実施可能である。In this embodiment, two systems of computers 110-
A and 110-B are connected by an inter-system communication line 130 in addition to the shared memory synchronization mechanism 120. When one system detects that a failure has occurred in the other system or the shared memory synchronization mechanism using the shared memory, the inter-system communication line 130 is activated by either failure of the other system or the shared memory synchronization mechanism. Used only to determine if there is.
The present invention can also be implemented by making this determination by other means.

【００１８】本実施例では、２系統の計算機１１０−
Ａ、１１０−Ｂは、制御端末接続回線１４０−Ａ、１４
０−Ｂにより、制御端末１５０へ接続され、計算機全体
１００の管理者はこの制御端末１５０を用いて、計算機
全体１００およびその中の各部の操作を行なう。しか
し、他の手段を用いて管理者が計算機の操作を行なう場
合や、管理者が計算機の操作を行なう手段が無い場合で
も、本発明は実施可能である。In this embodiment, two computers 110-
A, 110-B are control terminal connection lines 140-A, 14
0-B is connected to the control terminal 150, and the administrator of the entire computer 100 uses this control terminal 150 to operate the entire computer 100 and each unit therein. However, the present invention can be implemented even when the administrator operates the computer using other means, or when there is no means for the administrator to operate the computer.

【００１９】図２は各系の計算機１１０のハードウェア
構成を示している。FIG. 2 shows a hardware configuration of the computer 110 of each system.

【００２０】ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓ
ｉｎｇＵｎｉｔ）２００は局所メモリ２１０に格納さ
れているプログラムを実行するためのプロセッサであ
る。局所メモリ２１０の中には装置全体を制御するため
のＯＳ２１３、他系との間で状態更新情報の交換や障
害の検出を行なう状態更新情報交換ソフトウェア２１
５、計算機全体に本来の目的のサービスを行なわせるた
めに動作させる必要があるアプリケーション群２１７が
格納されている。共有メモリ２２０に書き込まれたデー
タは共有メモリ同期機構１２０によって他系の共有メモ
リへ自動的に書きこまれ、これによって両系の共有メモ
リ２２０には全く同一のデータが記憶される。この共有
メモリ２２０内には、状態更新情報を交換するためのリ
ングバッファが設けられる。系間通信コントローラ２３
０は各系の計算機１１０が互いに系間通信回線１３０を
用いて行う送受信を制御する。制御端末通信コントロー
ラ２４０は各系の計算機１１０が制御端末接続回線１４
０を用いて行う送受信を制御する。補助記憶装置コント
ローラ２５０は補助記憶装置２５５への入出力を制御す
る。図３は、局所メモリ２１０および共有メモリ２２０
に置かれる、プログラムおよびデータの構成を示してい
る。CPU (Central Process)
(ing Unit) 200 is a processor for executing a program stored in the local memory 210. The local memory 210 includes an OS 213 for controlling the entire apparatus, and state update information exchange software 21 for exchanging state update information and detecting a failure with another system.
5. Stores a group of applications 217 that need to be operated in order to provide the entire computer with the intended service. The data written to the shared memory 220 is automatically written to the shared memory of the other system by the shared memory synchronization mechanism 120, whereby exactly the same data is stored in the shared memory 220 of both systems. In this shared memory 220, a ring buffer for exchanging state update information is provided. Inter-system communication controller 23
0 controls transmission / reception performed by the computers 110 of the respective systems using the inter-system communication line 130. The control terminal communication controller 240 connects the computer 110 of each system to the control terminal connection line 14.
0 is used to control transmission and reception. The auxiliary storage device controller 250 controls input and output to and from the auxiliary storage device 255. FIG. 3 illustrates local memory 210 and shared memory 220.
Shows the structure of programs and data placed in the.

【００２１】局所メモリ２１０内には、ＯＳ２１３、
状態更新情報交換ソフトウェア２１５、アプリケーショ
ン群２１７といったソフトウェアが格納される。ここで
は、ＯＳ２１３およびアプリケーション群２１７に含
まれる各アプリケーションは、ソフトウェアの処理手順
を表すプログラム３３０、３７０、およびプログラムの
処理に応じて変化するソフトウェアの内部状態を表す状
態データ３３５、３７５で構成されるものとして考え
る。ＯＳのプログラム３３０には、状態更新情報交換ソ
フトウェア２１５内の定期的処理を呼び出すタイマ制御
部３３１、系間通信コントローラ２３０を直接操作して
他系との送受信を行なう系間通信制御部３３２、他系や
共有メモリ同期機構１２０の障害を発見した際に系切替
等の適切な処理を行なう障害復旧処理部３３３といった
部分プログラムが含まれている。なお、ＯＳ２１３、
アプリケーション群２１７いずれについても、運用系と
待機系では同一のソフトウェアを動作させる。ただし、
運用系で動作するＯＳ２１３やアプリケーション群２
１７は、計算機全体に本来の目的のサービスを行なわせ
るために必要な処理と、状態更新情報を状態更新情報交
換ソフトウェア２１５ヘ引き渡す処理を行う一方、待機
系で動作するＯＳ２１３やアプリケーション群２１７
は、状態更新情報交換ソフトウェア２１５から状態更新
情報を受け取って自らの状態データに反映させる処理
と、待機系として動作するための必要最低限の処理のみ
を行なう。In the local memory 210, the OS 213,
Software such as status update information exchange software 215 and application group 217 is stored. Here, each of the applications included in the OS 213 and the application group 217 includes programs 330 and 370 representing a software processing procedure, and state data 335 and 375 representing an internal state of the software which changes according to the processing of the program. Think as something. The OS program 330 includes a timer control unit 331 that calls a periodic process in the status update information exchange software 215, an inter-system communication control unit 332 that directly operates the inter-system communication controller 230 to perform transmission and reception with another system, and the like. A partial program such as a failure recovery processing unit 333 that performs appropriate processing such as system switching when a failure of the system or the shared memory synchronization mechanism 120 is found is included. OS 213,
Regarding any of the application groups 217, the same software is operated in the active system and the standby system. However,
OS 213 and application group 2 operating in the active system
Reference numeral 17 denotes a process necessary for causing the entire computer to perform the intended service and a process for transferring the status update information to the status update information exchange software 215, while the OS 213 and the application group 217 operating in the standby system.
Performs only the process of receiving the status update information from the status update information exchange software 215 and reflecting it on its own status data, and the minimum necessary process for operating as a standby system.

【００２２】状態更新情報交換ソフトウェア２１５は、
リングバッファ制御部３５１、更新情報処理部３５２、
他系障害監視部３５３の３つの部分ソフトウェアから構
成される。The status update information exchange software 215 includes:
A ring buffer control unit 351, an update information processing unit 352,
It is composed of three pieces of software of the other system fault monitoring unit 353.

【００２３】リングバッファ制御部３５１は、共有メモ
リ２２０に設けられたリングバッファ３２０に対する書
き込み（運用系のみが行なう）や読み出し（待機系のみ
が行なう）の処理を、リングバッファ３２０の構成要素
であるバッファメモリ領域３２５、先頭ポインタ３２
６、末尾ポインタ３２７を直接操作することによって行
なう。このリングバッファ制御部３５１は、先頭ポイン
タ３２６または末尾ポインタ３２７の値が他系で変更さ
れたかどうかを判断するために、ポインタ前回値３５６
とポインタ変更フラグ３５７の２つのデータ領域を局所
メモリ上に持つ。The ring buffer controller 351 is a component of the ring buffer 320 that performs writing (performed only by the active system) and reading (performed only by the standby system) on the ring buffer 320 provided in the shared memory 220. Buffer memory area 325, start pointer 32
6. This is performed by directly operating the end pointer 327. The ring buffer control unit 351 determines whether the value of the head pointer 326 or the end pointer 327 has been changed in another system by using the pointer previous value 356.
And a pointer change flag 357 in the local memory.

【００２４】更新情報処理部３５２は、ＯＳ２１３や
アプリケーション群２１７に含まれる各アプリケーショ
ンとの間で状態更新情報を送受信するためのソフトウェ
ア間接続を確立する。更新情報処理部３５２はこの接続
確立時にＯＳやアプリケーションからアプリＩＤを受け
取り、アプリＩＤと接続の対応表３５５を局所メモリ上
に作成する。そして、運用系で動作する更新情報処理部
３５２は、それらの接続を通してＯＳやアプリケーショ
ンから受信した状態更新情報に、接続に対応するアプリ
ＩＤを付加し、そのデータをリングバッファ制御部３５
１へ引き渡すことによってリングバッファ３２０へ書き
込む。待機系で動作する更新情報処理部３５２は、リン
グバッファ制御部３５１を用いてリングバッファ３２０
からアプリＩＤおよび状態更新情報を定期的に読み出
し、アプリＩＤに対応する接続を通して適切なＯＳやア
プリケーションへ状態更新情報を引き渡す。また、運用
系で動作する更新情報処理部３５２は、定期的にＮＯＰ
情報の書き込みをリングバッファ制御部３５１へ指示す
る。待機系で動作する更新情報処理部３５２は、読み出
した状態更新情報がＮＯＰ情報である場合、そのＮＯＰ
情報を破棄する。The update information processing unit 352 establishes a software connection for transmitting and receiving state update information to and from the OS 213 and each application included in the application group 217. The update information processing unit 352 receives the application ID from the OS or the application when the connection is established, and creates a correspondence table 355 between the application ID and the connection on the local memory. Then, the update information processing unit 352 operating in the active system adds the application ID corresponding to the connection to the status update information received from the OS or the application through the connection, and transfers the data to the ring buffer control unit 35.
1 to write to the ring buffer 320. The update information processing unit 352 operating in the standby system uses the ring buffer
, And periodically reads the application ID and the status update information, and delivers the status update information to an appropriate OS or application through a connection corresponding to the application ID. In addition, the update information processing unit 352 operating in the active system periodically
It instructs the ring buffer control unit 351 to write information. When the read status update information is NOP information, the update information processing unit 352 operating in the standby system
Discard the information.

【００２５】他系障害監視部３５３は、共有メモリ上の
先頭ポインタ３２６（運用系の場合）または末尾ポイン
タ３２７（待機系の場合）と、局所メモリ上のポインタ
前回値３５６とポインタ変更フラグ３５７を利用して、
他系または共有メモリ同期機構１２０に障害が起きてい
ないかどうかを定期的に監視する。この監視によって他
系または共有メモリ同期機構１２０に障害が起きている
ことが分かると、他系障害監視部３５３は系間通信を用
いて他系と共有メモリ同期機構１２０のうちいずれの障
害なのかを調べ、その結果をＯＳ２１３の障害復旧処
理部３３３へ通知する。The other system fault monitoring unit 353 stores the first pointer 326 (in the case of the active system) or the last pointer 327 (in the case of the standby system) on the shared memory, the previous pointer value 356 and the pointer change flag 357 on the local memory. Use
It periodically monitors whether a failure has occurred in the other system or the shared memory synchronization mechanism 120. When this monitoring shows that a failure has occurred in the other system or the shared memory synchronization mechanism 120, the other system failure monitoring unit 353 uses the inter-system communication to determine which of the other system and the shared memory synchronization mechanism 120 has a failure. Is checked, and the result is notified to the failure recovery processing unit 333 of the OS 213.

【００２６】図４は、共有メモリ２２０内のリングバッ
ファ３２０への読み書きに用いる、アプリＩＤと状態更
新情報、およびデータの長さから成る３つ組データ４０
０の形式を示している。FIG. 4 shows triad data 40 composed of an application ID, status update information, and data length used for reading / writing from / to a ring buffer 320 in the shared memory 220.
0 indicates the format.

【００２７】長さフィールド４１０は、長さフィールド
４１０、アプリＩＤフィールド４２０、状態更新情報フ
ィールド４３０全てのフィールド長の合計値が入る。こ
の値が長さフィールド４１０の長さと同じ場合、すなわ
ちアプリＩＤフィールド４２０と状態更新情報フィール
ド４３０が存在しない場合は、その３つ組データはＮＯ
Ｐ情報として扱われる。アプリＩＤフィールド４２０に
は、状態更新情報４３０の引渡し元および引渡し先のＯ
Ｓ・アプリケーションを表すアプリＩＤが入る。状態更
新情報フィールド４３０には、ＯＳ・アプリケーション
から渡された状態更新情報がそのまま入る。The length field 410 contains the sum of the lengths of all of the length field 410, the application ID field 420, and the status update information field 430. If this value is the same as the length of the length field 410, that is, if the application ID field 420 and the status update information field 430 do not exist, the triple data is NO
Handled as P information. In the application ID field 420, O of the delivery source and delivery destination of the status update information 430 are set.
S. An application ID representing the application is entered. In the status update information field 430, the status update information passed from the OS / application is directly entered.

【００２８】図５は本実施例で用いるリングバッファ３
２０のデータ構造を示している。共有メモリ２２０は、
先頭から1ワード毎に順にアドレス番号が割り振られ
た、連続した記憶領域であるとする（先頭に近いワード
ほど、アドレス番号は小さい）。リングバッファ３２０
のバッファメモリ領域３２５は、共有メモリ上に一定の
大きさだけ確保される。以降、そのバッファメモリ領域
３２５の先頭ワードのアドレス番号をＳＡ、末尾ワード
の次に位置するワードのアドレス番号をＥＡと表すこと
とする。先頭ポインタ３２６には、バッファメモリ領域
内のうちで実際にデータが記録されている領域の先頭に
位置するワードのアドレス番号がその値として格納さ
れ、末尾ポインタ３２７には、バッファメモリ領域内の
うちで実際にデータが記録されている領域の末尾の次に
位置するワード（すなわち、次回にバッファメモリ領域
へデータを書き込む際の書き込み開始ワード）のアドレ
ス番号がその値として格納される。以降、先頭ポインタ
３２６の値をＨＰ、末尾ポインタ３２７の値をＴＰと表
すこととする。ＨＰ、ＴＰともに、ＳＡからＥＡ−１ま
での値を採り得る。ＨＰ＜ＴＰである場合は、バッファ
メモリ領域内のアドレスＨＰに位置するワードからＴＰ
−１に位置するワードまでにデータが順に記録されてい
る。ＨＰ＞ＴＰである場合は、バッファメモリ領域内の
アドレスＨＰに位置するワードからＥＡ−１に位置する
ワードまでと、ＳＡに位置するワードからＴＰ−１に位
置するワードまでにデータが順に記録されている。バッ
ファが満杯のときは、ＨＰ＝ＳＡかつＴＰ＝ＥＡ−１、
またはＨＰ＝ＴＰ＋１となる。バッファが空のときは、
ＨＰ＝ＴＰとなる。ＨＰ、ＴＰの初期値は、ともにＳＡ
である。FIG. 5 shows a ring buffer 3 used in this embodiment.
20 shows the data structure of the twentieth data. The shared memory 220
It is assumed that the storage area is a continuous storage area in which an address number is sequentially assigned to each word from the head (a word closer to the head has a smaller address number). Ring buffer 320
Buffer memory area 325 is secured in the shared memory by a certain size. Hereinafter, the address number of the first word of the buffer memory area 325 is represented by SA, and the address number of the word located next to the last word is represented by EA. The head pointer 326 stores, as its value, the address number of the word located at the head of the area where data is actually recorded in the buffer memory area, and the end pointer 327 stores the address number of the word in the buffer memory area. Then, the address number of the word located next to the end of the area where the data is actually recorded (that is, the write start word when writing data to the buffer memory area next time) is stored as the value. Hereinafter, the value of the start pointer 326 is represented as HP, and the value of the end pointer 327 is represented as TP. Both HP and TP can take values from SA to EA-1. If HP <TP, the word TP from the word located at address HP in the buffer memory area
Data is recorded in order up to the word located at -1. If HP> TP, data is sequentially recorded from the word located at address HP to the word located at EA-1 in the buffer memory area, and from the word located at SA to the word located at TP-1. ing. When the buffer is full, HP = SA and TP = EA-1,
Or, HP = TP + 1. If the buffer is empty,
HP = TP. The initial values of HP and TP are both SA
It is.

【００２９】なお、本発明は、リングバッファ３２０が
上記の通りの構造ではなくても、書き込み時に末尾ポイ
ンタの値が、読み込み時に先頭ポインタの値が変更され
るリングバッファでさえあれば、実施可能である。The present invention can be implemented even if the ring buffer 320 does not have the structure described above, as long as the value of the end pointer is changed at the time of writing and the value of the start pointer is changed at the time of reading. It is.

【００３０】図６は、運用系のリングバッファ制御部３
５１がリングバッファ３２０へ３つ組データ４００を書
き込む際に行なう、リングバッファ書き込み処理６００
のフローを示す。FIG. 6 shows the ring buffer controller 3 of the active system.
A ring buffer write processing 600 performed by the CPU 51 when writing the triple data 400 to the ring buffer 320;
The flow of is shown.

【００３１】リングバッファ書き込み処理６００は、そ
の引数として書き込む３つ組データ４００を呼び出し元
から受け取り、返値として書き込みの際にバッファオー
バーフロー（引数のデータを書き込むための、バッファ
メモリ領域３２５内の空き領域が不足していることを表
すエラー）が発生したかどうかを表す値を呼び出し元へ
返す、関数型の手続きとして実現することとする。な
お、このリングバッファ書き込み処理６００は運用系で
のみ呼び出される。以降、引数の３つ組データ４００に
含まれる長さフィールド４１０の値を、ＤＳと表記す
る。The ring buffer write processing 600 receives the triplet data 400 to be written as an argument from the caller and returns a buffer overflow at the time of writing (free space in the buffer memory area 325 for writing argument data). This is implemented as a function-type procedure that returns to the caller a value indicating whether or not an error indicating that the area is insufficient is generated. The ring buffer write processing 600 is called only in the active system. Hereinafter, the value of the length field 410 included in the triple data 400 of the argument is referred to as DS.

【００３２】呼び出されたリングバッファ書き込み処理
６００は、最初に、共有メモリ上にある先頭ポインタ３
２６の値を読み出す（ステップ６０５）。以降、この読
み出した値をＨＰとして扱う。続いて、このＨＰとポイ
ンタ前回値３５６とを比較し（ステップ６１０）、値が
異なればポインタ変更フラグ３５７の値を１に変更する
（ステップ６１５）。The called ring buffer write processing 600 firstly starts with the start pointer 3 in the shared memory.
The value of 26 is read (step 605). Hereinafter, the read value is treated as HP. Subsequently, the HP is compared with the previous value 356 of the pointer (step 610). If the values are different, the value of the pointer change flag 357 is changed to 1 (step 615).

【００３３】次に、共有メモリ上にある末尾ポインタ３
２７の値ＴＰと、ＤＳの和が、ＥＡよりも小さいかどう
かを調べる（ステップ６２０）。その結果、ＴＰ＋ＤＳ
がＥＡよりも小さければ、バッファメモリ領域３２５内
のＴＰが指すワードからＤＳワード連続して引数のデー
タを書き込む処理である、ステップ６４５からステップ
６５５までの処理を行なう。ＴＰ＋ＤＳがＥＡ以上であ
れば、バッファメモリ領域３２５内のＴＰが指すワード
からバッファメモリ領域３２５の末尾まで引数のデータ
を書きこみ、残ったデータをバッファメモリ領域３２５
の先頭から書き込む処理である、ステップ６２５からス
テップ６４０までの処理を行なう。Next, the end pointer 3 on the shared memory
It is checked whether the sum of the value TP of 27 and DS is smaller than EA (step 620). As a result, TP + DS
Is smaller than EA, the process from step 645 to step 655, which is the process of writing the data of the argument continuously from the word pointed to by the TP in the buffer memory area 325 to the DS word, is performed. If TP + DS is equal to or larger than EA, argument data is written from the word pointed to by TP in the buffer memory area 325 to the end of the buffer memory area 325, and the remaining data is written to the buffer memory area 325.
The processing from step 625 to step 640, which is the processing of writing from the beginning, is performed.

【００３４】ステップ６２０でＴＰ＋ＤＳ≧ＥＡであれ
ば、まず、ＳＡ＋ＤＳ−（ＥＡ−ＴＰ）＜ＨＰ≦ＴＰを
満たすかどうかを調べることにより、引数のデータを書
きこんだ結果バッファオーバーフローにならないかどう
かのチェックを行なう（ステップ６２５）。この不等式
を満たさなければ、バッファオーバーフローを表す返値
を呼び出し元へ返して、リングバッファ書き込み処理６
００を終了する。この不等式を満たせば、引数データの
うち先頭のＥＡ−ＴＰワードだけをバッファメモリ領域
３２５内のＴＰが指すワード以降に書き込み（ステップ
６３０）、引数データのうちの残りＤＳ−（ＥＡ−Ｔ
Ｐ）ワードをバッファメモリ領域３２５の先頭から書き
込み（ステップ６３５）、末尾ポインタ３２７の値をＳ
Ａ＋ＤＳ−（ＥＡ−ＴＰ）へ変更し（ステップ６４
０）、正常に書き込みが終わったことを表す返値を呼び
出し元へ返して、リングバッファ書き込み処理６００を
終了する。If TP + DS ≧ EA at step 620, first, it is checked whether SA + DS− (EA−TP) <HP ≦ TP to determine whether buffer overflow occurs as a result of writing the argument data. A check is performed (step 625). If this inequality is not satisfied, a return value indicating a buffer overflow is returned to the caller, and the ring buffer write processing 6
End 00. If this inequality is satisfied, only the first EA-TP word of the argument data is written after the word pointed to by the TP in the buffer memory area 325 (step 630), and the remaining DS- (EA-T) of the argument data is written.
P) Write a word from the beginning of the buffer memory area 325 (step 635), and set the value of the end pointer 327 to S
Change to A + DS- (EA-TP) (step 64)
0), a return value indicating that the writing has been normally completed is returned to the caller, and the ring buffer writing process 600 ends.

【００３５】ステップ６２０でＴＰ＋ＤＳ＜ＥＡであれ
ば、まず、ＴＰ＜ＨＰ≦ＴＰ＋ＤＳを満たさないかどう
かを調べることにより、引数のデータを書きこんだ結果
バッファオーバーフローにならないかどうかのチェック
を行なう（ステップ６４５）。この不等式を満たせば、
バッファオーバーフローを表す返値を呼び出し元へ返し
て、リングバッファ書き込み処理６００を終了する。こ
の不等式を満たさなければ、引数データ全てをバッファ
メモリ領域３２５内のＴＰが指すワード以降に書き込み
（ステップ６５０）、末尾ポインタ３２７の値をＴＰ＋
ＤＳへ変更し（ステップ６５５）、正常に書き込みが終
わったことを表す返値を呼び出し元へ返して、リングバ
ッファ書き込み処理６００を終了する。If TP + DS <EA in step 620, first, it is checked whether or not TP <HP ≦ TP + DS is satisfied, thereby checking whether or not buffer data overflow does not occur as a result of writing argument data (step). 645). If this inequality is satisfied,
The return value indicating the buffer overflow is returned to the caller, and the ring buffer write processing 600 ends. If this inequality is not satisfied, all the argument data is written to the word after the word pointed to by TP in the buffer memory area 325 (step 650), and the value of the tail pointer 327 is written to TP +
DS (step 655), returns a return value indicating that the writing has been completed normally to the caller, and ends the ring buffer writing process 600.

【００３６】図７は、待機系のリングバッファ制御部３
５１がリングバッファ３２０から３つ組データ４００を
読み出す際に行なう、リングバッファ読み出し処理７０
０のフローを示す。FIG. 7 shows the ring buffer controller 3 of the standby system.
A ring buffer read process 70 performed by the device 51 when reading the triple data 400 from the ring buffer 320
0 is shown.

【００３７】リングバッファ読み出し処理７００は、引
数を必要としないが、返値として読み出した３つ組デー
タ４００を呼び出し元へ返す、関数型の手続きとして実
現することとする。ただし、リングバッファが空である
ことを表す特殊な値を返値として呼び出し元へ返す場合
もある。なお、このリングバッファ読み出し処理７００
は待機系でのみ呼び出される。呼び出されたリングバッ
ファ読み出し処理７００は、最初に、共有メモリ上にあ
る末尾ポインタ３２７の値を読み出す（ステップ７０
５）。以降、この読み出した値をＴＰとして扱う。続い
て、このＴＰとポインタ前回値３５６とを比較し（ステ
ップ７１０）、値が異なればポインタ変更フラグ３５７
の値を１に変更する（ステップ７１５）。The ring buffer read processing 700 does not require an argument, but is implemented as a function-type procedure that returns the triple data 400 read as a return value to the caller. However, a special value indicating that the ring buffer is empty may be returned to the caller as a return value. The ring buffer read processing 700
Is called only in the standby system. The called ring buffer read processing 700 first reads the value of the tail pointer 327 on the shared memory (step 70).
5). Hereinafter, the read value is treated as TP. Subsequently, the TP is compared with the previous value 356 of the pointer (step 710).
Is changed to 1 (step 715).

【００３８】次に、共有メモリ上にある先頭ポインタ３
２６の値ＨＰが、ＴＰと等しいかどうかを調べる（ステ
ップ７２０）。その結果、ＨＰとＴＰが等しければ、リ
ングバッファが空であることを表す返値を呼び出し元へ
返して、リングバッファ読み出し処理７００を終了す
る。ＨＰとＴＰが等しくなければ、バッファメモリ領域
３２５内のＨＰが指すワードから長さフィールド４１０
の長さだけデータを読み出し、その値をＤＳとして扱う
（ステップ７２５）。Next, the start pointer 3 in the shared memory
It is checked whether the value HP of 26 is equal to TP (step 720). As a result, if the HP and the TP are equal, a return value indicating that the ring buffer is empty is returned to the caller, and the ring buffer read processing 700 ends. If HP and TP are not equal, the word to length field 410 in the buffer memory area 325 points to the HP.
Is read out and the value is treated as DS (step 725).

【００３９】続いて、ＨＰ＋ＤＳがＥＡより小さいかど
うかを調べる（ステップ７３０）。ＨＰ＋ＤＳ≧ＥＡで
あれば、バッファメモリ領域３２５内のＨＰが指すワー
ド以降ＥＡ−ＨＰワードを読み出し（ステップ７３
５）、バッファメモリ領域３２５の先頭から残りＤＳ−
（ＥＡ−ＨＰ）ワードを読み出し（ステップ７４０）、
先頭ポインタ３２６の値をＳＡ＋ＤＳ−（ＥＡ−ＨＰ）
へ変更し（ステップ７４５）、ステップ７３５で読み出
したデータとステップ７４０で読み出したデータを順に
つなげたデータを返値として呼び出し元へ返して、リン
グバッファ読み出し処理７００を終了する。ステップ７
３０にてＨＰ＋ＤＳ＜ＥＡであれば、バッファメモリ領
域３２５内のＨＰが指すワード以降ＤＳワードを読み出
し（ステップ７５０）、先頭ポインタ３２６の値をＨＰ
＋ＤＳへ変更し（ステップ７５５）、ステップ７５０で
読み出したデータを返値として呼び出し元へ返して、リ
ングバッファ読み出し処理７００を終了する。Subsequently, it is checked whether HP + DS is smaller than EA (step 730). If HP + DS ≧ EA, read the EA-HP word after the word indicated by the HP in the buffer memory area 325 (step 73).
5), the remaining DS-from the beginning of the buffer memory area 325
Read (EA-HP) word (step 740),
Set the value of the start pointer 326 to SA + DS- (EA-HP)
(Step 745), and the data read in step 735 and the data read in step 740 are sequentially returned to the caller as a return value, and the ring buffer read processing 700 ends. Step 7
If HP + DS <EA at 30, a DS word is read after the word pointed to by HP in the buffer memory area 325 (step 750), and the value of the head pointer 326 is set to HP.
+ DS (step 755), returns the data read in step 750 as a return value to the caller, and ends the ring buffer read processing 700.

【００４０】図８は、運用系の更新情報処理部３５２が
ＯＳ２１３やアプリケーション群２１７から状態更新
情報を受け取ったとき、その状態更新情報にアプリＩＤ
やデータの長さを付加した３つ組データ４００をリング
バッファ３２０へ書き込む。状態更新情報書き込み処理
８００のフローを示す。FIG. 8 shows that when the active update information processing unit 352 receives state update information from the OS 213 or the application group 217, the application ID is included in the state update information.
And the triple data 400 to which the data length is added is written to the ring buffer 320. 9 shows a flow of a status update information writing process 800.

【００４１】ＯＳやアプリケーションから状態更新情報
を受け取った更新情報処理部３５２は、アプリＩＤと接
続の対応表３５５を用いて、状態更新情報を受け取った
接続からアプリＩＤを決定し、そのアプリＩＤと状態更
新情報、およびその長さを元に、３つ組データ４００を
構成する（ステップ８０５）。そして、その３つ組デー
タを引数としてリングバッファ書き込み処理６００を呼
び出すことにより、リングバッファへ状態更新情報を含
む３つ組データを書き込む（ステップ８１０）。次に、
ステップ８１０で呼び出したリングバッファ書き込み処
理６００の返値より、バッファオーバーフローが発生し
たかどうかを判定する（ステップ８１５）。その結果、
バッファオーバーフローが発生したことが判明した場合
は、適切なオーバーフロー処理を行い（ステップ８２
０）、状態更新情報書き込み処理８００を終了する。バ
ッファオーバーフローが発生しなかった場合は、ＮＯＰ
情報書き込み処理９００を定期的に実行するためのＮＯ
Ｐ情報書き込みタイマを、そのタイマの初期値へ戻し
（ステップ８２５）、状態更新情報書き込み処理８００
を終了する。The update information processing unit 352, which has received the status update information from the OS or the application, determines the application ID from the connection that has received the status update information using the correspondence table 355 between the application ID and the connection. The triple data 400 is constructed based on the status update information and its length (step 805). Then, by calling the ring buffer write processing 600 using the triple data as an argument, the triple data including the state update information is written to the ring buffer (step 810). next,
It is determined whether a buffer overflow has occurred based on the return value of the ring buffer write processing 600 called in step 810 (step 815). as a result,
If it is determined that a buffer overflow has occurred, an appropriate overflow process is performed (step 82).
0), the state update information writing processing 800 ends. If no buffer overflow occurs, NOP
NO for periodically executing the information writing process 900
The P information writing timer is returned to the initial value of the timer (step 825), and the state update information writing processing 800
To end.

【００４２】なお、ステップ８２０のオーバーフロー処
理の具体的な内容は、運用系に期待される動作によって
異なるものになるため、本実施例では詳細を述べない
が、ここでは主な選択肢として３つの処理方法の例を挙
げる。一つ目は、リングバッファ３２０へ書き込み可能
になるまで状態更新情報交換ソフトウェア２１５内で待
ち、書き込みが終わるまで元のＯＳやアプリケーション
へ処理を戻さない方法である。この方法では、両系の状
態同期は確実に行なえるが、運用系が本来行うべき処理
が停止してしまう恐れがある。二つ目は、リングバッフ
ァ３２０への書き込みが失敗した旨を元のＯＳやアプリ
ケーションへ知らせ、待機系へ状態更新を知らせなくて
も行える処理を先に行なわせる方法である。この方法で
は、両系の状態同期を確実に行なえる上、運用系が本来
行うべき処理が停止する可能性も低くなるが、処理停止
を完全に防げるわけではなく、また、ＯＳヤアプリケー
ションのプログラムが複雑になる。三つ目は、リングバ
ッファ３２０への書き込みが失敗した状態更新情報を局
所メモリ２１０へ一時退避して元のＯＳやアプリケーシ
ョンへ処理を戻し、リングバッファ３２０に空きができ
たかどうかを状態更新情報交換ソフトウェア２１５が定
期的に監視し、空きができたら退避した状態更新情報を
リングバッファ３２０へ書き込む、という方法である。
この方法では、運用系が本来行うべき処理が停止する可
能性がない一方、運用系に障害が発生した場合に、局所
メモリ２１０へ退避していた状態更新情報が、待機系へ
反映されないまま消滅する可能性がある。以上の３つの
方法はいずれも一長一短があるため、いずれの方法を採
るとしても、可能な限りバッファオーバーフローが発生
しないように、リングバッファ３２０のバッファメモリ
領域３２５の容量を大きめに確保することが望ましい。Since the specific contents of the overflow process in step 820 differ depending on the operation expected of the active system, the details will not be described in the present embodiment. Here is an example of the method. The first method is to wait in the state update information exchange software 215 until writing to the ring buffer 320 is possible, and do not return to the original OS or application until the writing is completed. According to this method, the status synchronization between the two systems can be reliably performed, but there is a risk that the processing that should be performed by the active system is stopped. The second method is to notify the original OS or application that writing to the ring buffer 320 has failed, and to perform processing that can be performed without notifying the standby system of the status update. With this method, the status of both systems can be reliably synchronized, and the possibility that the processing that should be performed by the active system should be stopped is reduced. However, this does not completely prevent the processing from being stopped. Becomes complicated. Third, state update information for which writing to the ring buffer 320 has failed is temporarily saved to the local memory 210, processing is returned to the original OS or application, and state update information exchange is performed to determine whether the ring buffer 320 is free. This is a method in which the software 215 periodically monitors, and writes the saved status update information to the ring buffer 320 when a space is available.
In this method, while there is no possibility that the processing that should be performed by the active system is stopped, when a failure occurs in the active system, the status update information saved in the local memory 210 disappears without being reflected in the standby system. there's a possibility that. Since each of the above three methods has advantages and disadvantages, it is desirable to secure a relatively large capacity of the buffer memory area 325 of the ring buffer 320 so that buffer overflow does not occur as much as possible. .

【００４３】図９は、運用系の更新情報処理部３５２が
ＮＯＰ情報をリングバッファ３２０へ書き込む。ＮＯＰ
情報書き込み処理９００のフローを示す。FIG. 9 shows that the active update information processing unit 352 writes NOP information to the ring buffer 320. NOP
9 shows a flow of an information writing process 900.

【００４４】ＮＯＰ情報書き込み処理９００は、ＮＯＰ
情報書き込みタイマにより定期的に実行される。The NOP information writing process 900 is executed by the NOP
It is periodically executed by the information writing timer.

【００４５】ＮＯＰ情報書き込みタイマが切れたことに
よって呼び出された更新情報処理部３５２は、ＮＯＰ情
報を引数としてリングバッファ書き込み処理６００を呼
び出すことにより、リングバッファへＮＯＰ情報を書き
込む（ステップ９０５）。そして、ステップ９０５で呼
び出されたリングバッファ書き込み処理６００の返値よ
り、バッファオーバーフローが発生したかどうかを判定
し（ステップ９１０）、バッファオーバーフローが発生
しなかった場合は、ＮＯＰ情報書き込みタイマをそのタ
イマの初期値へ戻し（ステップ９１５）、ＮＯＰ情報書
き込み処理９００を終了する。The update information processing unit 352 called when the NOP information writing timer expires calls the ring buffer writing process 600 with the NOP information as an argument, and writes the NOP information into the ring buffer (step 905). Then, it is determined whether a buffer overflow has occurred based on the return value of the ring buffer write processing 600 called in step 905 (step 910). If no buffer overflow has occurred, the NOP information writing timer is set to the timer. (Step 915), and the NOP information writing process 900 ends.

【００４６】一方、ステップ９１０でバッファオーバー
フローが発生したことが判明した場合は、待機系に異常
が発生したのかどうかを調べるために、系間通信による
障害確認処理１３００を呼び出し（ステップ９２０）、
その結果をもとに待機系に異常が発生したかどうかを判
定する（ステップ９２５）。そして、待機系に異常が発
生したことが判明したら、待機系に異常が発生したこと
をＯＳ２１３の障害復旧処理部３３３へ通知し（ステ
ップ９３０）、状態更新情報交換ソフトウェア２１５の
動作を全て停止する。待機系に異常がない場合は、共有
メモリ同期機構１２０に異常が発生したかどうかをステ
ップ９２０の結果をもとに判定する（ステップ９３
５）。そして、共有メモリ同期機構１２０に異常が発生
したことが判明したら、共有メモリ同期機構１２０に異
常が発生したことをＯＳ２１３の障害復旧処理部３３
３へ通知し（ステップ９４０）、状態更新情報交換ソフ
トウェア２１５の動作を全て停止する。待機系にも共有
メモリ同期機構１２０にも異常がない場合は、単に待機
系のリングバッファ処理が間に合っていないだけと考え
られるので、ＮＯＰ情報書き込みタイマを再設定して
（ステップ９４５）、ＮＯＰ情報書き込み処理９００を
終了する。ただしこの場合は、リングバッファに空きが
出来次第ＮＯＰ情報を書き込むことが望ましいので、ス
テップ９４５で設定するＮＯＰ情報書きこみタイマの長
さは、通常のＮＯＰ情報書き込みタイマの初期値よりも
短い値にするべきである。On the other hand, if it is determined in step 910 that a buffer overflow has occurred, a fault confirmation process 1300 by inter-system communication is called to check whether an error has occurred in the standby system (step 920).
It is determined whether an abnormality has occurred in the standby system based on the result (step 925). Then, when it is determined that an abnormality has occurred in the standby system, the failure recovery processing unit 333 of the OS 213 is notified that the abnormality has occurred in the standby system (step 930), and all operations of the status update information exchange software 215 are stopped. I do. If there is no abnormality in the standby system, it is determined whether an abnormality has occurred in the shared memory synchronization mechanism 120 based on the result of step 920 (step 93).
5). Then, when it is determined that an error has occurred in the shared memory synchronization mechanism 120, the failure recovery processing unit 33 of the OS 213 reports that an error has occurred in the shared memory synchronization mechanism 120.
3 (step 940), and stops all operations of the status update information exchange software 215. If there is no abnormality in both the standby system and the shared memory synchronization mechanism 120, it is considered that the ring buffer processing of the standby system is simply not in time, so the NOP information writing timer is reset (step 945), and the NOP information is reset. The writing process 900 ends. However, in this case, it is desirable to write the NOP information as soon as the ring buffer becomes free. Therefore, the length of the NOP information write timer set in step 945 is set to a value shorter than the initial value of the normal NOP information write timer. Should be.

【００４７】図１０は、待機系の更新情報処理部３５２
が、リングバッファ３２０から３つ組データ４００を読
み出し、その３つ組データに含まれるアプリＩＤに対応
するＯＳまたはアプリケーションへ、その３つ組データ
に含まれる状態更新情報を引き渡す、状態更新情報読み
出し処理１０００のフローを示す。FIG. 10 shows the update information processing section 352 of the standby system.
Reads the triplet data 400 from the ring buffer 320 and delivers the status update information included in the triplet data to the OS or application corresponding to the application ID included in the triplet data. 10 shows a flow of a process 1000.

【００４８】本実施例では、状態更新情報読み出し処理
１０００はリングバッファ読み出しタイマ（ＮＯＰ情報
書き込みタイマ以下の長さ）により定期的に実行される
こととするが、末尾ポインタ３２７へ書き込みが起こっ
たときに待機系へ割込みを発生させるハードウェアを共
有メモリ２２０に設け、その割り込みが発生したときに
状態更新情報読み出し処理１０００を実行するようにし
ても、本発明を実施可能である。In this embodiment, the state update information read processing 1000 is periodically executed by a ring buffer read timer (length equal to or less than the NOP information write timer), but when the write to the tail pointer 327 occurs. The present invention can also be implemented by providing hardware for generating an interrupt to the standby system in the shared memory 220 and executing the status update information reading processing 1000 when the interrupt occurs.

【００４９】リングバッファ読み出しタイマが切れたこ
とによって呼び出された更新情報処理部３５２は、リン
グバッファ読み出し処理７００を呼び出すことにより、
リングバッファから３つ組データ４００を読み出す（ス
テップ１００５）。そして、ステップ１００５で呼び出
されたリングバッファ読み出し処理７００の返値より、
リングバッファが空かどうかを判定し（ステップ１０１
０）、空であればリングバッファ読み出しタイマをその
タイマの初期値へ戻し（ステップ１０１５）、状態更新
情報読み出し処理１０００を終了する。The update information processing unit 352 called when the ring buffer read timer expires calls the ring buffer read processing 700 to
The triad data 400 is read from the ring buffer (step 1005). Then, from the return value of the ring buffer read processing 700 called in step 1005,
It is determined whether the ring buffer is empty (step 101).
0), if it is empty, the ring buffer read timer is returned to the initial value of the timer (step 1015), and the state update information read processing 1000 ends.

【００５０】ステップ１００５で呼び出されたリングバ
ッファ読み出し処理７００の返値が、バッファが空であ
ることを示す値でなければ、その読み出したデータがＮ
ＯＰ情報であるのかどうかを調べ（ステップ１０２
０）、ＮＯＰ情報であれば再びステップ１００５へ処理
を戻す。ＮＯＰ情報でない場合は、読み出した３つ組デ
ータに含まれるアプリＩＤと、アプリＩＤと接続の対応
表３５５とを用いて状態更新情報の受け渡しに用いる接
続を決め、その接続を用いて、その３つ組データに含ま
れる状態更新情報をそのアプリＩＤに対応するＯＳまた
はアプリケーションへ引き渡す（ステップ１０２５）。
そして、再びステップ１００５へ処理を戻す。If the return value of the ring buffer read processing 700 called in step 1005 is not a value indicating that the buffer is empty, the read data is N
It is checked whether the information is OP information (step 102).
0), if it is NOP information, the process returns to step 1005 again. If the information is not NOP information, a connection to be used for transfer of status update information is determined using the application ID included in the read triplet data and the application ID / connection correspondence table 355, and the connection is used to determine the connection. The status update information included in the tuple data is delivered to the OS or application corresponding to the application ID (step 1025).
Then, the process returns to step 1005 again.

【００５１】図１１は、他系障害監視部３５３が、他系
によって更新されているポインタの値を監視することに
よって他系または共有メモリ同期機構１２０に障害が発
生していないかどうか監視し、それらのいずれかに障害
が発生していることが判明したら、系間通信を用いてそ
れらのうちいずれの障害なのかを確認する、他系障害監
視処理１１００のフローを示す。FIG. 11 shows that the other system fault monitoring unit 353 monitors whether or not a fault has occurred in the other system or the shared memory synchronization mechanism 120 by monitoring the value of the pointer updated by the other system. If it is determined that a failure has occurred in any of them, the flow of the other system failure monitoring processing 1100 for confirming which of the failures it is by using inter-system communication is shown.

【００５２】他系障害監視処理１１００は、障害監視タ
イマ（ＮＯＰ情報書き込みタイマ以上の長さ）により定
期的に実行される。他系障害監視処理１１００は運用
系、待機系の両方で実行できる（他系の障害を監視しな
くても良いのなら、実行しなくても良い）が、運用系で
実行する場合は先頭ポインタ３２６を、待機系で実行す
る場合は末尾ポインタ３２７を、監視対象のポインタと
する。以降、この監視対象のポインタの値をＰと表す。
なお、状態更新情報交換ソフトウェア２１５の実行開始
時には、ポインタ前回値３５６はＰの初期値（＝Ｓ
Ａ）、ポインタ変更フラグ３５７は０に初期化されてい
ることとする。The other-system fault monitoring process 1100 is periodically executed by a fault monitoring timer (length longer than the NOP information writing timer). The other system fault monitoring process 1100 can be executed by both the active system and the standby system (if it is not necessary to monitor the fault of the other system, it is not necessary to execute the process). When 326 is executed in the standby system, the tail pointer 327 is set as a monitoring target pointer. Hereinafter, the value of the pointer to be monitored is represented by P.
When the execution of the state update information exchange software 215 is started, the previous value 356 of the pointer is set to the initial value of P (= S
A) It is assumed that the pointer change flag 357 has been initialized to 0.

【００５３】障害監視タイマが切れたことによって呼び
出された他系障害監視部３５３は、ポインタ変更フラグ
３５７が１かどうかを調べる（ステップ１１０５）。１
であれば、以前にポインタ変更フラグ３５７を０に戻し
た後でＰが変更されたことを意味しているので、障害監
視タイマとポインタ前回値３５６とポインタ変更フラグ
３５７を初期化する処理である、ステップ１１１５から
ステップ１１２０までの処理を行なう。The other system fault monitoring unit 353 called by the expiration of the fault monitoring timer checks whether or not the pointer change flag 357 is 1 (step 1105). 1
If this is the case, it means that P has been changed after the pointer change flag 357 was previously returned to 0, so that the failure monitoring timer, the previous pointer value 356, and the pointer change flag 357 are initialized. , The processing from step 1115 to step 1120 is performed.

【００５４】ポインタ変更フラグ３５７が０である場合
には、以前にポインタ変更フラグ３５７を０に戻してか
ら、Ｐが全く変更されていない場合と、自系がリングバ
ッファ書き込み処理６００やリングバッファ読み出し処
理７００を全く行なっていない場合の２通りが考えられ
る。これらの場合を区別するために、Ｐとポインタ前回
値３５６が等しいかどうかを調べる（ステップ１１１
０）。値が異なれば、ポインタ変更フラグ３５７が１の
場合と同様に、ステップ１１１５からステップ１１２０
までの処理を行なう。なお、ポインタ前回値３５６にＰ
を設定してから、自系がリングバッファ書き込み処理６
００やリングバッファ読み出し処理７００を一度でも行
なうと、Ｐがリングバッファを１周してポインタ前回値
３５６と同じ値に戻る可能性があるので、ポインタ前回
値３５６とＰの比較だけではＰが変更されたかどうかを
判断できない。このことから、ステップ１１０５、ステ
ップ１１１０はいずれも省略できない。When the pointer change flag 357 is 0, the pointer change flag 357 is returned to 0 before, and when P is not changed at all, the own system performs the ring buffer write processing 600 or the ring buffer read processing. There are two cases in which the process 700 is not performed at all. In order to distinguish these cases, it is checked whether P and the previous value 356 of the pointer are equal (step 111).
0). If the values are different, steps 1115 to 1120 are performed as in the case where the pointer change flag 357 is 1.
The processing up to is performed. In addition, P is added to the previous value 356 of the pointer.
Is set, and the local system performs ring buffer write processing 6
00 or even once the ring buffer read processing 700 is performed, P may make one round of the ring buffer and return to the same value as the previous pointer value 356. Therefore, P is changed only by comparing the previous pointer value 356 with P. I can't tell if it was done. Therefore, neither Step 1105 nor Step 1110 can be omitted.

【００５５】ステップ１１０５およびステップ１１１０
によってＰが変更されたことが分かると、ポインタ前回
値３５６をＰに、ポインタ変更フラグ３５７を０に設定
し（ステップ１１１５）、障害監視タイマをそのタイマ
の初期値へ戻し（ステップ１１２０）、他系障害監視処
理１１００を終了する。Step 1105 and Step 1110
When P is found to have changed, the previous pointer value 356 is set to P, the pointer change flag 357 is set to 0 (step 1115), and the failure monitoring timer is returned to the initial value of the timer (step 1120). The system failure monitoring processing 1100 ends.

【００５６】一方、ステップ１１０５およびステップ１
１１０によってＰが変更されていないことが分かると、
他系に異常が発生したのかどうかを調べるために、系間
通信による障害確認処理１３００を呼び出し（ステップ
１１２５）、その結果をもとに他系に異常が発生したか
どうかを判定する（ステップ１１３０）。そして、他系
に異常が発生したことが判明したら、他系に異常が発生
したことをＯＳ２１３の障害復旧処理部３３３へ通知
し（ステップ１１３５）、状態更新情報交換ソフトウェ
ア２１５の動作を全て停止する。他系に異常がない場合
は、共有メモリ同期機構１２０に異常が発生したかどう
かをステップ１１２５の結果をもとに判定する（ステッ
プ１１４０）。そして、共有メモリ同期機構１２０に異
常が発生したことが判明したら、共有メモリ同期機構１
２０に異常が発生したことをＯＳ２１３の障害復旧処理
部３３３へ通知し（ステップ１１４５）、状態更新情報
交換ソフトウェア２１５の動作を全て停止する。他系に
も共有メモリ同期機構１２０にも異常がない場合は、単
に他系のリングバッファ処理が間に合っていないだけと
考えられるので、障害監視タイマを再設定して（ステッ
プ１１５０）、他系障害監視処理１１００を終了する。
ただしこの場合は、他系がリングバッファ処理を実行し
次第障害でないことを検知できることが望ましいので、
ステップ１１５０で設定する障害監視タイマの長さは、
通常の障害監視タイマの初期値よりも短い値にするべき
である。On the other hand, step 1105 and step 1
When 110 finds that P has not been changed,
In order to check whether an abnormality has occurred in the other system, the failure confirmation processing 1300 by inter-system communication is called (step 1125), and based on the result, it is determined whether or not an abnormality has occurred in the other system (step 1130). ). Then, when it is determined that an abnormality has occurred in the other system, the failure recovery processing unit 333 of the OS 213 is notified that an abnormality has occurred in the other system (step 1135), and all operations of the status update information exchange software 215 are stopped. I do. If there is no abnormality in the other system, it is determined whether an abnormality has occurred in the shared memory synchronization mechanism 120 based on the result of step 1125 (step 1140). If it is determined that an error has occurred in the shared memory synchronization mechanism 120, the shared memory synchronization mechanism 1
20 is notified to the failure recovery processing unit 333 of the OS 213 (step 1145), and all the operations of the status update information exchange software 215 are stopped. If neither the other system nor the shared memory synchronization mechanism 120 has an abnormality, it is considered that the ring buffer processing of the other system is simply not in time. Therefore, the failure monitoring timer is reset (step 1150), and the other system failure is detected. The monitoring processing 1100 ends.
However, in this case, it is desirable to be able to detect that there is no failure as soon as the other system executes the ring buffer processing.
The length of the fault monitoring timer set in step 1150 is
The value should be shorter than the initial value of the normal fault monitoring timer.

【００５７】図１２は、系間通信による障害確認処理１
３００で用いる、系間通信メッセージの形式を示してい
る。FIG. 12 shows a failure confirmation process 1 by inter-system communication.
The format of the inter-system communication message used in 300 is shown.

【００５８】ヘッダ１２１０は、メッセージ長等を含む
メッセージヘッダである。シーケンス番号１２２０は系
間通信回線１３０を流れる各メッセージを一意に区別す
るための番号で、今までに流れたメッセージのシーケン
ス番号１２２０と可能な限り異なる値が用いられる。た
だし、要求メッセージとそのメッセージに対する返答メ
ッセージでは、シーケンス番号１２２０を同一にする。
メッセージタイプ１２３０には、そのメッセージの形式
を表す番号が入る。本実施例では、他系へ先頭ポインタ
３２６の値を返答するよう要求する先頭ポインタ要求メ
ッセージ、それに対する返答である先頭ポインタ返答メ
ッセージ、他系へ末尾ポインタ３２７の値を返答するよ
う要求する末尾ポインタ要求メッセージ、それに対する
返答である末尾ポインタ返答メッセージの４種類のメッ
セージ形式を用意することとし、それぞれを表す番号を
１、２、３、４とする。ポインタ値１２４０は、返答メ
ッセージの場合のみ付加されるフィールドであり、先頭
ポインタ返答メッセージの場合は先頭ポインタ３２６の
値が、末尾ポインタ返答メッセージの場合は末尾ポイン
タ３２７の値が入る。The header 1210 is a message header including a message length and the like. The sequence number 1220 is a number for uniquely distinguishing each message flowing through the inter-system communication line 130, and a value that is as different as possible from the sequence number 1220 of the message that has flowed so far is used. However, the sequence number 1220 is the same between the request message and the reply message to the request message.
The message type 1230 contains a number indicating the format of the message. In this embodiment, a head pointer request message requesting the other system to return the value of the head pointer 326, a head pointer reply message as a response thereto, and a tail pointer requesting the other system to reply the value of the tail pointer 327. Four types of message formats, that is, a request message and a tail pointer response message that is a response to the request message, are prepared, and the numbers representing the respective types are 1, 2, 3, and 4. The pointer value 1240 is a field added only in the case of a reply message, and contains the value of the leading pointer 326 in the case of a leading pointer reply message and the value of the trailing pointer 327 in the case of a trailing pointer reply message.

【００５９】図１３は、他系障害監視部３５３が系間通
信を用いて、他系に障害が発生しているのか、共有メモ
リ同期機構１２０に障害が発生しているのか、あるいは
障害が発生していないのかを確認する、系間通信による
障害確認処理１３００のフローを示す。FIG. 13 shows that the other-system fault monitor 353 uses the inter-system communication to determine whether a fault has occurred in the other system, a fault has occurred in the shared memory synchronization mechanism 120, or a fault has occurred. 9 shows a flow of a failure confirmation process 1300 by inter-system communication for confirming whether or not the failure has occurred.

【００６０】系間通信による障害確認処理１３００は、
引数を必要としないが、他系の障害なのか、共有メモリ
同期機構１２０の障害なのか、障害が発生していないの
かを表す値を、返値として呼び出し元へ返す、関数型の
手続きとして実現することとする。The failure confirmation processing 1300 by inter-system communication
Although it does not require an argument, it is implemented as a function-type procedure that returns to the caller a value indicating whether it is a failure of another system, a failure of the shared memory synchronization mechanism 120, or no failure has occurred. I decided to.

【００６１】ある系Ａで呼び出された系間通信による障
害確認処理１３００は、最初に、ＯＳ２１３内に在る
系間通信制御部３３２を呼び出すことによって、他系で
ある系Ｂへ向かって要求メッセージを流す（ステップ１
３０５）。この要求メッセージのメッセージタイプ１２
３０は、系Ａが運用系である場合は先頭ポインタ要求メ
ッセージを表す番号１、待機系である場合は末尾ポイン
タ要求メッセージを表す番号３である。そして、送信し
た要求メッセージと同じシーケンス番号１２２０を持つ
返答メッセージを系Ｂから受信するまで一定時間待つ
（ステップ１３１０）。ステップ１３１０で受信待ちを
行なっている間は、状態更新情報交換ソフトウェア２１
５の全処理部を含むあらゆる処理を並行して行って構わ
ない。The failure confirmation processing 1300 based on the inter-system communication called by a certain system A is performed by first calling the inter-system communication control unit 332 in the OS 213 to request the other system B. Stream the message (Step 1
305). Message type 12 of this request message
Numeral 30 denotes number 1 representing a head pointer request message when the system A is an active system, and number 3 representing a tail pointer request message when the system A is a standby system. Then, it waits for a certain time until a response message having the same sequence number 1220 as the transmitted request message is received from the system B (step 1310). While waiting for reception in step 1310, the state update information exchange software 21
All processes including all the processing units 5 may be performed in parallel.

【００６２】正常に動作している系Ｂは、系Ａからの要
求メッセージを受信すると、まずＯＳ２１３内に在る
系間通信制御部３３２が要求メッセージの受信処理を行
ない、その要求メッセージを他系障害監視部３５３へ引
き渡す。そして、他系障害監視部３５３はその要求メッ
セージに対応する返答メッセージを作成する。このと
き、メッセージタイプ１２３０には、受信したメッセー
ジが先頭ポインタ要求メッセージである場合は先頭ポイ
ンタ返答メッセージを表す番号２、末尾ポインタ要求メ
ッセージである場合は末尾ポインタ返答メッセージを表
す番号４を入れる。シーケンス番号１２２０には受信し
た要求メッセージと同じシーケンス番号を入れる。ポイ
ンタ値１２４０には、受信したメッセージが先頭ポイン
タ要求メッセージである場合は系Ｂが共有メモリ上に持
つ先頭ポインタ３２６の値を、末尾ポインタ要求メッセ
ージである場合は系Ｂが共有メモリ上に持つ末尾ポイン
タ３２７の値を入れる。そして、作成した返答メッセー
ジを、ＯＳ２１３内に在る系間通信制御部３３２を呼
び出すことによって、系Ａへ向かって流す。When the system B operating normally receives the request message from the system A, first, the inter-system communication control unit 332 in the OS 213 performs a process of receiving the request message. Deliver to the system failure monitoring unit 353. Then, the other-system fault monitoring unit 353 creates a reply message corresponding to the request message. At this time, in the message type 1230, if the received message is a head pointer request message, a number 2 representing a head pointer reply message is entered, and if a received message is a tail pointer request message, a number 4 representing a tail pointer reply message is entered. The sequence number 1220 contains the same sequence number as the received request message. The pointer value 1240 contains the value of the head pointer 326 held by the system B on the shared memory when the received message is the head pointer request message, and the tail value held by the system B on the shared memory when the received message is the tail pointer request message. The value of the pointer 327 is entered. Then, the created reply message is transmitted toward the system A by calling the inter-system communication control unit 332 in the OS 213.

【００６３】系Ａは、ステップ１３１０における一定時
間の受信待ちが終わったら、その受信待ち終了が時間切
れによるものなのかどうかを調べる（ステップ１３１
５）。時間切れによって受信待ちが終了したのならば、
他系に障害が発生していることを表す返値を呼び出し元
へ返して、系間通信による障害確認処理１３００を終了
する。When the waiting for reception for a predetermined time in step 1310 is completed, the system A checks whether or not the completion of the waiting for reception is due to time out (step 131).
5). If waiting time is over due to timeout,
A return value indicating that a failure has occurred in the other system is returned to the caller, and the failure confirmation processing 1300 by inter-system communication ends.

【００６４】受信待ち終了が返答メッセージ受信による
ものであれば、ステップ１３２０からステップ１３３０
により、共有メモリ上に設けられた先頭ポインタ３２６
または末尾ポインタ３２７の値の同期が。系Ａと系Ｂの
間で行なわれているかどうかを調べることになる。If the end of the reception waiting is due to the reception of the reply message, steps 1320 to 1330
, The start pointer 326 provided on the shared memory
Or synchronization of the value of the end pointer 327. It will be checked whether it is performed between the system A and the system B.

【００６５】まず、受信待ちの間に、監視対象であるポ
インタ（系Ａが運用系である場合は先頭ポインタ３２
６、待機系である場合は末尾ポインタ３２７）の値Ｐが
変わったかどうかを、ポインタ変更フラグ３５７が1で
あるかどうか（ステップ１３２０）と、ポインタ前回値
３５６とＰが等しいかどうか（ステップ１３２５）によ
って調べる。ステップ１３２０、１３２５により、Ｐが
変わったことが分かれば、他系にも共有メモリ同期機構
１２０にも障害が発生していないことを表す返値を呼び
出し元へ返して、系間通信による障害確認処理１３００
を終了する。ステップ１３２０、１３２５により、Ｐに
変化がないことが分かったら、系Ｂから受信した返答メ
ッセージに含まれるポインタ値１２４０とＰを比較し
（ステップ１３３０）、値が異なれば、共有メモリ同期
機構１２０に障害が発生していることを表す返値を呼び
出し元へ返して、系間通信による障害確認処理１３００
を終了する。値が同じならば、他系にも共有メモリ同期
機構１２０にも障害が発生していないことを表す返値を
呼び出し元へ返して、系間通信による障害確認処理１３
００を終了する。First, while waiting for reception, the pointer to be monitored (the head pointer 32 if the system A is the active system)
6. In the case of the standby system, whether the value P of the end pointer 327 has changed, whether the pointer change flag 357 is 1 (step 1320), and whether the previous value 356 of the pointer is equal to P (step 1325) Find out by). If it is determined in steps 1320 and 1325 that P has changed, a return value indicating that no failure has occurred in the other system or the shared memory synchronization mechanism 120 is returned to the caller, and the failure is confirmed by inter-system communication. Process 1300
To end. If it is found in Steps 1320 and 1325 that P has not changed, P is compared with the pointer value 1240 included in the reply message received from the system B (Step 1330). A return value indicating that a failure has occurred is returned to the caller, and failure confirmation processing 1300 by inter-system communication is performed.
To end. If the values are the same, a return value indicating that no failure has occurred in the other system or the shared memory synchronization mechanism 120 is returned to the caller, and the failure confirmation processing 13 by the inter-system communication is performed.
End 00.

【００６６】[0066]

【発明の効果】このように、本発明の状態交換・障害検
出兼用方式を利用することによって、少量の共有メモリ
のみを用いて両系の状態同期を行なうことができると同
時に、その状態同期機構を用いて一方の系が他方の系の
ソフトウェア障害を検出することができる。加えて、本
発明では両系が状態データを直接共有せず、また両系が
行う処理内容の多くは異なるものであるので、状態デー
タ異常やソフトウェア・ハードウェアのバグによる障害
に対する耐性も高い。As described above, by utilizing the state exchange / fault detection system of the present invention, the state synchronization of both systems can be performed using only a small amount of shared memory, and at the same time, the state synchronization mechanism is provided. Can be used to detect one system's software failure. In addition, according to the present invention, since the two systems do not directly share the status data, and most of the processes performed by the two systems are different, the system is highly resistant to abnormalities in status data and failures due to software / hardware bugs.

[Brief description of the drawings]

【図１】本発明における、二重化された計算機のシステ
ム構成図である。FIG. 1 is a system configuration diagram of a duplicated computer according to the present invention.

【図２】本発明における、二重化された計算機に内蔵さ
れた各計算機系のハードウェア構成図である。FIG. 2 is a hardware configuration diagram of each computer system incorporated in a duplicated computer in the present invention.

【図３】本発明における、各計算機系の局所メモリおよ
び共有メモリに置かれるプログラムおよびデータの構成
図である。FIG. 3 is a configuration diagram of programs and data stored in a local memory and a shared memory of each computer system according to the present invention.

【図４】リングバッファへの読み書きに用いる３つ組デ
ータの形式を示す図である。FIG. 4 is a diagram showing a format of triple data used for reading / writing from / to a ring buffer.

【図５】リングバッファのデータ構造を示す図である。FIG. 5 is a diagram showing a data structure of a ring buffer.

【図６】リングバッファへデータを書き込む処理のフロ
ーを示す図である。FIG. 6 is a diagram showing a flow of a process of writing data to a ring buffer.

【図７】リングバッファからデータを読み出す処理のフ
ローを示す図である。FIG. 7 is a diagram showing a flow of a process of reading data from a ring buffer.

【図８】ＯＳやアプリケーションから受け取った状態更
新情報をリングバッファへ書き込む処理のフローを示す
図である。FIG. 8 is a diagram illustrating a flow of a process of writing state update information received from an OS or an application to a ring buffer.

【図９】ＮＯＰ情報をリングバッファへ書き込む処理の
フローを示す図である。FIG. 9 is a diagram showing a flow of a process of writing NOP information to a ring buffer.

【図１０】リングバッファから状態更新情報を読み出し
てＯＳまたはアプリケーションへ引き渡す処理のフロー
を示す図である。FIG. 10 is a diagram illustrating a flow of a process of reading state update information from a ring buffer and transferring the information to an OS or an application.

【図１１】共有メモリを用いて他系または共有メモリ同
期機構の障害を監視し、障害が判明したら、系間通信を
用いて障害内容を確認するフローを示す図である。FIG. 11 is a diagram illustrating a flow of monitoring a failure of another system or a shared memory synchronization mechanism using a shared memory, and confirming the failure content using inter-system communication when the failure is found.

【図１２】系間通信による障害確認処理で用いるメッセ
ージの形式を示す図である。FIG. 12 is a diagram showing a message format used in a failure confirmation process by inter-system communication.

【図１３】系間通信を用いて他系または共有メモリ同期
機構のいずれの障害なのかを確認する処理のフローを示
す図である。FIG. 13 is a diagram illustrating a flow of a process of confirming whether a failure has occurred in another system or a shared memory synchronization mechanism using inter-system communication.

[Explanation of symbols]

１００…計算機全体、１１０…各系統の計算機、１２０
…共有メモリ同期機構、１３０…系間通信回線、１４０
…制御端末接続回線、１５０…制御端末、２００…ＣＰ
Ｕ、２１０…局所メモリ、２１３…ＯＳ、２１５…状態
更新情報交換ソフトウェア、２１７…アプリケーション
群、２２０…共有メモリ、２３０…系間通信コントロー
ラ、２４０…制御端末通信コントローラ、２５０…補助
記憶装置コントローラ、２５５…補助記憶装置、３２０
…リングバッファ、３２５…バッファメモリ領域、３２
６…先頭ポインタ、３２７…末尾ポインタ、３３０…Ｏ
Ｓプログラム、３３１…タイマ制御部、３３２…系間通
信制御部、３３３…障害復旧処理部、３３５…ＯＳ状態
データ、３５１…リングバッファ制御部、３５２…更新
情報処理部、３５３…他系障害監視部、３５５…アプリ
ＩＤと接続の対応表、３５６…ポインタ前回値、３５７
…ポインタ変更フラグ、３７０…アプリケーションプロ
グラム、３７５…アプリケーション状態データ、４００
…３つ組データ、４１０…長さフィールド、４２０…ア
プリＩＤフィールド、４３０…状態更新情報フィール
ド、６００…リングバッファ書き込み処理、７００…リ
ングバッファ読み出し処理、８００…状態更新情報書き
込み処理、９００…ＮＯＰ情報書き込み処理、１０００
…状態更新情報読み出し処理、１１００…他系障害監視
処理、１２１０…ヘッダ、１２２０…シーケンス番号、
１２３０…メッセージタイプ、１２４０…ポインタ値、
１３００…系間通信による障害確認処理。100: The entire computer, 110: Computer of each system, 120
... shared memory synchronization mechanism, 130 ... inter-system communication line, 140
... Control terminal connection line, 150 ... Control terminal, 200 ... CP
U, 210 local memory, 213 OS, 215 state update information exchange software, 217 application group, 220 shared memory, 230 inter-system communication controller, 240 control terminal communication controller, 250 auxiliary memory device controller 255: auxiliary storage device, 320
... ring buffer, 325 ... buffer memory area, 32
6 Start pointer, 327 End pointer, 330 O
S program, 331: timer control unit, 332: inter-system communication control unit, 333: failure recovery processing unit, 335: OS state data, 351: ring buffer control unit, 352: update information processing unit, 353: other system failure monitoring Part, 355: correspondence table between application ID and connection, 356: previous value of pointer, 357
... pointer change flag, 370 ... application program, 375 ... application state data, 400
.. Triplet data, 410 length field, 420 application ID field, 430 state update information field, 600 ring buffer write processing, 700 ring buffer read processing, 800 state update information write processing, 900 NOP Information writing process, 1000
... Status update information read processing, 1100 other system fault monitoring processing, 1210 header, 1220.
1230: message type, 1240: pointer value,
1300: Failure confirmation processing by inter-system communication.

Claims

[Claims]

A computer having therein both a local storage area that can be referred to only by the computer and a shared storage area that can be referred to by other computers, wherein the local storage area of the computer is provided. Means for sequentially writing information on the change of data stored in the shared storage area of the computer, data stored in a specific storage area provided in the shared storage area of the computer, Means for periodically rewriting in accordance with the writing; other means for sharing the shared storage area with the computer by periodically monitoring a specific storage area provided in the shared storage area of the computer. It is determined that a failure has occurred in the computer, an apparatus for sharing the storage contents of the shared storage area between the computer and another computer, or both of them. , A computer comprising:

2. A computer having therein both a local storage area that can be referred to only by the computer and a shared storage area that can be referred to by other computers, wherein the shared storage area of the computer is provided. Means for sequentially reading data stored in the computer, interpreting the data as information relating to a change in the data stored in the local storage area of the computer, and reflecting the data in the local storage area of the computer; Means for periodically rewriting data stored in a specific storage area provided in the storage area in accordance with the reading of the information; and periodically updating the specific storage area provided in the shared storage area of the computer. The other computer sharing the shared storage area with the computer, or the storage contents of the shared storage area can be monitored by the computer and another computer. A computer for sharing with the computer, or a unit for determining that a failure has occurred in both of them.

3. The computer according to claim 1, wherein: a means for transmitting data from the computer to another computer using a device other than the shared storage area; and using a device other than the shared storage area. Means for receiving data from another computer to the computer, data stored in a specific storage area provided in the shared storage area of the computer in accordance with data received from the other computer by the data receiving means. Means for sending to another computer using the data transmitting means, after sending data to another computer using the data transmitting means, the data receiving means did not receive data from the other computer for a certain period of time. Means for judging that a failure has occurred in another computer sharing the shared storage area with the computer, and receiving from the other computer by the data receiving means. Means for sharing the storage contents of the shared storage area between the computer and another computer using the taken data, and determining whether a failure has occurred in an apparatus. calculator.

4. The computer according to claim 2, wherein: a means for transmitting data from the computer to another computer using a device other than the shared storage area; and using a device other than the shared storage area. Means for receiving data from another computer to the computer, data stored in a specific storage area provided in the shared storage area of the computer in accordance with data received from the other computer by the data receiving means. Means for sending to another computer using the data transmitting means, after sending data to another computer using the data transmitting means, the data receiving means did not receive data from the other computer for a certain period of time. Means for judging that a failure has occurred in another computer sharing the shared storage area with the computer, and receiving from the other computer by the data receiving means. Means for sharing the storage contents of the shared storage area between the computer and another computer using the taken data, and determining whether a failure has occurred in an apparatus. calculator.

5. A computer system comprising a computer according to claim 1 or 3 and a computer according to claim 2 therein, one each of which comprises: A means for sharing the storage contents of the shared storage area.

6. A computer system having one computer according to claim 3 and one computer according to claim 4 therein, wherein the two computers share a common storage area with each other. And a means for transmitting and receiving data between the two computers using a device other than the shared storage area.