JP2009140424A

JP2009140424A - Fault tolerant computer system, re-synchronization operating processing method, and program

Info

Publication number: JP2009140424A
Application number: JP2007318695A
Authority: JP
Inventors: Shusuke Yamamoto; 秀典山本; Shigetoshi Samejima; 茂稔鮫嶋; Masanori Yoshida; 雅徳吉田; Yoshiaki Adachi; 芳昭足達
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2007-12-10
Filing date: 2007-12-10
Publication date: 2009-06-25
Anticipated expiration: 2027-12-10
Also published as: JP5153310B2

Abstract

<P>PROBLEM TO BE SOLVED: To allow the processing timing of a node to be re-incorporated so as to be coincident with that of an operating node without interrupting the processing of the operating node when re-incorporating the stopped node to a system concerning a fault tolerant computer system which includes a plurality of nodes which are connected to one another via a network so as to independently perform the same processing in the respective nodes in parallel. <P>SOLUTION: The driving management parts 0311, 0312 of the operating node notify the incorporation node of a processing start when the processing of a user program is started. The driving management parts 0311, 0312 of the incorporation node starts the processing of the user program by referring to the processing start notification which is received by the incorporation node from the operating node, when data coincidence processing is completed between the operating node and the incorporation node. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、ネットワークを介して相互接続した、複数の独立したノードにより構成され、構成する各ノードにおいて同じ処理を並列実行させることによるフォールトトレラントコンピュータ（ＦａｕｌｔＴｏｌｅｒａｎｔＣｏｍｐｕｔｅｒ）システムに関し、該システムにて実行、提供されるサービスは無停止であることを保証して、特にソフトウェア更新、ハードウェア更改または障害回復、メンテナンス等のためにシステム稼働中に停止させていたノードを再起動し、稼働中の他のノードは無停止、処理継続のまま、稼働中の他のノードと保有するデータ内容、プログラムの実行状態、入出力等を一致させ、実行状態に移行することにより該ノードをシステムに再度組込むという、ノードの再同期稼働化処理に関するものである。 The present invention relates to a fault-tolerant computer (Fault Tolerant Computer) system configured by a plurality of independent nodes interconnected via a network, and executing the same processing in parallel in each of the configured nodes. , Ensure that the services provided are non-disruptive, restart nodes that were stopped during system operation, especially for software updates, hardware updates or failure recovery, maintenance, etc. This node is not stopped, continues processing, matches the data content, program execution status, input / output, etc. with other operating nodes, and moves to the execution status to incorporate the node into the system again , Node resynchronization activation processing.

複数のサブシステムから送出される同一データを二台のホストコンピュータで同時に処理する二重化コンピュータシステムにおいて、二重化運転を開始する際の、処理データの同期合わせ、すなわちマスター系処理データのスレーブ系へのコピー処理（以下、この処理を「スレーブ組み込み処理」という。）を、二台のディスク装置と、系間通信路を用いて行う、ことを特徴とする二重化コンピュータシステムのスレーブ組込方法（特許文献１参照）などが提案されている。
特開平１１−７３２７８号公報 In a duplex computer system that simultaneously processes the same data sent from multiple subsystems on two host computers, synchronization of processing data when starting duplex operation, that is, copying master processing data to a slave system A slave incorporation method for a duplex computer system (Patent Document 1) characterized in that the process (hereinafter referred to as "slave incorporation process") is performed using two disk devices and an inter-system communication path. Have been proposed).
Japanese Patent Laid-Open No. 11-73278

ところで、従来のスレーブ組み込み処理では、ノードの再同期稼働化を行うために、稼働中ノードにおける処理を一時停止させてから、データのコピー処理を行い、コピー処理完了後に、組込みノードも含めた全ノードの処理を一斉に再開させることで、ノード間のデータ内容、プログラムの実行状態、入出力等の一致化を図っている。しかし、稼働中ノードを無停止のままで再同期稼働化を実行しようとすると、たとえあるタイミングで稼働中ノードと組込みノードとの間でデータが一致化していたとしても、同期していない異なるタイミングで各々のノードにてプログラムの処理が実施されると、異なるタイミングでのデータの読み書きが発生し、ノード間でデータ内容に違いが生じてくる。またデータ内容が異なれば、これらを参照して行う処理の結果は、同一処理内容であっても異なる可能性があり、ノード間で出力に不一致が生じる可能性がある。 By the way, in the conventional slave embedding process, in order to perform resynchronization operation of the node, the process in the operating node is temporarily stopped and then the data copy process is performed. After the copy process is completed, all the processes including the embedded node are performed. By resuming the processing of the nodes all at once, the data contents between the nodes, the execution state of the program, the input / output, etc. are matched. However, if you try to perform resynchronization operation without stopping the active node, even if the data is consistent between the active node and the embedded node at a certain timing, different timing that is not synchronized When the processing of the program is executed at each node, data reading / writing occurs at different timings, resulting in a difference in data contents between the nodes. Also, if the data contents are different, the results of the processing performed by referring to them may be different even if the processing contents are the same, and there may be a mismatch in output between nodes.

そこで、本発明は上記課題に鑑みてなされたものであり、ソフトウェア更新、ハードウェア更改または障害回復、メンテナンス等のために停止させていたノードを再度システムに組込む際に、正常稼働中のノードの処理を中断することなく、該ノードを稼働中ノードと処理タイミングを一致化させ、漏れなく必要な処理を実行することを可能とし、フォールトトレラントコンピュータシステムによる高い信頼性の維持、該システムにより実行ないし提供されるサービスの無停止を保証するような、ノードの再同期稼働化の方法を提供することを主たる目的とする。 Therefore, the present invention has been made in view of the above problems, and when a node that has been stopped for software update, hardware update or failure recovery, maintenance, etc. is re-installed in the system, Without interrupting the processing, it is possible to match the processing timing of the node with the operating node, and to execute necessary processing without omission, maintaining high reliability by the fault-tolerant computer system, The main purpose is to provide a node resynchronization operation method that guarantees non-stop service provided.

上記課題を解決するため、本発明によるフォールトトレラントコンピュータシステムは、ネットワークを介して相互に接続された複数のノードを備え、複数のノードの各ノードにおいて同じ処理を独立に並列実行させる。各ノードは、システム内の稼働中のノード（以下「稼働中ノード」という。）は無停止かつ処理を継続させた状態で、停止状態から再起動したノード（以下「組込みノード」という。）を稼働中ノードとの間で処理タイミングを合わせてシステムへ再度組込む、という再同期稼働化処理を行う駆動管理部と、稼働中ノードと組込みノードとの間で、データ内容を含むノードの状態を一致させるデータ一致化処理を行うデータ同期処理部と、を備える。ここで、駆動管理部は、ノードが稼働中ノードである場合には、ユーザプログラムの処理を実行開始する度に、組込みノードに対して処理開始の通知を行い、ノードが組込みノードである場合には、データ同期処理部によって稼働中ノードと当該組込みノードとの間でのデータ一致化処理が完了していた場合に、当該組込みノードが稼働中ノードから受信した処理開始通知を参照して、ユーザプログラムの処理を開始することを特徴とする。 In order to solve the above problems, a fault-tolerant computer system according to the present invention includes a plurality of nodes connected to each other via a network, and the same processing is independently executed in parallel in each node of the plurality of nodes. Each node is a node that is restarted from a stopped state (hereinafter referred to as an “embedded node”) in a state in which an operating node in the system (hereinafter referred to as “active node”) is not stopped and processing is continued. The state of the node including the data contents is matched between the active node and the embedded node, and the drive management unit that performs resynchronization operation processing that re-integrates into the system at the same processing timing with the active node A data synchronization processing unit that performs data matching processing. Here, when the node is an active node, the drive management unit notifies the embedded node of the start of processing every time the processing of the user program starts, and when the node is an embedded node. Refers to the processing start notification received by the embedded node from the active node when the data matching processing between the active node and the embedded node has been completed by the data synchronization processing unit. The program processing is started.

好適には、駆動管理部は、ノードが組込みノードである場合、稼働中ノードから送信されるデータ一致化完了通知を受信することによって、稼働中ノードと当該組込みノードとの間でのデータ一致化の完了を判定する。 Preferably, when the node is an embedded node, the drive management unit receives data matching completion notification transmitted from the operating node, thereby matching the data between the operating node and the embedded node. Determine completion of.

また、駆動管理部は、入力をトリガーとして処理を実施するユーザプログラムに関して、再同期稼働化処理を行う入力駆動管理部を含むことが好ましい。ここで、入力駆動管理部は、ノードが稼働中ノードである場合には、当該ノードにおいて入力を受信し、ユーザプログラムの処理を開始する度に、組込みノードに対して処理開始の通知及び入力の転送を行い、ノードが組込みノードである場合には、稼働中ノードから転送された入力、又は、当該組込みノードにて直接受信した入力を用いて、ユーザプログラムの処理を実施することを特徴とする。 Moreover, it is preferable that a drive management part contains the input drive management part which performs a resynchronization operation | movement process regarding the user program which implements a process with an input as a trigger. Here, when the node is an active node, the input drive management unit receives the input at the node, and each time the user program starts processing, the input drive management unit notifies the embedded node of processing start and input When the node is an embedded node, the processing of the user program is performed using the input transferred from the active node or the input directly received by the embedded node. .

好適には、入力駆動管理部は、ノードが組込みノードである場合、当該組込みノードが稼働中ノードからの処理開始通知及び転送された入力を受信した時点又は稼働中ノードから転送された入力に対応する入力を直接受信した時点で、データ同期処理部による稼働中ノードと組込みノードの間でのデータ一致化処理が完了していない場合には、ユーザプログラムの処理を行わず、次の処理開始タイミングまで待機する。 Preferably, when the node is a built-in node, the input drive management unit corresponds to the time when the built-in node receives the processing start notification from the active node and the transferred input or the input transferred from the active node. When the data matching processing between the active node and the embedded node by the data synchronization processing unit is not completed when the input to be directly received is received, the user program processing is not performed and the next processing start timing Wait until.

また、好適には、入力駆動管理部は、ノードが組込みノードである場合、稼働中ノードから受信した入力に対応する入力を当該組込みノードにて直接受信しているか否かを判定し、直接受信していないと判定した場合には、稼働中ノードから転送された入力を用いてユーザプログラムの処理を実施する。 In addition, preferably, when the node is an embedded node, the input drive management unit determines whether the input corresponding to the input received from the active node is directly received by the embedded node and receives the direct reception If it is determined that it is not, the user program is processed using the input transferred from the active node.

さらに好適には、ノードが組込みノードである場合、稼働中ノードから受信した入力に対応する入力を当該組込みノードにて直接受信しているか否かを判定し、直接受信していると判定した場合には、稼働中ノードに対して入力の転送の停止を要求するとともに、以後の処理を、当該組込みノードにて直接受信した入力を用いてユーザプログラムの処理を実施する。 More preferably, when the node is an embedded node, it is determined whether the input corresponding to the input received from the active node is directly received by the embedded node, and it is determined that the input is received directly. In this case, the active node is requested to stop the transfer of the input, and the processing of the user program is executed using the input directly received by the embedded node for the subsequent processing.

加えて、フォールトトレラントコンピュータシステムは、ネットワークに接続され、かつ、ネットワークとは別のネットワークを介して外部システムに接続されるゲートウェイを備えることが好ましい。ここで、ゲートウェイは、外部システムから受け取った入力を複数のノードに転送し、入力に対する処理を複数のノードが並行して略同時に実行した処理結果を受信して比較演算し、比較演算の結果として得られた出力を外部システムに応答として返す。 In addition, the fault tolerant computer system preferably includes a gateway connected to the network and connected to the external system via a network different from the network. Here, the gateway transfers the input received from the external system to a plurality of nodes, receives a processing result obtained by executing the processing on the input in parallel by the plurality of nodes substantially simultaneously, performs a comparison operation, and obtains a comparison operation result. The obtained output is returned to the external system as a response.

また、駆動管理部は、周期的に処理を実施するユーザプログラムに関して、再同期稼働化処理を行う周期駆動管理部を含むことが好ましい。ここで、周期駆動管理部は、ノードが稼働中ノードである場合には、タイマイベントが発生してユーザプログラムの処理を開始する度に、組込みノードに対して処理開始通知の送信を行い、ノードが組込みノードである場合には、稼働中ノードから受信した処理開始通知を参照して、当該組込みノードにてユーザプログラムの現時点から次の周期における処理開始タイミングを算出し、算出した次周期の処理開始タイミングに達した際に、稼働中ノードと組込みノードとの間でのデータ一致化処理が完了している場合、組込みノードにてユーザプログラムの処理を開始することを特徴とする。 Moreover, it is preferable that a drive management part contains the periodic drive management part which performs a resynchronization operation | movement process regarding the user program which performs a process periodically. Here, when the node is an active node, the periodic drive management unit sends a processing start notification to the embedded node every time a timer event occurs and the processing of the user program is started. Is a built-in node, refer to the process start notification received from the active node, calculate the process start timing in the next cycle from the current time of the user program at the built-in node, and process the calculated next cycle When the start timing is reached, if the data matching process between the active node and the embedded node is completed, the process of the user program is started at the embedded node.

好適には、周期駆動管理部は、ノードが組込みノードである場合、再同期稼働化処理におけるユーザプログラムの処理開始後に、稼働中ノードと当該組込みノードとの処理開始タイミングの差異を計測し、該差異が所定値より大きい場合は、当該組込みノードにおける処理タイミングを補正することによって、処理開始タイミングの同期を維持する。 Preferably, when the node is an embedded node, the periodic drive management unit measures a difference in processing start timing between the active node and the embedded node after starting the processing of the user program in the resynchronization activation process, When the difference is larger than a predetermined value, the processing start timing is synchronized by correcting the processing timing in the embedded node.

また、好適には、周期駆動管理部は、ノードが組込みノードである場合、稼働中ノードから受信した処理開始通知を参照して算出した次の周期の処理開始タイミングに達した時点で、データ同期処理部による稼働中ノードと組込みノードの間でのデータ一致化処理が完了していない場合には、ユーザプログラムの処理を行わず、次の周期の処理開始タイミングまで待機する。 Preferably, when the node is a built-in node, the periodic drive management unit performs data synchronization at the time when the processing start timing of the next cycle calculated with reference to the processing start notification received from the active node is reached. If the data matching process between the active node and the embedded node by the processing unit has not been completed, the process of the user program is not performed and the process waits until the process start timing of the next cycle.

さらに好適には、周期駆動管理部は、ノードが組込みノードである場合、稼働中ノードから受信した処理開始通知を参照し、該通知に含まれる周期情報及び該通知の伝送時間を用いて、当該組込みノードにおけるユーザプログラムの次の周期における処理開始タイミングを算出する。 More preferably, when the node is a built-in node, the periodic drive management unit refers to the processing start notification received from the active node, and uses the periodic information included in the notification and the transmission time of the notification, The processing start timing in the next cycle of the user program in the embedded node is calculated.

また、本発明によるフォールトトレラントコンピュータシステムの再同期稼働化処理方法は、ネットワークを介して相互に接続された複数のノードを備え、複数のノードの各ノードにおいて同じ処理を独立に並列実行させるフォールトトレラントコンピュータシステムにおいて、稼働中ノードは無停止かつ処理を継続させた状態で、組込みノードを稼働中ノードとの間で処理タイミングを合わせてシステムへ再度組込む、という再同期稼働化処理を行うための方法である。このフォールトトレラントコンピュータシステムの再同期稼働化処理方法は、稼働中ノードが、ユーザプログラムの処理を実行開始する度に、組込みノードに対して処理開始の通知を行うステップと、組込みノードが、稼働中ノードと当該組込みノードとの間で、データ内容を含むノードの状態を一致させるデータ一致化処理が完了していた場合に、稼働中ノードから受信した処理開始通知を参照して、ユーザプログラムの処理を開始するステップとを備える。 The fault-tolerant computer system resynchronization operation processing method according to the present invention includes a plurality of nodes connected to each other via a network, and the same processing is independently executed in parallel in each node of the plurality of nodes. In a computer system, a method for performing resynchronization operation processing in which an operating node is not stopped and processing is continued, and an embedded node is re-integrated into the system at the same processing timing as the operating node. It is. This fault-tolerant computer system resynchronization activation processing method includes a step of notifying the embedded node of the start of processing each time the operating node starts executing the processing of the user program, and the embedded node is operating. When the data matching process that matches the state of the node including the data contents is completed between the node and the embedded node, refer to the process start notification received from the active node and process the user program. Starting.

また、本発明に係るプログラムは、フォールトトレラントコンピュータシステムの再同期稼働化処理方法の各処理ステップを、フォールトトレラントコンピュータシステムの備える各ノードコンピュータに実行させることを特徴とする。本発明に係るプログラムは、ＣＤ−ＲＯＭ等の光学ディスク、磁気ディスク、半導体メモリなどの各種の記録媒体を通じて、又は通信ネットワークなどを介してダウンロードすることにより、コンピュータにインストール又はロードすることができる。 The program according to the present invention causes each node computer of the fault-tolerant computer system to execute each processing step of the resynchronization activation processing method of the fault-tolerant computer system. The program according to the present invention can be installed or loaded on a computer through various recording media such as an optical disk such as a CD-ROM, a magnetic disk, and a semiconductor memory, or via a communication network.

なお、本明細書等において、「部」とは、物理的手段のみを意味するものではなく、その部が有する機能をソフトウェアによって実現する場合も含む。また、１つの部が有する機能が２つ以上の物理的手段により実現されても、２つ以上の部の機能が１つの物理的手段により実現されてもよい。 In this specification and the like, the “unit” does not mean only a physical means, but includes a case where the function of the unit is realized by software. Also, the functions of one unit may be realized by two or more physical means, or the functions of two or more units may be realized by one physical means.

本発明によれば、ソフトウェア更新、ハードウェア更改または障害回復、メンテナンス等のために停止させていたノードを再度システムに組込む際に、正常稼働中のノードの処理を中断することなく、該ノードを稼働中ノードと処理タイミングを一致化させ、漏れなく必要な処理を実行することを可能とし、フォールトトレラントコンピュータシステムによる高い信頼性の維持、該システムにより実行ないし提供されるサービスの無停止を保証できる。 According to the present invention, when a node that has been stopped for software update, hardware update or failure recovery, maintenance, or the like is re-installed in the system, the node can be connected without interrupting the processing of the normally operating node. It is possible to match the processing timing with the operating node, execute the necessary processing without omission, maintain high reliability by the fault tolerant computer system, and guarantee the non-stop of the service executed or provided by the system. .

以下、本発明の実施の形態について図面を参照しつつ詳細に説明する。なお、基本的に、同一の要素には同一の符号を付し、重複する説明を省略する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. Basically, the same elements are denoted by the same reference numerals, and redundant description is omitted.

図１は、本発明の一実施形態に係るフォールトトレラントコンピュータシステムの概要を示す図である。フォールトトレラントコンピュータシステムは、ネットワークを介して相互接続した、複数の独立したノードにより構成され、構成する各ノードにおいて同じ処理を並列に実行させる。 FIG. 1 is a diagram showing an overview of a fault tolerant computer system according to an embodiment of the present invention. The fault-tolerant computer system is composed of a plurality of independent nodes interconnected via a network, and the same processing is executed in parallel in each of the constituting nodes.

同図に示すように、本フォールトトレラントコンピュータシステム０１０１は、ＬＡＮ０１１２を介して相互に接続される２つ以上の独立したノード０１１１と、広域ネットワーク０１０２に接続され、外部システム０１０３との通信の中継を行うゲートウェイサーバ０１１３とを主な構成要素として備える。 As shown in the figure, the fault tolerant computer system 0101 is connected to two or more independent nodes 0111 connected to each other via a LAN 0112 and a wide area network 0102 and relays communication with an external system 0103. A gateway server 0113 is provided as a main component.

フォールトトレラントコンピュータシステム０１０１は、広域ネットワーク０１０２を介して通信可能な外部システム０１０３からの要求を受け付けて、該要求に対する処理を実施し、処理結果を応答として該外部システム０１０３に返信することによるサービスを提供する。ここでは外部システム０１０３からの要求として入力メッセージ０１３１を受信し、該要求に対する処理結果を格納した出力メッセージ０１４１を外部システム０１０３に対して送信する。 The fault-tolerant computer system 0101 receives a request from the external system 0103 that can communicate via the wide area network 0102, performs processing for the request, and returns a service result as a response to the external system 0103. provide. Here, the input message 0131 is received as a request from the external system 0103, and an output message 0141 storing the processing result for the request is transmitted to the external system 0103.

フォールトトレラントコンピュータシステム０１０１の内部では、広域ネットワーク０１０２を介して外部システム０１０３からの要求として入力メッセージ０１３１を中継するゲートウェイサーバ０１１３が、ＬＡＮ０１１２を介して、システム内の全てのノード０１１１に対して、受信した入力メッセージ０１３１を入力メッセージ０１３２として転送する。ここで、ほぼ同時に全てのノード０１１１が該メッセージを受信し、概メッセージに対する処理を開始できるように、ゲートウェイサーバ０１１３は、入力メッセージ０１３２をブロードキャスト送信する。該入力メッセージ０１３２を受信した各ノード０１１１は、各々該入力メッセージ０１３２に対する処理を実行して、処理結果を格納した出力メッセージ０１４２を、ＬＡＮ０１１２を介してゲートウェイサーバ０１１３に対して送信する。各ノード０１１１からの出力メッセージ０１４２を受信したゲートウェイサーバ０１１３は要求元の外部システム０１０３への応答として出力メッセージ０１４１を作成し、外部システム０１０３に対して送信する。ここでゲートウェイサーバ０１１３は、各ノード０１１１から受信した１つ以上の出力メッセージ０１４２のデータ内容の比較照合、正誤判定等を行い、多数決等により決定した最も確からしいメッセージデータを出力メッセージ０１４１として、要求元の外部システム０１０３に送信する。 Inside the fault tolerant computer system 0101, the gateway server 0113 that relays the input message 0131 as a request from the external system 0103 via the wide area network 0102 receives all the nodes 0111 within the system via the LAN 0112. The input message 0131 thus transferred is transferred as the input message 0132. Here, the gateway server 0113 broadcasts the input message 0132 so that all the nodes 0111 receive the message almost simultaneously and can start processing the general message. Each node 0111 that has received the input message 0132 executes processing for the input message 0132 and transmits an output message 0142 storing the processing result to the gateway server 0113 via the LAN 0112. The gateway server 0113 that has received the output message 0142 from each node 0111 creates an output message 0141 as a response to the requesting external system 0103 and transmits it to the external system 0103. Here, the gateway server 0113 performs comparison and collation of data contents of one or more output messages 0142 received from each node 0111, correctness determination, etc., and requests the most probable message data determined by majority decision as an output message 0141. Transmit to the original external system 0103.

また、ノード０１１１のハードウェア構成は、処理装置０１２１、記憶装置０１２２、通信装置０１２３を含む。記憶装置０１２２には、フォールトトレラントコンピュータシステム０１０１により外部システム０１０３に対して提供されるサービスを実行する上で必要なデータ、これらサービスに関する外部システム０１０３からの要求に対して処理を実行するためのユーザプログラム、複数のノード０１１１の間での稼働中及び再組込み中の同期を実施するためのソフトウェアプログラム、ＬＡＮ０１１２を介してノード０１１１とゲートウェイサーバ０１１３との間の通信を行うためのソフトウェアプログラム等が格納され、処理装置０１２１により処理される。また通信装置０１２３は、ゲートウェイサーバ０１１３からの入力メッセージ０１３２を受信し、ゲートウェイサーバ０１１３に対して出力メッセージ０１４２を送信するための通信処理、また複数のノード０１１１の間での稼働中及び再組込み中の同期を実施するために必要な通信処理を行う。 The hardware configuration of the node 0111 includes a processing device 0121, a storage device 0122, and a communication device 0123. The storage device 0122 stores data necessary for executing services provided to the external system 0103 by the fault-tolerant computer system 0101 and a user for executing processing for requests from the external system 0103 regarding these services. Stores a program, a software program for performing synchronization during operation and re-installation between a plurality of nodes 0111, a software program for performing communication between the node 0111 and the gateway server 0113 via the LAN 0112, and the like. And processed by the processing device 0121. Also, the communication device 0123 receives the input message 0132 from the gateway server 0113, and performs communication processing for transmitting the output message 0142 to the gateway server 0113, and is operating and being reassembled among a plurality of nodes 0111. The communication processing necessary to implement the synchronization is performed.

なおフォールトトレラントコンピュータシステム０１０１内に２つ以上のノード０１１１を含め、これらで外部システム０１０３からの要求に対する処理を同時に併行して実行させていることで、１つ以上のノードにて障害が発生した場合でも、残りの稼働中のノードが処理を実行することで、システムとして正常動作によるサービス提供を維持することを可能とし、システムの耐障害性、信頼性を高めている。 A fault has occurred in one or more nodes by including two or more nodes 0111 in the fault-tolerant computer system 0101 and simultaneously executing processing for requests from the external system 0103 using these nodes. Even in this case, the remaining operating nodes execute processing, thereby enabling the system to maintain service provision by normal operation, and improving the fault tolerance and reliability of the system.

図２は、本発明の一実施形態における、フォールトトレラントコンピュータシステム０１０１を構成し、外部からの要求に対する処理等を実行するノード０１１１の再同期稼働化の概要を示す図、及び、関連する処理の流れを示すシーケンス図である。ここでは各ノードにて実行するユーザプログラムとして、入力をトリガーとして処理を実施するユーザプログラムが実行される場合を例に挙げている。 FIG. 2 is a diagram showing an outline of resynchronization operation of the node 0111 that configures the fault-tolerant computer system 0101 and executes processing for an external request in the embodiment of the present invention, and related processing. It is a sequence diagram which shows a flow. Here, a case where a user program that executes processing with an input as a trigger is executed as an example of a user program executed at each node.

なお、図２に示す例は、フォールトトレラントコンピュータシステム０１０１において、ＬＡＮ０１１２を介して相互接続されるノード０１１１のうち、稼働中ノード０２０１と、停止状態からの再同期稼働化を実施する組込みノード０２０２とを示している。 In the example shown in FIG. 2, in the fault tolerant computer system 0101, among the nodes 0111 interconnected via the LAN 0112, the active node 0201 and the embedded node 0202 that performs resynchronization operation from the stopped state Is shown.

稼働中ノード０２０１では、ユーザプログラム０２１１がデータ０２１２（共有メモリ、ファイル等）等を参照して処理を行っている。組込みノードの再同期稼働化を実施する際には、稼働中ノード０２０１におけるユーザプログラム０２１１等の処理は組込みノード０２０２の状態に関わらず無停止で正常に実行させたまま、データ０２１２のコピー及びユーザプログラム０２１１の実行状態の通知等を行う。組込みノードの再同期稼働化の完了時には、組込みノード０２０２のデータ０２２２は稼働中ノード０２０１のデータ０２１２と一致化し、組込みノードにおけるユーザプログラム０２２１は稼働中ノード０２０１のユーザプログラム０２１１と同じ実行状態となる。 In the operating node 0201, the user program 0211 performs processing with reference to data 0212 (shared memory, file, etc.) and the like. When the resynchronization operation of the embedded node is executed, the processing of the user program 0211 and the like in the operating node 0201 is executed without stopping regardless of the state of the embedded node 0202, and the copy of the data 0212 and the user are performed. The execution status of the program 0211 is notified. When the resynchronization operation of the embedded node is completed, the data 0222 of the embedded node 0202 coincides with the data 0212 of the active node 0201, and the user program 0221 in the embedded node is in the same execution state as the user program 0211 of the active node 0201. .

組込みノード０２０２の再同期稼働化に関連する処理のシーケンスとして、稼働中ノード０２０１におけるユーザプログラム０２１１は、入力（０２５１、０２５２、０２５３）に対して処理を行い、出力（０２６１、０２６２、０２６３）を返す。ここで、組込みノード０２０２では、ソフトウェア更新、ハードウェア更改またはメンテナンスや異常発生時の回復作業等のために停止していた状態から再起動すると（０２４１）、稼働中ノード０２０１は該組込みノード０２０２のデータ一致化を行うため、データ０２２２に対してデータ０２１２の書き込みを行う（０２３１）。組込みノード０２０２は、データ一致化完了を検出すると（０２４２）、稼働中ノード０２０１におけるユーザプログラム０２１１と実行状態を一致化させて、ユーザプログラム０２２１の処理を再開する（０２４３）。以降は、稼働中ノード０２０１におけるユーザプログラム０２１１と同様に、入力（０２５３）に対して処理を行い、出力（０２７２、０２７３）を返す。ここで組込みノードの処理が正常に実行されれば、出力０２７２、０２７３はそれぞれ稼働中ノード０２０１の出力０２６２、０２６３と合致するはずである。 As a sequence of processing related to the resynchronization operation of the embedded node 0202, the user program 0211 in the active node 0201 processes the input (0251, 0252, 0253) and outputs (0261, 0262, 0263). return. Here, when the embedded node 0202 is restarted from a state where it has been stopped for software update, hardware update or maintenance, recovery work in the event of an abnormality, etc. (0241), the active node 0201 is set to the embedded node 0202. In order to perform data matching, data 0212 is written to data 0222 (0231). When the embedded node 0202 detects the completion of data matching (0242), it matches the execution state with the user program 0211 in the active node 0201, and resumes the processing of the user program 0221 (0243). Thereafter, similarly to the user program 0211 in the active node 0201, the input (0253) is processed, and the outputs (0272, 0273) are returned. If the processing of the embedded node is normally executed here, the outputs 0272 and 0273 should match the outputs 0262 and 0263 of the active node 0201, respectively.

図３は、本発明の一実施形態における、フォールトトレラントコンピュータシステム０１０１を構成し、外部からの要求に対する処理等を実行するノード０１１１のモジュール構成を示す図である。 FIG. 3 is a diagram showing a module configuration of the node 0111 that configures the fault-tolerant computer system 0101 and executes processing for an external request and the like in an embodiment of the present invention.

ノード０１１１には、外部システム０１０３からの要求に対する処理を実行し応答を返すユーザプログラム０３０２と、該ユーザプログラムが処理中に読み込み、書き込みを行う共有メモリやファイル等のデータ０３０３と、通信媒体０３０４（図１のＬＡＮ０１１２に相当する。）を介して他のノード０１１１やゲートウェイサーバ０１１３等との通信、ノードの再同期稼働化のためのデータ一致化、ユーザプログラムの処理タイミングの一致化等の処理を行うミドルウェア０３０１とが導入される。 The node 0111 includes a user program 0302 that executes a process for a request from the external system 0103 and returns a response, data 0303 such as a shared memory and a file that the user program reads and writes during the process, and a communication medium 0304 ( (Corresponding to the LAN 0112 in FIG. 1)) and other processes such as communication with other nodes 0111, gateway server 0113, etc., data matching for node resynchronization operation, and matching of user program processing timing. Middleware to perform 0301 is introduced.

ミドルウェア０３０１は、組込みノードの再同期稼働化の実行時に、入力駆動型ユーザプログラム管理テーブル０３２１及び入力管理テーブル０３２３を参照・更新し、入力をトリガーとして処理を実施するユーザプログラムの起動又は実行状態の監視等を行う入力駆動管理部０３１１、周期駆動型ユーザプログラム管理テーブル０３２２を参照・更新し、周期的に処理を実施するユーザプログラムの起動又は実行状態の監視等を行う周期駆動管理部０３１２、データ一致化のために稼働中ノード０２０１のデータの組込みノード０２０２への書き込み等の処理を行うデータ同期処理部０３１３、データ通信部０３１５を介して外部システム０１０３から送信された入力メッセージもしくは稼働中ノード０２０１から転送された入力メッセージを、入力受信バッファ０３３１もしくは転送受信バッファ０３３２に格納し、入力管理テーブル０３２３を格納メッセージに応じて更新する受信データ管理部０３１４、及び、通信媒体０３０４を介して他のノード０１１１やゲートウェイサーバ０１１３等との間の通信を行うデータ通信部０３１５を主な構成要素として含む。 The middleware 0301 refers to / updates the input-driven user program management table 0321 and the input management table 0323 at the time of executing the resynchronization operation of the embedded node, and starts or executes the execution state of the user program that performs the processing using the input as a trigger. An input drive management unit 0311 that performs monitoring and the like, a periodic drive management unit 0312 that refers to and updates a periodic drive type user program management table 0322, and monitors a start or execution state of a user program that performs processing periodically, data An input message transmitted from the external system 0103 or the active node 0201 through the data synchronization processing unit 0313 and the data communication unit 0315 for performing processing such as writing of data of the active node 0201 to the embedded node 0202 for matching. Input message forwarded from Is stored in the input reception buffer 0331 or the transfer reception buffer 0332, the received data management unit 0314 updates the input management table 0323 according to the stored message, and other nodes 0111 and gateway servers 0113 via the communication medium 0304. A data communication unit 0315 that performs communication with the mobile phone is included as a main component.

ここで、組込みノードの再同期稼働化の実行時に、データの書き込みによる稼働中ノード０２０１と組込みノード０２０２とのデータ一致化を行う場合、稼働中ノード０２０１におけるデータ同期処理部０３１３は、書き込み用のデータを稼働中ノード０２０１内のデータ０３０３から抜き出し、データ通信部０３１５を用いて組込みノード０２０２に対して送信する。組込みノード０２０２におけるデータ同期処理部０３１３は、稼働中ノード０２０１から送信される書き込み用データを、データ通信部０３１５を用いて受信し、組込みノード０２０２内のデータ０３０３に対して上書きを許して書き込む。 Here, when performing data matching between the active node 0201 and the embedded node 0202 by writing data when executing the resynchronization operation of the embedded node, the data synchronization processing unit 0313 in the active node 0201 Data is extracted from the data 0303 in the active node 0201 and transmitted to the embedded node 0202 using the data communication unit 0315. The data synchronization processing unit 0313 in the embedded node 0202 receives the write data transmitted from the active node 0201 using the data communication unit 0315 and writes the data 0303 in the embedded node 0202 with overwriting permitted.

入力受信バッファ０３３１には、ゲートウェイサーバ０１１３から送信されデータ通信部０３１５により受信した入力メッセージ０１３２を、ユーザプログラム０３０２が該入力メッセージ０１３２を処理対象として使用する時まで格納しておく。転送受信バッファ０３３２には、組込みノード０２０２の再同期稼働化の実行時に、稼働中ノード０２０１より転送されデータ通信部０３１５により受信した入力メッセージを、ユーザプログラム０３０２が該入力メッセージを処理対象として使用する時まで格納しておく。また入力駆動型ユーザプログラム管理テーブル０３２１、周期駆動型ユーザプログラム管理テーブル０３２２、及び、入力管理テーブル０３２３の概略は図７に示す。 The input reception buffer 0331 stores the input message 0132 transmitted from the gateway server 0113 and received by the data communication unit 0315 until the user program 0302 uses the input message 0132 as a processing target. In the transfer reception buffer 0332, when the resynchronization activation of the embedded node 0202 is executed, the user program 0302 uses the input message transferred from the active node 0201 and received by the data communication unit 0315 as the processing target. Store until time. An outline of the input drive type user program management table 0321, the periodic drive type user program management table 0322, and the input management table 0323 is shown in FIG.

図４は、本発明の一実施形態による、フォールトトレラントコンピュータシステム０１０１を対象として実施する、計算機間の再同期稼働化のための処理タイミング一致化方法の概要を示す図である。 FIG. 4 is a diagram showing an outline of a process timing matching method for resynchronization operation between computers, which is performed for the fault tolerant computer system 0101 according to an embodiment of the present invention.

ここでは、組込みノードの再同期稼働化を行う際に、組込みノード０２０２において稼働するユーザプログラムを、稼働中ノード０２０１にて稼働するユーザプログラムと処理タイミングを一致化させるための処理の流れの概要を示す。 Here, an outline of the processing flow for matching the processing timing of the user program operating on the embedded node 0202 with the user program operating on the operating node 0201 when performing the resynchronization operation of the embedded node will be described. Show.

稼働中ノード０２０１ではユーザプログラム０２１１は、入力メッセージもしくはタイマイベント等を処理開始のトリガー（０４０１、０４０２、０４０３）として、処理（０４１１、０４１２、０４１３）を実施し、処理の過程でデータ０２１２に対して読み込み、書き込みを行う。ここで、ユーザプログラム０２１１に対して処理開始のトリガー（０４０１、０４０２、０４０３）が発生し、処理（０４１１、０４１２、０４１３）が実施される度に、各処理（０４１１、０４１２、０４１３）が開始されるタイミングで、該稼働中ノード０２０１より組込みノード０２０２に対して、処理開始通知（０４２１、０４２２、０４２３）が送信される。 In the active node 0201, the user program 0211 executes the processing (0411, 0412, 0413) using the input message or timer event as a processing start trigger (0401, 0402, 0403), and the data 0212 is processed in the process. Read and write. Here, a processing start trigger (0401, 0402, 0403) is generated for the user program 0211, and each processing (0411, 0412, 0413) is started each time the processing (0411, 0412, 0413) is performed. At this time, processing start notifications (0421, 0422, 0423) are transmitted from the active node 0201 to the embedded node 0202.

組込みノード０２０２では、稼働中ノード０２０１から組込みノード０２０２へのデータコピーによる該ノード間のデータ一致化が完了したことを、稼働中ノード０２０１から組込みノード０２０２に対して送信されるデータ一致化完了通知０４３１を受信することで判定する。ここで稼働中ノード０２０１から送信される処理開始通知を受信し、該通知を参照することで、組込みノード０２０２におけるユーザプログラム０２２１の次の実行タイミングを事前に算出しておく。この方式の詳細は図５、６にて述べる。ただし、次の実行タイミングでユーザプログラム０２２１の処理を実際に開始するのは、該タイミングに達した時点でデータ一致化が完了している場合のみである。図４に示す例では、処理開始通知０４２１を参照して算出した次の実行タイミング０４０４は、データ一致化完了通知０４３１を受信しデータ一致化完了を判定した後なので、次の実行タイミング０４０４にて、ユーザプログラム０２２１の処理０４１４を開始する。 In the embedded node 0202, a data matching completion notification transmitted from the operating node 0201 to the embedded node 0202 indicates that data matching between the nodes by data copying from the operating node 0201 to the embedded node 0202 has been completed. It is determined by receiving 0431. Here, the processing start notification transmitted from the active node 0201 is received, and the next execution timing of the user program 0221 in the embedded node 0202 is calculated in advance by referring to the notification. Details of this method will be described with reference to FIGS. However, the processing of the user program 0221 is actually started at the next execution timing only when data matching is completed when the timing is reached. In the example shown in FIG. 4, the next execution timing 0404 calculated with reference to the process start notification 0421 is after the data matching completion notification 0431 is received and the data matching completion is determined. Then, the process 0414 of the user program 0221 is started.

これにより、組込みノード０２０２にて稼働中ノード０２０１とタイミングを合わせてユーザプログラム０２２１の処理を開始した後も、稼働中ノード０２０１からの処理開始通知０４２３を受信し比較参照することで、稼働中ノード０２０１における処理０４１３の開始タイミング０４０３と組込みノード０２０２における処理０４１５の処理開始タイミング０４０５とのずれを計測し、ずれが大きい場合は該ずれを補正することで、稼働中ノード０２０１におけるユーザプログラム０２１１と組込みノード０２０２におけるユーザプログラム０２２１との処理タイミングが再びずれることを回避する。 As a result, even after the embedded node 0202 starts processing of the user program 0221 in synchronization with the operating node 0201, the processing start notification 0423 from the operating node 0201 is received and compared, and the operating node The deviation between the start timing 0403 of the process 0413 in 0201 and the process start timing 0405 of the process 0415 in the embedded node 0202 is measured. If the deviation is large, the deviation is corrected, and the user program 0211 in the active node 0201 is incorporated. It is avoided that the processing timing with the user program 0221 in the node 0202 is shifted again.

このようにして、組込みノード０２０２の再起動後、稼働中ノード０２０１に合わせることで処理タイミングの一致化を図り（０４４１）、処理タイミングの一致化後は、一度ノード間で一致化した処理タイミングの同期を維持していく（０４４２）。 In this way, after the embedded node 0202 is restarted, the processing timing is matched by matching with the active node 0201 (0441). After matching the processing timing, the processing timing once matched between the nodes is obtained. Synchronization is maintained (0442).

次に、計算機（ノード０１１１）間の再同期稼働化のための処理タイミング一致化方法に関して、特に入力をトリガーとして処理を実施するユーザプログラムを対象とする場合の概要を図５に、周期的に処理を実施するユーザプログラムを対象とする場合の概要を図６に示す。 Next, regarding the processing timing matching method for resynchronization operation between computers (node 0111), an outline in the case of targeting a user program that performs processing with an input as a trigger is shown in FIG. FIG. 6 shows an outline in the case of targeting a user program that performs processing.

図５は、本発明の一実施形態による、フォールトトレラントコンピュータシステム０１０１を対象として実施する、計算機間の再同期稼働化のための処理タイミング一致化方法について、入力をトリガーとして処理を実施するユーザプログラムを対象として実施する場合の概要を示す図である。 FIG. 5 shows a user program for performing processing using an input as a trigger for a processing timing matching method for resynchronization operation between computers, which is performed for a fault tolerant computer system 0101 according to an embodiment of the present invention. It is a figure which shows the outline | summary in the case of implementing for object.

稼働中ノード０２０１では、受信バッファ０５０１（図３の入力受信バッファ０３３１に相当する。）を介して渡された外部システム０１０３からの入力メッセージ０５１１（通番＃１０、＃１１、＃１２、＃１３）（図１の入力メッセージ０１３２に対応する。）をトリガーとして、ユーザプログラムの各々の処理（０５２１、０５２２、０５２３、０５２４）が実施される。また各処理（０５２１、０５２２、０５２３、０５２４）が実行開始されるタイミングで、稼働中ノード０２０１から組込みノード０２０２に対して、処理開始の通知及び該処理のトリガーとなる入力メッセージ０５１１の転送を行う（０５４１、０５４２、０５４３）。 In the active node 0201, the input message 0511 (serial number # 10, # 11, # 12, # 13) from the external system 0103 passed through the reception buffer 0501 (corresponding to the input reception buffer 0331 in FIG. 3). Each process (0521, 0522, 0523, 0524) of the user program is executed with the trigger (corresponding to the input message 0132 in FIG. 1). In addition, at the timing when each process (0521, 0522, 0523, 0524) starts to be executed, the active node 0201 transmits a process start notification and an input message 0511 that triggers the process to the embedded node 0202. (0541, 0542, 0543).

組込みノード０２０２では、稼働中ノード０２０１からの処理開始通知及び入力メッセージの転送を受信する度に（０５５１、０５５２、０５５３）、自ノードの直接受信したメッセージを格納する受信バッファ０５０２（図３の入力受信バッファ０３３１に相当する。）を参照し、稼働中ノード０２０１より転送された入力メッセージと通番が一致する入力メッセージが格納されているか否かを判定する。この時点でデータ一致化が完了していれば（図５ではデータ書き込み処理０５６１によりデータ一致化完了済みになるとデータ一致化完了通知０５６２が通知される。）、通番が一致する入力メッセージが格納されていない場合（図５では入力メッセージ転送０５４１、０５４２）は、稼働中ノード０２０１より転送された入力メッセージを用いてユーザプログラムの処理を行う（処理０５３１、０５３２）。また稼働中ノード０２０１より転送された入力メッセージと通番が一致する入力メッセージが格納されている場合（図５では入力メッセージ転送０５４３）は、受信バッファ０５０２に格納されている、組込みノード０２０２が直接受信した入力メッセージ０５１２を用いてユーザプログラムの処理を行う（処理０５３３）。またこの時組込みノード０２０２より稼働中ノード０２０１に対して転送停止要求０５４４を送信し、稼働中ノード０２０１からの入力メッセージの転送を停止する。転送の停止後は、受信バッファ０５０２に格納されている、組込みノード０２０２が直接受信した入力メッセージを用いて、ユーザプログラムの処理を行う（処理０５３４）。 Each time the embedded node 0202 receives a process start notification and an input message transfer from the active node 0201 (0551, 0552, 0553), a reception buffer 0502 (in FIG. 3) for storing the directly received message of the own node. It is determined whether or not an input message whose serial number matches the input message transferred from the active node 0201 is stored. If data matching is completed at this point (in FIG. 5, when data matching is completed by data write processing 0561, a data matching completion notification 0562 is notified), an input message with a matching serial number is stored. If not (input message transfer 0541 and 0542 in FIG. 5), the user program is processed using the input message transferred from the active node 0201 (process 0531 and 0532). When an input message whose serial number matches the input message transferred from the active node 0201 is stored (input message transfer 0543 in FIG. 5), the embedded node 0202 stored in the reception buffer 0502 directly receives it. The user program is processed using the input message 0512 (process 0533). At this time, the embedded node 0202 transmits a transfer stop request 0544 to the active node 0201, and stops the transfer of the input message from the active node 0201. After the transfer is stopped, the user program is processed using the input message directly received by the embedded node 0202 and stored in the reception buffer 0502 (processing 0534).

なお、この組込みノード０２０２が稼働中ノード０２０１からの処理開始通知及び入力メッセージの転送を受信した時点で、まだデータ一致化が完了していない場合は、ユーザプログラムの処理は行わず、次の稼働中ノード０２０１からの処理開始通知及び入力メッセージの転送を受信するまで待機する。 When the embedded node 0202 receives the processing start notification and the input message transfer from the active node 0201, if the data matching is not yet completed, the user program is not processed and the next operation is not performed. It waits until it receives a processing start notification and input message transfer from the middle node 0201.

図６は、本発明の一実施形態による、フォールトトレラントコンピュータシステム０１０１を対象として実施する、計算機間の再同期稼働化のための処理タイミング一致化方法について、周期的に処理を実施するユーザプログラムを対象として実施する場合の概要を示す図である。 FIG. 6 shows a user program for periodically executing a process timing matching method for resynchronization operation between computers, which is executed for a fault tolerant computer system 0101 according to an embodiment of the present invention. It is a figure which shows the outline | summary in the case of implementing as an object.

稼働中ノード０２０１では、周期的に発生するタイマイベント（０６０１、０６０２、０６０３）に対して、周期毎のユーザプログラムの処理（０６１１、０６１２、０６１３）がそれぞれ実施される。また各処理（０６１１、０６１２、０６１３）が実行開始されるタイミングで、稼働中ノード０２０１から組込みノード０２０２に対して、処理開始の通知（０６３１、０６３２、０６３３）を送信する。 In the active node 0201, user program processing (0611, 0612, 0613) for each cycle is performed on the periodically occurring timer events (0601, 0602, 0603). In addition, at the timing when each process (0611, 0612, 0613) is started to be executed, a process start notification (0631, 0632, 0633) is transmitted from the active node 0201 to the embedded node 0202.

組込みノード０２０２では、稼働中ノード０２０１からの処理開始通知（＃１１）０６３１を受信すると、該通知に含まれる周期時間等の情報を参照して、該通知の受信時点から次周期（＃１２）が開始されるまでの時間を算出する（０６６１）。この間も稼働中ノード０２０１と組込みノード０２０２との間のデータ一致化のための稼働中ノード０２０１から組込みノード０２０２へのデータ書き込みは並行して実施される（０６４１）。 When the embedded node 0202 receives the processing start notification (# 11) 0631 from the active node 0201, the embedded node 0202 refers to information such as the cycle time included in the notification and starts the next cycle (# 12) from the reception time of the notification. The time until the start is calculated (0661). During this time, data writing from the active node 0201 to the embedded node 0202 for data matching between the active node 0201 and the embedded node 0202 is performed in parallel (0641).

そして、算出した次周期（＃１２）開始までの時間が経過した時点までに、稼働中ノード０２０１よりデータ一致化完了通知０６５１を受信済みである場合、ユーザプログラムの処理を開始する（処理（＃１２）０６２２）。稼働中ノード０２０１よりデータ一致化完了通知０６５１をまだ受信していない場合、ユーザプログラムの処理は行わず、次周期（＃１３）が開始されるまでの時間を再度算出し、該算出した時間が経過するまで待機する。 If the data matching completion notification 0651 has been received from the active node 0201 by the time point until the calculated next cycle (# 12) starts, the processing of the user program is started (processing (#) 12) 0622). If the data matching completion notification 0651 has not been received from the active node 0201, the user program is not processed and the time until the next cycle (# 13) is started is calculated again. Wait until it has passed.

組込みノード０２０２にてユーザプログラムの処理再開後は、稼働中ノード０２０１からの処理開始通知（図６では通知（＃１３）０６３３）を受信する度に、組込みノード０２０２における該当する処理（図６では処理（＃１３）０６２３）の開始タイミングとの差異を比較する。タイミングのずれを検出する度に補正の処理を実施する。 After the processing of the user program is resumed by the embedded node 0202, every time a processing start notification (notification (# 13) 0633 in FIG. 6) is received from the active node 0201, the corresponding processing in the embedded node 0202 (in FIG. 6). The difference with the start timing of the process (# 13) 0623) is compared. A correction process is performed every time a timing shift is detected.

図７は、本発明の一実施形態における、フォールトトレラントコンピュータシステム０１０１を構成し、外部からの要求に対する処理等を実行するノード０１１１において管理される管理テーブルの構成を示す図である。図７（Ａ）は、入力をトリガーとして処理を実施するユーザプログラムの実行状態を管理する入力駆動型ユーザプログラム管理テーブル０３２１を示す図である。図７（Ｂ）は、ノードが受信した外部からの入力メッセージを管理する入力管理テーブル０３２３を示す図である。図７（Ｃ）は、周期的に処理を実施するユーザプログラムの実行状態を管理する周期駆動型ユーザプログラム管理テーブル０３２２を示す図である。 FIG. 7 is a diagram illustrating a configuration of a management table that is configured in the fault tolerant computer system 0101 according to an embodiment of the present invention and that is managed in the node 0111 that executes processing for a request from the outside. FIG. 7A is a diagram showing an input-driven user program management table 0321 that manages the execution state of a user program that performs processing using an input as a trigger. FIG. 7B is a diagram showing an input management table 0323 for managing external input messages received by the node. FIG. 7C is a diagram showing a period-driven user program management table 0322 that manages the execution state of user programs that periodically execute processing.

図７（Ａ）に示すように、入力駆動型ユーザプログラム管理テーブル０３２１は、処理対象０７１１、ステータス０７１２、メッセージ受信元０７１３、及び、更新時刻０７１４を主な構成要素として備える。 As shown in FIG. 7A, the input-driven user program management table 0321 includes a processing target 0711, a status 0712, a message reception source 0713, and an update time 0714 as main components.

処理対象０７１１には、入力をトリガーとして処理を実施するユーザプログラムの処理対象となる最新の入力メッセージに関する情報が格納され、該メッセージに付与された通番が格納される通番０７４１と、該メッセージの受信時刻が格納される受信時刻０７４２とを含んで構成される。ステータス０７１２には、処理対象０７１１に格納される内容に対応する最新の入力メッセージに対して実行されるユーザプログラムの実行状態が格納される。ここに格納されるステータス値として、“待機中”、“処理中”、“処理終了”等がある。メッセージ受信元０７１３には、処理対象０７１１に格納される内容に対応する最新の入力メッセージを直接受信した受信元ノードの情報が格納される。ここに格納されるのは、“稼働中ノード”、“組込みノード”のいずれかである。更新時刻０７１４には、本テーブルの各行の最新の更新時刻が格納される。上記の入力駆動型ユーザプログラム管理テーブル０３２１の各項目は、組込みノードにて入力メッセージを受信する度に、また該メッセージに対して処理が実行される度に更新される。 The processing target 0711 stores information related to the latest input message to be processed by the user program that performs processing using an input as a trigger, and a serial number 0741 in which a serial number assigned to the message is stored, and reception of the message And reception time 0742 in which the time is stored. The status 0712 stores the execution state of the user program executed for the latest input message corresponding to the content stored in the processing target 0711. The status values stored here include “waiting”, “processing”, “processing completed”, and the like. The message reception source 0713 stores information on the reception source node that directly received the latest input message corresponding to the content stored in the processing target 0711. What is stored here is either “active node” or “built-in node”. The update time 0714 stores the latest update time of each row of this table. Each item of the input-driven user program management table 0321 is updated every time an input message is received by the embedded node and each time a process is executed on the message.

図７（Ｂ）に示すように、入力管理テーブル０３２３は、ノード０７２１、通番０７２２、及び、受信時刻０７２３を主な構成要素として備える。 As shown in FIG. 7B, the input management table 0323 includes a node 0721, a serial number 0722, and a reception time 0723 as main components.

ノード０７２１には、入力メッセージを直接受信するノードの情報が格納される。ここに格納されるのは、“稼働中ノード”、“組込みノード”の２つである。通番０７２２には、稼働中ノードまたは組込みノードが受信する入力メッセージに付与された通番が格納される。受信時刻０７２３には、稼働中ノードまたは組込みノードが受信した入力メッセージの受信時刻が格納される。上記の入力管理テーブル０３２３の各項目は、組込みノードが稼働中ノードから転送された入力メッセージを受信する度に、また組込みノードにて入力メッセージが直接受信される度に更新される。 Node 0721 stores information of a node that directly receives an input message. Two items, “active node” and “built-in node”, are stored here. The serial number 0722 stores the serial number assigned to the input message received by the active node or the embedded node. The reception time 0723 stores the reception time of the input message received by the active node or the embedded node. Each item of the input management table 0323 is updated every time the embedded node receives the input message transferred from the active node and every time the input message is directly received by the embedded node.

図７（Ｃ）に示すように、周期駆動型ユーザプログラム管理テーブル０３２２は、ユーザプログラム名称０７３１、周期カウンタ０７３２、周期間隔０７３３、待機時間０７３４、及び、ステータス０７３５を主な構成要素として備える。 As shown in FIG. 7C, the periodic drive type user program management table 0322 includes a user program name 0731, a period counter 0732, a period interval 0733, a standby time 0734, and a status 0735 as main components.

ユーザプログラム名称０７３１には、周期的に処理を実施するユーザプログラムの名称が格納される。周期カウンタ０７３２には、ユーザプログラム名称０７３１に格納される名称に対応するユーザプログラムに関してタイマイベント発生の度に加算し付与される連番である周期カウンタが格納される。周期間隔０７３３には、ユーザプログラム名称０７３１に格納される名称に対応するユーザプログラムに関して処理実施のためのタイマイベントが発生する時間間隔が格納される。待機時間０７３４には、ユーザプログラム名称０７３１に格納される名称に対応するユーザプログラムに関して処理終了後、次周期開始までの待機時間が格納される。ステータス０７３５には、ユーザプログラム名称０７３１に格納される名称に対応するユーザプログラムの実行状態が格納される。ここに格納されるステータス値として、“待機中”、“処理中”、“処理終了”等がある。上記の周期駆動型ユーザプログラム管理テーブル０３２２の各項目は、組込みノードにてタイマイベントが発生する度に、また該タイマイベントに対して処理が実行、終了する度に更新される。 The user program name 0731 stores the name of a user program that performs processing periodically. The cycle counter 0732 stores a cycle counter that is a serial number that is added and given each time a timer event occurs with respect to the user program corresponding to the name stored in the user program name 0731. In the cycle interval 0733, a time interval at which a timer event for executing processing is generated for the user program corresponding to the name stored in the user program name 0731 is stored. In the waiting time 0734, a waiting time until the start of the next cycle is stored after the processing related to the user program corresponding to the name stored in the user program name 0731 is stored. The status 0735 stores the execution state of the user program corresponding to the name stored in the user program name 0731. The status values stored here include “waiting”, “processing”, “processing completed”, and the like. Each item of the periodic drive type user program management table 0322 is updated every time a timer event occurs in the embedded node, and every time a process is executed and terminated for the timer event.

図８は、本発明の一実施形態における、フォールトトレラントコンピュータシステム０１０１において、ソフトウェア更新、ハードウェア更改または障害回復、メンテナンス等のために停止させていたノードの再同期稼働化の実施時において、稼働中ノードと組込みノードとの間で処理タイミングの一致化を行う際の、稼働中ノードにおける処理の流れを示すフローチャートである。 FIG. 8 shows the operation of the fault-tolerant computer system 0101 according to the embodiment of the present invention when the resynchronization operation of a node that has been stopped for software update, hardware update or failure recovery, maintenance, etc. is performed. It is a flowchart which shows the flow of a process in an active node at the time of performing matching of a process timing between a middle node and an embedded node.

ここで、図８（Ａ）は、入力をトリガーとして処理を実施するユーザプログラムを対象として、稼働中ノード０２０１と組込みノード０２０２との間で処理タイミングの一致化を行う際の、稼働中ノード０２０１における処理の流れを示すフローチャートである。図８（Ｂ）は、周期的に処理を実施するユーザプログラムを対象として、稼働中ノード０２０１と組込みノード０２０２との間で処理タイミングの一致化を行う際の、稼働中ノード０２０１における処理の流れを示すフローチャートである。 Here, FIG. 8A shows the active node 0201 when the processing timing is matched between the active node 0201 and the embedded node 0202 for the user program that executes the process using the input as a trigger. It is a flowchart which shows the flow of the process in. FIG. 8B shows the flow of processing in the active node 0201 when matching the processing timing between the active node 0201 and the embedded node 0202 for a user program that periodically executes processing. It is a flowchart which shows.

図８（Ａ）に示すように、入力をトリガーとして処理を実施する場合、まず、組込みノード０２０２の再起動を検出する（０８１１）。この検出処理は、例えば、稼働中ノード０２０１が組込みノード０２０２の再起動を直接検出するマスターとなるノードより組込みノード再起動の通知を受ける、もしくは、組込みノード０２０２からの問い合わせを受け付ける、等の処理による。次に、外部システムからの入力メッセージを受信し（０８１２）、組込みノード０２０２に対して該入力メッセージに対する処理開始の通知と該入力メッセージデータの転送を行う（０８１３）。そして、該入力メッセージを自ノード（稼働中ノード０２０１）におけるユーザプログラムに引き渡し（０８１４）、該入力メッセージに対するユーザプログラムによる処理が実行される（０８１５）。その後、組込みノード０２０２からの転送停止要求通知を受信したか否かを判断し（０８１６）、組込みノード０２０２からの転送停止要求通知を受信していない場合（０８１６；ＮＯ）、ステップ０８１２から０８１５までの処理を繰り返す。ステップ０８１６において、組込みノードからの転送停止要求通知を受信した場合（０８１６；ＹＥＳ）、終了する。 As shown in FIG. 8A, when the process is performed using an input as a trigger, first, the restart of the embedded node 0202 is detected (0811). This detection process is, for example, a process in which the active node 0201 receives a notification of restart of the embedded node from the master node that directly detects the restart of the embedded node 0202, or receives an inquiry from the embedded node 0202. by. Next, an input message from an external system is received (0812), and a process start notification for the input message is sent to the embedded node 0202 and the input message data is transferred (0813). Then, the input message is delivered to the user program in the own node (active node 0201) (0814), and the process by the user program for the input message is executed (0815). Thereafter, it is determined whether or not a transfer stop request notification from the embedded node 0202 has been received (0816). If a transfer stop request notification from the embedded node 0202 has not been received (0816; NO), steps 0812 to 0815 are performed. Repeat the process. If a transfer stop request notification from the embedded node is received in step 0816 (0816; YES), the process ends.

また、図８（Ｂ）に示すように、周期的に処理を実施する場合、まず、稼働中ノード０２０１におけるタイマイベントの発生を検出する（０８２１）。ここで、タイマイベントは、指定された周期や時刻をノードに搭載されたＯＳのタイマ等を用いることで監視され、指定された周期に達した時点もしくは指定された時刻にて発生するイベントである。次に、組込みノード０２０２に対して周期処理開始の通知を送信する（０８２２）。そして、本周期における自ノード（稼動中ノード０２０１）におけるユーザプログラムの処理が実行される（０８２３）。以後、ステップ０８２１から０８２３までの処理を繰り返す。 Further, as shown in FIG. 8B, when processing is periodically performed, first, occurrence of a timer event in the active node 0201 is detected (0821). Here, a timer event is an event that is monitored by using a timer or the like of an OS installed in a node for a specified period or time, and occurs at a specified time or at a specified time. . Next, a notification of periodic processing start is transmitted to the embedded node 0202 (0822). Then, the processing of the user program in the own node (active node 0201) in this cycle is executed (0823). Thereafter, the processing from steps 0821 to 0823 is repeated.

図９は、本発明の一実施形態における、フォールトトレラントコンピュータシステム０１０１において、ソフトウェア更新、ハードウェア更改または障害回復、メンテナンス等のために停止させていたノードの再同期稼働化の実施時において、入力をトリガーとして処理を実施するユーザプログラムを対象として、稼働中ノード０２０１と組込みノード０２０２との間で処理タイミングの一致化を行う際の、組込みノード０２０２における処理の流れを示すフローチャートである。 FIG. 9 is a diagram showing an example of input when the resynchronization operation of a node that has been stopped for software update, hardware update or failure recovery, maintenance, etc. in the fault tolerant computer system 0101 according to an embodiment of the present invention. 11 is a flowchart showing the flow of processing in the embedded node 0202 when the processing timing is matched between the active node 0201 and the embedded node 0202 for a user program that executes processing using the above as a trigger.

まず、稼働中ノード０２０１からの処理開始通知及び転送された入力メッセージを受信する（０９０１）。次に、ステップ０９０１にて受信した転送メッセージを参照して、組込みノード０２０２にて管理される入力管理テーブル０３２３の稼働中ノードの項目を更新する（０９０２）。その後、稼働中ノード０２０１からデータ一致化完了通知を受信済みか否かを判断し（０９０３）、データ一致化完了通知を受信済みではない場合（０９０３；ＮＯ）、処理を終了する。一方、ステップ０９０３において、稼働中ノード０２０１からのデータ一致化完了通知を受信済みである場合（０９０３；ＹＥＳ）、組込みノード０２０２にて管理される入力管理テーブル０３２３の稼働中ノードの項目及び自ノード（組込みノード）に関する項目を参照し（０９０４）、その結果、稼働中ノード０２０１からの転送メッセージの通番が、組込みノード０２０２にて直接受信され、受信バッファ０５０２に蓄積されているメッセージのうちの最も古いメッセージの通番よりも小さい場合（０９０５；ＹＥＳ）、ステップ０９０１にて受信した稼働中ノード０２０１からの転送メッセージを自ノード（組込みノード０２０２）のユーザプログラムに引き渡す（０９０６）。そして、ステップ０９０６にて引き渡されたメッセージに対してユーザプログラムが実行される（０９０７）。以降は、ステップ０９０１から０９０７までの処理を繰り返す。一方、ステップ０９０５において、稼働中ノード０２０１からの転送メッセージの通番が、組込みノード０２０２にて直接受信され、受信バッファ０５０２に蓄積されているメッセージのうちのいずれかのメッセージの通番と同じである場合（０９０５；ＮＯ）、組込みノード０２０２にて直接受信され、受信バッファ０５０２に蓄積されているメッセージのうち該当する通番のものを自ノード（組込みノード０２０２）のユーザプログラムに引き渡す（０９０８）。そして、ステップ０９０８にて引き渡されたメッセージに対してユーザプログラムが実行され（０９０９）、稼働中ノード０２０１に対してメッセージ転送の停止要求通知０５４４を送信して（０９１０）、終了する。 First, a process start notification and a transferred input message are received from the active node 0201 (0901). Next, with reference to the transfer message received in step 0901, the item of the active node in the input management table 0323 managed by the embedded node 0202 is updated (0902). Thereafter, it is determined whether or not a data matching completion notification has been received from the active node 0201 (0903). If the data matching completion notification has not been received (0903; NO), the processing is terminated. On the other hand, if the data matching completion notification has been received from the active node 0201 in Step 0903 (0903; YES), the item of the active node in the input management table 0323 managed by the embedded node 0202 and the own node The item related to (embedded node) is referred to (0904). As a result, the serial number of the transfer message from the active node 0201 is directly received by the built-in node 0202 and is the most of the messages stored in the reception buffer 0502. If it is smaller than the serial number of the old message (0905; YES), the transfer message from the active node 0201 received in step 0901 is delivered to the user program of the own node (embedded node 0202) (0906). Then, the user program is executed with respect to the message delivered in step 0906 (0907). Thereafter, the processing from steps 0901 to 0907 is repeated. On the other hand, when the serial number of the transfer message from the active node 0201 is the same as the serial number of one of the messages received directly by the embedded node 0202 and stored in the reception buffer 0502 in step 0905 (0905; NO), the corresponding serial number among the messages directly received by the embedded node 0202 and stored in the reception buffer 0502 is delivered to the user program of the own node (embedded node 0202) (0908). Then, the user program is executed for the message delivered in Step 0908 (0909), a message transfer stop request notification 0544 is transmitted to the active node 0201 (0910), and the process is terminated.

図１０は、本発明の一実施形態における、フォールトトレラントコンピュータシステムにおいて、ソフトウェア更新、ハードウェア更改または障害回復、メンテナンス等のために停止させていたノードの再同期稼働化の実施時において、周期的に処理を実施するユーザプログラムを対象として、稼働中ノード０２０１と組込みノード０２０２との間で処理タイミングの一致化を行う際の、組込みノード０２０２における処理の流れを示すフローチャートである。 FIG. 10 is a schematic diagram of a fault-tolerant computer system according to an embodiment of the present invention when a node that has been stopped for software update, hardware update or failure recovery, maintenance, etc. is re-synchronized. 10 is a flowchart showing the flow of processing in the embedded node 0202 when the processing timing is matched between the active node 0201 and the embedded node 0202 for the user program that executes the process in FIG.

まず、稼働中ノード０２０１からの処理開始通知を受信する（１００１）。次に、ステップ１００１にて受信した処理開始通知に含まれる周期に関する情報を参照して、該通知を受信した時点から次周期開始までの時間を算出する（１００２）。ここで、次周期開始までの時間は、ステップ１００２にて受信した稼働中ノード０２０１からの処理開始通知に含まれる周期に関する情報の１つである周期間隔から、稼働中ノード０２０１と組込みノード０２０２の間の通信時間を減算することで算出する。その後、周期駆動型ユーザプログラム管理テーブル０３２２の該当するユーザプログラムに関する周期カウンタ、周期間隔、待機時間の項目を更新し（１００３）、ステップ１００２の結果を基にして、次周期開始までの残り時間を計測する（１００４）。そして、稼働中ノード０２０１においてデータ更新が発生した場合、該データ更新による差分のデータの組込みノード０２０２への書き込みが行われる（１００５）。このとき、ステップ１００４の結果、次周期開始までの時間が経過していない場合（１００６；ＮＯ）、ステップ１００４及び１００５の処理を繰り返す。また、ステップ１００６において、ステップ１００４の結果、次周期開始までの時間が経過した場合（１００６；ＹＥＳ）、稼働中ノード０２０１からデータ一致化完了通知を受信済みか否かを判断し（１００７）、データ一致化完了通知を受信済みではない場合（１００７；ＮＯ）、ステップ１００１から１００６までの処理を繰り返す。一方、ステップ１００７において、稼働中ノード０２０１からのデータ一致化完了通知を受信済みである場合（１００７；ＹＥＳ）、自ノード（組込みノード０２０２）におけるユーザプログラムの処理を開始させる（１００８）。そして、周期駆動型ユーザプログラム管理テーブル０３２２の、ステップ１００８にて処理開始したユーザプログラムに関する周期カウンタ、周期間隔、待機時間、ステータスの項目を更新する（１００９）。それから、稼働中ノード０２０１からの処理開始通知を受信する（１０１０）。該通知は、ステップ１００１にて受信した処理開始通知の次に稼働中ノード０２０１から送信された通知であり、ステップ１００８にて開始したユーザプログラムの周期処理に該当すべきものである。その後、ステップ１００８にて開始したユーザプログラムの周期処理の周期カウンタの値と、ステップ１０１０にて受信した稼働中ノード０２０１からの処理開始通知に含まれる周期カウンタの値とを比較して（１０１１）、これらが合致する場合（１０１１；ＹＥＳ）、処理を終了する。一方、ステップ１０１１において、ステップ１００８にて開始したユーザプログラムの周期処理の周期カウンタの値と、ステップ１０１０にて受信した稼働中ノード０２０１からの処理開始通知に含まれる周期カウンタの値とが合致しない場合（１０１１；ＮＯ）、ステップ１００８にて処理開始したユーザプログラムを強制終了し、組込みノード０２０２内のデータ等を、ステップ１００８にてユーザプログラムが処理開始する以前の状態に戻し（１０１２）、その後、ステップ１００２から１０１１までの処理を繰り返す。 First, a process start notification is received from the active node 0201 (1001). Next, with reference to the information about the period included in the process start notification received in step 1001, the time from the reception of the notification to the start of the next period is calculated (1002). Here, the time until the start of the next cycle is determined based on the interval between the active node 0201 and the embedded node 0202 based on one of the cycle intervals included in the processing start notification from the active node 0201 received in Step 1002. It is calculated by subtracting the communication time between. Thereafter, the items of the cycle counter, cycle interval, and standby time for the corresponding user program in the cycle drive type user program management table 0322 are updated (1003), and the remaining time until the start of the next cycle is calculated based on the result of step 1002. Measure (1004). When data update occurs in the active node 0201, the difference data resulting from the data update is written to the embedded node 0202 (1005). At this time, if the result of step 1004 indicates that the time until the next cycle has not elapsed (1006; NO), the processing of steps 1004 and 1005 is repeated. In step 1006, if the time until the start of the next cycle has passed as a result of step 1004 (1006; YES), it is determined whether or not a data matching completion notification has been received from the active node 0201 (1007). If the data matching completion notification has not been received (1007; NO), the processing from steps 1001 to 1006 is repeated. On the other hand, if the data matching completion notification has been received from the active node 0201 in step 1007 (1007; YES), the user program processing in the own node (embedded node 0202) is started (1008). Then, the items of the cycle counter, cycle interval, standby time, and status relating to the user program started in step 1008 in the cycle-driven user program management table 0322 are updated (1009). Then, a process start notification is received from the active node 0201 (1010). This notification is a notification transmitted from the active node 0201 next to the processing start notification received in step 1001 and should correspond to the periodic processing of the user program started in step 1008. Thereafter, the value of the periodic counter of the periodic processing of the user program started in step 1008 is compared with the value of the periodic counter included in the processing start notification received from the active node 0201 received in step 1010 (1011). If they match (1011; YES), the process is terminated. On the other hand, in step 1011, the value of the period counter of the periodic process of the user program started in step 1008 does not match the value of the period counter included in the process start notification received from the active node 0201 received in step 1010. In the case (1011; NO), the user program started in step 1008 is forcibly terminated, the data in the embedded node 0202 is returned to the state before the user program started processing in step 1008 (1012), and thereafter , Steps 1002 to 1011 are repeated.

図１１は、本発明の一実施形態における、フォールトトレラントコンピュータシステム０１０１において、ソフトウェア更新、ハードウェア更改または障害回復、メンテナンス等のために停止させていたノードの再同期稼働化の実施後、周期的に処理を実施するユーザプログラムを対象として、稼働中ノード０２０１と組込みノード０２０２との間で一致化させた処理タイミングを維持するための、組込みノード０２０２における処理の流れを示すフローチャートである。 FIG. 11 is a block diagram of a fault-tolerant computer system 0101 according to an embodiment of the present invention, after performing resynchronization operation of a node that has been stopped for software update, hardware update or failure recovery, maintenance, and the like. 10 is a flowchart showing the flow of processing in the embedded node 0202 for maintaining the processing timing matched between the active node 0201 and the embedded node 0202 for the user program that executes the process in FIG.

まず、稼働中ノード０２０１からの処理開始通知を受信する（１１０１）。次に、ステップ１１０１にて受信した通知から抽出した周期カウンタを参照する（１１０２）。そして、ステップ１１０２にて参照した周期カウンタに該当する周期の処理が組込みノード０２０２で開始済みでないか否かを判断し（１１０３）、開始済みである場合（１１０３；ＮＯ）、ステップ１１０１からの処理に戻る。一方、ステップ１１０３において、ステップ１１０２にて参照した周期カウンタに該当する周期の処理が組込みノード０２０２では未開始である場合（１１０３；ＹＥＳ）、この周期カウンタに該当する周期の処理が開始されるまで待機する（１１０４）。周期カウンタに該当する周期の処理が開始されると（１１０４；ＹＥＳ）、該周期の開始までの遅延時間を算出する（１１０５）。ここで、遅延時間は、ステップ１１０１にて稼働中ノード０２０１からの処理開始通知を受信した時点から、ステップ１１０４にて該当周期の処理が開始された時点までの差分時間と、稼働中ノード０２０１と組込みノード０２０２の間の通信時間との和として算出する。そして、該周期の処理の終了後、ステップ１１０５にて算出した遅延時間に基づいて、該周期の処理終了時点から次周期開始までの待機時間の補正を行う（１１０６）。ここで待機時間の補正は、元々の待機時間（当該ユーザプログラムの周期間隔から該周期における実際の処理時間を減算することで算出）からステップ１１０５で算出した遅延時間を減算することに行う。 First, a process start notification is received from the active node 0201 (1101). Next, the period counter extracted from the notification received in step 1101 is referred to (1102). Then, it is determined whether or not the processing of the cycle corresponding to the cycle counter referred to in step 1102 has been started in the embedded node 0202 (1103). If the processing has been started (1103; NO), the processing from step 1101 is performed. Return to. On the other hand, in step 1103, when the processing of the cycle corresponding to the cycle counter referred to in step 1102 has not started in the embedded node 0202 (1103; YES), the processing of the cycle corresponding to this cycle counter is started. Wait (1104). When the process corresponding to the period counter is started (1104; YES), the delay time until the start of the period is calculated (1105). Here, the delay time is the difference time from the time when the processing start notification is received from the active node 0201 in step 1101 to the time when the processing of the corresponding cycle is started in step 1104, and the active node 0201 and Calculated as the sum of the communication time between the embedded nodes 0202. Then, after completion of the processing of the cycle, the waiting time from the end of processing of the cycle to the start of the next cycle is corrected based on the delay time calculated in step 1105 (1106). Here, the correction of the waiting time is performed by subtracting the delay time calculated in step 1105 from the original waiting time (calculated by subtracting the actual processing time in the cycle from the cycle interval of the user program).

図１２は、本発明の一実施形態における、フォールトトレラントコンピュータシステムを構成するノードのうち、稼働中ノード０２０１と組込みノードの間で、組込みノードの再同期稼働化の実施時において、処理タイミングの一致化を行うために送受信されるメッセージの形式を示す図である。 FIG. 12 shows that the processing timing coincides between the active node 0201 and the embedded node among the nodes constituting the fault-tolerant computer system in the embodiment of the present invention when the embedded node is resynchronized. It is a figure which shows the format of the message transmitted / received in order to carry out.

図１２（Ａ）は、稼働中ノード０２０１から組込みノード０２０２へと送信されるもので、入力をトリガーとして処理を実施するユーザプログラムに関して、処理開始の通知及びユーザプログラムの処理開始のトリガーとなる入力メッセージデータを含めるメッセージの形式１２０１を示す。 FIG. 12 (A) is transmitted from the active node 0201 to the embedded node 0202. Regarding the user program that performs processing using the input as a trigger, the processing start notification and the input that triggers the user program processing start. A message format 1201 including message data is shown.

このメッセージ形式１２０１は、ヘッダ情報１２１１、識別情報１２１２、通番１２１３、受信時刻１２１４、ステータス１２１５、及び、入力データ１２１６を主な構成要素として備える。 This message format 1201 includes header information 1211, identification information 1212, serial number 1213, reception time 1214, status 1215, and input data 1216 as main components.

ヘッダ情報１２１１には、メッセージプロトコル等に関する情報が格納される。識別情報１２１２には、このメッセージが稼働中ノード０２０１からの、入力をトリガーとして処理を実施するユーザプログラムに関する処理開始通知であることを示す識別情報が格納される。通番１２１３には、稼働中ノード０２０１が直接受信し、組込みノード０２０２に対して転送する入力メッセージに付与されている通番が格納される。受信時刻１２１４には、稼働中ノード０２０１が該入力メッセージを直接受信した時刻が格納される。ステータス１２１５には、稼働中ノード０２０１における該入力メッセージに対する処理の実行状態が格納される。本通知は稼働中ノード０２０１にてユーザプログラムによる処理を開始する時点で送信されることから、該ステータス１０１５には通常は“実行中”を示す値が格納される。入力データ１２１６には、稼働中ノード０２０１が直接受信し、組込みノード０２０２に対して転送する入力メッセージのデータ本体が格納される。 The header information 1211 stores information related to the message protocol and the like. The identification information 1212 stores identification information indicating that this message is a processing start notification related to a user program that performs processing using an input as a trigger from the active node 0201. The serial number 1213 stores the serial number assigned to the input message that is directly received by the active node 0201 and transferred to the embedded node 0202. The reception time 1214 stores the time when the active node 0201 directly receives the input message. The status 1215 stores the execution state of the processing for the input message in the active node 0201. Since this notification is transmitted when processing by the user program is started in the active node 0201, the status 1015 normally stores a value indicating “in execution”. The input data 1216 stores the data body of the input message that is directly received by the active node 0201 and transferred to the embedded node 0202.

図１２（Ｂ）は、稼働中ノード０２０１から組込みノード０２０２へと送信されるもので、周期的に処理を実施するユーザプログラムに関して、処理開始の通知のメッセージの形式１２０２を示す。 FIG. 12B shows a message format 1202 of a process start notification regarding a user program which is transmitted from the active node 0201 to the embedded node 0202 and periodically executes the process.

このメッセージ形式１２０２は、ヘッダ情報１２２１、識別情報１２２２、ＵＰ名称１２２３、周期カウンタ１２２４、及び、周期間隔１２２５を主な構成要素として備える。 This message format 1202 includes header information 1221, identification information 1222, UP name 1223, period counter 1224, and period interval 1225 as main components.

ヘッダ情報１２２１には、メッセージプロトコル等に関する情報が格納される。識別情報１２２２には、このメッセージが稼働中ノード０２０１からの、周期的に処理を実施するユーザプログラムに関する処理開始通知であることを示す識別情報が格納される。ＵＰ名称１２２３には、該当する周期的に処理を実施するユーザプログラムの名称が格納される。周期カウンタ１２２４には、ＵＰ名称１２２３に格納される名称に該当するユーザプログラムのタイマイベント発生の度に加算し付与される連番である周期カウンタが格納される。周期間隔１２２５には、ＵＰ名称１２２３に格納される名称に該当するユーザプログラムの処理実施のためのタイマイベントが発生する時間間隔が格納される。 The header information 1221 stores information related to the message protocol and the like. The identification information 1222 stores identification information indicating that this message is a processing start notification from the active node 0201 regarding a user program that periodically performs processing. The UP name 1223 stores the name of the user program that executes the processing periodically. The period counter 1224 stores a period counter that is a serial number that is added every time a timer event of the user program corresponding to the name stored in the UP name 1223 occurs. The period interval 1225 stores a time interval at which a timer event for executing the processing of the user program corresponding to the name stored in the UP name 1223 occurs.

図１２（Ｃ）は、稼働中ノード０２０１から組込みノード０２０２へと送信される、データ一致化完了通知、もしくは組込みノード０２０２から稼働中ノード０２０１へと送信される、転送停止要求通知のメッセージの形式１２０３を示す。 FIG. 12C shows a message format of a data matching completion notification transmitted from the active node 0201 to the embedded node 0202 or a transfer stop request notification message transmitted from the embedded node 0202 to the active node 0201. 1203 is shown.

このメッセージ形式１２０３は、ヘッダ情報１２３１及び識別情報１２３２を主な構成要素として備える。 This message format 1203 includes header information 1231 and identification information 1232 as main components.

ヘッダ情報１２３１には、メッセージプロトコル等に関する情報が格納される。識別情報１２３２には、このメッセージが稼働中ノード０２０１からのデータ一致化完了通知もしくは転送停止要求通知であることを示す識別情報が格納される。 The header information 1231 stores information related to the message protocol and the like. In the identification information 1232, identification information indicating that this message is a data matching completion notification or a transfer stop request notification from the active node 0201 is stored.

以上、本発明の実施の形態について、その実施の形態に基づき具体的に説明したが、これに限定されるものではなく、その要旨を逸脱しない範囲で種々変更可能である。このため、上記実施形態はあらゆる点で単なる例示にすぎず、限定的に解釈されるものではない。例えば、上述の各処理ステップは処理内容に矛盾を生じない範囲で任意に順番を変更して又は並列に実行することが可能である。 As mentioned above, although embodiment of this invention was described concretely based on the embodiment, it is not limited to this and can be variously changed in the range which does not deviate from the summary. For this reason, the said embodiment is only a mere illustration in all points, and is not interpreted limitedly. For example, the above-mentioned processing steps can be executed in any order or in parallel as long as the processing contents do not contradict each other.

本発明の一実施形態に係るフォールトトレラントコンピュータシステムの概要を示す図である。1 is a diagram showing an overview of a fault tolerant computer system according to an embodiment of the present invention. FIG. 本発明の一実施形態における、フォールトトレラントコンピュータシステムを構成し、外部からの要求に対する処理等を実行するノードの再同期稼働化の概要を示す図、及び、関連する処理の流れを示すシーケンス図である。FIG. 2 is a diagram illustrating an outline of resynchronization operation of a node that configures a fault-tolerant computer system and executes processing for a request from the outside, and a sequence diagram illustrating a flow of related processing in an embodiment of the present invention. is there. 本発明の一実施形態における、フォールトトレラントコンピュータシステムを構成し、外部からの要求に対する処理等を実行するノードのモジュール構成を示す図である。It is a figure which shows the module structure of the node which comprises the fault tolerant computer system in one Embodiment of this invention, and performs the process with respect to the request | requirement from the outside. 本発明の一実施形態による、フォールトトレラントコンピュータシステムを対象として実施する、計算機間の再同期稼働化のための処理タイミング一致化方法の概要を示す図である。It is a figure which shows the outline | summary of the processing timing matching method for the resynchronization operation | movement between computers implemented with respect to the fault tolerant computer system by one Embodiment of this invention. 本発明の一実施形態による、フォールトトレラントコンピュータシステムを対象として実施する、計算機間の再同期稼働化のための処理タイミング一致化方法について、入力をトリガーとして処理を実施するユーザプログラムを対象として実施する場合の概要を示す図である。A processing timing matching method for resynchronization operation between computers, which is performed for a fault tolerant computer system according to an embodiment of the present invention, is performed for a user program that performs processing using an input as a trigger. It is a figure which shows the outline | summary in the case. 本発明の一実施形態による、フォールトトレラントコンピュータシステムを対象として実施する、計算機間の再同期稼働化のための処理タイミング一致化方法について、周期的に処理を実施するユーザプログラムを対象として実施する場合の概要を示す図である。In the case where the processing timing matching method for resynchronization operation between computers, which is executed for a fault-tolerant computer system according to an embodiment of the present invention, is executed for a user program that performs processing periodically It is a figure which shows the outline | summary. 本発明の一実施形態における、フォールトトレラントコンピュータシステム０１０１を構成し、外部からの要求に対する処理等を実行するノード０１１１において管理される管理テーブルの構成を示す図である。It is a figure which shows the structure of the management table which comprises the fault tolerant computer system 0101 in one Embodiment of this invention, and is managed in the node 0111 which performs the process with respect to the request | requirement from the outside. 本発明の一実施形態における、フォールトトレラントコンピュータシステムにおいて、ソフトウェア更新、ハードウェア更改または障害回復、メンテナンス等のために停止させていたノードの再同期稼働化の実施時において、稼働中ノードと組込みノードとの間で処理タイミングの一致化を行う際の、稼働中ノードにおける処理の流れを示すフローチャートである。In a fault-tolerant computer system according to an embodiment of the present invention, when executing resynchronization operation of a node that has been stopped for software update, hardware update or failure recovery, maintenance, etc., an active node and an embedded node 5 is a flowchart showing a flow of processing in an active node when matching processing timings between the nodes and. 本発明の一実施形態における、フォールトトレラントコンピュータシステムにおいて、ソフトウェア更新、ハードウェア更改または障害回復、メンテナンス等のために停止させていたノードの再同期稼働化の実施時において、入力をトリガーとして処理を実施するユーザプログラムを対象として、稼働中ノードと組込みノードとの間で処理タイミングの一致化を行う際の、組込みノードにおける処理の流れを示すフローチャートである。In a fault-tolerant computer system according to an embodiment of the present invention, processing is triggered by an input when performing resynchronization operation of a node that has been stopped for software update, hardware update or failure recovery, maintenance, etc. It is a flowchart which shows the flow of a process in an embedded node at the time of performing matching of a process timing between an active node and an embedded node for the user program to implement. 本発明の一実施形態における、フォールトトレラントコンピュータシステムにおいて、ソフトウェア更新、ハードウェア更改または障害回復、メンテナンス等のために停止させていたノードの再同期稼働化の実施時において、周期的に処理を実施するユーザプログラムを対象として、稼働中ノードと組込みノードとの間で処理タイミングの一致化を行う際の、組込みノードにおける処理の流れを示すフローチャートである。In a fault-tolerant computer system according to an embodiment of the present invention, processing is periodically performed at the time of resynchronization operation of a node that has been stopped for software update, hardware update or failure recovery, maintenance, etc. It is a flowchart which shows the flow of a process in a built-in node at the time of performing matching of a process timing between an active node and a built-in node for the user program to do. 本発明の一実施形態における、フォールトトレラントコンピュータシステムにおいて、ソフトウェア更新、ハードウェア更改または障害回復、メンテナンス等のために停止させていたノードの再同期稼働化の実施後、周期的に処理を実施するユーザプログラムを対象として、稼働中ノードと組込みノードとの間で一致化させた処理タイミングを維持するための、組込みノードにおける処理の流れを示すフローチャートである。In a fault-tolerant computer system according to an embodiment of the present invention, processing is periodically performed after re-synchronization operation of a node that has been stopped for software update, hardware update or failure recovery, maintenance, etc. It is a flowchart which shows the flow of the process in an embedded node for maintaining the process timing matched between the active node and the embedded node for the user program. 本発明の一実施形態における、フォールトトレラントコンピュータシステムを構成するノードのうち、稼働中ノードと組込みノードの間で、組込みノードの再同期稼働化の実施時において、処理タイミングの一致化を行うために送受信されるメッセージの形式を示す図である。In the embodiment of the present invention, among the nodes constituting the fault tolerant computer system, in order to make the processing timings coincide between the active node and the embedded node when performing the resynchronization operation of the embedded node. It is a figure which shows the format of the message transmitted / received.

Explanation of symbols

０１０１フォールトトレラントコンピュータシステム
０１０２広域ネットワーク
０１０３外部システム
０１１１ノード
（０２０１稼働中ノード）
（０２０２組込みノード）
０１１２，０３０４ネットワーク（ＬＡＮ，通信媒体）
０１１３ゲートウェイサーバ
０１２１処理装置
０１２２記憶装置
０１２３通信装置
０２１１，０２２１，０３０２ユーザプログラム
０２１２，０２２２，０３０３データ
０３０１ミドルウェア
０３１１入力駆動管理部
０３１２周期駆動管理部
０３１３データ同期処理部
０３１４受信データ管理部
０３１５データ通信部
０３２１入力駆動型ユーザプログラム管理テーブル
０３２２周期駆動型ユーザプログラム管理テーブル
０３２３入力管理テーブル
０３３１入力受信バッファ
０３３２転送受信バッファ
０４２１，０４２２，０４２３処理開始通知
（０５４１，０５４２，０５４３入力メッセージの転送）
（０６３１，０６３２，０６３３周期的処理の処理開始通知）
０４３１，０５６２，０６５１データ一致化完了通知
０５４４転送停止要求通知 0101 Fault tolerant computer system 0102 Wide area network 0103 External system 0111 node (0201 working node)
(0202 embedded node)
0112, 0304 Network (LAN, communication medium)
0131 Gateway server 0121 Processing device 0122 Storage device 0123 Communication device 0211, 0221, 0302 User program 0212, 0222, 0303 Data 0301 Middleware 0311 Input drive management unit 0312 Periodic drive management unit 0313 Data synchronization processing unit 0314 Received data management unit 0315 Data communication Unit 0321 Input-driven user program management table 0322 Periodic-driven user program management table 0323 Input management table 0331 Input reception buffer 0332 Transfer reception buffer 0421, 0422, 0423 Processing start notification (0541, 0542, 0543 Transfer of input message)
(0631, 0632, 0633 Periodic processing start notification)
0431, 0562, 0651 Data matching completion notification 0544 Transfer stop request notification

Claims

A fault tolerant computer system comprising a plurality of nodes connected to each other via a network, wherein the same processing is independently executed in parallel in each node of the plurality of nodes,
Each of the nodes
An active node in the system (hereinafter referred to as “active node”) is in a state where processing is continued without stopping, and a node restarted from a stopped state (hereinafter referred to as “built-in node”) is defined as an active node. A drive management unit that performs resynchronization operation processing that is re-integrated into the system in accordance with the processing timing between,
A data synchronization processing unit that performs data matching processing for matching the state of the node including the data content between the active node and the embedded node;
With
The drive management unit
If the node is an active node, it will notify the embedded node of the process start every time it starts executing the user program process.
When the node is an embedded node, the data synchronization processing unit has received the embedded node from the active node when the data matching process between the active node and the embedded node has been completed. Refer to the processing start notification and start the processing of the user program.
A fault tolerant computer system characterized by that.

The drive management unit
When the node is a built-in node, the completion of data matching between the active node and the embedded node is determined by receiving a data matching completion notification transmitted from the active node.
The fault tolerant computer system according to claim 1.

The drive management unit includes an input drive management unit that performs the resynchronization activation process for a user program that performs processing with an input as a trigger,
The input drive manager is
When a node is an active node, each time it receives input at the node and starts processing of the user program, it notifies the embedded node of the start of processing and transfers the input,
When the node is an embedded node, the user program is processed using the input transferred from the operating node or the input directly received by the embedded node.
The fault tolerant computer system according to claim 1 or 2.

The input drive manager is
When the node is a built-in node, when the built-in node receives a process start notification from the active node and the transferred input or when the input corresponding to the input transferred from the active node is directly received, If the data matching process between the active node and the embedded node by the data synchronization processing unit has not been completed, the process of the user program is not performed and it waits until the next process start timing.
The fault-tolerant computer system according to claim 3.

The input drive manager is
When the node is a built-in node, it is determined whether or not the input corresponding to the input received from the active node is directly received by the built-in node. The user program is processed using the input transferred from the middle node.
5. The fault tolerant computer system according to claim 3 or 4,

When the node is a built-in node, it is determined whether or not the input corresponding to the input received from the active node is directly received by the built-in node. While requesting the middle node to stop transferring input, the user program is processed using the input directly received by the embedded node.
The fault tolerant computer system according to any one of claims 3 to 5.

The fault tolerant computer system is:
A gateway connected to the network and connected to an external system via a network different from the network;
The gateway is
The input received from the external system is transferred to the plurality of nodes, the processing result of the plurality of nodes executed in parallel at substantially the same time is received and compared, and the result of the comparison operation is obtained. Return the output as a response to the external system;
7. The fault tolerant computer system according to claim 1, wherein

The drive management unit includes a periodic drive management unit that performs the resynchronization activation process for a user program that periodically performs processing,
The periodic drive management unit includes:
When the node is an active node, every time a timer event occurs and user program processing is started, a processing start notification is sent to the embedded node,
If the node is a built-in node, refer to the process start notification received from the active node and calculate the process start timing in the next cycle from the current time of the user program at the built-in node. When the process start timing is reached and the data matching process between the active node and the embedded node is completed, the user program process starts on the embedded node.
The fault tolerant computer system according to claim 1, wherein:

The periodic drive management unit includes:
If the node is a built-in node, measure the difference in processing start timing between the active node and the built-in node after starting the user program process in the resynchronization activation process, and if the difference is greater than a predetermined value By maintaining the processing start timing by correcting the processing timing in the embedded node,
The fault tolerant computer system according to claim 8.

The periodic drive management unit includes:
When the node is an embedded node, when the processing start timing of the next cycle calculated by referring to the process start notification received from the operating node is reached, the data synchronization processing unit determines whether the node is in operation If the data matching process is not completed, the process of the user program is not performed and the process waits until the process start timing of the next cycle.
10. The fault tolerant computer system according to claim 8 or 9,

The periodic drive management unit includes:
When the node is an embedded node, the process start notification received from the active node is referred to, and the period information included in the notification and the transmission time of the notification are used to perform processing in the next period of the user program in the embedded node. Calculate the start timing,
11. The fault tolerant computer system according to claim 8, wherein

In a fault tolerant computer system comprising a plurality of nodes connected to each other via a network and executing the same processing independently in parallel in each node of the plurality of nodes, an active node (hereinafter referred to as “active node”) in the system )) Without stopping and continuing the process, re-install the node restarted from the stopped state (hereinafter referred to as “embedded node”) into the system at the same processing timing as the operating node. It is a method for performing the resynchronization operation processing,
A step of notifying the embedded node of the start of processing each time the operating node starts executing the processing of the user program;
When the embedded node has completed the data matching process that matches the status of the node that includes the data contents between the active node and the embedded node, refer to the process start notification received from the active node. The step of starting the processing of the user program,
A resynchronization operation processing method for a fault tolerant computer system comprising:

A program for causing each node of the fault-tolerant computer system to execute the resynchronization activation processing method for a fault-tolerant computer system according to claim 12.

A fault tolerant computer system comprising a plurality of nodes connected to each other via a network and executing the same processing in each node of the plurality of nodes,
Each of the nodes
A drive management unit that incorporates a node to be incorporated into the system (hereinafter referred to as an “embedded node”) into the system at the same processing timing as a node operating in the system (hereinafter referred to as an “active node”); ,
A data synchronization processing unit that associates the state of the node including the data content between the active node and the embedded node;
With
The drive management unit
When the own node is an active node, every time the process of the own node is started, the process start is notified to the embedded node.
When the local node is an embedded node, the embedded node is in operation when the data synchronization processing unit has completed the process of matching the state of the node between the active node and the embedded node. Refers to the process start notification received from the node and starts the process of its own node.
A fault tolerant computer system characterized by that.