JP2000215074A

JP2000215074A - System operation method and failure automatic recovery method

Info

Publication number: JP2000215074A
Application number: JP11016874A
Authority: JP
Inventors: Kyosuke Nakao; 恭介中尾; Kazuhiko Mejiro; 和彦目代; Yuji Goto; 祐治後藤; Katsuyuki Fujiyoshi; 勝幸藤吉; Yukio Kono; 幸雄光野; Daisuke Namoto; 大輔名本
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1999-01-26
Filing date: 1999-01-26
Publication date: 2000-08-04

Abstract

(57)【要約】【課題】通信システムや情報処理システムにおけるシ
ステムの運用方式及び障害自動復旧方式に関し、障害検
出を確実に行なうことができる上に障害からの自動復旧
が容易で、設置にかかわる経済的負担が小さく且つ設置
スペースを縮減することができるシステムの運用方式及
び障害復旧方式を提供する。【解決手段】システムに備えられている全てのプロセ
スの動作を制御するプロセス制御部と、システム動作の
ための運用プログラムとシステム動作を管理する運用管
理プログラムの全てを備えるオリジナル・プロセスと、
該オリジナル・プロセスの運用管理プログラムのうち必
要最小限の運用管理プログラムを備えるクローン・プロ
セスとを有し、該オリジナル・プロセスと該クローン・
プロセスの間で定期通信を行なわせる。 (57) [Summary] [Problem] Regarding a system operation method and a failure automatic recovery method in a communication system or an information processing system, failure detection can be reliably performed, automatic recovery from a failure is easy, and installation is involved. Provided are a system operation method and a failure recovery method that can reduce the economic burden and reduce the installation space. A process control unit that controls the operation of all processes provided in a system, an original process including all of an operation program for system operation and an operation management program for managing system operation,
A clone process having a minimum necessary operation management program among the operation management programs of the original process, wherein the original process and the clone
Make regular communication between processes.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、通信システムや情
報処理システムにおけるシステムの運用方式及び障害自
動復旧方式に係り、特に、障害検出を確実に行なうこと
ができる上に障害からの自動復旧が容易で、設置にかか
わる経済的負担が小さく且つ設置スペースを縮減するこ
とができるシステムの運用方式及び障害復旧方式に関す
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a system operation system and an automatic recovery system for a communication system or an information processing system, and more particularly to a system capable of reliably detecting a failure and facilitating automatic recovery from a failure. Thus, the present invention relates to a system operation method and a failure recovery method that can reduce the economical burden of installation and reduce the installation space.

【０００２】通信システムや情報処理システムにおいて
は、今や、オンライン・データを取り扱うことがないも
のはないと言ってよい状況になっている。そういうオン
ライン・データを取り扱うシステムにおいて最も重要な
のはシステム・ダウンがないこと、即ち、高信頼度を確
保することであるが、障害検出の確実性や自動復旧の容
易性及び高信頼度を確保するための経済的負担や設置ス
ペースの軽減に裏打ちされて初めて実用的なシステムに
なると言える。[0002] In communication systems and information processing systems, it can be said that there is no one that does not handle online data. The most important thing in a system that handles such online data is that there is no system down, that is, to ensure high reliability, but to ensure the reliability of failure detection, the ease of automatic recovery, and the high reliability. It can only be said that a practical system can be achieved if it is backed by the economic burden and reduced installation space.

【０００３】システム・ダウンをなくす手段は種々の面
から研究・開発されているが、未だ十分ではなく、障害
検出を確実に行なうことができる上に障害からの自動復
旧が容易で、設置にかかわる経済的負担が小さく且つ設
置スペースを縮減することができるシステムの運用方式
及び障害自動復旧方式の開発が待たれている。The means for eliminating the system down has been researched and developed from various aspects. However, it is still not sufficient, and it is possible to reliably detect the failure, and it is easy to automatically recover from the failure. Development of a system operation system and an automatic failure recovery system that can reduce the economic burden and reduce the installation space is awaited.

【０００４】[0004]

【従来の技術】図１６は、従来の二重化システムの運用
方式の構成で、例えば、公開特許公報平成５−１２２１
０４「通信システムの二重化系切替方式」や、公開特許
公報平成８−３１６９５７「二重化ネットワーク管理シ
ステム」に記載されているものである。2. Description of the Related Art FIG. 16 shows a configuration of a conventional operation system of a duplex system.
04, "Redundant system switching system of communication system" and JP-A-8-316957, "Redundant network management system".

【０００５】図１６において、５１は第一の通信装置の
ハードウェア、５２は該第一の通信装置のハードウェア
５１のアプリケーション・プログラム（図１６では、単
にアプリケーションとだけ記載している。）、５３は第
二の通信装置のハードウェア、５４は該第二の通信装置
のハードウェア５３のアプリケーション・プログラム、
５５は監視切替装置である。In FIG. 16, reference numeral 51 denotes hardware of the first communication apparatus, 52 denotes an application program of the hardware 51 of the first communication apparatus (in FIG. 16, only an application is described). 53 is hardware of the second communication device, 54 is an application program of the hardware 53 of the second communication device,
Reference numeral 55 denotes a monitoring switching device.

【０００６】図１６の構成において、例えば、該第一の
通信装置のハードウェア５１が現用系（運用系、オペレ
ーション系とかアクト系と呼ばれることもある。）の装
置として使用されている時には、該第二の通信装置のハ
ードウェア５３は予備系（待機系とかスタンバイ系と呼
ばれることもある。）の装置として、これが処理してい
る情報は使用されておらず、該第一の通信装置のハード
ウェア５１が障害になった時に初めて処理している情報
が使用されるようになる。In the configuration shown in FIG. 16, for example, when the hardware 51 of the first communication apparatus is used as an active system (also referred to as an operation system, an operation system, or an act system), the first communication device is not used. The hardware 53 of the second communication device is a standby system (sometimes called a standby system or a standby system), and the information being processed is not used, and the hardware of the first communication device is not used. The information that is being processed for the first time when the wear 51 has failed is used.

【０００７】若干の補充説明をすると、通常の場合、該
第一の通信装置のハードウェア５１と該第二の通信装置
のハードウェア５３には共に同じ入り側の通信回線が接
続されており、双方共同じ情報を受けて、それぞれ、該
第一の通信装置のアプリケーション・プログラム５２及
び該第二の通信装置のアプリケーション・プログラム５
４と連携して同じ通信処理をしている。To explain a little supplementary explanation, in the normal case, the same communication line on the incoming side is connected to both the hardware 51 of the first communication device and the hardware 53 of the second communication device. Both receive the same information and receive an application program 52 of the first communication device and an application program 5 of the second communication device, respectively.
4 in the same communication process.

【０００８】しかし、該第一の通信装置のハードウェア
５１が現用系で、該第二の通信装置のハードウェア５３
が予備系の場合には、該第一の通信装置のハードウェア
５１の処理出力が出側の通信回線に供給され、該第二の
通信装置のハードウェア５３の処理出力は出側の通信回
線には供給されていない。つまり、物理的には両系は動
作しているが、一方の処理出力だけが使われている訳で
ある。However, the hardware 51 of the first communication device is an active system and the hardware 53 of the second communication device is
Is the standby system, the processing output of the hardware 51 of the first communication device is supplied to the output communication line, and the processing output of the hardware 53 of the second communication device is output to the output communication line. Has not been supplied. In other words, physically both systems are operating, but only one processing output is used.

【０００９】このようにして現用系の通信装置のハード
ウェアとアプリケーション及び予備系の通信装置のハー
ドウェアとアプリケーションが運用されている間、該監
視切替装置５５は該第一の通信装置のハードウェア５
１、該第一の通信装置のアプリケーション・プログラム
５２、該第二の通信装置のハードウェア５３及び該第二
の通信装置のアプリケーション・プログラム５４の挙動
の監視を続けている。尚、図１６の場合、該監視切替装
置５５は該第一の通信装置のアプリケーション・プログ
ラム５２及び該第二の通信装置のアプリケーション・プ
ログラム５４を経由して監視をすることを想定してい
る。While the hardware and the application of the active communication device and the hardware and the application of the standby communication device are operated in this way, the monitoring and switching device 55 is connected to the hardware of the first communication device. 5
1. The monitoring of the behavior of the application program 52 of the first communication device, the hardware 53 of the second communication device, and the application program 54 of the second communication device is continued. In the case of FIG. 16, it is assumed that the monitoring switching device 55 performs monitoring via the application program 52 of the first communication device and the application program 54 of the second communication device.

【００１０】そして、該第一の通信装置のハードウェア
５１、該第一の通信装置のアプリケーション・プログラ
ム５２のいずれかに障害があることを検出すると、該監
視切替装置５５は出側の通信回線を該第一の通信装置の
ハードウェア５１から該第二の通信装置のハードウェア
５３に切り替える。When it is detected that there is a failure in either the hardware 51 of the first communication device or the application program 52 of the first communication device, the monitoring switching device 55 sets the outgoing communication line. Is switched from the hardware 51 of the first communication device to the hardware 53 of the second communication device.

【００１１】ここで、該第二の通信装置のハードウェア
５３も該第二の通信装置のアプリケーション・プログラ
ム５４も物理的には使用中であり、切り替えは通常電子
的に行なわれ、更に、現用系と予備系の位相同期などに
配慮がなされるために、障害検出から切り替えが終了し
てシステムが復旧するまでの時間は非常に短いと考えて
よい。Here, both the hardware 53 of the second communication device and the application program 54 of the second communication device are physically in use, and the switching is usually performed electronically. Since consideration is given to the phase synchronization between the system and the standby system, the time from the detection of a failure to the end of switching and the restoration of the system may be considered to be very short.

【００１２】図１７は、従来の障害復旧方式の構成で、
例えば、公開特許公報平成３−１４４８３１「システム
復旧方法」に記載されているものである。これは、複数
の処理機能を有するパーソナル・コンピュータにおける
システムの障害復旧方式を想定しているものである。FIG. 17 shows the configuration of a conventional failure recovery system.
For example, this is described in Japanese Patent Application Laid-Open No. Hei 3-148331 “System restoration method”. This assumes a system failure recovery method in a personal computer having a plurality of processing functions.

【００１３】図１７において、６１はプロセス制御部
で、障害検出手段６１−１及び障害復旧手段６１−２を
備えている。又、６２乃至６４は該プロセス制御部６１
の制御を受けながら連携して動作するプロセスで、６２
はプロセスＡ、６３はプロセスＢ、６４はプロセスＣと
標記している。In FIG. 17, reference numeral 61 denotes a process control unit which includes a failure detecting means 61-1 and a failure recovery means 61-2. 62 to 64 are the process control units 61
Is a process that operates in cooperation under the control of
Denotes a process A, 63 denotes a process B, and 64 denotes a process C.

【００１４】図１７の構成において、該障害検出手段６
１−１は常に該プロセスＡ６２、プロセスＢ６３及びプ
ロセスＣ６４の状態を監視しており、検出した障害の原
因となっている部位を判断し、判断結果を該障害復旧手
段６１−２に通知する。通知を受けた該障害復旧手段６
１−２は、その障害部位と障害内容に対応して復旧処理
を実施する。例えば、障害部位がプロセスＣ６４である
と特定された場合には、障害内容は該プロセスＣ６４本
来の機能と密接な関係にあるので、該プロセスＣ６４に
特有の復旧処理を実施する。In the configuration shown in FIG.
1-1 constantly monitors the states of the process A62, the process B63, and the process C64, determines the part causing the detected failure, and notifies the failure recovery unit 61-2 of the determination result. The failure recovery means 6 that has been notified
1-2 performs a recovery process in accordance with the failure site and the failure content. For example, when the failure site is specified as the process C64, the failure content is closely related to the original function of the process C64, and therefore, a recovery process specific to the process C64 is performed.

【００１５】従って、通常の場合、復旧処理の自動化が
可能で、障害となったプロセスの復旧処理と正常な動作
を継続しているプロセスにおける本来の処理は並行して
行なわれる。Therefore, in a normal case, the recovery process can be automated, and the recovery process of the failed process and the original process in the process that continues the normal operation are performed in parallel.

【００１６】又、図１７の構成においては二重化の必要
性がないからシステム規模の肥大化を避けることができ
る。Further, in the configuration of FIG. 17, since there is no need for duplication, it is possible to avoid an increase in system scale.

【００１７】[0017]

【発明が解決しようとする課題】しかし、図１６の構成
は、同じ通信装置のハードウェアとアプリケーション・
プログラムを二重に備えていなければならないから、シ
ステム規模の肥大化を免れることは難しい。通信システ
ムにしても情報処理システムにしても、最も多くは社会
の経済活動の中で使用されるので、規模の肥大化に伴う
経済的な負担の増加や設置スペースの増加は企業の投資
負担の増加そのものであるので問題が大きい。However, the configuration shown in FIG. 16 uses the same communication device hardware and application software.
It is difficult to avoid system bloat because the program must be duplicated. Since most communication systems and information processing systems are used in social economic activities, the increase in economic burden and installation space due to the enlargement of the scale increases the investment burden on companies. The problem is significant because it is an increase itself.

【００１８】又、障害発生後に動作が正常な予備系に切
り替えられるのでシステムの運用上は問題ないが、通常
の場合、障害発生部位のハード的な交換や、暴走したア
プリケーション・プログラムの終了処理が保守者などの
人手を介して必要になるという問題がある。In addition, since the operation is switched to the normal standby system after the occurrence of a failure, there is no problem in system operation. However, in a normal case, hardware replacement of a failed part and termination processing of a runaway application program are performed. There is a problem that it becomes necessary through manual operations such as maintenance personnel.

【００１９】一方、図１７の構成においては、障害とな
ったプロセスの復旧処理の間は当該プロセスの処理が停
止されることが最も大きい問題であり、又、他の正常な
プロセスにおける本来の処理と障害プロセスの復旧処理
が並行して行なわれるために、他の正常なプロセスにお
ける本来の処理能力が低下するという恐れもある。On the other hand, in the configuration shown in FIG. 17, the biggest problem is that the processing of the failed process is stopped during the recovery process of the failed process. Since the recovery process of the failed process is performed in parallel with the normal process, the original processing capability of another normal process may be reduced.

【００２０】又、図１６の構成に於ける監視切替装置５
５や、図１７の構成における障害検出手段６１−１及び
障害復旧手段６１−２自体に障害が発生した場合には、
障害検出機能そのもの又は障害復旧機能そのものが働か
なくなるので、システム運用上致命的な状況に陥る。The monitoring switching device 5 in the configuration of FIG.
5 or when a failure has occurred in the failure detection means 61-1 and the failure recovery means 61-2 in the configuration of FIG.
Since the failure detection function itself or the failure recovery function itself does not work, a fatal situation occurs in system operation.

【００２１】更に、図１７の構成では、障害となったプ
ロセスを強制的に終了処理して再起動をかけるという障
害復旧方法をとるものであり、当該プロセスは初期状態
から立ち上がることになるので、障害直前までの処理デ
ータが消失してしまうという大問題がある。Further, the configuration of FIG. 17 employs a failure recovery method in which a failed process is forcibly terminated and restarted, and the process is started from an initial state. There is a major problem that the processing data immediately before the failure is lost.

【００２２】かくの如く、従来の二重化システムの運用
方式や、システム復旧方式には種々の問題点がある。As described above, there are various problems in the conventional operation system of the duplex system and the system restoration system.

【００２３】本発明は、かかる問題点に鑑み、障害検出
を確実に行なうことができる上に障害からの自動復旧が
容易で、設置にかかわる経済的負担が小さく且つ設置ス
ペースを縮減することができる二重化システムの運用方
式を提供することを目的とする。In view of the above problems, the present invention can reliably detect a failure, facilitate automatic recovery from a failure, reduce the economic burden on installation, and reduce the installation space. An object of the present invention is to provide a redundant system operation method.

【００２４】[0024]

【課題を解決するための手段】本発明の原理は、システ
ムの立ち上げ時に起動された各プロセス（これらがオリ
ジナルとなる。）が必要最小限のリソースだけを持たせ
たクローンを自律的に起動し、オリジナルとクローンの
間で行なう定期的な通信によって互いの状態の把握と必
要なデータの共有を行ない、オリジナルの障害時には一
部プロセス制御部の助けを借りて障害復旧をし、クロー
ンの障害時にはオリジナルが自律的に障害復旧を行なう
技術である。According to the principle of the present invention, each process started when the system is started (these processes become originals) autonomously starts a clone having only necessary minimum resources. The original and the clone perform regular communication to understand each other's status and share necessary data, and in the event of an original failure, recover the failure with the help of a part of the process control unit. Sometimes the original is a technology that autonomously recovers from a failure.

【００２５】上記本発明の原理によれば、各プロセス共
オリジナルとクローンを備えているが、独立な装置を２
台備える訳ではないし、クローンには最低限必要なリソ
ースしか与えないので、システム規模の肥大化を避ける
ことができる。According to the principles of the present invention, each process has an original and a clone, but requires two independent devices.
It doesn't provide any resources and gives clones only the minimum resources they need, thus avoiding system bloat.

【００２６】又、オリジナルとクローンが定期通信を通
じて互いの状態を監視することができるので、障害の検
出を確実に行なうことができるシステムを構築すること
ができる。Also, since the original and the clone can monitor each other's status through regular communication, it is possible to construct a system that can reliably detect a failure.

【００２７】更に、オリジナルとクローンは運用管理デ
ータを共有して持っているので、オリジナルが障害にな
って強制終了されても運用管理データが消失することは
ない。又、オリジナルとクローンが同一プログラム・メ
モリ上に展開されているので、オリジナルとクローンの
間でプログラム・メモリの再配分をしてからオリジナル
を強制終了させることが可能であるから、障害になった
オリジナルが持っていた処理データをクローンから変態
した新オリジナルに渡すことができる。Furthermore, since the original and the clone share the operation management data, the operation management data will not be lost even if the original becomes a failure and is forcibly terminated. Also, since the original and the clone are deployed on the same program memory, it is possible to forcibly terminate the original after redistributing the program memory between the original and the clone, which is an obstacle. The processing data of the original can be transferred from the clone to the transformed new original.

【００２８】従って、障害になったオリジナルプロセス
をクローン化し、クローンであったプロセスをオリジナ
ル化することによって障害復旧しても、運用管理データ
と処理データを消失することはない。Therefore, even if the failed original process is cloned and the failed process is restored by originalizing the cloned process, the operation management data and the processed data are not lost.

【００２９】尚、クローンの障害を検出した時には、そ
の障害を検出したオリジナルが全てのデータを持ってい
るので、クローンの再生時にクローンに運用管理データ
を渡すことができるから、何ら問題はない。When a failure of a clone is detected, since the original which has detected the failure has all the data, the operation management data can be passed to the clone when the clone is reproduced, so that there is no problem.

【００３０】[0030]

【発明の実施の形態】図１は、本発明のシステム構成概
要と本発明のシステムの起動を説明する図で、パーソナ
ル・コンピュータ内に構築したシステムの運用方式を想
定してその主要部を図示したものである。DESCRIPTION OF THE PREFERRED EMBODIMENTS FIG. 1 is a diagram for explaining the outline of the system configuration of the present invention and the activation of the system of the present invention. It was done.

【００３１】図１において、１はハード・ディスクで、
該ハード・ディスク１内にはプロセス制御部ロード・モ
ジュール（図１中では“・”が記載されていないが、同
じものである。以降、図において類似の省略が行なわれ
ることがある。）、プロセスＡロード・モジュール１
２、プロセスＢロード・モジュール１３及びダイナミッ
ク・リンク・ライブラリ１４を格納している。In FIG. 1, 1 is a hard disk,
In the hard disk 1, a process control unit load module ("." Is not described in FIG. 1 but is the same, but is the same. Hereinafter, similar omission may be performed in the figure). Process A load module 1
2, a process B load module 13 and a dynamic link library 14 are stored.

【００３２】２はプロセス制御部のオリジナルで、プロ
セス起動／制御部２１−１及びダイナミック・リンク・
ライブラリ読み込み部（図１では、読込部と標記してい
るが同じものである。）２１−２を備える運用プログラ
ム２１と、運用管理プログラム２２を備えている。Reference numeral 2 denotes an original process control unit, which includes a process start / control unit 21-1 and a dynamic link unit.
An operation program 21 including a library reading unit (in FIG. 1, although denoted as a reading unit but being the same as the reading unit) 21-2, and an operation management program 22 are provided.

【００３３】２ａはプロセス制御部オリジナル２（文
中、プロセス制御部とも標記しているが同じものであ
る。）によって起動される該プロセス制御部オリジナル
２のクローンである。Reference numeral 2a denotes a clone of the process control unit original 2 which is started by the process control unit original 2 (the process control unit is also described in the text but is the same).

【００３４】３はプロセスＡのオリジナルで、ダイナミ
ック・リンク・ライブラリ読み込み部３１−１を備える
運用プログラム３１及び運用管理プログラム３２を備え
ている。Reference numeral 3 denotes an original of the process A, which includes an operation program 31 and an operation management program 32 having a dynamic link library reading unit 31-1.

【００３５】３ａは該プロセスＡオリジナル３によって
起動される該プロセスＡオリジナル３のクローンであ
る。3a is a clone of the process A original 3 started by the process A original 3.

【００３６】同様に、４はプロセスＢのオリジナルであ
り、ダイナミック・リンク・ライブラリ読み込み部４１
−１を備える運用プログラム４１及び運用管理プログラ
ム４２を備えている。Similarly, reference numeral 4 denotes an original of the process B, and the dynamic link library reading unit 41
-1 and an operation management program 42.

【００３７】又、４ａは該プロセスＢオリジナル４によ
って起動される該プロセスＢオリジナルのクローンであ
る。Reference numeral 4a denotes a clone of the process B original started by the process B original 4.

【００３８】ここで、各プロセスの運用プログラムは各
プロセスに特有のプログラムであり、各プロセスの運用
管理プログラムは各プロセスに共通なプログラムであ
る。Here, the operation program of each process is a program unique to each process, and the operation management program of each process is a program common to each process.

【００３９】尚、各プロセスの運用プログラム、運用管
理プログラム及びクローンの内容については後で詳述す
る。The details of the operation program, operation management program, and clone of each process will be described later.

【００４０】さて、プロセス制御部オリジナル２、プロ
セスＡ及びプロセスＢはプログラム・メモリ上にロード
されて初めてアプリケーション機能を実現することがで
きるようになる。それぞれの起動契機は下記の通りであ
る。Now, the process control unit original 2, process A and process B can realize the application function only after being loaded on the program memory. The triggers for each are as follows.

【００４１】即ち、プロセス制御部２は、ウィンドウズ
系プログラム起動手順による、ユーザーの手動起動、又
は、スタート・アップへの登録による自動起動によって
起動される。That is, the process control unit 2 is started by a manual start of the user according to a Windows system program start procedure, or by an automatic start by registration in the startup.

【００４２】又、各プロセスは、上記の如く起動された
プロセス制御部のプロセス起動／制御部２１−１からの
起動要求によって順次起動される。Each process is sequentially activated by the activation request from the process activation / control unit 21-1 of the process control unit activated as described above.

【００４３】更に、ダイナミック・リンク・ライブラリ
１４は、起動されたプロセス制御部２のダイナミック・
リンク・ライブラリ読み込み部２１−２、又は、起動さ
れた各プロセスのダイナミック・リンク・ライブラリ読
み込み部からの起動要求によって自動的にロードされ
る。Further, the dynamic link library 14 stores the dynamic library of the activated process control unit 2.
It is automatically loaded by a start request from the link library reading unit 21-2 or the dynamic link library reading unit of each started process.

【００４４】以下に、図１中に記載した○付き数字の符
号に沿って起動手順を説明する。Hereinafter, the start-up procedure will be described along the reference numerals with circles in FIG.

【００４５】ユーザーがアイコンをダブル・クリッ
クするか、直接コマンド・ラインにてプロセス制御部の
プログラム名を指定することによって、ウィンドウズの
オペレーション・システム（所謂ＯＳである。）がプロ
グラムを特定して起動指示を行なう。When the user double-clicks the icon or directly specifies the program name of the process control unit on the command line, the Windows operation system (so-called OS) specifies and starts the program. Give instructions.

【００４６】ハード・ディスク１に格納されている
プロセス制御部のロード・モジュールをプログラム・メ
モリ上にロードする。The load module of the process control unit stored on the hard disk 1 is loaded on the program memory.

【００４７】においてロードされたプロセス制御
部２のダイナミック・リンク・ライブラリ読み込み部２
１−２からダイナミック・リンク・ライブラリ１４に格
納されている運用管理プログラムのロードを要求する。The dynamic link library reading unit 2 of the process control unit 2 loaded in
From 1-2, a request is made to load the operation management program stored in the dynamic link library 14.

【００４８】ハード・ディスク１に格納されている
ダイナミック・リンク・ライブラリの内容をプロセス制
御部２に追加ロードする。The contents of the dynamic link library stored on the hard disk 1 are additionally loaded into the process control unit 2.

【００４９】プロセス制御部２が、プロセス制御部
２のプロセス起動／制御部２１−１が備える起動テーブ
ル（図示を省略している。）の起動手順に従って、各プ
ロセスの起動要求をする。The process control unit 2 issues a start request for each process according to the start procedure of a start table (not shown) provided in the process start / control unit 21-1 of the process control unit 2.

【００５０】ハード・ディスク１上に格納されてい
る各プロセスのロード・モジュールをプログラム・メモ
リ上にロードする。The load module of each process stored on the hard disk 1 is loaded on the program memory.

【００５１】でプログラム・メモリ上にロードさ
れた各プロセスのダイナミック・リンク・ライブラリ読
み込み部によって、各プロセスに共通な運用管理プログ
ラムのロード要求をする。Then, the dynamic link library reading unit of each process loaded on the program memory issues a load request for an operation management program common to each process.

【００５２】ハード・ディスク１上に格納されてい
る各プロセスに共通な運用管理プログラムをプログラム
・メモリ上の各プロセスに追加ロードする。An operation management program common to each process stored on the hard disk 1 is additionally loaded to each process on the program memory.

【００５３】尚、運用管理プログラムの内容は後で詳述
する。The contents of the operation management program will be described later in detail.

【００５４】こうしてプログラム・メモリ上にロードさ
れて動作可能になったプロセス制御部２と各プロセス
（これがそれぞれのオリジナルとなる。）は、運用管理
プログラムに格納されている機能プログラムの動作によ
って、それぞれ、必要最小限のリソースを持つ自分のク
ローンを生成する。The process control unit 2 and each process (these are originals) loaded on the program memory and operable in this way are respectively operated by the operation of the function program stored in the operation management program. Generate your own clone, with the minimum resources required.

【００５５】そして、それぞれのオリジナルとクローン
は定期的に通信を行ない、互いの状態を把握すると共
に、運用管理データを共有をしている。The originals and the clones periodically communicate with each other to grasp the state of each other and to share operation management data.

【００５６】そして、上記定期通信の間にクローン側に
よってオリジナルが障害であると判定した場合にはクロ
ーン側からプロセス制御部２にその旨通知し、該プロセ
ス制御部２からのオリジナル化操作によってクローンか
らオリジナルに変態し、起動時と同様に、オリジナルで
あるプロセスがクローンを生成する。If the clone determines that the original is faulty during the periodical communication, the clone notifies the process control unit 2 of the failure, and the clone control is performed by the process control unit 2 through the originalization operation. To the original, and the process that is the original creates a clone, as at startup.

【００５７】又、定期通信中にオリジナル側によってク
ローンの障害を検出した場合には、オリジナル側からク
ローンを強制終了させ、新たにクローンを再生する。If a failure of the clone is detected by the original during the periodical communication, the clone is forcibly terminated from the original and a new clone is reproduced.

【００５８】上記クローンの起動、オリジナルとクロー
ンの定期通信とそれによる互いの状態監視及び障害復旧
については後で順次説明する。The activation of the clone, the periodic communication between the original and the clone, the status monitoring of each other, and the recovery from the failure will be described later in order.

【００５９】図２は、本発明によるシステム内部の構成
である。FIG. 2 shows the internal structure of the system according to the present invention.

【００６０】図２において、２はプロセス制御部オリジ
ナルで、該プロセス制御部オリジナル２に本来必要な機
能を実現する運用プログラム２１、及び、本発明特有の
機能を実現する運用管理プログラム２２を備えている。In FIG. 2, reference numeral 2 denotes a process control unit original, which comprises an operation program 21 for realizing functions originally required for the process control unit original 2 and an operation management program 22 for realizing functions unique to the present invention. I have.

【００６１】該運用プログラム２１は、プロセス起動／
制御部２１−１及びダイナミック・リンク・ライブラリ
読み込み部（図２では、ＤＬＬ読み込み部と省略して標
記している。）２１−２を備える。The operation program 21 starts the process /
A control unit 21-1 and a dynamic link library reading unit (abbreviated as a DLL reading unit in FIG. 2) 21-2 are provided.

【００６２】又、該運用管理プログラム２２は、起動さ
れたプロセスを登録しておく起動プロセス登録テーブル
２２−１、定期通信／障害検出部２２−２と、図示は省
略している起動プロセス検出部、プロセス種別判定部、
クローン生成部を備えている。The operation management program 22 includes a startup process registration table 22-1 for registering the started processes, a periodic communication / failure detection unit 22-2, and a startup process detection unit (not shown). , Process type determination unit,
It has a clone generator.

【００６３】２ａはプロセス制御部クローンで、図示は
省略しているが、少なくとも該プロセス制御部オリジナ
ル２と同じ内容のプログラムをロードされるのが好まし
く、データについても同じ内容のデータを保有している
のが好ましい。この理由については全ての説明が終了し
たところで説明する。Reference numeral 2a denotes a process control unit clone, not shown, but it is preferable that a program having at least the same content as that of the original process control unit 2 is loaded. Is preferred. The reason will be explained when all the explanations have been completed.

【００６４】３はプロセスＡオリジナルで、運用プログ
ラム３１と運用管理プログラム３２を備えている。Reference numeral 3 denotes a process A original, which includes an operation program 31 and an operation management program 32.

【００６５】該運用管理プログラム３１は、図１に示し
た如きダイナミック・リンク・ライブラリ読み込み部を
含んで、プロセスＡが本来の機能を実現するために必要
なプログラムを備えている。The operation management program 31 includes a program necessary for the process A to realize the original function, including the dynamic link library reading unit as shown in FIG.

【００６６】又、該運用管理プログラム３２は起動され
たプロセス自身が上記起動プロセス登録テーブル２２−
１に登録されているか否かを検出する起動プロセス検出
部３２−１、上記検出結果によって起動されたプロセス
自身がオリジナルであるか否かを判定するプロセス種別
判定部３２−２、オリジナルであるプロセスがクローン
を生成するクローン生成部３２−３及びクローンと定期
通信を行なってデータの共有と互いの状況の判断を行な
う定期通信／障害検出部３２−４を備えている。In the operation management program 32, the started process itself is executed by the start process registration table 22-.
1; a start process detection unit 32-1 for detecting whether or not the process itself is an original; a process type determination unit 32-2 for determining whether or not the process itself started based on the detection result is an original; Has a clone generating unit 32-3 for generating a clone and a periodic communication / failure detecting unit 32-4 for performing periodic communication with the clone to share data and determine each other's situation.

【００６７】３ａはプロセスＡクローンで、定期通信／
障害検出部３２−４ａのみを備えている。Reference numeral 3a denotes a process A clone, which is used for regular communication /
Only the failure detection unit 32-4a is provided.

【００６８】同様に、４はプロセスＢオリジナル、４ａ
はプロセスＢクローンで、ロードされる内容は、それぞ
れ、プロセスＡオリジナル３、プロセスＡクローン３ａ
と全く同様なので、説明は省略する。Similarly, 4 is a process B original, 4a
Is a process B clone, and the loaded contents are a process A original 3 and a process A clone 3a, respectively.
Therefore, the description is omitted.

【００６９】図３はシステム内の通信リソースで、図４
はいわば通信リソースを中心にしたシステムの構成図で
ある。FIG. 3 shows communication resources in the system.
It is a configuration diagram of a system focusing on communication resources.

【００７０】図３において、２はプロセス制御部、３は
プロセスＡオリジナル、３ａはプロセスＡクローン、４
はプロセスＢオリジナル、４ａはプロセスＢクローン
で、システムの構成は図２に示したものと同じである。
尚、図３ではプロセス制御部２だけについてはクローン
を図示していない。In FIG. 3, 2 is a process control unit, 3 is a process A original, 3a is a process A clone, 4
Is a process B original, 4a is a process B clone, and the system configuration is the same as that shown in FIG.
Note that FIG. 3 does not show a clone for only the process control unit 2.

【００７１】５はプロセス間通信リソースで、該プロセ
ス制御部２と各プロセスのオリジナルとの間に設けられ
た通信リソースである。該プロセス間通信リソース５は
各々のプロセスに専用に設けることも、共通の通信リソ
ースとして設けることも可能である。専用に設ける場合
には、バス調停が必要でなくなる反面多数のバスが必要
になる。一方、共通に設ける場合には、バス調停が必要
な反面バスの数を少なくできる利点がある。Reference numeral 5 denotes an inter-process communication resource, which is a communication resource provided between the process control unit 2 and the original of each process. The inter-process communication resource 5 can be provided exclusively for each process or can be provided as a common communication resource. In the case where dedicated buses are provided, bus arbitration is not required, but a large number of buses are required. On the other hand, when they are provided in common, there is an advantage that the number of buses can be reduced while bus arbitration is required.

【００７２】そして、各々のプロセスのオリジナルは、
該プロセス間通信リソース５を介して該プロセス制御部
２との通信や他のプロセスのオリジナルとの間の通信を
行なう。The original of each process is
The communication with the process control unit 2 and the communication with the original of another process are performed via the inter-process communication resource 5.

【００７３】６は各々のプロセスのクローンに共通に設
けられた共通クローン通信リソースである。通常は、各
々のプロセスのクローンがプロセス外と通信することは
なく、後で述べるようにクローンがオリジナルの障害を
検出して該プロセス制御部２と通信を行なう時に用いる
ものなので、クローンに与えられるプロセス外との通信
リソースは各々のクローンに共通なものでよい。Reference numeral 6 denotes a common clone communication resource provided commonly to each process clone. Normally, each process clone does not communicate outside the process, and is used when the clone detects an original failure and communicates with the process control unit 2 as will be described later. The communication resources outside the process may be common to each clone.

【００７４】７は各々のプロセスのオリジナルとクロー
ンとの間の通信に用いられるオリジナル−クローン間通
信リソースである。各々のプロセスのオリジナルとクロ
ーンは、該オリジナル−クローン間通信リソース７を介
して定期通信を行なう。Reference numeral 7 denotes an original-clone communication resource used for communication between the original and the clone of each process. The original and the clone of each process perform regular communication via the original-clone communication resource 7.

【００７５】上記のように、プロセス制御部と各プロセ
スはオリジナルとクローンとを持つが、各々の内部構成
と通信リソースは完全二重化されたものではないので、
システム規模の肥大化を避けることができる。As described above, the process control unit and each process have an original and a clone. However, since the internal configuration and communication resources are not completely duplicated,
The system size can be prevented from being enlarged.

【００７６】図２のシステムのオリジナルを中心とした
起動の手順については図１を用いて詳述したので、ここ
では省略する。The procedure for starting the system of FIG. 2 focusing on the original has been described in detail with reference to FIG. 1 and will not be described here.

【００７７】以降、クローンの起動、オリジナルとクロ
ーンによる定期通信と障害検出及び障害復旧の説明をす
るが、本発明はシングル・プロセスでもマルチ・プロセ
スでも共通に適用することができるので、以下は図面の
簡略化のために、プロセスＡだけのシングル・プロセス
であるものとして種々の段階におけるシステムの動作の
説明を続ける。尚、以降の動作説明においては、プロセ
ス制御部とプロセスＡの内部構成は、当該動作に最も関
係が深いもののみを示すことにする。Hereinafter, activation of a clone, periodic communication using an original and a clone, fault detection and fault recovery will be described. However, since the present invention can be applied to both a single process and a multi-process, the following will be described. For simplicity, the description of the operation of the system at various stages will be continued as if it were a single process of process A only. In the following description of the operation, the internal configuration of the process control unit and the process A indicates only those most closely related to the operation.

【００７８】図４は、起動されたプロセスがオリジナル
であるか否かを判定するプロセス種別の判定を説明する
図である。FIG. 4 is a diagram for explaining the process type determination for determining whether the started process is the original process.

【００７９】図４において、２はプロセス制御部、２１
−１は該プロセス制御部２のプロセス起動／制御部、２
２−１は該プロセス制御部２が備える起動プロセス登録
テーブルである。又、３はプロセスＡ、３２−１は該プ
ロセスＡ３の起動プロセス検出部、３２−２は該プロセ
スＡ３のプロセス種別判定部である。In FIG. 4, reference numeral 2 denotes a process control unit;
-1 is a process start / control unit of the process control unit 2;
2-1 is an activation process registration table provided in the process control unit 2. Reference numeral 3 denotes a process A, 32-1 an activation process detection unit for the process A3, and 32-2 a process type determination unit for the process A3.

【００８０】上記の如く、該プロセス制御部２の該プロ
セス起動／制御部２１−１の指示によって該プロセスＡ
３がプログラム・メモリ上に展開され、初期化されて起
動される。これが、図４中に記載されている起動であ
る。As described above, according to the instruction of the process start / control unit 21-1 of the process control unit 2, the process A
3 is expanded on the program memory, initialized and activated. This is the activation described in FIG.

【００８１】起動された該プロセスＡ３の該起動プロセ
ス検出部３２−１は、図３に示したプロセス間通信リソ
ース４を介して該プロセス制御部２の起動プロセス登録
テーブル２２−１にアクセスして、プロセスＡ３自体の
ＩＤが登録されているか否かを調査する。これが、図４
中に記載されているプロセス種別調査である。The activated process detection unit 32-1 of the activated process A3 accesses the activated process registration table 22-1 of the process control unit 2 via the inter-process communication resource 4 shown in FIG. It is checked whether the ID of the process A3 itself is registered. This is shown in FIG.
It is a process type survey described in the table.

【００８２】該起動プロセス登録テーブル２２−１は、
図５の起動プロセス登録テーブルの構成例に示すよう
に、プロセス名と当該プロセスのＩＤを１ブロックとし
て、複数のプロセスに対応して複数のブロックが登録で
きるようになっており、このうちプロセス名は予め登録
されているが、プロセスＩＤは起動されたプロセスから
のアクセスで初めて登録される。The activation process registration table 22-1 is
As shown in the configuration example of the activation process registration table in FIG. 5, a process name and an ID of the process are set as one block, and a plurality of blocks can be registered corresponding to a plurality of processes. Is registered in advance, but the process ID is registered for the first time upon access from the started process.

【００８３】従って、図４のプロセスＡ３が初めて起動
されたプロセスであるならば、起動プロセス登録テーブ
ルにプロセスＩＤが未登録なので、プロセス種別調査と
それに続くプロセス種別選定部３２−２におけるプロセ
ス種別判定によってオリジナルであることが判明する。
そして、該起動プロセス登録テーブル２２−１に自身の
プロセスＩＤを登録して、このルーチンを終了する。こ
れが、図４中に記載されたＩＤ登録である。Therefore, if the process A3 in FIG. 4 is a process started for the first time, since the process ID is not registered in the start process registration table, the process type examination and the subsequent process type determination in the process type selection unit 32-2 are performed. Turns out to be the original.
Then, its own process ID is registered in the activation process registration table 22-1, and this routine ends. This is the ID registration described in FIG.

【００８４】そして、プロセス種別調査、プロセス種別
判定及びプロセスＩＤ登録を経て、図４のプロセスＡ３
はプロセスＡのオリジナルであることが確定する。Then, through the process type investigation, the process type determination, and the process ID registration, the process A3 in FIG.
Is determined to be the original of process A.

【００８５】尚、後で述べるようにオリジナルからクロ
ーンを起動するので、該プロセス起動／制御部２１−１
がプロセスＡ３の起動をかける際にプロセス起動／制御
部２１−１固有のＩＤを該プロセスＡ３に渡すようにす
れば、プロセス種別調査の必要性は低くなる。ただ、該
プロセス制御部２が起動済のプロセスを認識しておく必
要性は高いので、この場合でも起動されたプロセスが自
身のプロセスＩＤを登録することは重要である。Since the clone is started from the original as described later, the process start / control unit 21-1 is used.
If the process A3 is passed the ID unique to the process activation / control unit 21-1 when the process A3 is activated, the necessity of the process type examination is reduced. However, since it is highly necessary for the process control unit 2 to recognize the activated process, it is important that the activated process registers its own process ID even in this case.

【００８６】図６は、クローンの起動を説明する図であ
る。FIG. 6 is a diagram for explaining the activation of the clone.

【００８７】図６において、３はプロセスＡオリジナル
で、プロセス種別判定部３２−２及びクローン生成部３
２−３を備えている。３ａはプロセスＡクローンで、定
期通信／障害検出部３２−４ａを備えている。In FIG. 6, reference numeral 3 denotes a process A original, and the process type determination unit 32-2 and the clone generation unit 3
2-3 are provided. Reference numeral 3a denotes a process A clone, which includes a periodic communication / failure detection unit 32-4a.

【００８８】該プロセスＡオリジナル３が起動されて、
該プロセス種別判定部３２−２によって自身がオリジナ
ルであることが判定されると、その判定結果が該クロー
ン生成部３２−３に渡される。これを契機に該クローン
生成部３２−３は該プロセスＡオリジナル３の定期通信
／障害検出部を同一プログラム・メモリ上に展開してプ
ロセスＡクローンとして起動する。When the process A original 3 is started,
When the process type determination unit 32-2 determines that the process itself is the original, the determination result is passed to the clone generation unit 32-3. In response to this, the clone generation unit 32-3 develops the periodic communication / failure detection unit of the process A original 3 on the same program memory and starts up as a process A clone.

【００８９】即ち、該プロセスＡオリジナル３はプロセ
ス制御部を介さず、自律的に自身のクローンを起動す
る。That is, the process A original 3 autonomously starts its own clone without going through the process control unit.

【００９０】そして、該プロセスＡオリジナル３と該プ
ロセスＡクローン３ａは双方の定期通信／障害検出部を
介して定期通信を行なうことによって、互いの状態の把
握を行なうと共に、プログラム実行上のシーケンス番号
などの運用管理データを共有する。以降、これらについ
て説明する。The process A original 3 and the process A clone 3a communicate with each other via the regular communication / failure detection units to grasp each other's state and to determine the sequence number in program execution. And other operation management data. Hereinafter, these will be described.

【００９１】尚、定期通信のモードは、クローンからリ
クエストを送信するというクローン主導型と、オリジナ
ルが通信の主導権を握るオリジナル主導型とのいずれで
も可能であるが、本明細書では前者のモードで定期通信
するものとして説明する。The mode of the regular communication can be either a clone-initiated type in which a request is transmitted from a clone or an original-initiated type in which the original takes the initiative in communication. The description will be made assuming that communication is performed periodically.

【００９２】図７は、オリジナル・クローン間のデータ
の共有を説明する図である。FIG. 7 is a diagram for explaining data sharing between original clones.

【００９３】図７において、２はプロセス制御部、３は
プロセスＡオリジナル、３ａはプロセスＡクローンであ
る。尚、図７においては該プロセス制御部２の内部構成
は図示せず、該プロセスＡオリジナル３については定期
通信／障害検出部３２−４とオリジナル・データ３２−
５のみを図示し、該プロセスＡクローン３ａについては
定期通信／障害検出部３２−４ａとクローン・データ３
２−５ａのみを図示している。In FIG. 7, 2 is a process control unit, 3 is a process A original, and 3a is a process A clone. In FIG. 7, the internal configuration of the process control unit 2 is not shown. For the process A original 3, the periodic communication / failure detection unit 32-4 and the original data 32-
5, only the regular communication / failure detection unit 32-4a and the clone data 3 for the process A clone 3a.
Only 2-5a is shown.

【００９４】該プロセスＡオリジナル３は該定期通信／
障害検出部３２−４を使って、該プロセスＡクローン３
ａは該定期通信／障害検出部３２−４ａを使って互いに
定期通信をしており、定期通信の際に該プロセスＡオリ
ジナル３からオリジナル・データ３２−５が該プロセＡ
クローン３ａに送信され、該クローン・データ３２−５
ａとなる。尚、プロセスＡオリジナル３は処理シーケン
ス番号などの運用管理データと処理データとを持ってお
り、全てのデータをプロセスＡクローン３ａに渡すこと
は可能であるが、運用管理データのみを渡すだけでよ
い。The process A original 3 communicates with the regular communication /
Using the failure detection unit 32-4, the process A clone 3
a perform regular communication with each other using the regular communication / fault detection unit 32-4a, and at the time of the regular communication, the original data 32-5 from the process A original 3 is transmitted to the process A.
Transmitted to the clone 3a and the clone data 32-5
a. Note that the process A original 3 has operation management data such as a processing sequence number and processing data, and it is possible to pass all data to the process A clone 3a, but it is only necessary to pass only operation management data. .

【００９５】図８は、定期通信／障害検出の基本動作を
説明する図（その１）で、オリジナルとクローンが共に
正常で定期通信を通じてデータを共有しているケースの
動作を説明するものである。以降、図８に記載した符号
に沿って説明する。FIG. 8 is a diagram (part 1) for explaining the basic operation of periodic communication / failure detection, and explains the operation in the case where the original and the clone are both normal and share data through the periodic communication. . Hereinafter, description will be given along the reference numerals shown in FIG.

【００９６】Ｓ４１．定期通信開始時に、クローンはタ
イムアウト・タイマ（図８では字数の節約のために“タ
イムアウトタイマ”というように“・”を省略して記載
しているが、全く同じものと理解されたい。又、他のテ
クニカル・タームでも同様な記載方法をとることがあ
る。）をクリアする。S41. At the start of the regular communication, the clone is a timeout timer (in FIG. 8, the symbol "." Is omitted to save the number of characters, such as "timeout timer", but it should be understood that the clone is exactly the same. Other technical terms may be described in the same way.)

【００９７】Ｓ４２．オリジナルに対して定期通信のリ
クエスト（図８では字数節約のために“ＲＥＱ”と標記
している。同様な標記は他でも用いる。）を送信する。S42. A request for regular communication is transmitted to the original ("REQ" is written in FIG. 8 to save the number of characters. The same notation is used in other cases).

【００９８】定期通信リクエストのフォーマット例は、
図９の定期通信データの構成例の（イ）に示されている
が、例えば、最初の３バイトがリクエストであることを
示す識別子になっており、１バイトの予備バイトが付加
されている。[0098] The format example of the periodic communication request is as follows.
As shown in (a) of the configuration example of the periodic communication data in FIG. 9, for example, the first three bytes are identifiers indicating a request, and one spare byte is added.

【００９９】Ｓ４３．クローンからのリクエストを受け
たオリジナルは、必要なデータを編集して定期通信アン
サーを形成してクローンに向けて送信する。S43. The original which received the request from the clone edits necessary data, forms a periodic communication answer, and transmits it to the clone.

【０１００】定期通信アンサーのフォーマット例は、図
９の定期通信データの構成例の（ロ）に示されている
が、例えば最初の３バイトがアンサー（図９ではＡＮＳ
と標記している。同様な標記法は他でも用いる。）であ
ることを示す識別子になっており、次いでデータの展開
が必要か否かを示すデータ展開要求が搭載される。その
後に、送信するデータの種別（例えば、起動プロセス登
録テーブルのプロセスＩＤや処理シーケンスの番号）、
送信するデータの総サイズを示すデータ・サイズが搭載
されており、最後に送信するデータそのものが搭載さ
れ、データ種別から送信データまでで１ブロックが構成
される。そして、一般的には、定期通信アンサーの中に
複数のブロックが搭載されて送信される。An example of the format of the periodic communication answer is shown in (b) of the configuration example of the periodic communication data in FIG. 9. For example, the first three bytes are the answer (ANS in FIG. 9).
It is labeled. Similar notations are used elsewhere. ), And a data expansion request indicating whether data expansion is necessary is mounted next. After that, the type of data to be transmitted (for example, the process ID of the activation process registration table or the number of the processing sequence)
The data size indicating the total size of the data to be transmitted is mounted, and the data to be transmitted last is mounted, and one block from the data type to the transmission data is configured. In general, a plurality of blocks are mounted in the periodic communication answer and transmitted.

【０１０１】Ｓ４４．クローンは、オリジナルからのデ
ータを受信してメモリ上に展開する。S44. The clone receives data from the original and expands it on memory.

【０１０２】そして、図示を省略しているが、次の定期
通信の時刻まで所定時間待機し、定期通信の時刻になっ
たら再び上記ステップと同じステップ、即ち、Ｓ４５．定期通信開始時に、クローンはタイムアウト・
タイマをクリアする。Then, although not shown in the drawing, the apparatus waits for a predetermined time until the time of the next regular communication, and when the time of the regular communication comes, the same step as the above step, ie, S45. When the regular communication starts, the clone times out.
Clear the timer.

【０１０３】Ｓ４６．オリジナルに対して定期通信のリ
クエストを送信する。S46. Send a request for regular communication to the original.

【０１０４】Ｓ４７．クローンからのリクエストを受け
たオリジナルは、必要なデータを編集してクローンに向
けて送信する。S47. The original that received the request from the clone edits necessary data and sends it to the clone.

【０１０５】Ｓ４８．クローンは、オリジナルからのデ
ータを受信してメモリ上に展開する。を繰り返す。S48. The clone receives data from the original and expands it on memory. repeat.

【０１０６】このようにして、オリジナルとクローンは
同一データを共有することができる。Thus, the original and the clone can share the same data.

【０１０７】図１０は、オリジナルの障害検出とプロセ
ス強制終了を説明する図である。FIG. 10 is a diagram for explaining original failure detection and process forced termination.

【０１０８】図１０において、２はプロセス制御部、３
はプロセスＡオリジナル、３ａはプロセスＡクローンで
ある。In FIG. 10, reference numeral 2 denotes a process control unit;
Is a process A original and 3a is a process A clone.

【０１０９】図１０においては、該プロセス制御部２に
ついてはプロセス起動／制御部２１−１と起動プロセス
登録テーブル２２−１のみが記載されており、プロセス
Ａオリジナル３については定期通信／障害検出部３２−
４のみが記載されており、プロセスＡクローン３ａにつ
いても定期通信／障害検出部３２−４ａのみが記載され
ている。In FIG. 10, only the process activation / control unit 21-1 and the activation process registration table 22-1 are described for the process control unit 2, and the periodic communication / failure detection unit for the process A original 3 is described. 32-
4, only the periodic communication / failure detection unit 32-4a is described for the process A clone 3a.

【０１１０】そして、図１０は該プロセスＡオリジナル
３と該プロセスＡクローン３ａは定期通信をしている
が、該プロセスＡクローン３ａが定期通信リクエストを
出しているにもかかわらず該プロセスＡオリジナル３か
らアンサーが帰ってこない場合を想定して図示してい
る。FIG. 10 shows that the process A original 3 and the process A clone 3a perform regular communication, but the process A clone 3a issues a periodic communication request, but the process A original 3 It is illustrated assuming that the answer does not return from.

【０１１１】該プロセスＡクローン３ａが所定回数連続
して該プロセスＡオリジナル３からアンサーが帰ってこ
ないことを検出する（これが、図１０中に記載されてい
る障害検出である。）と、該定期通信／障害検出部３２
−４ａは該プロセスＡオリジナル３が障害であると判定
し、図３に示した共通クローン通信リソースを介して該
プロセス制御部２にその旨通知する（これが、図１０中
に記載した障害通知である。）。通知を受けた該プロセ
ス制御部２は、最終的に該プロセスＡオリジナル３を強
制終了させる（これが、図１０中に記載した強制終了で
ある。）。When it is detected that the process A clone 3a does not return an answer from the process A original 3 continuously for a predetermined number of times (this is the failure detection described in FIG. 10), Communication / failure detector 32
-4a determines that the process A original 3 is faulty and notifies the process control unit 2 via the common clone communication resource shown in FIG. 3 (this is the fault notification described in FIG. 10). is there.). Upon receiving the notification, the process control unit 2 finally forcibly terminates the process A original 3 (this is the forced termination described in FIG. 10).

【０１１２】尚、強制終了させる時には、運用プログラ
ムと運用管理プログラムについてのみ終了させ、該プロ
セスＡオリジナル３が保有していたデータは消去しな
い。When the forced termination is performed, only the operation program and the operation management program are terminated, and the data held by the process A original 3 is not deleted.

【０１１３】図１１は、定期通信／障害検出の基本動作
を説明する図（その２）で、クローンがオリジナルの障
害を検出するケースの動作を説明するものである。以
降、図１１の符号に沿って上記動作を説明する。FIG. 11 is a diagram (part 2) for explaining the basic operation of the periodic communication / failure detection, and explains the operation in the case where the clone detects the original failure. Hereinafter, the above operation will be described along the reference numerals in FIG.

【０１１４】Ｓ５１．クローンは定期通信に先立ってタ
イムアウト・タイマをクリアする。S51. The clone clears the timeout timer prior to regular communication.

【０１１５】Ｓ５２．オリジナルに対して定期通信リク
エストを送信する。S52. Send a periodic communication request to the original.

【０１１６】この場合、オリジナルが障害であることを
想定しているので、オリジナルからは上記定期通信リク
エストに対するアンサーが帰ってこない。この間、クロ
ーンはタイムアウト・タイマを作動させている。In this case, since it is assumed that the original is a failure, the answer to the periodic communication request does not return from the original. During this time, the clone runs a timeout timer.

【０１１７】Ｓ５３．クローンはタイムアウト・タイマ
が所定時間の経過を検出したのを受けてリトライ・カウ
ンタを歩進させる。該リトライ・カウンタは、定期通信
リクエストに対してオリジナルからアンサーを帰ってこ
なかった回数をカウントするカウンタで、所定回数に達
することによってオリジナルが障害であることを判定す
るためのものである。S53. The clone increments the retry counter in response to the detection of the elapse of the predetermined time by the timeout timer. The retry counter is a counter that counts the number of times that the answer has not returned from the original in response to the periodic communication request, and determines that the original has a failure by reaching a predetermined number.

【０１１８】そして、図示はしていないが、次の定期通
信の時刻まで所定時間待機する。Then, although not shown, it waits for a predetermined time until the time of the next periodic communication.

【０１１９】Ｓ５４．クローンは、再びタイムアウト・
タイマをクリアして、Ｓ５５．オリジナルに対して定期通信リクエストを送信
する。S54. The clone times out again.
Clear the timer, S55. Send a periodic communication request to the original.

【０１２０】この時にもオリジナルからはアンサーが帰
ってこない。At this time, the answer does not return from the original.

【０１２１】Ｓ５６．従って、クローンは再びタイムア
ウトを検出し、リトライ・カウンタを歩進させる。S56. Therefore, the clone detects the timeout again and increments the retry counter.

【０１２２】Ｓ５７．このような動作を繰り返した結
果，クローンはリトライ・カウンタが所定回数に達した
のを検出してオリジナルが障害であることを検出する。S57. As a result of repeating such operations, the clone detects that the retry counter has reached a predetermined number of times, and detects that the original is a failure.

【０１２３】Ｓ５８．そして障害処理のルーチンに入
る。S58. Then, the process enters a failure processing routine.

【０１２４】このルーチンで、まず、プロセスＡクロー
ンがプロセス制御部に対してプロセスＡオリジナルのプ
ロセスＩＤを通知し、最終的にプロセス制御部は起動プ
ロセス登録テーブルから障害となったプロセスＡオリジ
ナルのプロセスＩＤを消去し、該プロセスＡオリジナル
を強制的に終了させる。In this routine, first, the process A clone notifies the process control unit of the process A original process ID, and the process control unit finally finds the failed process A original process ID from the startup process registration table. The ID is erased, and the process A original is forcibly terminated.

【０１２５】図１２は、クローンのオリジナル化と新ク
ローンの生成を説明する図で、上記ステップＳ５８に対
応するものである。FIG. 12 is a diagram for explaining the creation of a clone and the generation of a new clone, and corresponds to step S58.

【０１２６】図１２において、２はプロセス制御部、３
ｂはプロセスＡクローンがオリジナル化されたプロセス
Ａ新オリジナル、３ｃは該プロセスＡ新オリジナル３ｂ
によって再生されたプロセスＡ再生クローンである。In FIG. 12, reference numeral 2 denotes a process control unit;
b is a process A new original in which the process A clone is originalized, 3c is the process A new original 3b
Is a reproduced clone of the process A reproduced by the process A.

【０１２７】尚、図１２においては、該プロセス制御部
２についてはプロセス起動／制御部２１−１、起動プロ
セス登録テーブル２２−１のみを記載し、該プロセスＡ
新オリジナル３ｂについては起動プロセス検出部３２−
１、プロセス種別判定部３２−２、クローン生成部３２
−３のみを記載し、プロセスＡ再生クローン３ｃについ
ては定期通信／障害検出部３２−４ａのみを記載してい
る。In FIG. 12, for the process control unit 2, only the process start / control unit 21-1 and the start process registration table 22-1 are described.
For the new original 3b, the activation process detection unit 32-
1. Process type determination unit 32-2, clone generation unit 32
-3, only the periodic communication / failure detection unit 32-4a is described for the process A reproduction clone 3c.

【０１２８】図１０に示したようにプロセスＡクローン
３ａからプロセスＡオリジナル３が障害であることの通
知を受けたプロセス制御部２は、図４と図６において説
明したプロセスの起動と同様な手順でプロセスＡクロー
ンをオリジナル化してプロセスＡ新オリジナル３ｂを起
動する。従って、プロセスＡ新オリジナル３ｂには運用
プログラムと全ての機能を含む運用管理プログラムがロ
ード、展開される。As shown in FIG. 10, the process control unit 2, which has been notified from the process A clone 3a that the process A original 3 has failed, performs the same procedure as the process start described in FIGS. 4 and 6. To make the process A clone original and start the process A new original 3b. Therefore, an operation program and an operation management program including all functions are loaded and expanded in the process A new original 3b.

【０１２９】そして、図１０の説明で記載したように、
障害となったプロセスＡオリジナルが保有していたデー
タはメモリ領域に保存されているので、このデータを格
納しているデータ領域をプロセスＡ新オリジナル３ｂの
運用プログラムと運用管理プログラムと接続すれば、障
害になったプロセスＡオリジナル３から新たに起動され
たプロセスＡ新オリジナル３ｂにデータを引き継ぐこと
ができる。Then, as described in the description of FIG.
Since the data held by the failed process A original is stored in the memory area, if the data area storing this data is connected to the operation program and the operation management program of the process A new original 3b, Data can be taken over from the failed process A original 3 to the newly activated process A new original 3b.

【０１３０】次いで、図４に示したのと同様に、該プロ
セスＡ新オリジナル３ｂは、起動プロセス検出部３２−
１によって該起動プロセス登録テーブル２２−１にアク
セスして自身のプロセスＩＤが該起動プロセス登録テー
ブル２２−１に登録されているか否かの調査を行なう。Next, in the same manner as shown in FIG. 4, the process A new original 3b is
1 accesses the activation process registration table 22-1 and checks whether or not its own process ID is registered in the activation process registration table 22-1.

【０１３１】この場合、該プロセスＡ新オリジナル３ｂ
は起動されたばかりであるので、自身のプロセスＩＤは
未登録である。従って、プロセス種別判定部３２−２に
よって自身がプロセスＡのオリジナルであると判定し、
該起動プロセス登録テーブル２２−１に自身のプロセス
ＩＤを登録する。In this case, the process A new original 3b
Has just been started, so its own process ID has not been registered. Therefore, the process type determination unit 32-2 determines that the process A is the original of the process A,
It registers its own process ID in the activation process registration table 22-1.

【０１３２】次いで、該プロセス種別判定部３２−２の
判定結果に従って、該クローン生成部３２−３が新たな
クローンを起動して該プロセスＡ再生クローン３ｃとす
る。Next, according to the judgment result of the process type judgment unit 32-2, the clone generation unit 32-3 starts a new clone and sets it as the process A reproduction clone 3c.

【０１３３】図１３は、クローン暴走時の障害検出とク
ローンの再生を説明する図である。FIG. 13 is a diagram for explaining fault detection and clone reproduction during runaway of a clone.

【０１３４】図１３において、２はプロセス制御部、３
はプロセスＡオリジナル、３ａはプロセスＡクローン、
３ｄはプロセスＡ再生クローンである。In FIG. 13, reference numeral 2 denotes a process control unit;
Is Process A original, 3a is Process A clone,
3d is a process A reproduction clone.

【０１３５】尚、該プロセス制御部２の内部構成は図示
を省略し、該プロセスＡオリジナル３については定期通
信／障害検出部３２−４とクローン生成部３２−３のみ
を記載し、プロセスＡクローン３ａとプロセスＡ再生ク
ローン３ｄについては定期通信／障害検出部３２−４ａ
のみを記載している。The internal configuration of the process control unit 2 is not shown, and only the periodic communication / failure detection unit 32-4 and the clone generation unit 32-3 are described for the process A original 3, and the process A clone The regular communication / failure detection unit 32-4a for the process 3a and the process A reproduction clone 3d
Only listed.

【０１３６】該プロセスＡオリジナル３と該プロセスＡ
クローン３ａは互いの定期通信／障害検出部を介して定
期通信を行なっているが、プロセスＡクローン３ａから
の定期通信リクエストを連続して受信できなかった該プ
ロセスＡオリジナル３は該プロセスＡクローン３ａが障
害であると判定する（これが、図１３中に記載した障害
検出である。）。The process A original 3 and the process A
Although the clones 3a are performing regular communication via their mutual regular communication / failure detection units, the process A original 3 that cannot continuously receive the regular communication request from the process A clone 3a is the process A clone 3a. Is determined to be a failure (this is the failure detection described in FIG. 13).

【０１３７】この場合には、該プロセスＡオリジナル３
は該プロセスＡクローン３ａを強制的に終了させ（これ
が、図１３中に記載した強制終了である。）、該クロー
ン生成部３２−３によって定期通信／障害検出部３２−
４ａをロード、展開して再度プロセスＡのクローンを起
動し、これをプロセスＡ再生クローン３ｄとする（これ
が、図１３中に記載した再生である。）。In this case, the process A original 3
Forcibly terminates the process A clone 3a (this is the forced termination described in FIG. 13), and the periodic generation / failure detection unit 32-
4a is loaded and expanded, and the clone of the process A is started again, and this is set as the process A reproduction clone 3d (this is the reproduction described in FIG. 13).

【０１３８】このように、オリジナルがクローンの障害
を検出した時には、プロセス制御部２を介することな
く、自律的にオリジナルがクローンを再生、起動する。As described above, when the original detects a failure of the clone, the original autonomously reproduces and starts the clone without passing through the process control unit 2.

【０１３９】図１４は、定期通信／障害検出の基本動作
を説明する図（その３）で、オリジナルがクローンの障
害を検出するケースの動作を示すものである。以降、図
１４の符号に沿って上記動作を説明する。FIG. 14 is a diagram (part 3) for explaining the basic operation of periodic communication / failure detection, and shows the operation in the case where the original detects a clone failure. Hereinafter, the above operation will be described along the reference numerals in FIG.

【０１４０】Ｓ６１．オリジナルは前回の定期通信が終
了した後、タイムアウト・タイマをクリアして、Ｓ６２．クローンが定期通信のリクエストをしてくるの
を待機している。S61. The original clears the timeout timer after the previous periodical communication ends, and S62. The clone is waiting for a regular communication request.

【０１４１】今のケースでは、クローンが障害になって
いることを想定しているので、クローンは定期通信リク
エストを送信してこない。In the present case, since it is assumed that the clone has failed, the clone does not send a periodic communication request.

【０１４２】Ｓ６３．従って、オリジナルのタイムアウ
ト・タイマが所定時間の経過を検出するので、オリジナ
ルはリトライ・カウンタを歩進させる。S63. Accordingly, the original increments the retry counter because the original timeout timer detects the passage of a predetermined time.

【０１４３】この後、オリジナルがクローンに対して再
送要求を出す方式と、再送要求せずに定期通信リクエス
トを待つ方式とがあるが、いずれにしても、上記ステッ
プと同じステップ、即ち、Ｓ６４．タイムアウト・タイマをクリアして、Ｓ６５．クローンが定期通信のリクエストをしてくるの
を待機している。Thereafter, there are a method in which the original sends a retransmission request to the clone, and a method in which the original does not request a retransmission, and waits for a periodic communication request. In any case, the same step as the above step, that is, S64. Clear the timeout timer, S65. The clone is waiting for a regular communication request.

【０１４４】Ｓ６６．そして、オリジナルのタイムアウ
ト・タイマが所定時間の経過を検出するので、オリジナ
ルはリトライ・カウンタを歩進させる。を繰り返す。S66. Then, since the original timeout timer detects the elapse of the predetermined time, the original increments the retry counter. repeat.

【０１４５】Ｓ６７．このようにしてリトライ・カウン
タが所定回数に達したことを検出すると、オリジナルは
クローンが障害であると判定して、Ｓ６８．障害処理のルーチンに入る。S67. When it is detected that the retry counter has reached the predetermined number of times in this way, the original is determined to have a fault in the clone, and S68. Enter the failure handling routine.

【０１４６】図１５は、定期通信／障害検出のフローチ
ャートで、上記全ての動作を統合して図示したものであ
る。殆どの内容が既に説明されたものではあるが、全て
を統合した動作の説明は重要であるから、重複を顧みず
敢えて説明をする。FIG. 15 is a flowchart of the periodic communication / failure detection, in which all the above operations are integrated and illustrated. Although most of the contents have already been described, it is important to explain the operation in which all the operations are integrated.

【０１４７】尚、図１５はクローン主導型を想定し、
又、クローンからの定期通信リクエストが所定時間こな
かった場合にオリジナルは再送要求せずに次の所定時間
を待つという方式を想定して図示している。FIG. 15 assumes a clone-driven type.
The figure also assumes a method in which, when a regular communication request from a clone has not arrived for a predetermined time, the original does not request retransmission and waits for the next predetermined time.

【０１４８】Ｓ１．クローンは前回の定期通信の後、タ
イムアウト・タイマをクリアし、Ｓ２．オリジナルに対して定期通信リクエストを送信し
て、Ｓ３．オリジナルからのアンサーを待機している。S1. The clone clears the timeout timer after the previous periodic communication, and S2. Send a regular communication request to the original, and S3. Waiting for answer from original.

【０１４９】Ｓ４．オリジナルからのアンサーが受信さ
れたか否かを判定する。S4. Determine whether an answer from the original has been received.

【０１５０】Ｓ５．ステップＳ４でオリジナルからのア
ンサーが受信されないと判定された場合（Ｎｏ）には、
タイムアウト・タイマが所定の時間τに達しているか否
かを判定する。S5. If it is determined in step S4 that no answer from the original has been received (No),
It is determined whether the timeout timer has reached a predetermined time τ.

【０１５１】所定の時間τに達していないと判定された
場合（Ｎｏ）には、ステップＳ３に戻って待機を続け
る。If it is determined that the predetermined time τ has not been reached (No), the process returns to step S3 to continue waiting.

【０１５２】Ｓ６．ステップＳ５においてタイムアウト
・タイマが所定の時間τに達していると判定された場合
（Ｙｅｓ）には、リトライ・カウンタを歩進する。S6. If it is determined in step S5 that the timeout timer has reached the predetermined time τ (Yes), the retry counter is incremented.

【０１５３】Ｓ７．リトライ・カウンタのカウント値が
所定回数に達しているか否かを判定する。所定回数に達
していないと判定された場合（Ｎｏ）には、ステップＳ
３に戻る。S7. It is determined whether the count value of the retry counter has reached a predetermined number. If it is determined that the number has not reached the predetermined number (No), step S
Return to 3.

【０１５４】Ｓ８．ステップＳ７でリトライ・カウンタ
のカウント値が所定回数に達したと判定された場合（Ｙ
ｅｓ）には、障害処理のルーチンに入る。S8. When it is determined in step S7 that the count value of the retry counter has reached the predetermined number (Y
In es), a failure processing routine is entered.

【０１５５】即ち、オリジナルとクローンの定期通信を
することによって、クローンがオリジナルの障害を発見
することができる。That is, by performing regular communication between the original and the clone, the clone can find the original failure.

【０１５６】一方、オリジナルからアンサーが帰ってき
た場合には、ステップＳ４でアンサーの受信があった
（Ｙｅｓ）ことを検出できるので、ステップＳ９に移行
する。即ち、Ｓ９．データの展開要求があるか否かを判定し、Ｓ１０．ステップＳ９でデータの展開要求があると判定
された場合（Ｙｅｓ）には、データをメモリに展開す
る。On the other hand, if the answer has returned from the original, it can be detected in step S4 that the answer has been received (Yes), so the flow proceeds to step S9. That is, S9. It is determined whether there is a data expansion request, and S10. If it is determined in step S9 that there is a data development request (Yes), the data is developed in the memory.

【０１５７】これによって、オリジナルのデータをクロ
ーンが共有することができる。As a result, the original data can be shared by the clones.

【０１５８】Ｓ１１．ステップＳ９でデータの展開要求
がないと判定された場合（Ｎｏ）と、ステップＳ１０の
処理を終了した場合には、定期通信間隔Ｔだけ待機す
る。S11. If it is determined in step S9 that there is no data expansion request (No), and if the processing in step S10 has been completed, the process waits for the regular communication interval T.

【０１５９】Ｓ１２．そして、リトライ・カウンタをク
リアしてステップＳ１に戻る。S12. Then, the retry counter is cleared and the process returns to step S1.

【０１６０】一方、オリジナルは次のように動作する。On the other hand, the original operates as follows.

【０１６１】Ｓ１５．前回の定期通信の後、タイムアウ
ト・タイマをクリアして、Ｓ１６．クローンからの定期通信リクエストを待機して
いる。S15. After the previous regular communication, the timeout timer is cleared, and S16. Waiting for regular communication request from clone.

【０１６２】Ｓ１７．クローンからの定期通信リクエス
トを受信した否か判定する。S17. It is determined whether a regular communication request from the clone has been received.

【０１６３】Ｓ１８．ステップＳ１７において、クロー
ンからの定期通信リクエストを受信していないと判定さ
れた場合（Ｎｏ）には、タイムアウト・タイマが所定の
時間τの経過を検出したか否かを判定する。所定時間τ
の経過を検出していない場合（Ｎｏ）には、ステップＳ
１６に戻る。S18. If it is determined in step S17 that the periodic communication request has not been received from the clone (No), it is determined whether or not the timeout timer has detected the elapse of the predetermined time τ. Predetermined time τ
If the elapsed time has not been detected (No), step S
Return to 16.

【０１６４】Ｓ１９．ステップＳ１８において、所定の
時間τが経過したと判定された場合（Ｙｅｓ）には、リ
トライ・カウンタを歩進する。S19. If it is determined in step S18 that the predetermined time τ has elapsed (Yes), the retry counter is incremented.

【０１６５】Ｓ２０．該リトライ・カウンタのカウント
値が所定回数に達したか否かを判定し、所定回数に達し
ていないと判定された場合（Ｎｏ）にはステップＳ１６
に戻る。S20. It is determined whether or not the count value of the retry counter has reached a predetermined number. If it is determined that the count has not reached the predetermined number (No), step S16 is performed.
Return to

【０１６６】Ｓ２１．一方、ステップＳ２０で所定回数
に達したと判定された場合（Ｙｅｓ）には、障害処理の
ルーチンに入る。S21. On the other hand, if it is determined in step S20 that the number of times has reached the predetermined number (Yes), a failure processing routine is entered.

【０１６７】即ち、オリジナルとクローンが定期通信を
することによってオリジナルがクローンの障害を発見す
ることができる。That is, the original can detect a failure of the clone by performing regular communication between the original and the clone.

【０１６８】さて、クローンから定期通信リクエストが
受信されると、ステップＳ１７では定期通信リクエスト
の受信ありにＹｅｓ）と判定されるので、ステップＳ２
２に移行する。When a regular communication request is received from the clone, it is determined in step S17 that the regular communication request has been received (Yes).
Move to 2.

【０１６９】Ｓ２２．定期通信リクエストに対してアン
サーする必要性があるか否か判定する。S22. It is determined whether it is necessary to answer the periodic communication request.

【０１７０】Ｓ２３．ステップＳ２２においてアンサー
する必要性があると判定された場合（Ｙｅｓ）には、ア
ンサーを編集してクローンに対して送信する。S23. If it is determined in step S22 that the answer is necessary (Yes), the answer is edited and transmitted to the clone.

【０１７１】Ｓ２４．ステップＳ２２でアンサーの必要
性がないと判定された場合（Ｎｏ）と、ステップＳ２３
の処理が終了した場合には定期通信間隔Ｔだけ待機す
る。S24. When it is determined that there is no need for an answer in step S22 (No), step S23
Is completed, the process waits for the regular communication interval T.

【０１７２】Ｓ２５．そして、リトライ・カウンタをク
リアして、ステップＳ１５に戻る。S25. Then, the retry counter is cleared, and the process returns to step S15.

【０１７３】ここでは、クローン主導で定期通信場合を
説明したが、オリジナル主導で定期通信することが可能
であることは容易に想到しうることである。Here, the case of the regular communication led by the clone has been described, but it is easily conceivable that the regular communication can be led by the original.

【０１７４】又、オリジナルがクローンからの定期通信
リクエストがこないと判定した後、再送要求をする方式
も上記の方式を若干変更して実現できることも容易に理
解できる。Further, it can be easily understood that the method of making a retransmission request after determining that the original does not receive a regular communication request from the clone can be realized by slightly changing the above method.

【０１７５】さて、図２の説明において、プロセス制御
部のクローンにはオリジナルと同じプログラムをロード
するのが好ましいと記載し、一方、各プロセスのクロー
ンには定期通信／障害検出部をロードすればよいと記載
した。そして、各プロセスのクローンには定期通信／障
害検出部をロードすればよいことはその後の説明で明ら
かになっている。In the description of FIG. 2, it is described that it is preferable to load the same program as the original in the clone of the process control unit. On the other hand, if the regular communication / failure detection unit is loaded in the clone of each process. It was described as good. It is clear from the following description that the periodic communication / failure detection unit may be loaded into the clone of each process.

【０１７６】そこで、上記の理由を説明する。Thus, the above-mentioned reason will be described.

【０１７７】通常の各プロセスのオリジナルが障害であ
るとクローンが検出した時には、既に説明したように障
害を検出したクローンがプロセス制御部にその旨通知し
て、クローンのオリジナル化と障害になったオリジナル
の強制終了をプロセス制御部に行なってもらうことが可
能である。When a clone detects that the original of each process is faulty, the clone that has detected the fault notifies the process control unit to that effect as described above, and the original copy of the clone and the fault occur. The original forced termination can be performed by the process control unit.

【０１７８】一方、プロセス制御部のオリジナルが障害
であると検出される場合を想定する時、プロセス制御部
のクローンに定期通信／障害検出部のみがロードされて
いる場合には、プロセス制御部のオリジナルが障害であ
ることを表示することが可能である。そして、この障害
表示を契機にして図１を用いて説明した起動プロセスを
再度実行すれば障害になったプロセス制御部の復旧をす
ることができる。On the other hand, when assuming that the original of the process control unit is detected as a failure, if only the periodic communication / failure detection unit is loaded in the clone of the process control unit, the process control unit It is possible to indicate that the original is an obstacle. Then, by executing the startup process described with reference to FIG. 1 again in response to the failure display, the failed process control unit can be recovered.

【０１７９】ただ、図１で説明した起動プロセスによっ
てプロセス制御部の起動を行なうと、各プロセスも自動
的に再度起動されることになり、各プロセスが保有する
運用管理データ及び処理データを消去されてしまうこと
になるので、それを防止するために各プロセスが保有す
る運用管理データ及び処理データを一旦ダウン・ロード
してから図１で説明した起動プロセスを実行する必要が
ある。However, when the process control unit is activated by the activation process described with reference to FIG. 1, each process is automatically activated again, and the operation management data and processing data possessed by each process are deleted. In order to prevent this, it is necessary to download the operation management data and processing data held by each process once and then execute the start-up process described with reference to FIG.

【０１８０】これに対して、プロセス制御部のクローン
がオリジナルと同じプログラムを共有していれば、クロ
ーンはオリジナルと全く同じ機能を備えているので、図
１３においてオリジナルがクローンを強制終了させるの
と同じように、プロセス制御部のクローンが障害になっ
たプロセス制御部のオリジナルを強制終了させ、一旦プ
ログラム・メモリ上から障害になったプロセス制御部の
オリジナルを消去することができる。On the other hand, if the clone of the process control unit shares the same program as the original, the clone has exactly the same function as the original. Similarly, the clone of the process control unit can forcibly terminate the original of the failed process control unit and erase the original of the failed process control unit from the program memory.

【０１８１】しかも、それまでクローンであっても全て
のプログラムとデータを保有しているので、自身がオリ
ジナルに変わることができる。In addition, since all the programs and data are retained even if they are clones, they can be changed to originals.

【０１８２】そして、図１３においてオリジナルがクロ
ーンを再生するのと同じように、新たにオリジナルにな
ったプロセス制御部が新たなクローンを再生することが
できる。Then, in the same way as the original reproduces the clone in FIG. 13, the process control unit which has become the new original can reproduce the new clone.

【０１８３】従って、プロセス制御部についてはクロー
ンにもオリジナルと同じプログラムとデータを持たせて
おけば、プロセス制御部のオリジナルが障害になっても
自律的に障害復旧させることが可能になる。Therefore, if the clone has the same program and data as the original in the process control section, it is possible to autonomously recover the failure even if the original of the process control section fails.

【０１８４】上記の意味で、図２の説明においてプロセ
ス制御部の場合にはオリジナルと同じプログラムをクロ
ーンにもロードするのが好ましいと記載したのである。In the above sense, it has been described in the description of FIG. 2 that in the case of the process control unit, it is preferable to load the same program as the original into the clone.

【０１８５】ただ、プロセス制御部のオリジナルとクロ
ーンに同じプログラムとデータを持たせるとシステム規
模が大きくならざるを得ない。しかし、通常は、システ
ム内には多数のプロセスがロードされており、それら多
数のプロセスについてはクローンには最小限のリソース
しか与えず、完全二重化するのはプロセス制御部だけで
あるので、その影響は軽微である。However, if the original program and the clone of the process control unit have the same program and data, the system scale must be increased. However, usually, a large number of processes are loaded in the system, and these clones give the clone only minimal resources, and only the process control unit performs full duplication. Is minor.

【０１８６】[0186]

【発明の効果】以上詳述した如く、本発明により、シス
テム規模の肥大化を回避できるシステムの運用方式を実
現することができ、又、上記システムの運用に当たって
構成要素であるオリジナル・プロセスとクローン・プロ
セスの障害を確実に検出することが可能なシステムの運
用方式を実現することができ、更に、オリジナル・プロ
セスとクローン・プロセスとの間でデータを共有するこ
とが可能なシステムの運用方式を実現することができ
る。As described above in detail, according to the present invention, it is possible to realize a system operation method capable of avoiding an increase in the scale of a system. -A system operation method capable of reliably detecting a process failure can be realized, and a system operation method capable of sharing data between an original process and a clone process can be realized. Can be realized.

【０１８７】従って、経済的な負担や設置スペースの増
加を避けながら、通信システムや情報処理システムの信
頼度を向上することができる。Therefore, the reliability of the communication system or the information processing system can be improved while avoiding an economic burden and an increase in installation space.

[Brief description of the drawings]

【図１】本発明のシステム構成概要と本発明のシステ
ムの起動を説明する図。FIG. 1 is a diagram for explaining an outline of a system configuration according to the present invention and explaining activation of a system according to the present invention.

【図２】本発明によるシステム内部の構成。FIG. 2 shows the internal configuration of the system according to the present invention.

【図３】システム内の通信リソース。FIG. 3 shows communication resources in the system.

【図４】プロセス種別の判定を説明する図。FIG. 4 is a diagram illustrating a process type determination.

【図５】起動プロセス登録テーブルの構成例。FIG. 5 is a configuration example of a startup process registration table.

【図６】クローンの起動を説明する図。FIG. 6 is a view for explaining activation of a clone.

【図７】オリジナル・クローン間のデータの共有を説
明する図。FIG. 7 is a view for explaining data sharing between original clones.

【図８】定期通信／障害検出の基本動作を説明する図
（その１）。FIG. 8 is a view for explaining the basic operation of periodic communication / failure detection (part 1).

【図９】定期通信のデータの構成例。FIG. 9 is a configuration example of data of periodic communication.

【図１０】オリジナルの障害検出とプロセス強制終了
を説明する図。FIG. 10 is a diagram for explaining original failure detection and process forced termination.

【図１１】定期通信／障害検出の基本動作を説明する
図（その２）。FIG. 11 is a view for explaining a basic operation of periodic communication / failure detection (part 2).

【図１２】クローンのオリジナル化と新クローンの生
成を説明する図。FIG. 12 is a view for explaining originalization of clones and generation of new clones.

【図１３】クローン暴走時の障害検出とクローンの再
生を説明する図。FIG. 13 is a view for explaining failure detection and clone reproduction at the time of clone runaway.

【図１４】定期通信／障害検出の基本動作を説明する
図（その３）。FIG. 14 is a diagram illustrating the basic operation of periodic communication / failure detection (part 3).

【図１５】定期通信／障害検出のフローチャート。FIG. 15 is a flowchart of periodic communication / failure detection.

【図１６】従来の二重化システムの運用方式の構成。FIG. 16 shows a configuration of a conventional operation system of a duplex system.

【図１７】従来の障害復旧方式の構成。FIG. 17 shows a configuration of a conventional failure recovery system.

[Explanation of symbols]

１ハード・ディスク２プロセス制御部オリジナル２ａプロセス制御部クローン３プロセスＡオリジナル、プロセスＡ３ａプロセスＡクローン３ｂプロセスＡ新オリジナル３ｃプロセスＡ再生クローン３ｄプロセスＡ再生クローン４プロセスＢオリジナル４ａプロセスＢクローン５プロセス間通信リソース６共通クローン通信リソース７オリジナル−クローン間通信リソース１１プロセス制御部ロード・モジュール１２プロセスＡロード・モジュール１３プロセスＢロード・モジュール１４ダイナミック・リンク・ライブラリ２１運用プログラム２２運用管理プログラム３１運用プログラム３２運用管理プログラム４１運用プログラム４２運用管理プログラム２１−１プロセス起動／制御部２１−２ダイナミック・リンク・ライブラリ読み込
み部２２−１起動プロセス登録テーブル２２−２定期通信／障害検出部３１−１ダイナミック・リンク・ライブラリ読み込み
部３２−１起動プロセス検出部３２−２プロセス種別判定部３２−３クローン生成部３２−４定期通信／障害検出部３２−４ａ定期通信／障害検出部４１−１ダイナミック・リンク・ライブラリ読み込み
部５１第一の通信装置のハードウェア５２第一の通信装置のアプリケーション・プログラム５３第二の通信装置のハードウェア５４第二の通信装置のアプリケーション・プログラム５５監視切替装置６１プロセス制御部６２プロセスＡ６３プロセスＢ６４プロセスＣ６１−１障害検出手段６１−２障害復旧手段DESCRIPTION OF SYMBOLS 1 Hard disk 2 Process control part original 2a Process control part clone 3 Process A original, process A 3a Process A clone 3b Process A new original 3c Process A reproduction clone 3d Process A reproduction clone 4 Process B original 4a Process B clone 5 Process Inter-communication resources 6 Common clone communication resources 7 Original-clone communication resources 11 Process control unit load module 12 Process A load module 13 Process B load module 14 Dynamic link library 21 Operation program 22 Operation management program 31 Operation program 32 Operation Management Program 41 Operation Program 42 Operation Management Program 21-1 Process Activation / Control Unit 21-2 Dynami Link library reading unit 22-1 startup process registration table 22-2 periodic communication / failure detection unit 31-1 dynamic link library reading unit 32-1 startup process detection unit 32-2 process type determination unit 32- 3 Clone Generator 32-4 Periodic Communication / Fault Detector 32-4a Periodic Communication / Fault Detector 41-1 Dynamic Link Library Reading Unit 51 Hardware of First Communication Device 52 Application of First Communication Device Program 53 Hardware of second communication device 54 Application program of second communication device 55 Monitoring and switching device 61 Process control unit 62 Process A 63 Process B 64 Process C 61-1 Failure detection means 61-2 Failure recovery means

フロントページの続き (72)発明者後藤祐治福岡県福岡市早良区百道浜２丁目２番１号富士通九州通信システム株式会社内 (72)発明者藤吉勝幸福岡県福岡市早良区百道浜２丁目２番１号富士通九州通信システム株式会社内 (72)発明者光野幸雄福岡県福岡市早良区百道浜２丁目２番１号富士通九州通信システム株式会社内 (72)発明者名本大輔福岡県福岡市早良区百道浜２丁目２番１号富士通九州通信システム株式会社内Ｆターム(参考） 5B034 BB02 CC03 5B042 JJ04 JJ08 5B045 JJ02 JJ12 JJ42 JJ45 JJ48 5B089 GA01 GB02 HA01 JA40 JB17 KA12 KB06 KC30 LB14 MC02 MD02 MD03 ME15 Continuation of the front page (72) Inventor Yuji Goto 2-2-1 Momichihama, Sawara-ku, Fukuoka, Fukuoka Prefecture Inside Fujitsu Kyushu Communication Systems Co., Ltd. (72) Katsuyuki Fujiyoshi 2-2-1 Momichihama, Sawara-ku, Fukuoka, Fukuoka No. Fujitsu Kyushu Communication System Co., Ltd. (72) Inventor Yukio Mitsuno 2-2-1 Momichihama, Sawara-ku, Fukuoka, Fukuoka Prefecture Inventor Daisuke Namoto Mochihama, Sawara-ku, Fukuoka, Fukuoka 2-2-1 Fujitsu Kyushu Communication System Co., Ltd. F-term (reference) 5B034 BB02 CC03 5B042 JJ04 JJ08 5B045 JJ02 JJ12 JJ42 JJ45 JJ48 5B089 GA01 GB02 HA01 JA40 JB17 KA12 KB06 KC30 LB14 MC02 MD02 MD03 ME15

Claims

[Claims]

A process control unit for controlling operations of all processes provided in the system; an original process including all of an operation program for system operation and an operation management program for managing system operation;
A clone process having a minimum necessary operation management program among the operation management programs of the original process, wherein the original process and the clone
A system operation method characterized in that periodic communication is performed between processes.

2. The system operation method according to claim 1, wherein, for the process control unit, all programs and all data are shared by an original and a clone.

3. The system operation method according to claim 1, wherein said original process and said clone process share data by said periodic communication.

4. The system operation method according to claim 1, wherein the periodic communication detects that the original process detects a failure of the clone process, and that the clone process detects a failure of the original process. The operating system of the featured system.

5. The automatic recovery method according to claim 4, wherein when the clone process detects a failure of the original process, the clone process notifies the clone process of the original process. An automatic failure recovery method, in which the control unit recovers outside the original process where the failure occurred.

6. An automatic failure recovery method in the system operation method according to claim 4, wherein when the original process detects a failure of the clone process, the original process autonomously performs the recovery. An automatic failure recovery method characterized by regenerating a clone process.