JPH06202897A

JPH06202897A - Method and device for resume in network system

Info

Publication number: JPH06202897A
Application number: JP4361198A
Authority: JP
Inventors: Yoichi Toguchi; 洋一戸口; Keisuke Noda; 敬祐野田
Original assignee: Omron Corp; Omron Tateisi Electronics Co
Current assignee: Omron Corp
Priority date: 1992-12-29
Filing date: 1992-12-29
Publication date: 1994-07-22

Abstract

PURPOSE:To improve the reliability of computer systems connected to the net work system. CONSTITUTION:The resume data of the main memory of an existent computer system are always stored in a resume data storage section 22 of a resume device 20. When the generation of fault in any existent computer system is detected by a fault detection section 21, the data on existent computer system where the fault is generated are read out from the resume data storage section 22 and sent to the standby computer system by a resume processing section 24. The standby computer system continues the processing of the existent computer system where the fault is generated based on the transmitted resume data.

Description

Detailed Description of the Invention

【０００１】[0001]

【技術分野】この発明は，ネットワーク・システムにお
けるリジューム装置および方法に関する。TECHNICAL FIELD The present invention relates to a resume device and method in a network system.

【０００２】[0002]

【従来技術とその問題点】従来，ネットワーク・システ
ムに接続されているコンピュータ・システムにおいて，
コンピュータ・システムの信頼性を向上させるために次
のようなものが知られている。[Prior Art and its Problems] Conventionally, in a computer system connected to a network system,
The following are known to improve the reliability of computer systems.

【０００３】その１つは，コンピュータ・システムに同
一処理を多重化して行なわせ，これらの処理結果の多数
決を取っていた。しかしながら，この方法によると，多
重化した処理が可能な特殊な構成をコンピュータ・シス
テムに設けることが必要になり，コストが高くなるとい
う問題点があった。One of them is to make a computer system perform the same processing in a multiplexed manner and take a majority decision of these processing results. However, according to this method, it is necessary to provide the computer system with a special configuration capable of performing multiplexed processing, which causes a problem of high cost.

【０００４】他の１つは，現有系のコンピュータ・シス
テムに障害が発生した場合に備えて，待機状態にある待
機系コンピュータ・システムを用意しておくものであ
る。しかしながら，待機系コンピュータ・システムは予
備的に用意されているだけで，障害が発生した現有系コ
ンピュータ・システムの処理を動的にかつ短時間に引継
ぐことはできない。また，引継ぐ処理はトランザクショ
ン処理などのその都度完結する処理に限られるという問
題点があった。The other one is to prepare a standby computer system in a standby state in preparation for a failure in the existing computer system. However, the standby computer system is only prepared in advance, and it is not possible to dynamically take over the processing of the faulty existing computer system in a short time. Further, there is a problem that the process to be taken over is limited to a process that is completed each time such as a transaction process.

【０００５】さらに，現有系のコンピュータ・システム
の電源が切断されたときには，リジューム機能を有する
コンピュータ・システムも存在する。しかしながら，こ
のコンピュータ・システムは電源の切断の場合のみでそ
の他の障害に対応できないという問題点があった。Further, there is a computer system having a resume function when the power of the existing computer system is cut off. However, this computer system has a problem that it cannot cope with other failures only when the power is turned off.

【０００６】[0006]

【発明の開示】第１の発明は，ネットワーク・システム
に接続されたコンピュータ・システムの信頼性を高める
ことを目的としている。DISCLOSURE OF THE INVENTION A first aspect of the present invention aims to improve the reliability of a computer system connected to a network system.

【０００７】第１の発明によるネットワーク・システム
におけるるリジューム装置は，複数台の現有系コンピュ
ータ・システムおよび少なくとも１台の待機系コンピュ
ータ・システムを備えたネットワーク・システムにおけ
る装置であり，上記現有系コンピュータ・システムのう
ちの所定のコンピュータ・システムが行なっていた処理
を引継ぐために必要な，上記所定のコンピュータ・シス
テムから与えられるデータに基づいて常時更新されるリ
ジューム・データを格納する記憶手段，上記所定のコン
ピュータ・システムの障害を検知し，障害を検知する
と，障害発生信号を出力する障害検知手段，および上記
障害発生信号に応答して，上記記憶手段に格納されてい
るリジューム・データを上記待機系コンピュータ・シス
テムに転送する手段を備えている。A resume device in a network system according to a first aspect of the present invention is a device in a network system including a plurality of currently active computer systems and at least one standby computer system. Storage means for storing resume data constantly updated based on data provided from the predetermined computer system, necessary for taking over the processing performed by the predetermined computer system in the system; A failure of the computer system is detected, and when the failure is detected, a failure detection means for outputting a failure occurrence signal, and in response to the failure occurrence signal, the resume data stored in the storage means is transferred to the standby computer. .Means for transferring to system It is provided.

【０００８】第１の発明によるネットワーク・システム
におけるリジューム方法は，複数台の現有系コンピュー
タ・システムおよび少なくとも１台の待機系コンピュー
タ・システムを備えたネットワーク・システムにおける
方法であり，上記現有系コンピュータ・システムのうち
の所定のコンピュータ・システムが行なっていた処理を
引継ぐために必要な，上記所定のコンピュータ・システ
ムから与えられるデータに基づいて常時更新されるリジ
ューム・データを記憶手段に格納し，上記所定のコンピ
ュータ・システムの障害を検知し，障害を検知すると，
障害発生信号を出力し，上記障害発生信号に応答して，
上記記憶手段に格納されているリジューム・データを上
記待機系コンピュータ・システムに転送するものであ
る。A resume method in a network system according to a first aspect of the present invention is a method in a network system including a plurality of current active computer systems and at least one standby computer system. The resume data, which is constantly updated based on the data provided from the predetermined computer system, necessary for taking over the processing performed by the predetermined computer system in the system, is stored in the storage means, and the predetermined data is stored in the storage means. When a failure of a computer system is detected and the failure is detected,
Outputs a fault occurrence signal and responds to the above fault occurrence signal by
The resume data stored in the storage means is transferred to the standby computer system.

【０００９】第１の発明によると，所定の現有系コンピ
ュータ・システムの障害発生に備えてリジューム・デー
タを記憶手段に格納しておく。障害が発生すると，リジ
ューム・データを待機系コンピュータ・システムに転送
することによって，待機系コンピュータ・システムが所
定の現有系コンピュータ・システムの処理を継続でき
る。According to the first aspect of the present invention, the resume data is stored in the storage means in preparation for the occurrence of a failure in a predetermined existing computer system. When a failure occurs, by transferring the resume data to the standby computer system, the standby computer system can continue the processing of a predetermined existing computer system.

【００１０】したがって，既存のコンピュータ・システ
ムにリジューム装置を接続することで特殊なコンピュー
タ・システムを必要としないので，コスト・ダウンにな
る。また，現有系のコンピュータ・システムの処理を動
的にかつ短時間に引継ぐことができる。引継ぐ処理につ
いてもトランザクション処理のようなその都度完結する
処理に限られるという制約も受けない。Therefore, since a special computer system is not required by connecting the resume device to the existing computer system, the cost is reduced. Moreover, the processing of the existing computer system can be dynamically taken over in a short time. The process to be taken over is not limited to the process that is completed each time such as transaction process.

【００１１】第２の発明は，第１の発明と同様にネット
ワーク・システムに接続されたコンピュータ・システム
の信頼性を高めることを目的としている。A second aspect of the invention is to improve the reliability of a computer system connected to a network system as in the first aspect.

【００１２】第２の発明によるネットワーク・システム
におけるリジューム方法は，現有系の複数台のコンピュ
ータ・システムにおいて，少なくとも１台の待機系コン
ピュータ・システム，および上記現有系コンピュータ・
システムのコンテキスト・ファイルを格納するファイル
・サーバを各現有系のコンピュータ・システムと通信可
能に設け，上記各現有系コンピュータ・システムが行な
っている処理を継続するために必要なコンテキスト・フ
ァイルを常時上記ファイル・サーバに格納しておき，上
記現有系コンピュータ・システムのいずれかに障害が発
生したかどうかを上記ファイル・サーバにおいてチェッ
クし，上記現有系のコンピュータ・システムのいずれか
に障害が発生したことが検知されたときに，障害が発生
したコンピュータ・システムのコンテキスト・ファイル
を上記待機系コンピュータ・システムに，障害が発生し
ていない現有系コンピュータ・システムの各コンテキス
ト・ファイルを対応する現有系コンピュータ・システム
にそれぞれ伝送し，上記待機系コンピュータ・システム
は受信したコンテキスト・ファイルに基づいて上記障害
が発生したコンピュータ・システムの処理を引継ぎ，上
記障害が発生してないコンピュータ・システムは受信し
たコンテキスト・ファイルに基づいてそのコンテキスト
・ファイルの時点からの処理を再開するものである。According to a second aspect of the present invention, there is provided a resume method in a network system, wherein in a plurality of existing computer systems, at least one standby computer system, and the above existing computer system.
A file server that stores the context file of the system is provided so as to be communicable with each existing computer system, and the context file necessary for continuing the processing performed by each existing computer system is always described above. It is stored in a file server, and it is checked in the above file server whether or not any of the above existing computer systems has failed, and any one of the above current computer systems has failed. When a failure is detected, the context file of the computer system in which the failure has occurred is assigned to the standby computer system, and the context file of the current active computer system in which the failure has not occurred corresponds to the current owned computer system. Each transmitted to the system The standby computer system takes over the processing of the computer system in which the failure has occurred based on the received context file, and the computer system in which the failure has not occurred does The processing is restarted from the time of the file.

【００１３】第２の発明によると，各現有系のコンピュ
ータ・システムの障害発生に備えてコンテキスト・ファ
イルをファイル・サーバに格納しておく。現有系のコン
ピュータ・システムのいずれかに障害が発生すると，障
害が発生した現有系コンピュータ・システムのコンテキ
スト・ファイルが待機系コンピュータ・システムに転送
することによって，障害が発生した現有系コンピュータ
・システムの処理が待機系コンピュータ・システムに引
継がれる。According to the second aspect of the present invention, the context file is stored in the file server in preparation for the occurrence of a failure in each existing computer system. When a failure occurs in any of the current owned computer systems, the context file of the failed current owned computer system is transferred to the standby computer system, so that the failure of the current owned computer system occurs. The processing is handed over to the standby computer system.

【００１４】したがって，コンピュータ・システムは特
殊なコンピュータ・システムを必要とせず，既存のコン
ピュータ・システムを用いて信頼性を高めることがで
き，フォルト・トラレント・ネットワークを構成でき
る。また現有系のコンピュータ・システムのコンテキス
ト・ファイルをファイル・サーバに格納しておくので，
現有系のコンピュータ・システムはコンテキスト・ファ
イルをファイル・サーバに格納した時点の処理から何度
でも引継ぐことができるので，ネットワーク・システム
の解析に利用できる。Therefore, the computer system does not need a special computer system, reliability can be improved by using an existing computer system, and a fault-tolerant network can be constructed. Since the context file of the existing computer system is stored in the file server,
Since the existing computer system can inherit the context file any number of times from the processing when the context file was stored in the file server, it can be used for network system analysis.

【００１５】[0015]

[Explanation of Examples]

第１実施例図１は，第１の発明の実施例（第１実施例）によるネッ
トワーク・システムの構成を示すものである。First Embodiment FIG. 1 shows the configuration of a network system according to an embodiment (first embodiment) of the first invention.

【００１６】ネットワーク・システムは，伝送路10に複
数台のコンピュータ・システムおよびリジューム装置20
が接続されてなる。実施例においては，４台のコンピュ
ータ・システム11〜14が設けられている。The network system includes a plurality of computer systems and a resume device 20 on the transmission line 10.
Are connected. In the preferred embodiment, four computer systems 11-14 are provided.

【００１７】コンピュータ・システム11〜14のうちコン
ピュータ・システム11〜13は現有系であり，コンピュー
タ・システム14は待機系のものである。リジューム装置
20は，ネットワーク・システムの中で特に重要な役割を
持ち，かつ高い信頼性が必要な現有系のコンピュータ・
システム11に接続される。リジューム装置20はコンピュ
ータ・システム11に限らず，その他の現有系コンピュー
タ・システム12，13に接続することもできる。また，１
台のリジューム装置20を複数台の現有系コンピュータ・
システムで共用することもできる。Of the computer systems 11-14, the computer systems 11-13 are currently active systems, and the computer system 14 is a standby system. Resume device
20 is an existing computer system that has a particularly important role in network systems and requires high reliability.
Connected to system 11. The resume device 20 can be connected not only to the computer system 11 but also to other existing computer systems 12 and 13. Also, 1
Resume device 20 with multiple existing computers
It can also be shared by the system.

【００１８】現有系コンピュータ・システム11〜13は，
それぞれにあらかじめ定められた，または伝送路10を通
して指令させた通常処理を行っており，必要に応じて相
互に伝送路10を介して通信も行う。コンピュータ・シス
テム11〜14の通信には，信頼性の高い通信プロトル，た
とえばｔｃｐ／ｉｐを用いる。The existing computer systems 11 to 13 are
Normal processing that is predetermined for each of them or that is instructed through the transmission path 10 is performed, and communication is also performed through the transmission path 10 as necessary. A highly reliable communication protocol, for example, tcp / ip is used for communication of the computer systems 11-14.

【００１９】待機系のコンピュータ・システム14は，リ
ジューム装置20が接続されている現有系のコンピュータ
・システム11において障害が発生した場合に，そのコン
ピュータ・システムの代わりとなるものである。したが
って，待機系のコンピュータ・システム14は，現有系の
コンピュータ・システム11に障害が発生するまでは待機
状態にある。現有系のコンピュータ・システム11に障害
が発生すると，それまで行なっていたコンピュータ・シ
ステム11の処理を待機系のコンピュータ・システム14が
引継ぐことになる。The standby computer system 14 is a substitute for the computer system 11 of the existing system to which the resume device 20 is connected, when a failure occurs. Therefore, the standby computer system 14 is in a standby state until a failure occurs in the existing computer system 11. When a failure occurs in the existing computer system 11, the standby computer system 14 takes over the processing of the computer system 11 that has been performed until then.

【００２０】図２はリジューム装置20の電気的構成を示
すブロック図である。FIG. 2 is a block diagram showing the electrical construction of the resume device 20.

【００２１】リジューム装置20は，障害検知部21，リジ
ューム・データ記憶部22，リジューム・データ通信部2
2，およびリジューム処理部24により構成される。The resume device 20 includes a failure detection unit 21, a resume data storage unit 22, and a resume data communication unit 2.
2, and the resume processing unit 24.

【００２２】障害検知部21は，現有系のコンピュータ・
システム11から送られてくるアライブ・メッセージに基
づいて現有系のコンピュータ・システム11の障害を検知
する。アライブ・メッセージは，現有系のコンピュータ
・システム11の稼動状態を表わす信号である。障害検知
部21は，現有系のコンピュータ・システム11の障害を検
知すると，リジューム処理部24に障害検知信号を与え
る。The failure detection unit 21 is a computer of the existing system.
A fault in the existing computer system 11 is detected based on the alive message sent from the system 11. The alive message is a signal indicating the operating state of the existing computer system 11. When the failure detection unit 21 detects a failure in the existing computer system 11, it gives a failure detection signal to the resume processing unit 24.

【００２３】記憶部22は，現有系コンピュータ・システ
ム11のリジューム・データを格納するものである。リジ
ューム・データとは，この実施例では待機系のコンピュ
ータ・システム14が処理を引継ぐべき現有系コンピュー
タ・システム11のメイン・メモリに記憶されている全て
のデータを意味する。これによって，現有系コンピュー
タ・システム11に障害が発生したときに待機系コンピュ
ータ・システム14は現有系コンピュータ・システム11の
行っていた処理をそのまま引続いて続行することができ
る。The storage unit 22 stores the resume data of the existing computer system 11. In this embodiment, the resume data means all data stored in the main memory of the existing computer system 11 in which the standby computer system 14 should take over the processing. This allows the standby computer system 14 to continue the processing performed by the active computer system 11 as it is when a failure occurs in the active computer system 11.

【００２４】リジューム・データは現有系のコンピュー
タ・システム11のメイン・メモリのデータに変化が生じ
たときに更新される必要がある。コンピュータ・システ
ム11の始動時にそのメイン・メモリの全データがリジュ
ーム・データ記憶部22に転送されて記憶される。その後
は，メイン・メモリのデータの変更のために変更のあっ
たデータのみがコンピュータ・システム11から記憶部22
に転送されて，記憶部22のデータがメイン・メモリのデ
ータと常に同じになるように更新される。たとえば，コ
ンピュータ・システム11はそのシステム内のファイル
（たとえば外部メモリ）にアクセスした直後に，このフ
ァイル・アクセスによって変更のあったメイン・メモリ
のデータをリジューム・データ記憶部22へ転送する。ま
た，コンピュータ・システム11はコンピュータ・システ
ム12または13と通信を行なった直後に，この通信によっ
て変更のあったメイン・メモリのデータを記憶部22へ転
送する。一定時間ごとにメイン・メモリの状態を監視
し，変更のあったデータを一定時間ごとに記憶部22に送
るようにしてもよい。プログラムの適所にデータ転送・
コマンドを入れ込んでおいて，転送するデータは圧縮を
施すことにより転送時間が短縮される。The resume data needs to be updated when the data in the main memory of the existing computer system 11 changes. When the computer system 11 is started up, all the data in its main memory is transferred to the resume data storage unit 22 and stored therein. After that, only the changed data is changed from the computer system 11 to the storage unit 22 due to the change of the data in the main memory.
And is updated so that the data in the storage unit 22 is always the same as the data in the main memory. For example, the computer system 11 transfers the data in the main memory changed by the file access to the resume data storage unit 22 immediately after accessing the file (for example, external memory) in the system. Immediately after communicating with the computer system 12 or 13, the computer system 11 transfers the data in the main memory changed by this communication to the storage unit 22. The state of the main memory may be monitored at regular time intervals and the changed data may be sent to the storage unit 22 at regular time intervals. Data transfer to the right place in the program
The transfer time is shortened by inserting the command and compressing the data to be transferred.

【００２５】リジューム処理部24は，障害検知部21が現
有系コンピュータ・システム11の障害を検知したときに
入力する障害検知信号に基づいて，記憶部22に格納され
ているリジューム・データを待機状態にある待機系コン
ピュータ・システム14に転送するようにリジューム・デ
ータ通信部23を制御する。The resume processing section 24 waits for the resume data stored in the storage section 22 based on the failure detection signal input when the failure detection section 21 detects a failure in the existing computer system 11. The resume data communication unit 23 is controlled so as to transfer to the standby computer system 14 in FIG.

【００２６】リジューム・データ通信部23は，現有系の
コンピュータ・システム11に障害が発生するとリジュー
ム処理部24からの指令により記憶部22に格納されている
リジューム・データを待機状態にあるコンピュータ・シ
ステムに伝送路10を介して転送する。The resume data communication unit 23 is a computer system which is in a standby state for the resume data stored in the storage unit 22 in response to a command from the resume processing unit 24 when a failure occurs in the existing computer system 11. To be transmitted via the transmission line 10.

【００２７】待機状態にあったコンピュータ・システム
14は，リジューム装置20から障害が発生したコンピュー
タ・システム11のリジューム・データを受信すると，受
信したリジューム・データに基づいて現有系のコンピュ
ータ・システム11が行なっていた処理を引継ぐ。コンピ
ュータ・システム14は，コンピュータ・システム14のネ
ットワーク・システムにおけるアドレスを障害が発生す
るまで現有系のコンピュータ・システム11が持っていた
ネットワーク・システムのアドレスに切換える。アドレ
スを切換えると，コンピュータ・システム14は現有系の
コンピュータ・システム12および13にアドレスが切換わ
ったことを通知する。これによりコンピュータ・システ
ム14は，コンピュータ・システム11が行なっていたコン
ピュータ・システム12〜13との通信をコンピュータ・シ
ステム11の代わりに行えるようになる。Computer system in standby state
Upon receiving the resume data of the computer system 11 in which the failure has occurred from the resume device 20, the 14 takes over the processing performed by the existing computer system 11 based on the received resume data. The computer system 14 switches the address in the network system of the computer system 14 to the address of the network system owned by the existing computer system 11 until a failure occurs. When the address is switched, the computer system 14 notifies the existing computer systems 12 and 13 that the address has been switched. This allows computer system 14 to communicate with computer systems 12-13, which computer system 11 was doing, on behalf of computer system 11.

【００２８】第２実施例図３は第２の発明の実施例（第２実施例）によるネット
ワーク・システムの構成を示すブロック図である。Second Embodiment FIG. 3 is a block diagram showing the configuration of a network system according to an embodiment (second embodiment) of the second invention.

【００２９】ネットワーク・システムは，伝送路30によ
り相互に接続された，現有系の複数台のコンピュータ・
システム31〜33，待機系のコンピュータ・システム34，
およびファイル・サーバ35から構成される。待機系のコ
ンピュータ・システム34は，通常は待機状態にあり，現
有系のコンピュータ・システム31〜33のいずれかに障害
が発生したとき，障害が発生した現有系コンピュータ・
システム31〜33のいずれかの代わりに処理を引継ぐ。The network system comprises a plurality of existing computers connected to each other by a transmission line 30.
Systems 31-33, standby computer system 34,
And a file server 35. The standby computer system 34 is normally in a standby state, and when any of the existing computer systems 31 to 33 fails, the current active computer system in which the failure occurs
Take over on behalf of any of the systems 31-33.

【００３０】ファイル・サーバ35は，現有系のコンピュ
ータ・システム31〜33のファイル（メイン・メモリおよ
び必要に応じて外部メモリのデータ）を保存するもので
ある。現有系のコンピュータ・システム31〜33のファイ
ル・データはシステム31〜33の処理に応じて更新されて
いく。ファイル・サーバ35はシステム31〜33のファイル
のデータの最新のもののみならず更新前のものを，その
複数回前の更新前のものまで保存する。The file server 35 stores files (data in the main memory and, if necessary, external memory) of the existing computer systems 31 to 33. The file data of the existing computer systems 31 to 33 are updated according to the processing of the systems 31 to 33. The file server 35 stores not only the latest data of the files of the systems 31 to 33 but also the data before the update up to the data before the update plural times before.

【００３１】ファイル・サーバ35は，現有系のコンピュ
ータ・システム31〜33に対して定期的にまたは必要に応
じて，凍結信号を送信する。現有系コンピュータ・シス
テム31〜34は凍結信号を受信すると，そのコンピュータ
・システムのファイル・データをコンテキスト・ファイ
ルとしてファイル・サーバ35に転送する。ファイル・サ
ーバ35は，現有系のコンピュータ・システム31〜33から
転送されたコンテキスト・ファイルを保存する。The file server 35 sends a freeze signal to the existing computer systems 31 to 33 periodically or as needed. Upon receiving the freeze signal, the existing computer systems 31 to 34 transfer the file data of the computer system to the file server 35 as a context file. The file server 35 saves the context files transferred from the existing computer systems 31 to 33.

【００３２】ファイル・サーバ35は，現有系コンピュー
タ・システム31〜33のいずれかの障害を検知すると，障
害が発生していない他の現有系コンピュータ・システム
に保存しておいた最新のコンテキスト・ファイルに対応
するもの（各コンピュータ・システムから転送されたも
の）を返送する。最新のコンテキスト・ファイルを受信
した現有系コンピュータ・システムは受信したコンテキ
スト・ファイルを転送した時点に戻ってこの受信したコ
ンテキスト・ファイルに基づいて処理を再開する。待機
系のコンピュータ・システム34には，障害が発生した現
有系コンピュータ・システムのコンテキスト・ファイル
が転送される。待機系コンピュータ・システム34は，受
信したコンテキスト・ファイルに基づいて，障害が発生
した現有系コンピュータ・システムの処理を，その現有
系コンピュータ・システムが上記コンテキスト・ファイ
ルを転送した時点のものから引継ぐ。When the file server 35 detects a failure in any of the existing owned computer systems 31 to 33, the latest context file saved in another existing owned computer system in which no failure has occurred. Return the one corresponding to (the one transferred from each computer system). Upon receiving the latest context file, the existing computer system returns to the time when the received context file was transferred and restarts the processing based on the received context file. The context file of the faulty existing computer system is transferred to the standby computer system 34. Based on the received context file, the standby computer system 34 takes over the processing of the faulty current computer system from the time when the current computer computer transferred the context file.

【００３３】ネットワーク・システム全体は，ファイル
・サーバ35が最新の凍結信号を送信した時点の状態に戻
ることになる。The entire network system returns to the state at the time when the file server 35 sends the latest freeze signal.

【００３４】図４および図５はファイル・サーバ35の処
理手順を示すフロー・チャートである。また図６はコン
ピュータ・システムの処理手順を示すフロー・チャート
であり，図７はその一部（ステップ61の処理）の詳細を
示すものである。4 and 5 are flow charts showing the processing procedure of the file server 35. FIG. 6 is a flow chart showing the processing procedure of the computer system, and FIG. 7 shows the details of a part of the processing (processing of step 61).

【００３５】現有系のコンピュータ・システム31〜33は
それぞれのアプリケーションの処理を行なっている（図
６，ステップ61，図７，ステップ71）。The existing computer systems 31 to 33 are processing their respective applications (FIG. 6, step 61, FIG. 7, step 71).

【００３６】ファイル・サーバ35が凍結信号を現有系の
コンピュータ・システム31〜33に送信し（図４，ステッ
プ41），各現有系コンピュータ・システム31〜33がファ
イル・サーバからの凍結信号を受信すると（図７，ステ
ップ72），前まで行なっていたアプリケーション処理を
停止する（図７，ステップ７73）。各現有系コンピュー
タ・システム31〜33は凍結開始信号をファイル・サーバ
35に送信し（図７，ステップ74），カーネルの凍結を行
ない（図７，ステップ75），障害が発生した場合に備え
てそれぞま行なっていた処理を他の待機系コンピュータ
・システム（実施例においては待機系コンピュータ・シ
ステム34）に継続してもらうためにコンテキスト・ファ
イルを生成する（図７，ステップ76）。The file server 35 sends a freeze signal to the currently owned computer systems 31-33 (FIG. 4, step 41), and each currently owned computer system 31-33 receives the freeze signal from the file server. Then, (FIG. 7, step 72), the application processing that has been performed until then is stopped (FIG. 7, step 773). Each existing computer system 31-33 sends a freeze start signal to the file server.
35 (Fig. 7, step 74), freezes the kernel (Fig. 7, step 75), and performs the processing that was being performed in case of failure in another standby computer system (implementation). In the example, a context file is created for the standby computer system 34) to continue (FIG. 7, step 76).

【００３７】ファイル・サーバ35は，各現有系コンピュ
ータ・システム31〜33から凍結開始信号を凍結信号送信
後，所定時間内に更新すると（図４，ステップ42），次
に受信する最新コンテキスト・ファイルを保存する準備
として，既に保存しているコンテキスト・ファイルを過
去のものとして待避させるファイル履歴処理を行なう
（図４，ステップ43）。ファイル・サーバ35はファイル
履歴処理を終えると，履歴処理終了信号を各現有系コン
ピュータ・システム31〜33に送信する（図４，ステップ
44）。The file server 35 updates the freezing start signal from each of the existing computer systems 31 to 33 within a predetermined time after transmitting the freeze signal (step 42 in FIG. 4), and then receives the latest context file. In preparation for saving, the file history processing is performed to save the already saved context file as a past one (step 43 in FIG. 4). When the file server 35 finishes the file history processing, it sends a history processing end signal to each of the existing computer systems 31 to 33 (FIG. 4, step).
44).

【００３８】各現有コンピュータ・システム31〜33は，
ファイル・サーバ35から履歴処理終了信号を受信すると
（図７，ステップ77），ステップ76で生成したコンテキ
スト・ファイルをファイル・サーバ35に転送する（図
７，ステップ78）。各現有系コンピュータ・システム31
〜33は，コンテキスト・ファイルを転送し終えると，転
送終了信号をファイル・サーバ35に送信する（図７，ス
テップ79）。各現有系コンピュータ・システム31〜33は
ステップ71に戻り，アプリケーション処理に移る。Each existing computer system 31-33
When the history processing end signal is received from the file server 35 (FIG. 7, step 77), the context file generated in step 76 is transferred to the file server 35 (FIG. 7, step 78). Each existing computer system 31
After completing the transfer of the context file, each of ~ 33 sends a transfer end signal to the file server 35 (Fig. 7, step 79). Each of the existing computer systems 31 to 33 returns to step 71 and moves to application processing.

【００３９】ファイル・サーバ35は，各現有系コンピュ
ータ・システム31〜33から転送されるコンテキスト・フ
ァイルを保存する（図４，ステップ45）。ファイル・サ
ーバ35は現有系コンピュータ・システム31〜33からの転
送終了信号を凍結終了信号送信（ステップ41）後所定時
間内に受信すると（図４，ステップ46），ステップ41に
戻って再び凍結信号を出力する。もっとも，転送終了信
号を受信して一定の待時間ののち凍結信号を出力するよ
うにしてもよいのはいうまでもない。The file server 35 stores the context file transferred from each of the existing computer systems 31 to 33 (FIG. 4, step 45). When the file server 35 receives the transfer end signal from the existing computer systems 31 to 33 within a predetermined time after transmitting the freeze end signal (step 41) (FIG. 4, step 46), it returns to step 41 and freeze signal again. Is output. Needless to say, the freeze signal may be output after a certain waiting time after receiving the transfer end signal.

【００４０】図４のステップ42において，ファイル・サ
ーバ35は現有系コンピュータ・システムから凍結開始信
号を所定時間内に受信しなければ，信号を受信していな
い現有系コンピュータ・システムに対して正常に動作し
ているかの問合せを行ない，この問合せに対する応答に
よってそのコンピュータ・システムに障害が発生してい
るかどうかを判断する（図４，ステップ47）。ファイル
・サーバ35が問合せをした現有系コンピュータ・システ
ムが正常に動作していることが確認されると，この現有
系コンピュータ・システムから送信される凍結開始信号
を待つ（図４，ステップ48）。In step 42 of FIG. 4, if the file server 35 does not receive the freeze start signal from the current owned computer system within the predetermined time, the file server 35 normally operates for the current owned computer system which has not received the signal. An inquiry is made as to whether or not it is operating, and the response to this inquiry determines whether or not a failure has occurred in the computer system (step 47 in FIG. 4). When it is confirmed that the present owned computer system inquired by the file server 35 is operating normally, it waits for a freeze start signal transmitted from the present owned computer system (FIG. 4, step 48).

【００４１】図４のステップ46において，ファイル・サ
ーバ35は現有系コンピュータ・システムから転送終了信
号を所定時間内に受信しなければ，信号を受信していな
い現有系コンピュータ・システムに対して正常に動作し
ているかの問合せを行ない，この問合せに対する応答に
よってそのコンピュータ・システムに障害が発生してい
るかどうかを判断する（図４，ステップ49）。ファイル
・サーバ35が問合せをした現有系コンピュータ・システ
ムが正常に動作していることが確認されると，この現有
系コンピュータ・システムから送信される凍結開始信号
を待つ（図４，ステップ50）。In step 46 of FIG. 4, if the file server 35 does not receive the transfer end signal from the current owned computer system within the predetermined time, it normally operates for the current owned computer system which has not received the signal. An inquiry is made as to whether or not it is operating, and the response to this inquiry determines whether or not there is a failure in the computer system (step 49 in FIG. 4). When it is confirmed that the current owned computer system inquired by the file server 35 is operating normally, it waits for a freeze start signal transmitted from this owned computer system (FIG. 4, step 50).

【００４２】ファイル・サーバ35は，上述のステップ47
または49において，いずれかの現有系コンピュータ・シ
ステムにおいて，たとえば現有系コンピュータ・システ
ム31に障害が発生したことを検知したとする。ファイル
・サーバ35は現有系のコンピュータ・システム31に障害
が発生したことを表わす障害発生信号を現有系のコンピ
ュータ・システム32〜34に対して送信する（図５，ステ
ップ51）。The file server 35 proceeds to step 47 described above.
Alternatively, it is assumed that, at 49, it is detected that a failure has occurred in, for example, the current owned computer system 31 in any of the currently owned computer systems. The file server 35 sends a fault occurrence signal indicating that a fault has occurred in the currently owned computer system 31 to the currently owned computer systems 32-34 (FIG. 5, step 51).

【００４３】現有系コンピュータ・システム32，33は障
害発生信号を受信すると（図７，ステップ72），現有系
コンピュータ・システムが行なっていた通常処理を停止
し（図６，ステップ63），解凍開始信号をファイル・サ
ーバ35に送信する（図６，ステップ64）。待機系コンピ
ュータ・システム34は障害発生信号を受信すると解凍開
始信号をファイル・サーバ35に送信する。Upon receipt of the fault occurrence signal (step 72 in FIG. 7), the existing computer systems 32 and 33 stop the normal processing performed by the computer system (FIG. 6, step 63) and start decompression. The signal is transmitted to the file server 35 (FIG. 6, step 64). When the standby computer system 34 receives the failure occurrence signal, it sends a decompression start signal to the file server 35.

【００４４】ファイル・サーバ35は現有系の各コンピュ
ータ・システム32，33から解凍開始信号を受信すると
（図５，ステップ52），格納している最新のこれらのコ
ンピュータ・システム32，33のコンテキスト・ファイル
をそれぞれに返送する（図５，ステップ53）。また待機
系のコンピュータ・システム34から解凍開始信号を受信
すると，障害のあった現有系コンピュータ・システム31
の最新のコンテキスト・ファイルを待機系コンピュータ
・システム34に送信する。When the file server 35 receives the decompression start signal from each of the existing computer systems 32 and 33 (step 52 in FIG. 5), it stores the latest context information of these computer systems 32 and 33. The files are returned to each (step 53 in FIG. 5,). When a decompression start signal is received from the standby computer system 34, the faulty existing computer system 31
The latest context file of the above is transmitted to the standby computer system 34.

【００４５】現有系のコンピュータ・システム32および
33はそれ自身のコンテキスト・ファイルをファイル・サ
ーバ35から受信すると，そのコンテキスト・ファイルが
生成された時点の状態に戻す処理（コンテキスト・ファ
イルの解凍）を行なう（図６，ステップ65）。現有系コ
ンピュータ・システム32および33は，解凍終了信号をフ
ァイル・サーバに送信する（図６，ステップ66）。Existing computer system 32 and
When receiving the context file of its own from the file server 35, the 33 performs a process (decompression of the context file) of returning to the state at the time when the context file was generated (FIG. 6, step 65). The existing computer systems 32 and 33 send a decompression end signal to the file server (FIG. 6, step 66).

【００４６】待機系コンピュータ・システム34は，障害
が発生した現有系コンピュータ・システム31のコンテキ
スト・ファイルを受信すると，このコンテキスト・ファ
イルに基づいて，そのコンテキスト・ファイルが生成さ
れた時点から現有系コンピュータ・システム31の処理を
引継ぐための処理（コンテキスト・ファイルの解凍）を
行なう（図６，ステップ65）。このとき，待機系コンピ
ュータ・システム34はネットワーク・システムにおける
アドレスを，現有系コンピュータ・システム31のアドレ
スに切換える。待機系コンピュータ・システム34は，解
凍終了信号をファイル・サーバに送信する（図６，ステ
ップ66）。When the standby computer system 34 receives the context file of the current owned computer system 31 in which the failure has occurred, the standby computer system 34 starts the current owned computer from the time when the context file is generated based on this context file. The processing for taking over the processing of the system 31 (decompression of the context file) is performed (FIG. 6, step 65). At this time, the standby computer system 34 switches the address in the network system to the address of the existing computer system 31. The standby computer system 34 sends a decompression end signal to the file server (FIG. 6, step 66).

【００４７】ファイル・サーバ35は各コンピュータ・シ
ステム32〜34から解凍終了信号を受信すると（図５，ス
テップ54），コンピュータ・システム32〜34に開始信号
を送信する（図５，ステップ55）。When the file server 35 receives the decompression end signal from each computer system 32-34 (FIG. 5, step 54), it sends a start signal to the computer system 32-34 (FIG. 5, step 55).

【００４８】現有系コンピュータ・システム32および33
はファイル・サーバ35から開始信号を受信すると（図
６，ステップ67），処理を再開する。Existing computer systems 32 and 33
Receives the start signal from the file server 35 (step 67 in FIG. 6,), restarts the processing.

【００４９】また待機系コンピュータ・システム34はフ
ァイル・サーバ35から開始信号を受信すると（図６，ス
テップ67），障害が発生した現有系コンピュータ・シス
テム31が行なっていた処理を引継ぐこととなる。When the standby computer system 34 receives the start signal from the file server 35 (step 67 in FIG. 6), the standby computer system 34 takes over the process performed by the faulty existing computer system 31.

【００５０】ネットワーク・システムは，ファイル・サ
ーバ35が最新のコンテキスト・ファイルを生成した時点
から処理を始めることになる。The network system will start processing from the time when the file server 35 generates the latest context file.

[Brief description of drawings]

【図１】第１実施例のネットワーク・システムを示すも
のである。FIG. 1 shows a network system according to a first embodiment.

【図２】リジューム装置の電気的構成を示すブロック図
である。FIG. 2 is a block diagram showing an electrical configuration of a resume device.

【図３】第２実施例のネットワーク・システムを示すも
のである。FIG. 3 shows a network system of a second embodiment.

【図４】ファイル・サーバの処理手順を示すフロー・チ
ャートである。FIG. 4 is a flow chart showing a processing procedure of a file server.

【図５】ファイル・サーバの処理手順を示すフロー・チ
ャートである。FIG. 5 is a flow chart showing a processing procedure of a file server.

【図６】コンピュータ・システムの処理手順を示すフロ
ー・チャートである。FIG. 6 is a flow chart showing a processing procedure of a computer system.

【図７】コンピュータ・システムの処理手順を示すフロ
ー・チャートである。FIG. 7 is a flow chart showing a processing procedure of a computer system.

[Explanation of symbols]

10，30 伝送路 11，12，13，31，32，33 現有系コンピュータ・システ
ム 14，34 待機系コンピュータ・システム 20 リジューム装置 21 障害検知部 22 リジューム・データ記憶部 23 リジューム・データ通信部 24 リジューム処理部 35 ファイル・サーバ10, 30 Transmission line 11, 12, 13, 31, 32, 33 Current computer system 14, 34 Standby computer system 20 Resume device 21 Failure detection unit 22 Resume data storage unit 23 Resume data communication unit 24 Resume Processor 35 File server

Claims

[Claims]

1. A device in a network system comprising a plurality of existing computer systems and at least one standby computer system,
Storage means for storing resume data which is constantly updated based on the data given from the predetermined computer system necessary for taking over the processing performed by the predetermined computer system among the existing computer systems Detecting a fault of the predetermined computer system, and detecting a fault, outputting a fault occurrence signal, fault response means, and responsive to the fault occurrence signal, the resume data stored in the storage means. A resume device in a network system comprising means for transferring to the standby computer system.

2. A method in a network system comprising a plurality of existing computer systems and at least one standby computer system,
Storing resume data, which is constantly updated based on the data given from the predetermined computer system, required to take over the processing performed by the predetermined computer system among the existing computer systems, in the storage means. Then, when a failure of the predetermined computer system is detected and a failure is detected, a failure occurrence signal is output, and in response to the failure occurrence signal, the resume data stored in the storage means is transferred to the standby system. Resume method in network system to transfer to computer system.

3. A plurality of currently existing computer systems, wherein at least one standby computer system and a file server for storing a context file of the current existing computer system are provided in each of the currently owned computer systems. A context file necessary for continuing the processing performed by each of the existing active computer systems is stored in the file server at all times so as to be communicable with the system. The file server checks whether or not a failure has occurred, and when it is detected that a failure has occurred in any of the currently existing computer systems, the context file of the failed computer system is detected. The above standby computer system Each of the context files of the existing owned computer system in which no failure has occurred is transmitted to the corresponding existing owned computer system, and the standby computer system receives the above-mentioned failure based on the received context file. A resume method in a network system in which the computer system that has taken over takes over, and the computer system that has not had the above-mentioned fault restarts the process from the time of the context file based on the received context file.