JPH08287030A

JPH08287030A - Device and method for automatically restarting multiple computer system

Info

Publication number: JPH08287030A
Application number: JP7109113A
Authority: JP
Inventors: Masaaki Sato; 正明佐藤; Yuji Ito; 裕司伊藤
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1995-04-10
Filing date: 1995-04-10
Publication date: 1996-11-01

Abstract

PURPOSE: To improve the operation rate and reliability of a multiple computer system by automatically restarting a stopped CPU when system down occurs in the system. CONSTITUTION: The multiple computer system is provided with a system down judging means 104 arranged on the outside of the system and capable of checking the combination of abnormal states of computers causing system down based upon state signals inputted from respective computers and judging whether system down occurs or not, a restart signal processing means 107 for outputting restart signals 111 as reset start signals at the time of a 1st restart request or as initializing start signals at the time of a 2nd restart request or after, and when the number of restart requests reaches a prescribed value, giving up the restart and plural memories 112 for storing the contents of a main storage device by computers having received the reset start signals.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、多重系計算機システム
（マルチシステム）の自動再起動装置および方法に係
り、特に、システムダウン発生時に、停止した中央処理
装置（ＣＰＵ）に自動的に再起動要求を行なう多重系計
算機システムの自動再起動装置および方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an automatic restarting apparatus and method for a multi-system computer system (multi-system), and more particularly, to automatically restarting a stopped central processing unit (CPU) when a system down occurs. The present invention relates to an automatic restart apparatus and method for a multi-system computer system that makes a request.

【０００２】[0002]

【従来の技術】従来の３重系計算機システムは、図７に
示すように、ＣＰＵ１（７０１）、ＣＰＵ２（７０
２）、ＣＰＵ３（７０３）の各計算機が相互に接続さ
れ、システムの状態監視を次のようにして行なう。ＣＰＵ２（７０２）、ＣＰＵ３（７０３）の計算機の
状態監視は、ライン７０４、７０５を介して、ＣＰＵ１
（７０１）のプログラムにより行なう。ＣＰＵ１（７０１）、ＣＰＵ３（７０３）の計算機の
状態監視は、ライン７０６、７０７を介して、ＣＰＵ２
（７０２）のプログラムにより行なう。ＣＰＵ１（７０１）、ＣＰＵ２（７０２）の計算機の
状態監視は、ライン７０８、７０９を介して、ＣＰＵ３
（７０３）のプログラムにより行なう。上記、、の監視結果により、いずれかのＣＰＵに
異常が発生すると、当該ＣＰＵは、自動停止するか、ま
たは、他のＣＰＵによりシステムから切り離される。こ
のように、従来のシステムの状態監視は、マルチシステ
ム内に発生した障害に対して、相互のＣＰＵで監視処理
し合うことによって対応していた。また、マルチシステ
ムの内部でシステム異常時の処理を行なう場合には、シ
ステムダウン（マルチシステム全体として機能を維持で
きない状態）に対して、何らかの有効な対処はできず、
手動によって復旧するしかなかった。2. Description of the Related Art As shown in FIG. 7, a conventional triple computer system has a CPU1 (701) and a CPU2 (70).
2), the computers of the CPU 3 (703) are connected to each other, and the system status is monitored as follows. The computer status monitoring of the CPU2 (702) and the CPU3 (703) is performed by the CPU1 through the lines 704 and 705.
This is performed by the program of (701). The computer status monitoring of the CPU1 (701) and the CPU3 (703) is performed by the CPU2 via the lines 706 and 707.
This is performed by the program of (702). The computer status monitoring of the CPU1 (701) and the CPU2 (702) is performed by the CPU3 via the lines 708 and 709.
This is performed by the program of (703). If an abnormality occurs in any of the CPUs as a result of the above monitoring results, the CPU is automatically stopped or is disconnected from the system by another CPU. As described above, the conventional system state monitoring has dealt with a failure occurring in the multi-system by mutual monitoring processing by the mutual CPUs. In addition, when performing processing at the time of system abnormality inside the multi-system, some effective measures cannot be taken against the system down (state in which the function of the multi-system as a whole cannot be maintained).
There was no choice but to recover manually.

【０００３】[0003]

【発明が解決しようとする課題】したがって、従来の技
術には、次のような課題がある。すなわち、システムは多重化されても、ソフトは同一のものを
使用するので、バグが発生した場合、そのソフトを搭載
する全ＣＰＵで同一のバグが発生する可能性があり、従
って、ソフトは実質的には一重化しかされていないと言
える。このため、ソフト障害は、複数のＣＰＵおいて同
時発生する可能性があり、この場合、システムダウンに
至るケースがある。システムダウンした場合には、基本的に手動によっ
て再起動するため、手間がかかり、対応に際してのミス
が生ずると、正常に再立ち上げできないケースもある。Therefore, the conventional techniques have the following problems. In other words, even if the system is multiplexed, the same software is used, so if a bug occurs, the same bug may occur in all CPUs equipped with that software. It can be said that it is only unified. Therefore, a soft failure may occur simultaneously in a plurality of CPUs, and in this case, the system may be down. When the system goes down, it is basically restarted manually, so it takes time and trouble, and if an error occurs in handling, there are cases where the system cannot be restarted normally.

【０００４】本発明の目的は、多重系計算機システムに
おいて、システムダウンが発生したとき、停止したＣＰ
Ｕを自動的に再起動し、システムの稼働率を上げ、信頼
性を向上させることにある。An object of the present invention is, in a multi-computer system, a CP which is stopped when a system down occurs.
It is to restart U automatically, improve the operating rate of the system, and improve the reliability.

【０００５】[0005]

【課題を解決するための手段】上記目的は、多重系計算
機システムの外部に設置するとともに、各計算機から取
り込む状態信号を元に、システムダウンに至る計算機の
異常の組合せをチェックし、システムダウンか否かを判
定するシステムダウン判定手段と、一度目の再起動要求
であれば、リセット起動信号として、二度目以降の再起
動要求であれば、初期化起動信号として、再起動信号を
出力する再起動信号処理手段を具備することによっ
て、、達成される。The above-mentioned object is to install the system outside a multi-system computer system, check the combination of computer abnormalities leading to the system down based on the status signal fetched from each computer, and check whether the system is down. A system down determination means for determining whether or not a restart signal is output as a reset start signal for a first restart request, and as a reset start signal for a second or subsequent restart request. This is achieved by providing the activation signal processing means.

【０００６】[0006]

【作用】本発明は、システムダウン判定手段および再起
動信号処理手段を多重系計算機システムの外部システム
として設け、システムダウン判定手段では、複数の計算
機の状態信号をそれぞれ取り込み、この状態信号の組合
せによってシステムダウンの検出を行なうため、システ
ムダウンの監視が確実かつ容易である。また、再起動信
号処理手段では、システムダウン検出時に、ダウンした
特定の計算機に対し、再起動信号としてリセット起動信
号と初期化起動信号を与えるため、多重系計算機システ
ムの機能復旧を自動的にかつ素早く行なう。これによ
り、システムの稼働率を上げ、信頼性を向上させること
が可能となる。According to the present invention, the system down judging means and the restart signal processing means are provided as an external system of the multi-computer system, and the system down judging means fetches the status signals of a plurality of computers respectively, and combines them by the status signals. Since the system down is detected, the system down can be monitored reliably and easily. Further, the restart signal processing means, when a system down is detected, gives a reset start signal and an initialization start signal as restart signals to a specific computer that has gone down, so that the function recovery of the multi-system computer system is automatically and automatically performed. Do it quickly. As a result, it becomes possible to increase the operating rate of the system and improve the reliability.

【０００７】[0007]

【実施例】以下、本発明の実施例を図面を用いて説明す
る。図１は、本発明の一実施例を示す多重系計算機シス
テム（以下、マルチシステムと称する。）のブロック図
である。図１において、１０１はＣＰＵ１、１０２はＣ
ＰＵ２、１０３はＣＰＵｎ、１０４はシステムダウン判
定部、１０７は再起動信号処理部、１１２は主記憶装置
（図示せず）の内容をセーブするＤＩＳＣ（１，２，
ｎ）を示す。システムダウン判定部（１０４）は、ＣＰ
Ｕ１，２，ｎからなるマルチシステムの外部に設置さ
れ、各ＣＰＵ１，２，ｎ（１０１，１０２，１０３）か
ら状態信号（１０５）を取り込み、システムダウンに至
るあらゆるＣＰＵ異常、例えば、全ＣＰＵ（オンライン運転中、オフライン運転中）の
停止オンライン運転中（バックアップ待機中を含む）の全
ＣＰＵの停止オンライン運転に必要不可欠な機能を有するＣＰＵの
停止（複数のＣＰＵで機能を分散している場合）の組合せをチェックする。システムダウンと判定された
場合は、システムダウン信号（１０６）を発生する。再
起動信号処理部（１０７）は、ＣＰＵ１，２，ｎからな
るマルチシステムの外部に設置されるとともに、再起動
断念判定機構（１０８）および再起動信号生成機構（１
１０）からなる。再起動信号生成機構（１１０）は、再
起動信号として、最初に主記憶装置等を初期化せずに再
起動する信号（以下、リセット起動信号と称する。）
と、ＣＰＵが起動しない場合に主記憶装置等を初期化す
る再起動信号（以下、初期化起動信号と称する。）を発
生し、種別判定して出力する。そこで、一度目の再起動
要求（１０９）であれば、リセット起動信号として、二
度目以降の再起動要求（１０９）であれば、初期化起動
信号として、再起動信号（１１０）を出力する。また、
既に初期化起動信号を一定回数出力している場合には、
再起動断念とし、再起動要求（１０９）は行なわない。
各ＣＰＵ１，２，ｎ（１０１、１０２、１０３）は、リ
セット起動信号受信時、リセット起動を行なう前に、主
記憶装置の内容を各セーブ用ＤＩＳＣ１，２，ｎ（１１
２）に退避させる。また、初期化起動信号を受信した場
合は、主記憶装置の内容の退避は行なわない。Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram of a multi-system computer system (hereinafter referred to as a multi-system) showing an embodiment of the present invention. In FIG. 1, 101 is a CPU 1 and 102 is a C
PU2 and 103 are CPU n, 104 is a system down determination unit, 107 is a restart signal processing unit, and 112 is a DISC (1, 2 ,,) that saves the contents of a main storage device (not shown).
n) is shown. The system down determination unit (104)
It is installed outside the multi-system consisting of U1,2, n, takes in the status signal (105) from each CPU1,2, n (101,102,103), and causes any system failure, for example, all CPU ( Stopping during online operation or offline operation) Stopping all CPUs during online operation (including backup standby) Stopping CPUs that have essential functions for online operation (when functions are distributed among multiple CPUs) Check the combination of. When it is determined that the system is down, a system down signal (106) is generated. The restart signal processing unit (107) is installed outside the multi-system including the CPUs 1, 2, and n, and the restart abandonment determination mechanism (108) and the restart signal generation mechanism (1).
It consists of 10). As a restart signal, the restart signal generation mechanism (110) is a signal for restarting without first initializing the main storage device or the like (hereinafter, referred to as a reset start signal).
Then, when the CPU does not start, a restart signal (hereinafter, referred to as an initialization start signal) for initializing the main storage device or the like is generated, and the type is determined and output. Therefore, if it is the first restart request (109), the restart signal (110) is output as the reset start signal, and if it is the second or subsequent restart request (109), the restart signal (110) is output as the initialization start signal. Also,
If the initialization start signal has already been output a certain number of times,
The restart is abandoned and the restart request (109) is not made.
When each CPU 1, 2, n (101, 102, 103) receives the reset start signal, the contents of the main storage device are saved for each DISC 1, 2, n (11) before reset start.
Move to 2). When the initialization activation signal is received, the contents of the main storage device are not saved.

【０００８】図２に、３重系計算機システムにおけるシ
ステムダウン判定部（１０４）の動作フローを示す。ま
ず、各ＣＰＵ（１０１，１０２，１０３）より、ＣＰＵ
停止信号、ウォッチドッグタイマ（ＷＤＴ）タイムアウ
ト信号、オンライン状態信号、オフライン状態信号を取
り込む。これらの信号の組合せにより、各ＣＰＵがオン
ライン運転中に停止したか（この状態をＣＰＵダウンと
する。）を判定する。次に、この判定結果より、システ
ムダウンに至るＣＰＵダウンの組合せを判定する。ここ
で、本実施例におけるシステムダウンの定義を「オンラ
イン運転中の全ＣＰＵの停止」とする。したがって、オ
ンライン状態の２台のＣＰＵがダウンした場合、システ
ムダウンとしてシステムダウン信号（１０６）を再起動
信号処理部（１０７）に受け渡す。ところで、３重系計
算機システムのＣＰＵ運用形態は、基本的に各ＣＰＵ
（１０１，１０２，１０３）をそれぞれ常用系、待機
系、オフライン系（試験系または停止）として運用する
ため、システムダウンに至るのは、オンライン運転中で
ある常用系、待機系が共にＣＰＵダウンした場合であ
る。ここでは、システムダウンに至るＣＰＵ異常の組合
せとして、常用系、待機系となりうる次の３通りがあ
り、この３通りからＣＰＵダウンの組合せを判定する。ＣＰＵ１とＣＰＵ２がＣＰＵダウンした場合ＣＰＵ１とＣＰＵ３がＣＰＵダウンした場合ＣＰＵ２とＣＰＵ３がＣＰＵダウンした場合なお、ここでのオンライン、オフラインの定義を下記と
する。（ａ）オンライン：制御対象に対する、監視、制御、状
態記録等の業務を直接実施する機能、またはそのモード
の名称。（ｂ）オフライン：オンラインでない機能、またはその
モードの名称。このように、本実施例は、システムダウン判定部を多重
系計算機システムの外部システムとして設け、複数の計
算機の状態信号をそれぞれ取り込み、この状態信号の組
合せによってシステムダウンの検出を行なうため、シス
テムダウンの検出は確実かつ容易である。FIG. 2 shows an operation flow of the system down judging section (104) in the triple computer system. First, from each CPU (101, 102, 103),
It takes in a stop signal, a watchdog timer (WDT) timeout signal, an online status signal, and an offline status signal. Based on the combination of these signals, it is determined whether each CPU has stopped during online operation (this state is referred to as CPU down). Next, a combination of CPU downs leading to system down is determined from this determination result. Here, the definition of system down in this embodiment is defined as "stopping all CPUs during online operation". Therefore, when two CPUs in the online state are down, the system down signal (106) is transferred to the restart signal processing unit (107) as a system down. By the way, the CPU operation form of the triple computer system is basically each CPU
Since (101, 102, 103) are operated as a normal system, a standby system, and an offline system (test system or stop), respectively, the system down occurs when both the normal system and the standby system in online operation are CPU down. This is the case. Here, there are the following three types of combinations of CPU abnormalities that lead to system down, which can be a normal system and a standby system, and the combination of CPU down is determined from these three types. When CPU1 and CPU2 are down CPU When CPU1 and CPU3 are down CPU When CPU2 and CPU3 are down CPU Here, the definitions of online and offline are as follows. (A) Online: The function of directly performing tasks such as monitoring, control, and status recording on the control target, or the name of the mode. (B) Offline: The name of a function or mode that is not online. As described above, in this embodiment, the system down determination unit is provided as an external system of the multi-computer system, the status signals of a plurality of computers are respectively captured, and the system down is detected by the combination of the status signals. Is reliable and easy to detect.

【０００９】図３に、３重系計算機システムにおける再
起動信号処理部（１０７）の動作フローを示す。なお、
本実施例では、一度初期化起動信号を出力してもシステ
ムダウンが回復しない場合に、再起動を断念するものと
する。まず、システムダウン信号（１０６）の入力によ
り、カウンタのインクリメントおよびタイマの起動を行
なう。タイマは可変とし、ＣＰＵが再起動するのに必要
とされる時間を設定しておく。このタイマにより、ＣＰ
Ｕ再起動が成功したかどうかを判定する。また、カウン
タは再起動信号種別の判定または再起動断念の判定を行
なうために使用する。最初のシステムダウン信号（１０
６）が入力されると、カウンタが「１」に設定され、リ
セット起動信号（１１０）を生成し、停止したＣＰＵに
リセット起動信号が出力される。また、リセット起動信
号を受信したＣＰＵは、リセット起動を行なう前に、主
記憶装置の内容をセーブ用ＤＩＳＣへ退避する。タイマ
に設定された時間が経過し、システムダウン信号（１０
６）の入力が継続していれば、再起動失敗とし、再びカ
ウンタのインクリメント、タイマの起動を行なう。シス
テムダウン信号（１０６）の入力がなくなっていれば、
再起動成功とし、再起動信号処理部（１０７）の状態を
初期化して終了する。次に、再起動失敗の場合は、再度
のシステムダウン信号（１０６）の入力により、カウン
タが「２」に設定され、初期化起動信号（１１０）を生
成し、停止したＣＰＵに初期化起動信号が出力される。
初期化起動信号を受信したＣＰＵは、主記憶装置の内容
を退避せずに、初期化起動を行なう。タイマに設定され
た時間が経過し、システムダウン信号（１０６）の入力
が継続していれば、再起動失敗とし、再びカウンタのイ
ンクリメント、タイマの起動を行なう。システムダウン
信号（１０６）の入力がなくなっていれば、再起動成功
とし、再起動信号処理部（１０７）の状態を初期化して
終了する。次に、再起動失敗の場合は、再度のシステム
ダウン信号（１０６）の入力により、カウンタが「３」
に設定される。この場合再起動断念とし、システムダウ
ン信号（１０６）の入力のためのスイッチ（図示せず）
をｏｆｆにして再起動信号処理部（１０７）を無効と
し、再度の再起動信号（１１０）出力を行なわないよう
にする。また、再起動信号処理部（１０７）の状態を初
期化して終了する。このように、本実施例は、再起動信
号処理部を多重系計算機システムの外部システムとして
設け、最初のシステムダウンのときは、リセット起動信
号を与え、そして、再起動失敗の場合は、初期化起動信
号を与えるようにしたので、ダウンしたＣＰＵに対し、
適切な再起動信号を与えることとなり、多重系計算機シ
ステムの機能復旧を自動的にかつ素早く行なうことがで
きる。FIG. 3 shows an operation flow of the restart signal processing section (107) in the triple computer system. In addition,
In this embodiment, if the system down does not recover even if the initialization start signal is output once, the restart is abandoned. First, by inputting the system down signal (106), the counter is incremented and the timer is started. The timer is variable and the time required for the CPU to restart is set. With this timer, CP
U Determine if the restart was successful. Further, the counter is used to determine the restart signal type or the restart abandonment. First system down signal (10
When 6) is input, the counter is set to "1", the reset start signal (110) is generated, and the reset start signal is output to the stopped CPU. Further, the CPU that has received the reset activation signal saves the contents of the main storage device to the save DISC before performing the reset activation. When the time set in the timer elapses, the system down signal (10
If the input in 6) is continued, it is determined that the restart has failed, and the counter is incremented and the timer is started again. If the system down signal (106) is not input,
When the restart is successful, the status of the restart signal processing unit (107) is initialized and the process ends. Next, in the case of the restart failure, the counter is set to "2" by the input of the system down signal (106) again, the initialization start signal (110) is generated, and the initialization start signal is sent to the stopped CPU. Is output.
Upon receiving the initialization start signal, the CPU performs the initialization start without saving the contents of the main storage device. If the time set in the timer has passed and the input of the system down signal (106) continues, it is determined that the restart has failed, the counter is incremented again, and the timer is started again. If the input of the system down signal (106) has disappeared, the restart is considered to be successful, the state of the restart signal processing unit (107) is initialized, and the process ends. Next, if the restart fails, the counter is set to "3" by inputting the system down signal (106) again.
Is set to In this case, the restart is abandoned, and a switch (not shown) for inputting the system down signal (106)
Is turned off to invalidate the restart signal processing unit (107) so that the restart signal (110) is not output again. In addition, the state of the restart signal processing unit (107) is initialized and the process ends. As described above, in this embodiment, the restart signal processing unit is provided as an external system of the multi-computer system, the reset start signal is given at the first system down, and the initialization is performed at the restart failure. Since the start signal is given, the CPU that goes down is
By giving an appropriate restart signal, it is possible to automatically and quickly restore the function of the multi-computer system.

【００１０】次に、本実施例の詳細な機器構成を図４、
図５、図６に示す。図４は、多重系計算機システムの全
体の機器構成であり、図４において、各ＣＰＵ１，２，
ｎ（１０１，１０２，１０３）からシステムダウン判定
部（１０４）にそれぞれＣＰＵ停止信号（２０１）、ウ
ォッチドッグタイマ（ＷＤＴ）タイムアウト信号（２０
２）、オンライン状態信号（２０３）、オフライン状態
信号（２０４）を出力し、システムダウン判定部（１０
４）から再起動信号処理部（１０７）にＣＰＵ１ダウン
信号（３０１）、ＣＰＵ２ダウン信号（３０２）、ＣＰ
Ｕｎダウン信号（３０３）、システムダウン信号（３０
４）を出力し、システムダウン判定部（１０４）から各
ＣＰＵ（１０１，１０２，１０３）にリセット起動信号
（４０１）、初期化起動信号（４０２）を出力し、各Ｃ
ＰＵ（１０１，１０２，１０３）から主記憶装置の内容
をセーブするセーブ用ＤＩＳＣ１，２，ｎ（１１２）に
主記憶内容退避要求（４０５）を出力する。Next, a detailed device configuration of this embodiment is shown in FIG.
This is shown in FIGS. FIG. 4 shows the overall device configuration of the multi-system computer system. In FIG.
n (101, 102, 103) to the system down determination unit (104), a CPU stop signal (201) and a watchdog timer (WDT) timeout signal (20).
2), the online status signal (203) and the offline status signal (204) are output, and the system down determination unit (10)
4) from the restart signal processing unit (107) to the CPU1 down signal (301), CPU2 down signal (302), CP
Un down signal (303), system down signal (30
4), the system down determination unit (104) outputs a reset activation signal (401) and an initialization activation signal (402) to each CPU (101, 102, 103), and each C
A main memory content save request (405) is output from the PU (101, 102, 103) to the save DISC 1, 2, n (112) for saving the content of the main memory device.

【００１１】図５は、３重系計算機システムにおけるシ
ステムダウン判定部（１０４）の機器構成であり、シス
テムダウン判定部（１０４）は、ＣＰＵ１ダウン判定部
（３１１）、ＣＰＵ２ダウン判定部（３１２）、ＣＰＵ
３ダウン判定部（３１３）、システムダウンの判定部
（３１４）からなる。（１）、まず、ＣＰＵ１ダウン判定部（３１１）は、Ｃ
ＰＵ１（１０１）から出力されるＣＰＵ停止信号（２０
１）、ウォッチドッグタイマ（ＷＤＴ）タイムアウト信
号（２０２）、オンライン状態信号（２０３）、オフラ
イン状態信号（２０４）を取り込む。ＣＰＵ停止信号
（２０１）とウォッチドッグタイマ（ＷＤＴ）タイムア
ウト信号（２０２）との論理和（ＯＲ回路）、さらにオ
ンライン状態信号（２０３）との論理積（ＡＮＤ回路）
をとり、ＣＰＵ１（１０１）がオンライン運転中に停止
したか（この状態をＣＰＵダウンとする。）を判定す
る。ＣＰＵダウンしたとき、ＣＰＵ１ダウン信号（３０
１）を出力する。ＣＰＵ２ダウン判定部（３１２）、Ｃ
ＰＵ３ダウン判定部（３１３）においても同様にＣＰＵ
２（１０２）、ＣＰＵ３（１０３）のＣＰＵダウンの判
定を行ない、ＣＰＵダウンしたとき、ＣＰＵ２ダウン信
号（３０２）、ＣＰＵ３ダウン信号（３０３）を出力す
る。ここで、各ＣＰＵ１，２，３がオフライン状態のと
き、オフライン状態信号（２０４）によりホールド回路
をリセットする。（２）、続いて、（１）の判定結果により、システムダ
ウンに至るＣＰＵダウンの組合せを判定する。本実施例
におけるシステムダウンの定義は「オンライン運転（常
用系、待機系）となりうる２台のＣＰＵの停止」として
いるため、ＣＰＵ１とＣＰＵ２のＣＰＵダウン、ＣＰＵ
２とＣＰＵ３のＣＰＵダウン、ＣＰＵ１とＣＰＵ３のＣ
ＰＵダウンの３通りのＣＰＵダウンの組合せつまり各Ｃ
ＰＵダウン信号との論理積（ＡＮＤ回路）をとり、シス
テムダウンの判定部（３１４）における論理回路（ＯＲ
回路、ＮＯＴＯＲ回路、ＡＮＤ回路）により、システム
ダウンを判定する。各ＣＰＵダウン信号（３０１，３０
２，３０３）、システムダウン信号（３０４）は再起同
信号処理部（１０７）に出力される。FIG. 5 shows a device configuration of the system down determination unit (104) in the triple computer system. The system down determination unit (104) includes a CPU1 down determination unit (311) and a CPU2 down determination unit (312). , CPU
A 3 down determination unit (313) and a system down determination unit (314). (1) First, the CPU1 down determination unit (311) sets C
CPU stop signal (20) output from PU1 (101)
1), a watchdog timer (WDT) timeout signal (202), an online status signal (203), and an offline status signal (204) are fetched. A logical sum (OR circuit) of the CPU stop signal (201) and the watchdog timer (WDT) timeout signal (202), and a logical product (AND circuit) of the online state signal (203)
Then, it is determined whether the CPU 1 (101) is stopped during the online operation (this state is referred to as CPU down). When the CPU goes down, the CPU1 down signal (30
1) is output. CPU2 down determination unit (312), C
Similarly in the PU3 down determination unit (313)
2 (102), CPU3 (103) CPU down determination is performed, and when the CPU is down, the CPU2 down signal (302) and the CPU3 down signal (303) are output. Here, when each of the CPUs 1, 2, and 3 is in the off-line state, the hold circuit is reset by the off-line state signal (204). (2) Then, based on the determination result of (1), the CPU down combination leading to the system down is determined. The definition of system down in this embodiment is "stop of two CPUs that can be in online operation (regular system, standby system)". Therefore, CPU down of CPU1 and CPU2, CPU down
2 and CPU3 CPU down, CPU1 and CPU3 C
Three combinations of CPU down, PU down, that is, each C
The logical product (AND circuit) with the PU down signal is taken, and the logical circuit (OR) in the system down determination unit (314) is obtained.
Circuit, NOT circuit, AND circuit) to determine system down. Each CPU down signal (301, 30
2, 303) and the system down signal (304) are output to the restarting signal processing unit (107).

【００１２】図６は、３重系計算機システムにおける再
起同信号処理部（１０７）の機器構成であり、再起同信
号処理部（１０７）は、タイマ（４１１）、カウンタ
（４１４）、リセット起動信号発生回路（４１６）、初
期化起動信号発生回路（４１７）、リレー回路（４１
８）、ホールド回路（４１９）からなる。（１）、まず、システムダウン信号（３０４）が入力さ
れると、カウンタ（４１４）のインクリメント、および
タイマ（４１１）の起動（４１２）を行なう。なお、リ
レー回路（４１８）はシステムダウン信号入力用のスイ
ッチであり、システムダウン無しまたは復帰時に閉塞、
その状態を保持してシステムダウン信号（３０４）の入
力を有効とし、再起動断念時（後述）に開放し、システ
ムダウン信号の入力を無効とする。（２）、初回のシステムダウン信号（３０４）が入力さ
れると、カウンタが「１」に設定され、リセット起動信
号発生回路（４１６）が起動する。リセット起動信号発
生回路（４１６）の出力信号（４０３）とシステムダウ
ン判定部（１０４）から出力された各ＣＰＵダウン信号
（３０１，３０２，３０３）との論理積（ＡＮＤ回路）
をとり、生成されたリセット起動信号（４０１）は、ダ
ウンしたＣＰＵに出力される。ここで、ダウンしたＣＰ
Ｕは、リセット起動信号（４０１）を受信し、セーブ用
ＤＩＳＣ（１１２）に主記憶内容退避要求（４０５）を
出力し、主記憶装置の内容をセーブする。（３）、タイマ（４０５）に設定された時間が経過する
と、ホールド回路（４１９）とタイマ（４０５）自体を
リセット（４１３）する。この時、システムダウンが復
帰していなければ（再起動失敗）、再度システムダウン
信号（３０４）が再起動信号処理部（１０７）に入力さ
れ、ホールド回路（４１５）をセットし、（１）と同様
にカウンタ（４１４）のインクリメント、タイマ（４１
１）の起動（４１２）を行う。システムダウンが復帰し
た場合は、カウンタ（４１４）の状態を初期化し、終了
する。（４）、再起動失敗の場合、再度のシステムダウン信号
（３０４）が入力され、カウンタ（４１４）が「２」に
設定され、初期化起動信号発生回路（４１７）が起動す
る。初期化起動信号発生回路（４１７）の出力信号（４
０４）とシステムダウン判定部（１０４）から出力され
た各ＣＰＵダウン信号（３０１，３０２，３０３）との
論理積（ＡＮＤ回路）をとり、生成された初期化起動信
号（４０２）は、ダウンしたＣＰＵに出力される。（５）、タイマ（４０５）に設定された時間が経過する
と、ホールド回路（４１９）とタイマ（４０５）自体を
リセット（４１３）する。この時、システムダウンが復
帰していなければ（再起動失敗）、再度システムダウン
信号（３０４）が再起動信号処理部（１０７）に入力さ
れ、ホールド回路（４１９）をセットし、（１）と同様
にカウンタ（４１４）のインクリメント、タイマ（４１
１）の起動（４１２）を行う。システムダウンが復帰し
た場合は、カウンタ（４１４）の状態を初期化し、終了
する。（６）、再起動失敗の場合は、再度のシステムダウン信
号（３０４）が入力され、カウンタ（４１４）が「３」
に設定される。この場合を再起動断念とし、システムダ
ウン入力用スイッチであるリレー回路（４１８）を開放
し、システムダウン信号の入力を無効し、再度の再起動
信号出力を行わないようにするとともに、カウンタ（４
１４）の状態を初期化し、終了する。FIG. 6 shows the equipment configuration of the restart signal processing unit (107) in the triple computer system. The restart signal processing unit (107) includes a timer (411), a counter (414), and a reset start signal. Generating circuit (416), initialization start signal generating circuit (417), relay circuit (41)
8) and a hold circuit (419). (1) First, when the system down signal (304) is input, the counter (414) is incremented and the timer (411) is started (412). The relay circuit (418) is a switch for inputting the system down signal, and is closed when the system is down or when the system is restored.
The input of the system down signal (304) is made valid by keeping this state, opened at the time of giving up restart (described later), and the input of the system down signal is made invalid. (2) When the system down signal (304) for the first time is input, the counter is set to "1" and the reset activation signal generation circuit (416) is activated. Logical product (AND circuit) of the output signal (403) of the reset activation signal generation circuit (416) and each CPU down signal (301, 302, 303) output from the system down determination unit (104)
The reset activation signal (401) thus generated is output to the down CPU. CP down here
The U receives the reset start signal (401), outputs a main memory content save request (405) to the save DISC (112), and saves the content of the main memory device. (3) When the time set in the timer (405) has elapsed, the hold circuit (419) and the timer (405) themselves are reset (413). At this time, if the system down is not recovered (restart failure), the system down signal (304) is input again to the restart signal processing unit (107), the hold circuit (415) is set, and (1) is set. Similarly, the counter (414) is incremented and the timer (41
The activation (412) of 1) is performed. When the system down is restored, the state of the counter (414) is initialized and the process ends. (4) If the restart fails, the system down signal (304) is input again, the counter (414) is set to "2", and the initialization start signal generation circuit (417) is started. The output signal (4) of the initialization start signal generation circuit (417)
04) and the CPU down signals (301, 302, 303) output from the system down determination unit (104), and the generated initialization start signal (402) goes down. It is output to the CPU. (5) When the time set in the timer (405) has elapsed, the hold circuit (419) and the timer (405) themselves are reset (413). At this time, if the system down is not recovered (restart failure), the system down signal (304) is input again to the restart signal processing unit (107), the hold circuit (419) is set, and (1) is set. Similarly, the counter (414) is incremented and the timer (41
The activation (412) of 1) is performed. When the system down is restored, the state of the counter (414) is initialized and the process ends. (6) If the restart fails, the system down signal (304) is input again, and the counter (414) is "3".
Is set to In this case, the restart is abandoned, the relay circuit (418) that is the system down input switch is opened, the system down signal input is disabled, and the restart signal output is not performed again.
The state of 14) is initialized and the process ends.

【００１３】[0013]

【発明の効果】以上説明したように、本発明によれば、
システムダウン判定部および再起動信号処理部を多重系
計算機システムの外部システムとして設け、複数の計算
機の状態信号をそれぞれ取り込み、この状態信号の組合
せによってシステムダウンの検出を行なうため、システ
ムダウンの監視が確実かつ容易である。また、システム
ダウン検出時に、ダウンした特定のＣＰＵに対し、適切
な再起動信号を与えるため、多重系計算機システムの機
能復旧を自動的にかつ素早く行なうことができる。これ
により、本発明では、システムの稼働率を上げ、信頼性
を向上させることが可能となる。また、本発明では、再
起動実行時に主記憶装置の内容を自動的にセーブ用ＤＩ
ＳＣに保存することにより、後のトラブル原因解析に役
立てることが可能となる。As described above, according to the present invention,
A system down determination unit and a restart signal processing unit are provided as an external system of the multi-system computer system, the status signals of a plurality of computers are respectively captured, and the system down is detected by the combination of these status signals. Reliable and easy. Further, when a system down is detected, an appropriate restart signal is given to the specific CPU that went down, so that the function recovery of the multi-system computer system can be performed automatically and quickly. As a result, in the present invention, it is possible to increase the operating rate of the system and improve the reliability. Further, in the present invention, the contents of the main storage device are automatically saved for DI when the restart is executed.
By storing in the SC, it becomes possible to use it for later trouble cause analysis.

[Brief description of drawings]

【図１】本発明の一実施例を示す多重系計算機システム
のブロック図FIG. 1 is a block diagram of a multi-system computer system showing an embodiment of the present invention.

【図２】システムダウン判定部の動作フローFIG. 2 is an operation flow of a system down determination unit

【図３】再起動信号処理部（１０７）の動作フローFIG. 3 is an operation flow of a restart signal processing unit (107).

【図４】多重系計算機システムの全体の機器構成図FIG. 4 is an overall device configuration diagram of a multi-computer system

【図５】システムダウン判定部の機器構成図FIG. 5 is a device configuration diagram of a system down determination unit

【図６】再起同信号処理部の機器構成図FIG. 6 is a device configuration diagram of a restarting signal processing unit.

【図７】従来の３重系計算機システムFIG. 7: Conventional triple computer system

[Explanation of symbols]

１０１ＣＰＵ１１０２ＣＰＵ２１０３ＣＰＵｎ１０４システムダウン判定部１０７再起動信号処理部１０８再起動断念判定機構１１０再起動信号生成機構１１２主記憶装置内容セーブ用ＤＩＳＣ３１１ＣＰＵダウン判定部（ＣＰＵ１）３１２ＣＰＵダウン判定部（ＣＰＵ２）３１３ＣＰＵダウン判定部（ＣＰＵ３）３１４システムダウンの判定部４１１タイマ４１４カウンタ４１６リセット起動信号発生回路４１７初期化起動信号発生回路４１８リレー回路４１９ホールド回路 101 CPU1 102 CPU2 103 CPUn 104 System down determination unit 107 Restart signal processing unit 108 Restart abandonment determination mechanism 110 Restart signal generation mechanism 112 DISC 311 for saving main memory contents CPU down determination unit (CPU1) 312 CPU down determination unit (CPU2) 313 CPU down determination unit (CPU3) 314 System down determination unit 411 Timer 414 Counter 416 Reset activation signal generation circuit 417 Initialization activation signal generation circuit 418 Relay circuit 419 Hold circuit

Claims

[Claims]

1. In a multi-system computer system composed of a plurality of computers, the computer is installed outside the system, and the abnormal combination of computers leading to system down is checked based on the status signal fetched from each computer. A system down judging means for judging whether or not the system is down, and a restart signal as a reset start signal for the first restart request, and as a reset start signal for the second and subsequent restart requests. An automatic restart device for a multi-system computer system, comprising an output restart signal processing means for outputting.

2. The automatic restart device for a multi-system computer system according to claim 1, wherein the function of the restart signal processing means is disabled when the restart request is issued a predetermined number of times.

3. The memory according to claim 1 or 2, wherein a memory for saving the contents of the main storage device is provided, and each computer is
When receiving the reset start signal, the contents of the main storage device are saved in the memory before the reset start is performed, and when the initialization start signal is received, the contents of the main storage device are not saved. Computer computer system automatic restart device.

4. The system down determination means according to claim 1, wherein the system down determination means checks the combination of abnormalities of the computers for each computer to determine the computer down, and each computer down. An automatic restart apparatus for a multi-system computer system, comprising: a system down determination unit that determines a system down based on a logical product of

5. The restart signal processing means according to claim 1, wherein the restart signal processing means sets a time required for restarting the computer and a counter for counting the number of restart requests. And a reset activation signal generation circuit that issues a reset activation signal and an initialization activation signal generation circuit that issues an initialization activation signal, and abandons the restart request when the number of restart requests is greater than or equal to a predetermined number. Automatic restart device for multi-system computer system.

6. In a multi-system computer system composed of a plurality of computers, based on a status signal taken from each computer, a combination of computer abnormalities leading to a system down is checked to determine whether or not the system is down. An automatic restart method for a multi-system computer system, wherein when the system is down, a reset start signal is issued in response to the first restart request, and an initialization start signal is issued in response to subsequent restart requests.

7. In a multi-system computer system composed of a plurality of computers, a combination of computer abnormalities leading to a system down is checked based on a status signal fetched from each computer to determine whether or not the system is down. When the system is down, a reset start signal is issued for the first restart request, an initialization start signal is issued for the second and subsequent restart requests, and a restart request is issued when the number of restart requests reaches a predetermined number. A method for automatically restarting a multi-computer system characterized by giving up.

8. The automatic restart system for a multi-system computer system according to claim 6 or 7, wherein the computer receiving the reset start signal saves the contents of the main storage device before performing the reset start. starting method.