JPH02141834A

JPH02141834A - Processor failure recovery method

Info

Publication number: JPH02141834A
Application number: JP63294537A
Authority: JP
Inventors: Yasushi Sakamoto; 靖坂本
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1988-11-24
Filing date: 1988-11-24
Publication date: 1990-05-31

Abstract

PURPOSE:To quickly perform the trouble recovery without intervention of a maintenance man by automatically performing IMPL again before reporting the occurrence of trouble from a service processor to a display device, and recovering the trouble without affecting in the operation of the other processor. CONSTITUTION:A service processor 1 is provided with a mechanism 2 which analyzes trouble of disk control processors 10 and 11 and disk devices 12 and 13. When trouble occurs in the service processor 1, a watchdog timer 4 expires because a clear signal 5 cannot be sent to this timer, and a time-out signal 6 is sent to a trouble recovery mechanism 3. This mechanism 3 automatically loads the program of the service processor 1 again to raise the system (re- IMPL). Then, trouble is automatically recovered without affecting in the other processor. Thus, the trouble recovery is performed without intervention of the maintenance man.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は、マルチプロセッサ構成の電子計算機で、サー
ビスプロセッサの障害回復方法に関する。DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention relates to a failure recovery method for a service processor in an electronic computer having a multiprocessor configuration.

[Conventional technology]

従来の装置は、特開昭６３−５９６３７号公報に記載の
ように１つのプロセッサの障害をウォッチドッグタイマ
により監視し、プロセッサに障害発生し、ウォッチドッ
グタイマに一定時間リセットがかからないとタイマーカ
ウンタがオーバーフローし、その時までのリセット回数
を表示装置に表示するとあり、プロセッサの構成もシン
グルプロセッサ方式となっていた。As described in Japanese Patent Application Laid-Open No. 63-59637, conventional devices monitor failures in one processor using a watchdog timer, and if a processor failure occurs and the watchdog timer is not reset for a certain period of time, the timer counter is reset. If an overflow occurred, the number of resets up to that point would be displayed on the display, and the processor configuration was also a single processor system.

[Problem to be solved by the invention]

上記従来技術は、障害を回復する点について配慮がされ
ておらず、障害検出後、障害要因に応じてオペレーター
が障害回復しなければならず、回復する為の操作が必要
という問題があった。The above-mentioned conventional technology does not take into account recovery from a fault, and after a fault is detected, the operator has to perform fault recovery depending on the cause of the fault, and there is a problem in that an operation for recovery is required.

本発明は、障害検出後、自動的に障害回復し、他のプロ
セッサに影響を与えずに障害回復する機能を提供するこ
とを目的とする。SUMMARY OF THE INVENTION An object of the present invention is to provide a function to automatically perform fault recovery after a fault is detected, and to perform fault recovery without affecting other processors.

[Means to solve the problem]

上記目的を達成するために、障害検出したサービスプロ
セッサが表示装置に障害発生を報告する前に自動的に再
ＩＭＰＬ　（プログラムの再ロードによるシステム立上
げ）を行ない、他プロセツサの動作に影響を与えること
なく、サービスプロセッサの障害を回復するようにした
ものである。In order to achieve the above purpose, the service processor that detects a fault automatically performs a re-IMPL (system startup by reloading the program) before reporting the fault to the display device, thereby affecting the operation of other processors. The system is designed to recover from a service processor failure without any trouble.

[Effect]

サービスプロセッサに障害発生すると、サービスプロセ
ッサの障害回復機構が自動的にサービスプロセッサの再
ＩＭＰＬ　（プログラムの再ロードによるシステム立上
げ）を行なう、それによって障害回復をサービスプロセ
ッサだけで行なうので、他プロセツサの動作に影響を与
えることがなく。When a failure occurs in the service processor, the service processor's failure recovery mechanism automatically re-IMPLs the service processor (starts up the system by reloading the program).As a result, failure recovery is performed only by the service processor, and other processors are without affecting operation.

誤動作することがない。No malfunctions.

〔Example〕

以下５本発明の一実施例を第１図により説明する。 An embodiment of the present invention will be described below with reference to FIG.

サービスプロセッサ１は、ディスク制御プロセッサ１０
．１１とディスク装置１２．１３の障害を解析する機構
２を持つ、ディスク制御プロセッサ１０．１１は、中央
処理装置１４からの制御信号１９．２０を受け、ディス
ク装置１２．１３に対してリード／ライト信号１７，１
８を介しデータ転送を行ない、制御信号１８．１９を介
して中央処理装置１４にデータを転送する。サービスプ
ロセッサ１は、制御信号１５．２２をディスク制御プロ
セッサ１０．１１に送りディスクの障害情報１６．２１
を受取りディスク障害解析機構２で障害要因を解析する
。表示装置８は、サービスプロセッサ１に制御信号１８
を送り、障害情報１７を得て、これを表示する。ウォッ
チドッグタイマ４は、サービスプロセッサ１の障害監視
するタイマで、一定期間クリア信号５を受けないと、タ
イムアウト信号６を発生する。サービスプロセッサ１が
正常に動作している間は、一定間隔で障害回復機構３が
ウォッチドッグタイマ４ヘクリア信号を送るのでタイム
アウト信号６は発生しない。The service processor 1 is a disk control processor 10
．． The disk control processor 10.11, which has a mechanism 2 for analyzing failures in the disk device 12.11 and the disk device 12.13, receives a control signal 19.20 from the central processing unit 14 and performs read/write operations on the disk device 12.13. signal 17,1
8 and to the central processing unit 14 via control signals 18 and 19. The service processor 1 sends a control signal 15.22 to the disk control processor 10.11 to provide disk failure information 16.21.
The disk failure analysis mechanism 2 receives the data and analyzes the cause of the failure. The display device 8 sends a control signal 18 to the service processor 1.
is sent, the fault information 17 is obtained, and this is displayed. The watchdog timer 4 is a timer that monitors failures in the service processor 1, and generates a timeout signal 6 if it does not receive a clear signal 5 for a certain period of time. While the service processor 1 is operating normally, the failure recovery mechanism 3 sends a clear signal to the watchdog timer 4 at regular intervals, so the timeout signal 6 is not generated.

サービスプロセッサ１で障害が発生するとウォッチドッ
グタイマ４ヘクリア信号５を送れなくなるのでウォッチ
ドッグタイマはタイムアウトになり、タイムアウト信号
６を障害回復機構３に送る。When a failure occurs in the service processor 1, the clear signal 5 cannot be sent to the watchdog timer 4, so the watchdog timer times out and sends a timeout signal 6 to the failure recovery mechanism 3.

障害回復機構３は、自動的にサービスプロセッサ１の再
ＩＭＰＬかける。ＩＭＰＬ　（イニシャルマイクロプロ
グラムロード）が、かかると外部記憶装置２４からサー
ビスプロセッサ１のプログラムをメモリ７にロードする
。プログラムがサービスプロセッサ１にリセット信号９
をオンにすると障害が解除される。The failure recovery mechanism 3 automatically re-IMPLs the service processor 1. IMPL (initial microprogram load) loads the program of the service processor 1 from the external storage device 24 into the memory 7. Program sends reset signal 9 to service processor 1
Turning on will clear the fault.

障害要因が解除されなければ、ＩＭＰＬを複数回かけて
障害回復する。サービスプロセッサ１がＩＭＰＬによる
障害回復中でも、中央処理装置１４は、ディスク制御プ
ロセッサ１０．１１と制御信号１９．２０をやりとりで
き、かつディスク制御プロセッサ１０．１１は、ディス
ク装置１３゜１４に対しリード／ライト信号１７．１８
をやりとりできる。障害回復中、ディスク制御プロセッ
サ１０．１１のディスク入出力動作には、影響を与えな
い。If the cause of the failure is not resolved, IMPL is applied multiple times to recover from the failure. Even while the service processor 1 is recovering from a failure using IMPL, the central processing unit 14 can exchange control signals 19.20 with the disk control processor 10.11, and the disk control processor 10.11 can read/write the disk devices 13 and 14. Light signal 17.18
can be exchanged. During failure recovery, the disk I/O operations of the disk control processor 10.11 are not affected.

本実施例によれば、サービスプロセッサの障害を自動的
に回復するので、保守員を介在せずにすみやかに障害回
復できる効果がある。According to this embodiment, since a fault in the service processor is automatically recovered, there is an advantage that the fault can be recovered quickly without the intervention of maintenance personnel.

〔Effect of the invention〕

本発明によれば、プロセッサの障害発生時、他プロセツ
サに影響を与えることなく、自動的に回復できるので、
障害回復を保守員の介在なしで行なうことができる。According to the present invention, when a processor failure occurs, it can be automatically recovered without affecting other processors.
Failure recovery can be performed without the intervention of maintenance personnel.

[Brief explanation of the drawing]

第１図は本発明の一実施例のブロック図である。１・・・サービスプロセッサ。２・・・ディスク障害解析機構。３・・・障害回復機構。４・・・ウォッチドッグタイマ。８・・・表示装置。 FIG. 1 is a block diagram of one embodiment of the present invention. 1... Service processor. 2...Disk failure analysis mechanism. 3...Failure recovery mechanism. 4...Watchdog timer. 8...Display device.

Claims

[Claims]

1. In an electronic computer consisting of multiple processors, a service processor, and a display device, after detecting a fault in the service processor, re-IMPL (system startup by reloading the program) only the service processor, without affecting the operation of other processors. A failure recovery method for a processor, characterized by providing a mechanism for recovering from a failure without causing any damage to the processor.