JP2844361B2

JP2844361B2 - Error recovery processing method

Info

Publication number: JP2844361B2
Application number: JP1228293A
Authority: JP
Inventors: 浩江川; 均遠山
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1989-09-05
Filing date: 1989-09-05
Publication date: 1999-01-06
Anticipated expiration: 2014-01-06
Also published as: JPH0391841A

Description

【発明の詳細な説明】〔概要〕複数の処理装置を有するシステムに属する当該処理装
置に発生した異常に対し、正常な処理の続行を図る異常
回復処理方式に関し、全処理装置が異常状態に陥った場合にもシステムを再
度立ち上げて正常な処理を続行することができる信頼性
のある異常回復処理方式を提供することを目的とし、前記処理装置に発生した異常により休止状態とする休
止命令発生部を当該各処理装置に設けるとともに、全処
理装置が休止状態にあるか否かを判別する全休止状態検
出部と、全処理装置が休止状態にあると判断した場合に
前記全処理装置に対してリセット信号を送出するリセッ
ト信号送出部とを有する構成である。DETAILED DESCRIPTION OF THE INVENTION [Overview] Regarding an error recovery processing method for continuing normal processing for an abnormality occurring in a processing device belonging to a system having a plurality of processing devices, all the processing devices are in an abnormal state. The purpose of the present invention is to provide a reliable error recovery processing method that can restart the system and continue normal processing even in the event of an error. A unit is provided for each of the processing devices, and an all-inactive state detection unit that determines whether all of the processing devices are in an inactive state, and an all inactive state detection unit that determines that all of the processing devices are in an inactive state. And a reset signal sending unit for sending a reset signal.

[Industrial applications]

本発明は、異常回復処理方式に係り、特に、複数の処
理装置を有するシステムに属する当該処理装置に発生し
た異常に対し、正常な処理の続行を図る異常回復処理方
式に関する。The present invention relates to an error recovery processing method, and more particularly to an error recovery processing method for continuing normal processing for an error occurring in a processing device belonging to a system having a plurality of processing devices.

[Conventional technology]

従来、第６図に示すようなシステムがあった。 Conventionally, there has been a system as shown in FIG.

本システムは複数の処理装置６（ｉ）;i＝1,2,〜（コ
ンピュータ）からなり、各処理装置は相互に異常を監視
しており、ある処理装置に異常が発生した場合には、残
りの処理装置によりシステムの運用を続行するような異
常回復方式が用いられていた。This system is composed of a plurality of processing units 6 (i); i = 1, 2,... (Computers), and each processing unit monitors each other for abnormalities. An abnormal recovery method has been used in which the remaining processing devices continue the operation of the system.

[Problems to be solved by the invention]

ところで、従来の異常回復方式にあっては、各処理装
置に次々と異常が発生し、遂に残りの処理装置がなくな
ってしまった場合には、当該システムは処理を続行する
ことができないという問題点を有していた。By the way, in the conventional error recovery method, when an error occurs in each processing device one after another and finally the remaining processing devices are exhausted, the system cannot continue the processing. Had.

一方、一般にコンピュータの異常は一時的な障害が殆
どであり再度立ち上げることによりコンピュータは正常
な処理を続行するという経験的事実があった。On the other hand, in general, computer failures are mostly temporary failures, and there is an empirical fact that the computer can continue normal processing by restarting.

そこで、本発明は全処理装置が異常状態に陥った場合
にもシステムを再度立ち上げて正常な処理を続行するこ
とができる信頼性のある異常回復処理方式を提供するこ
とを目的としてなされたものである。Accordingly, the present invention has been made to provide a reliable abnormality recovery processing method that can restart the system and continue normal processing even when all the processing apparatuses fall into an abnormal state. It is.

[Means for solving the problem]

以上の技術的課題を解決するため本発明は第１図に示
すように、複数の処理装置１（ｉ）,i＝1,2〜ｎを有す
るシステムに属する当該処理装置１（ｉ）に発生した異
常に対し、正常な処理の続行を図る異常処理方式におい
て、前記処理装置１（ｉ）に発生した異常により休止状
態とする休止命令発生部１（ｉ）ａを当該各処理装置に
設けるとともに、全処理装置１（ｉ）が休止状態にある
か否かを判別する全休止状態検出部２と、全処理装置が
休止状態にあると判断した場合に前記全処理装置に対し
てリセット信号を送出するリセット信号送出部３とを有
するものである。In order to solve the above technical problem, as shown in FIG. 1, the present invention is applied to a processing apparatus 1 (i) belonging to a system having a plurality of processing apparatuses 1 (i), i = 1, 2 to n. In an abnormal processing method for continuing normal processing in response to the abnormalities, a sleep instruction generating unit 1 (i) a for setting the processing apparatus 1 (i) to a halt state due to the abnormality generated in the processing apparatus 1 (i) is provided in each of the processing apparatuses. An all-inactive-state detecting unit 2 for determining whether or not all the processing devices 1 (i) are in an inactive state, and a reset signal to all of the processing devices when it is determined that all of the processing devices are in an inactive state. And a reset signal transmitting unit 3 for transmitting.

[Action]

各処理装置１（ｉ）,i＝1,2〜ｎには休止命令発生部
１（ｉ）a;i＝1,2〜ｎが設けられており、各処理装置で
異常が発生した場合には当該処理装置は休止状態とな
る。Each of the processing devices 1 (i), i = 1, 2 to n is provided with a pause instruction generator 1 (i) a; i = 1, 2, to n. Means that the processing apparatus is in a halt state.

異常が発生しても、正常な動作をしている処理装置が
１つでもある限りは、当該処理装置によりシステムは正
常な処理が続行される。Even if an abnormality occurs, as long as there is at least one processing device that operates normally, the system continues normal processing by the processing device.

しかし、当該全処理装置に異常が発生して前記休止命
令発生部により休止状態となったことが前記全休止状態
検出部２により検出された場合には、当該システムは正
常に処理を続行することができないことになり、その旨
を前記リセット信号送出部３に通知する。However, when the all-hibernation-state detecting unit 2 detects that all the processing units have become abnormal and the halt instruction generating unit has entered the halt state, the system normally continues processing. The reset signal sending unit 3 is notified of the fact.

すると、通知を受けたリセット信号送出部３は全処理
装置に対してリセット信号を送出することになる。Then, the reset signal sending unit 3 that has received the notification sends a reset signal to all the processing devices.

これは、一般にコンピュータの異常は一時的な障害が
殆どであり、再度立ち上げることにより正常な処理を続
行することがあるという経験的事実に基づくものであ
る。This is based on the empirical fact that computer failures are almost always temporary failures, and normal processing may be continued by restarting.

〔Example〕

続いて、本発明の実施例について説明する。 Next, examples of the present invention will be described.

第２図に本実施例に係るシステムの全体図を示す。 FIG. 2 shows an overall view of the system according to the present embodiment.

本装置は複数のCPU等の処理装置11（ｉ）;i＝1,2,〜
を有するとともに、当該処理装置11（ｉ）に発生した異
常により当該処理装置を休止状態とする前記休止命令発
生部１（ｉ）ａに相当するホルト命令発生部11（ｉ）ａ
が当該各処理装置に設けられている。This device is a processing device 11 (i) such as a plurality of CPUs; i = 1, 2,.
And a halt instruction generation unit 11 (i) a corresponding to the halt instruction generation unit 1 (i) a for bringing the processing device into a halt state due to an abnormality occurring in the processing device 11 (i).
Is provided in each processing apparatus.

さらに、本システムでは当該各処理装置とは独立した
サブシステムであって、各処理装置を運用していくのに
必要な本体のハードウェア制御、オペレーティング・シ
ステム（OS）との会話手段の提供、システムの運用状況
の監視、及び診断等を行うものである。Further, in the present system, each processing device is a subsystem independent of the respective processing devices, and hardware control of the main body necessary for operating each processing device, provision of a means of conversation with an operating system (OS), It monitors the operation status of the system and performs diagnosis and the like.

第３図に当該監視装置を詳細に示すものである。 FIG. 3 shows the monitoring device in detail.

当該監視装置はCPUにより構成され、前記全ホルト状
態検出部２及びリセット信号送出部３に相当する監視処
理部21と、各処理装置の制御用レジスタに対して書込み
または読出しのアクセスを行うハード・アクセス部22
と、前記各処理装置11（ｉ）;i＝1,2,3,〜がホルト状態
（休止状態）にあるか否かを示す各処理装置毎に対応す
る表を格納しておくメモリからなる制御テーブル23と、
外部記憶装置との入出力操作の処理や実行順序、誤り処
理等のシステムを円滑に動かすために用いる管理部24と
を有するものである。The monitoring device is constituted by a CPU, a monitoring processing unit 21 corresponding to the all halt state detection unit 2 and the reset signal transmission unit 3, and a hardware / access unit for writing or reading access to a control register of each processing device. Access section 22
And a memory for storing a table corresponding to each processing device, which indicates whether or not each of the processing devices 11 (i); i = 1, 2, 3,... Is in a halt state (pause state). Control table 23,
It has a management unit 24 used to smoothly operate a system for processing and execution of input / output operations with an external storage device, execution order, error processing, and the like.

本実施例に係る装置は次のように動作する。 The device according to the present embodiment operates as follows.

各処理装置11（ｉ）;i＝1,2,〜に異常が発生して当該
処理装置が正常な処理を続行することができないと判断
した場合には、前記ホルト命令発生部11（ｉ）;i＝1,2,
〜は当該処理装置をホルト状態にする。各処理装置の運
転の状況、すなわち、ホルト状態にあるか否かは各処理
装置11（ｉ）;i＝1,2,〜のステータス・レジスタに表示
される。If it is determined that an abnormality has occurred in each of the processing units 11 (i); i = 1, 2,... And the processing unit cannot continue normal processing, the halt instruction generation unit 11 (i) ; i = 1,2,
Sets the processing apparatus in a halt state. The operating status of each processing device, that is, whether or not the processing device is in the halt state is displayed in the status register of each processing device 11 (i); i = 1, 2,.

当該監視装置20は第５図の流れ図に示すように、ステ
ップSJ1でそのハード・アクセス部22は前記各処理装置1
1（ｉ）;i＝1,2,〜のステータス・レジスタに対して定
期的に読出し（アクセス）の指示を行う。読み出された
情報、すなわちホルト状態にあるか否かの情報は前記制
御テーブル23に第４図上段に示すように各処理装置毎に
記録される。As shown in the flow chart of FIG. 5, the monitoring device 20 determines in step SJ1 that the hard access unit 22
1 (i); a read (access) instruction is periodically issued to the status registers of i = 1, 2,. The read information, that is, information as to whether the apparatus is in the halt state or not is recorded in the control table 23 for each processing device as shown in the upper part of FIG.

ステップSJ2で前記監視処理部21は定期的に前記制御
テーブル23に対してアクセスを行い、当該表の内容を監
視し、全処理装置がホルト状態であるか否かの判定を行
う。In step SJ2, the monitoring processing unit 21 periodically accesses the control table 23, monitors the contents of the table, and determines whether all the processing devices are in the halt state.

全処理装置11（ｉ）;i＝1,2,〜がホルト状態でない場
合には、ホルト状態にない残余の処理装置（たとえ、そ
れが１個しかない場合であっても）によりシステムの処
理の続行が図られ、何ら監視装置からは指示はない。If all the processing units 11 (i); i = 1, 2,... Are not in the halt state, the processing of the system is performed by the remaining processing units that are not in the halt state (even if there is only one). Is continued, and there is no instruction from the monitoring device.

一方、第４図上段に示すように、前記制御テーブル23
内の表に全処理装置がホルト状態にあることを前記監視
処理部21が認識した場合には、ステップSJ3に進み、前
記ハード・アクセス部22により、各処理装置のリセット
用レジスタに対して、第４図中段に示すような指示を与
え、当該ハード・アクセス部22は各処理装置に対してリ
セット指示信号の書込みを行うことになる。On the other hand, as shown in the upper part of FIG.
When the monitoring processing unit 21 recognizes that all the processing devices are in the halt state in the table in (2), the process proceeds to step SJ3, and the hard access unit 22 causes the reset register of each processing device to: An instruction as shown in the middle part of FIG. 4 is given, and the hard access unit 22 writes a reset instruction signal to each processing device.

その結果、第４図下段に示すように、各処理装置は立
ち上げられ、処理が続行され、前記制御テーブル23の表
は各処理装置のステータス・レジスタを反映してホルト
状態にはない旨の表示がなされていることになる。As a result, as shown in the lower part of FIG. 4, each processing device is started up, the processing is continued, and the table of the control table 23 reflects that the status register of each processing device is not in the halt state. The display has been made.

〔The invention's effect〕

以上説明したように、本発明では複数の各処理装置に
異常が発生して正常な処理を続行することができない場
合には、当該処理装置を休止状態とし、当該処理装置が
全部休止状態にあることを検出した場合には、当該全処
理装置に対してリセット信号を送出して立ち上げるよう
にしている。As described above, according to the present invention, when an abnormality occurs in a plurality of processing apparatuses and normal processing cannot be continued, the processing apparatuses are set to a halt state, and the processing apparatuses are all in a halt state. When this is detected, a reset signal is sent to all the processing devices to start them up.

したがって、全装置が休止状態となって、システムの
運用が全くされない事態を防止し、信頼性のあるシステ
ム運用の図ることができることになる。Therefore, it is possible to prevent a situation in which all the devices are in a halt state and the system is not operated at all, and a reliable system operation can be achieved.

[Brief description of the drawings]

第１図は本発明の原理ブロック図、第２図は実施例に係
る全体ブロック図、第３図は実施例に係る監視装置を示
す図、第４図は実施例に係る監視装置の機能説明図、第
５図は実施例に係る監視装置の処理流れ図及び第６図は
従来例に係るブロック図である。１（ｉ）,11（ｉ）;i＝1,2,〜ｎ……処理装置１（ｉ）ａ（11（ｉ）ａ）;i＝1,2,〜ｎ……休止命令発
生部（ホルト命令発生部）２（20）……全休止状態検出部（監視装置）３（20）……リセット信号送出部（監視装置）1 is a block diagram showing the principle of the present invention, FIG. 2 is an overall block diagram according to an embodiment, FIG. 3 is a diagram showing a monitoring device according to the embodiment, and FIG. 4 is a functional description of the monitoring device according to the embodiment. FIG. 5, FIG. 5 is a processing flowchart of the monitoring apparatus according to the embodiment, and FIG. 6 is a block diagram according to the conventional example. 1 (i), 11 (i); i = 1, 2,... N Processing unit 1 (i) a (11 (i) a); i = 1, 2,. Holt command generator 2) (20) ... all pause state detector (monitor) 3 (20) ... reset signal transmitter (monitor)

───────────────────────────────────────────────────── フロントページの続き (58)調査した分野(Int.Cl.⁶，ＤＢ名) G06F 11/14 G06F 11/16 - 11/20 G06F 15/16──────────────────────────────────────────────────続き Continued on the front page (58) Field surveyed (Int.Cl. ⁶ , DB name) G06F 11/14 G06F 11/16-11/20 G06F 15/16

Claims

(57) [Claims]

1. A plurality of processing units {1 (i), i = 1,2 to n}
An abnormal processing method for continuing normal processing with respect to an abnormality occurring in the processing device {1 (i)} belonging to the system having A sleep instruction generating unit {1 (i) a} for each of the processing units, and a sleep state detection unit (2) for determining whether all the processing units {1 (i)} are in a sleep state. A reset signal sending unit (3) for sending a reset signal to all of the processing devices when it is determined that all of the processing devices are in the idle state.