JP3070282B2

JP3070282B2 - Failure handling method

Info

Publication number: JP3070282B2
Application number: JP4218854A
Authority: JP
Inventors: 淳高橋
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1992-08-18
Filing date: 1992-08-18
Publication date: 2000-07-31
Anticipated expiration: 2015-07-31
Also published as: JPH0667916A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は複数プロセッサを有する
コンピュータ障害処理方式に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a computer fault handling system having a plurality of processors.

【０００２】[0002]

【従来の技術】従来のこの種のコンピュータシステムの
障害処理方式について図９および図１０を参照して説明
する。2. Description of the Related Art A conventional fault handling system for a computer system of this type will be described with reference to FIGS.

【０００３】図９は従来のコンピュータシステムの代表
的な構成例であり、図１０は従来のコンピュータシステ
ムにおけるプロセッサ９０３，９０４の障害処理の動作
を示すフローチャートである。FIG. 9 is a typical configuration example of a conventional computer system, and FIG. 10 is a flowchart showing an operation of a fault processing of the processors 903 and 904 in the conventional computer system.

【０００４】図９において９０１は診断処理装置、９０
３，９０４は共にプロセッサ、４０５はメモリ制御装
置、９０６は入出力処理装置、９０７は主記憶装置であ
る。診断処理装置９０１はコンピュータシステム内の各
装置９０３〜９０７との間に、診断制御専用の直接イン
タフェースとして診断インタフェース９０２を有し、コ
ンピュータシステム内の各装置の初期化処理、構成制御
処理、障害処理など様々な制御を一括して実行する。In FIG. 9, reference numeral 901 denotes a diagnostic processing device;
Reference numerals 3 and 904 denote processors, 405 denotes a memory control device, 906 denotes an input / output processing device, and 907 denotes a main storage device. The diagnostic processing device 901 has a diagnostic interface 902 as a direct interface dedicated to diagnostic control between each device 903 to 907 in the computer system, and performs initialization processing, configuration control processing, and fault processing of each device in the computer system. Various controls are executed collectively.

【０００５】診断インタフェース９０２は主にコンピュ
ータの各装置から診断処理装置９０１への障害通知機
能、診断処理装置９０１からのコンピュータの各装置の
内部情報の読み出し機能、コンピュータの各装置のフリ
ップフロップに格納されたデータのスキャンアウト機能
およびコンピュータの各装置に対する各種指示機能など
を有する。The diagnostic interface 902 mainly has a function of notifying a failure from each device of the computer to the diagnostic processing device 901, a function of reading internal information of each device of the computer from the diagnostic processing device 901, and storing the information in a flip-flop of each device of the computer. It has a function of scanning out the obtained data and various instructions for various devices of the computer.

【０００６】診断処理装置９０１はこの診断インタフェ
ース９０２を用いてコンピュータ内各装置の障害処理を
行うが、その代表的な制御手順を図１０のフローチャー
トを参照して説明する。図１０はプロセッサに障害が発
生した場合の該当プロセッサに対する障害処理の動作を
示す。The diagnostic processing unit 901 uses the diagnostic interface 902 to perform fault processing for each device in the computer. A typical control procedure will be described with reference to a flowchart of FIG. FIG. 10 shows an operation of a failure process for a corresponding processor when a failure occurs in the processor.

【０００７】先ず、診断処理装置９０１はプロセッサか
らの障害通知を受信すると（図１０のＳ１０１）と、該
当プロセッサの内容情報を診断インタフェース９０２を
介して採取する（Ｓ１０２）。次に、プロセッサ内の障
害状態をクリアし（Ｓ１０３）、障害発生時の命令から
プロセッサを再起動することによりプロセッサを復旧さ
せる（Ｓ１０４）。First, upon receiving a failure notification from a processor (S101 in FIG. 10), the diagnostic processing unit 901 collects the content information of the relevant processor via the diagnostic interface 902 (S102). Next, a fault state in the processor is cleared (S103), and the processor is restored by restarting the processor from the instruction at the time of the fault occurrence (S104).

【０００８】診断処理装置９０１は該当プロセッサの復
旧処理を実行後、プロセッサの正常動作を確認する（Ｓ
１０５）。正常であれば復旧処理は成功でありコンピュ
ータは動作を継続する（Ｓ１０６）。正常でない場合、
復旧処理は失敗であり、診断処理装置４０１は該当プロ
セッサの動作を停止させ（Ｓ１０７）、コンピュータシ
ステムの構成から切り離す（Ｓ１０８）。After executing the recovery processing of the processor, the diagnostic processing unit 901 confirms the normal operation of the processor (S
105). If it is normal, the recovery processing is successful, and the computer continues to operate (S106). If not,
The recovery processing has failed, and the diagnostic processing device 401 stops the operation of the corresponding processor (S107) and disconnects it from the configuration of the computer system (S108).

【０００９】次に、プロセッサ引継ぎの処理について説
明する。上記プロセッサ障害時の復旧処理が失敗した場
合で、正常なプロセッサが他に存在する場合、診断処理
装置９０１はプロセッサ引継ぎ可能として復旧に失敗し
たプロセッサで実行していた処理を他の正常なプロセッ
サに引き継がせる。その手順は、先ず正常なプロセッサ
に対して現在実行中の処理の中止を指示する。Next, the process of taking over the processor will be described. When the recovery process at the time of the processor failure has failed, and there is another normal processor, the diagnostic processing device 901 determines that the processor that has failed in the recovery can perform the process executed by the processor that has failed in recovery to another normal processor. Let it take over. The procedure first instructs a normal processor to stop the process currently being executed.

【００１０】処理が正常に中止されると、障害の発生し
たプロセッサの内部情報は復旧処理を実行する前に収集
されているので、次にその情報のうち処理引継ぎに必要
な内部情報を抽出して、これを正常なプロセッサに対し
て設定する。そしてこの正常なプロセッサに対して引き
継いだ処理の開始を指示する。このプロセッサ引継ぎ処
理により、１台のプロセッサに障害が発生した場合でも
そのプロセッサで実行されていた処理を他の正常なプロ
セッサに引き継ぐことができる。When the processing is stopped normally, since the internal information of the failed processor is collected before executing the recovery processing, the internal information necessary for taking over the processing is extracted from the information. And set this for a normal processor. Then, the normal processor is instructed to start the inherited processing. By this processor takeover processing, even when a failure occurs in one processor, the processing executed by that processor can be taken over by another normal processor.

【００１１】[0011]

【発明が解決しようとする課題】上述した従来の障害処
理方式では、１台の診断処理装置にて複数のプロセッサ
に対する障害処理が集中することになり、診断処理装置
でのソフトウエアが使用するメモリ上のエリアの大幅な
拡大や、制御の複雑化などソフトウエアの負荷が非常に
大きくなる。 In the above-described conventional fault processing system, fault processing for a plurality of processors is concentrated in one diagnostic processing apparatus, and a memory used by software in the diagnostic processing apparatus. The load on software, such as the large expansion of the upper area and the complicated control, becomes extremely large .

【課題を解決するための手段】本発明の方式は、複数の
プロセッサを有するコンピュータの障害処理を司る診断
処理装置における障害処理方式において、前記プロセッ
サからの障害通知により起動され、前記プロセッサの内
部情報の収集、前記プロセッサの復旧処理、および前記
診断処理装置への復旧処理結果の通知を、ファームウエ
アまたはソフトウエアの制御により一括して行うプロセ
ッサ制御手段と、該プロセッサ制御手段からの通知を受
信して前記プロセッサ制御手段内に格納された前記プロ
セッサの内部情報を取り出し、前記復旧処理の結果に基
づき前記コンピュータからの前記プロセッサの切り離
し、組み込みの構成変更制御を行う前記診断処理装置の
状態管理手段とを具備し、前記各プロセッサ制御手段
は、２台の前記プロセッサに対して設けられ、プロセッ
サの障害発生時にプロセッサの復旧処理が失敗した場合
は、他方の正常なプロセッサに対して実行中の処理の停
止を指示し、前記障害プロセッサから収集した内部情報
より処理継続に必要な情報を抽出して前記正常なプロセ
ッサ内に設定することにより、前記障害プロセッサで実
行していた処理を前記正常なプロセッサに引き継ぎ、該
プロセッサ制御手段で制御される前記２台のプロセッサ
に関しては前記診断処理装置の制御を介さずにプロセッ
サの引継ぎが可能なことを特徴とする。According to the present invention, there is provided a failure processing method for a diagnostic processing apparatus for managing a failure of a computer having a plurality of processors, the method being started by a failure notification from the processor and having internal information of the processor. collection, recovery processing of the processor, and the notification of the diagnosis processor restoration processing results to, and row Upu Rose <br/> Tsu controlled means collectively by the control of firmware or software, the processor Receiving the notification from the control means, extracting the internal information of the processor stored in the processor control means, disconnecting the processor from the computer based on the result of the restoration processing, and performing the embedded configuration change control Diagnostic processor status management means , each processor control means
Are provided for the two processors,
When processor recovery processing fails when a server failure occurs
Suspends the processing being executed for the other normal processor.
Internal information collected from the failed processor
Extract the information necessary to continue the processing, and
Configuration in the failed processor.
The normal processing is passed to the normal processor,
The two processors controlled by processor control means
With respect to the processor without the control of the diagnostic processing unit.
It is possible to take over the sa .

【００１２】[0012]

【実施例】以下に図面を参照して本発明の実施例を説明
する。Embodiments of the present invention will be described below with reference to the drawings.

【００１３】図１は本発明の第１の実施例のブロック図
である。図中、診断処理装置１はコンピュータシステム
内の各装置の障害処理を一括して司る。この診断処理装
置１はコンピュータシステム内に構成されるｎ台のプロ
セッサ４０〜４（ｎ−１）に対応して、ｎ台のプロセッ
サ制御手段３０〜３（ｎ−１）を備えている。このプロ
セッサ制御手段３０〜３（ｎ−１）は各対応するプロセ
ッサ４０〜４（ｎ−１）との間に診断処理専用の診断イ
ンタフェース８を持ち、本診断インタフェース８を介し
て対応するプロセッサ３０〜３（ｎ−１）の障害処理を
ファームウエアまたはソフトウエアの制御により専属し
て実行する。また状態管理手段９はプロセッサ制御手段
３０〜３（ｎ−１）にて行われたプロセッサの障害処理
の結果に基づき、コンピュータシステム内のプロセッサ
の構成制御を行う。尚説明を簡単にするためにメモリ制
御装置５、主記憶装置６、入出力処理装置７と、診断処
理装置１との間の診断インタフェースの記述は省略し
た。FIG. 1 is a block diagram of a first embodiment of the present invention. In the figure, a diagnosis processing device 1 collectively manages failure processing of each device in a computer system. The diagnostic processing apparatus 1 includes n processor control units 30 to 3 (n-1) corresponding to the n processors 40 to 4 (n-1) configured in the computer system. The processor control means 30 to 3 (n-1) has a diagnostic interface 8 dedicated to diagnostic processing between the corresponding processors 40 to 4 (n-1). ３3 (n-1) is executed exclusively under the control of firmware or software. The state management means 9 controls the configuration of the processors in the computer system based on the result of the processor failure processing performed by the processor control means 30 to 3 (n-1). For simplicity, the description of the diagnostic interface between the memory control device 5, the main storage device 6, the input / output processing device 7, and the diagnostic processing device 1 is omitted.

【００１４】図２は、図１に示した診断処理装置１とプ
ロセッサ制御手段３０〜３（ｎ−１）とプロセッサ４０
〜４（ｎ−１）の三者間のインタフェースの詳細および
プロセッサ制御装置３ｍ内の構成を示したものである。FIG. 2 shows the diagnostic processing apparatus 1, the processor control means 30 to 3 (n-1) and the processor 40 shown in FIG.
6 shows details of an interface between the three parties to (4) (n-1) and a configuration inside the processor control device 3m.

【００１５】診断インタフェース８における信号線２１
０はプロセッサ４ｍからの障害通知線であり、本信号が
論理“１”となることにより、プロセッサ制御手段３ｍ
内のファームウエアまたはソフトウエアにより実現され
る障害処理手段２０５が起動される。障害処理手段２０
５は診断インタフェース８を用いてプロセッサ４ｍ内の
情報の抜き出しやプロセッサ４ｍに対する各種制御を実
行する。なお抜き出した内部情報はメモリ２０６に一時
格納される。Signal line 21 in diagnostic interface 8
Reference numeral 0 denotes a fault notification line from the processor 4m. When this signal becomes logic "1", the processor control unit 3m
The fault processing means 205 realized by the firmware or software in it is activated. Failure handling means 20
Reference numeral 5 uses the diagnostic interface 8 to extract information from the processor 4m and execute various controls on the processor 4m. The extracted internal information is temporarily stored in the memory 206.

【００１６】一方、診断処理装置１へのプロセッサ４ｍ
の障害報告は、信号線２０７にて行われ、状態管理手段
９が起動される。状態管理手段９はプロセッサ制御イン
タフェース２を介してプロセッサ制御手段３ｍにおける
障害処理の結果を認識する。また同インタフェース２を
介してプロセッサ４ｍの内部情報が格納されているメモ
リ２０６のデータを読み出すことができる。On the other hand, the processor 4 m
Is made on the signal line 207, and the state management means 9 is activated. The state management means 9 recognizes the result of the failure processing in the processor control means 3m via the processor control interface 2. Further, data of the memory 206 in which the internal information of the processor 4m is stored can be read through the interface 2.

【００１７】以下、本発明によるプロセッサ装置４０〜
４（ｎ−１）の障害処理の流れについて、図３および図
４のフローチャートを参照して説明する。Hereinafter, the processor devices 40 to 40 according to the present invention will be described.
4 (n-1) will be described with reference to the flowcharts of FIGS. 3 and 4.

【００１８】図３はプロセッサ制御手段３ｍにおけるプ
ロセッサ４ｍの障害処理の制御の流れを示したものであ
る。まず、プロセッサ４ｍに障害が発生すると、プロセ
ッサ４ｍからの障害通知によりプロセッサ制御手段３ｍ
に対して障害が通知される。プロセッサ制御手段３ｍは
この障害通知を受信してプロセッサの障害処理手段２０
５を起動する（図３のＳ３１）。FIG. 3 shows a flow of control of the failure processing of the processor 4m in the processor control means 3m. First, when a failure occurs in the processor 4m, the processor control unit 3m receives a failure notification from the processor 4m.
Is notified of the failure. The processor control means 3m receives this failure notification and receives the failure notification means of the processor.
5 is started (S31 in FIG. 3).

【００１９】障害処理手段２０５では、先ずプロセッサ
４ｍの障害時の内部情報を診断インタフェース８を介し
て抜き出し一時メモリ２０６に格納する（Ｓ３２）。つ
ぎに、診断インタフェース８を介したクリア指示により
プロセッサ４ｍの障害状態をクリアする（Ｓ３３）。そ
して、抜き出した内部情報を基に、プロセッサ４ｍに発
生した障害が復旧可能であるか否かを判断し（Ｓ３
４）、復旧可能の場合は（Ｓ３５）は復旧処理を実行す
る（Ｓ３６）。代表的な復旧処理としては、ソフトウエ
ア命令の再実行がある。この復旧処理にてプロセッサ４
ｍが正常に動作した場合は復旧成功とする（Ｓ３７）。
また復旧処理を行ってもプロセッサ４ｍが正常に動作し
なかった場合、または復旧処理不可能と判定された場合
は復旧失敗とする（Ｓ３７）。In the fault processing means 205, first, the internal information at the time of the fault of the processor 4m is extracted via the diagnostic interface 8 and stored in the temporary memory 206 (S32). Next, the failure state of the processor 4m is cleared by a clear instruction via the diagnostic interface 8 (S33). Then, based on the extracted internal information, it is determined whether or not the failure occurring in the processor 4m can be recovered (S3).
4) If recovery is possible (S35), a recovery process is executed (S36). Typical restoration processing includes re-execution of a software instruction. In this recovery process, processor 4
If m operates normally, the recovery is determined to be successful (S37).
When the processor 4m does not operate normally even after performing the recovery processing, or when it is determined that the recovery processing cannot be performed, it is determined that the recovery has failed (S37).

【００２０】障害処理実行後、プロセッサ制御手段３ｍ
は診断処理装置１に対して信号線２０７を介してプロセ
ッサ４ｍに障害あったことを通知する（Ｓ３８，Ｓ３
９）。以上がプロセッサ制御手段３ｍの制御である。After executing the fault processing, the processor control means 3m
Notifies the diagnostic processing device 1 via the signal line 207 that the processor 4m has failed (S38, S3).
9). The above is the control of the processor control unit 3m.

【００２１】次に、診断処理装置１における制御を図４
を参照して説明する。図４はプロセッサ制御手段３ｍか
らの障害処理の結果を受信した診断処理装置１のプロセ
ッサ状態管理手段９における制御を示したものである。Next, the control in the diagnostic processing apparatus 1 is shown in FIG.
This will be described with reference to FIG. FIG. 4 shows the control by the processor status management means 9 of the diagnostic processing device 1 which has received the result of the fault processing from the processor control means 3m.

【００２２】状態管理手段９は、障害報告を信号線２０
７経由で受信する（図４のＳ４１）とプロセッサ制御手
段３ｍよりプロセッサ制御インタフェース２を介して障
害処理の結果を読みだし、復旧処理の成功／失敗を判定
する（Ｓ４２）。成功の場合はコンピュータシステムの
動作をそのまま続行させる（Ｓ４３）。失敗の場合は該
当プロセッサ４ｍを停止させ（Ｓ４４）、コンピュータ
システムの構成から切り離す（Ｓ４５）。そして、次に
復旧処理の成功／失敗に拘らずプロセッサ制御手段３ｍ
のメモリ２０６に格納されたプロセッサ４ｍの障害発生
時の内部情報をプロセッサ制御インタフェース２を介し
て読み出す。The state management means 9 sends a failure report to the signal line 20.
7 (S41 in FIG. 4), the result of the failure processing is read out from the processor control means 3m via the processor control interface 2, and the success / failure of the recovery processing is determined (S42). If successful, the operation of the computer system is continued as it is (S43). In the case of failure, the corresponding processor 4m is stopped (S44) and disconnected from the configuration of the computer system (S45). Then, regardless of the success / failure of the recovery processing, the processor control unit 3m
Of the processor 4m stored in the memory 206 is read out via the processor control interface 2.

【００２３】以上説明した通り、プロセッサ４０〜４
（ｎ−１）に障害が発生した場合、その障害処理は該当
プロセッサ専属のプロセッサ制御手段３０〜３（ｎ−
１）にて一括しておこなわれ、診断処理装置１は障害処
理結果の認識と、障害情報の収集および障害処理結果に
基づいたコンピュータシステムの構成制御を行うだけで
よく、プロセッサ台数が増加した場合の診断処理装置１
に対する処理集中の負荷を軽減できる。As described above, the processors 40 to 4
When a fault occurs in (n-1), the fault processing is performed by the processor control means 30 to 3 (n-
The diagnostic processing device 1 only needs to recognize the failure processing result, collect the failure information, and control the configuration of the computer system based on the failure processing result. Diagnostic processing device 1
Can reduce the load of processing concentration.

【００２４】図５は本発明の第２の実施例のブロック図
である。図中、５０〜５（ｎ−１）は２台のプロセッサ
の障害処理を司るプロセッサ制御手段である。このプロ
セッサ制御手段５０〜５（ｎ−１）以外の構成要素は、
図１に示した第１の実施例における構成要素と同一であ
る。FIG. 5 is a block diagram of a second embodiment of the present invention. In the figure, reference numerals 50 to 5 (n-1) denote processor control means for handling failure processing of two processors. The components other than the processor control means 50 to 5 (n-1) are as follows.
The components are the same as those in the first embodiment shown in FIG.

【００２５】また、図６はプロセッサ制御手段５０〜５
（ｎ−１）のうちの１台５ｍとプロセッサ４（２ｍ−
２）〜４（２ｍ−１）および診断処理装置１の三者間の
インタフェースの詳細およびプロセッサ制御手段５ｍ内
の構成を示したものである。FIG. 6 shows processor control means 50 to 5.
(N-1) 1 single 5 m and the processor 4 of the (2M-
2) to 4 (2m-1) and the details of the interface between the three parties of the diagnostic processing apparatus 1 and the configuration within the processor control means 5m.

【００２６】プロセッサ制御手段５ｍは２台のプロセッ
サ４（２ｍ−２）と４（２ｍ−１）の障害処理を実行す
ることができ、その処理手順は以下の通りである。The processor control means 5m can execute fault processing for the two processors 4 (2m-2) and 4 (2m-1), and the processing procedure is as follows.

【００２７】診断インタフェース８における信号線２１
０はプロセッサ４（２ｍ−１）からの障害通知線であ
り、本信号が論理“１”となることによりプロセッサ制
御手段５ｍ内の障害処理手段６００が起動される。障害
処理手段６００は診断インタフェース８を用いてプロセ
ッサ４（２ｍ−１）の内部情報の抜き出しやプロセッサ
４（２ｍ−１）に対する各種制御を実行する。なお抜き
出した内部情報はメモリ２０６に一時格納される。Signal line 21 in diagnostic interface 8
Reference numeral 0 denotes a failure notification line from the processor 4 (2m-1). When this signal becomes logic "1", the failure processing unit 600 in the processor control unit 5m is activated. The fault processing means 600 uses the diagnostic interface 8 to extract internal information of the processor 4 (2m-1) and execute various controls on the processor 4 (2m-1). The extracted internal information is temporarily stored in the memory 206.

【００２８】次に図７のフローチャートを参照してプロ
セッサ制御手段５ｍにおけるプロセッサ引継処理を説明
する。障害処理手段６００はプロセッサ４（２ｍ−１）
の復旧処理を実行するが、これが失敗した場合で、しか
もプロセッサ制御手段５ｍ配下の他のプロセッサ４（２
ｍ−２）が継続されており正常な場合は、障害処理手段
６００はプロセッサ引継処理が可能であると判断して、
プロセッサ引継手段６０１を起動する。プロセッサ引継
手段６０１は、まずプロセッサ制御手段５ｍに接続され
たもう一方の正常なプロセッサ４（２ｍ−２）に対して
現在実行中の処理の停止を指示する。Next, the processor takeover processing in the processor control means 5m will be described with reference to the flowchart of FIG. The fault processing means 600 is the processor 4 (2m-1)
Of the other processor 4 (2) under the control of the processor control unit 5m.
If m-2) is continued and normal, the failure processing means 600 determines that the processor takeover processing is possible, and
Activate the processor takeover means 601. The processor takeover unit 601 first instructs the other normal processor 4 (2m-2) connected to the processor control unit 5m to stop the currently executing process.

【００２９】次に、復旧処理前に障害プロセッサから採
取しモリ２０６に格納されている障害プロセッサの内部
情報より、処理引継ぎに必要なソフトウエア見えの情報
を抽出し、これを正常なプロセッサ４（２ｍ−１）の内
部に設定する。そしてこの正常なプロセッサ４（２ｍ−
１）に対して処理実行の開始を指示する。このようにし
て、診断処理装置１の制御を介さずに、プロセッサ制御
手段５ｍ配下の２台のプロセッサ４（２ｍ−１），４
（２ｍ−２）の間でのプロセッサ引継処理を行い、診断
処理装置１にはその障害からの復旧処理の正否と、復旧
処理に失敗した場合のプロセッサ引継処理の実行結果の
みを報告する。Next, from the internal information of the faulty processor collected from the faulty processor before the recovery processing and stored in the memory 206, information of software appearance necessary for taking over the processing is extracted, and this is extracted as the normal processor 4 ( 2m-1). And this normal processor 4 (2m-
Instruct 1) to start processing. In this manner, the two processors 4 (2m-1) and 4 (2m-1) under the processor control means 5m are not controlled by the control of the diagnostic processing apparatus 1.
The processor takeover process is performed between (2m−2), and only the success or failure of the recovery process from the failure and the execution result of the processor takeover process when the recovery process fails are reported to the diagnostic processing device 1.

【００３０】本実施例によれば、２台のプロセッサ間で
のプロセッサ引継処理を診断処理装置の処理を介さずに
実行可能とすることで、コンピュータシステム内のプロ
セッサ台数を増やした場合でも、プロセッサ引継の処理
を直接診断処理装置が行う必要がなく、プロセッサ台数
の増加による診断処理装置に対する処理集中による負荷
を軽減することができるという効果がある。According to the present embodiment, the processor takeover process between two processors can be executed without going through the processing of the diagnostic processing device, so that even if the number of processors in the computer system is increased, There is no need for the diagnostic processing device to directly carry out the handover process, and the effect of reducing the load due to the concentration of processing on the diagnostic processing device due to the increase in the number of processors can be obtained.

【００３１】図８は本発明の第３の実施例を示すブロッ
ク図である。図中８０１は診断処理装置、８０２０〜８
０２（ｎ−１）はプロセッサ制御手段である。８０３０
〜８０３（ｎ−１）はプロセッサの復旧処理及びプロセ
ッサ引継処理の結果を保持する障害処理結果保持手段で
あり、診断処理装置８０１におけるデータ転送制御手段
８１０からの歩進指示信号８１２が有効な間、格納する
データを１ビットずつ順次直列にシフトするフリップフ
ロップから構成される。FIG. 8 is a block diagram showing a third embodiment of the present invention. In the figure, reference numeral 801 denotes a diagnostic processing device,
02 (n-1) is processor control means. 8030
Reference numeral 803 (n-1) denotes a failure processing result holding unit that holds the results of the processor recovery process and the processor takeover process, while the failure instruction signal 812 from the data transfer control unit 810 in the diagnostic processing device 801 is valid. , And flip-flops that sequentially shift the data to be stored one bit at a time in series.

【００３２】歩進指示信号８１２は全てのプロセッサ制
御手段８０２０〜８０２（ｎ−１）に対して分配されて
おり、本信号が論理“１”の間、全てのプロセッサ制御
手段８０２０〜８０２（ｎ−１）内の障害処理結果保持
手段８０３０〜８０３（ｎ−１）が１ビットずつ歩進し
て直列インタフェース８１１を介して診断処理装置８０
１に向けて１ビットずつ順次転送される。例えば、プロ
セッサ制御手段８０２（ｎ−１）内のデータは直列イン
タフェース８１１を介してプロセッサ制御手段８０２
（ｎ−１）→８０２（ｎ−２）→・・・→８０２１→８
０２０→診断処理装置８０１と転送される。一方、診断
処理装置８０１はこれをシフトレジスタ８０４（ｎ−
１）→８０４（ｎ−２）→・・・→８０４１→８０４０
と１ビットずつシフトして順次受信する。The step instruction signal 812 is distributed to all the processor control means 8020 to 802 (n-1), and while this signal is at logic "1", all the processor control means 8020 to 802 (n The failure processing result holding means 8030 to 803 (n-1) in -1) step by one bit and execute the diagnostic processing device 80 via the serial interface 811.
The bits are sequentially transferred one by one toward 1. For example, the data in the processor control means 802 (n-1) is transmitted via the serial interface 811 to the processor control means 802 (n-1).
(N-1) → 802 (n-2) → ... → 8021 → 8
020 → transferred to the diagnosis processing device 801. On the other hand, the diagnostic processing device 801 converts this into the shift register 804 (n-
1) → 804 (n−2) → ・・・ → 8041 → 8040
And is sequentially received one bit after another.

【００３３】ここで、状態管理手段８０８はコンピュー
タシステム内に構成されるプロセッサの構成状態を構成
情報制御手段８０９０〜８０９（ｎ−１）に保持してい
る。例えば一つのプロセッサがコンピュータシステムに
構成されていた場合は、そのプロセッサに対応する構成
情報制御手段８０９１の値が論理“１”となる。構成情
報制御手段８０９１の出力が論理“１”の場合、切り換
え回路８０７１はシフトレジスタ８０４１からの出力を
選択する。この制御により構成されている全プロセッサ
に対応したデータが各プロセッサに対応したシフトレジ
スタ８０４ｍに格納される。Here, the state management means 808 holds the configuration states of the processors configured in the computer system in the configuration information control means 8090 to 809 (n-1). For example, if one processor is configured in a computer system, the configuration corresponding to that processor
The value of the information control means 8091 becomes logic "1". Configuration information
When the output of the information control means 8091 is logic "1", the switching circuit 8071 selects the output from the shift register 8041. Data corresponding to all processors configured by this control is stored in the shift register 804m corresponding to each processor.

【００３４】データ転送制御手段８１０は全データ転送
が終了すると歩進指示信号８１２を論理“０”とする。
このタイミングで構成情報制御手段８０９０〜８０９
（ｎ−１）のうち出力が論理“１”であるプロセッサに
対応したシフトレジスタ８０４の内容が、ＡＮＤゲート
８０６出力のセット信号により、データレジスタ８０５
ｍに格納される。Data transfer control means 810 sets stepping instruction signal 812 to logic "0" when all data transfer is completed.
At this timing, the configuration information control means 8090 to 809
The contents of the shift register 804 corresponding to the processor whose output is logic “1” in (n−1) are converted into the data register 805 by the set signal of the output of the AND gate 806.
m.

【００３５】状態管理手段８０８はこのデータレジスタ
８０５の内容を取り込み、どのプロセッサに障害が発生
し、それに伴うプロセッサ制御手段での障害処理の結果
を認識しコンピュータシステムとしての状態管理制御を
実行する。The state management means 808 fetches the contents of the data register 805, recognizes which processor has a failure, recognizes the result of the failure processing by the processor control means, and executes the state management control as a computer system.

【００３６】本実施例によれば、プロセッサ制御手段内
の各種情報を全プロセッサ制御手段を直列に接続して１
ビットずつシリアルに抜き出す直列インタフェースを設
けることにより、プロセッサ台数の増加による診断処理
装置に対するインタフェース・ケーブル接続の集中を避
けることができる、診断処理装置を小型化できるという
効果がある。According to the present embodiment, various kinds of information in the processor control means are stored in one processor by connecting all the processor control means in series.
Providing a serial interface that serially extracts bits one by one has the effect of avoiding concentration of interface cable connection to the diagnostic processing device due to an increase in the number of processors, and of reducing the size of the diagnostic processing device.

【００３７】[0037]

【発明の効果】以上説明した通り、本発明の障害処理方
式によれば、プロセッサ装置の障害処理を一括して実行
し診断処理装置にはその障害処理の結果をのみ通知する
プロセッサ制御手段をプロセッサ単位に設けることで、
コンピュータシステム内のプロセッサ台数を増やした場
合でも、プロセッサの障害発生時にプロセッサ制御手段
により採取されたプロセッサの内部情報をプロセッサ制
御手段より読み出す処理と、システム内の複数のプロセ
ッサの構成状態の管理を行うだけでよく、プロセッサ台
数の増加による診断処理装置に対する処理集中による負
荷を軽減することができるという効果がある。As described above, according to the fault handling system of the present invention, the processor control means for executing fault handling of the processor unit at a time and notifying the diagnostic processing device only of the result of the fault handling is provided by the processor. By providing in units,
Even when the number of processors in the computer system is increased, the processor reads out the internal information of the processor collected by the processor control unit from the processor control unit when a failure occurs in the processor, and manages the configuration state of the plurality of processors in the system. Only, it is possible to reduce the load due to processing concentration on the diagnostic processing device due to the increase in the number of processors.

[Brief description of the drawings]

【図１】本発明の第１の実施例のブロック図である。FIG. 1 is a block diagram of a first embodiment of the present invention.

【図２】第１の実施例の詳細ブロック図である。FIG. 2 is a detailed block diagram of the first embodiment.

【図３】第１の実施例におけるプロセッサ制御手段によ
る障害処理のフローチャートである。FIG. 3 is a flowchart of a failure process by a processor control unit according to the first embodiment.

【図４】第１の実施例における診断処理装置における障
害処理のフローチャートである。FIG. 4 is a flowchart of a failure process in the diagnostic processing device according to the first embodiment.

【図５】本発明の第２の実施例のブロック図である。FIG. 5 is a block diagram of a second embodiment of the present invention.

【図６】第２の実施例の詳細ブロック図である。FIG. 6 is a detailed block diagram of a second embodiment.

【図７】第２の実施例におけるプロセッサ制御手段によ
るプロセッサ引継処理のフローチャートである。FIG. 7 is a flowchart of a processor takeover process by a processor control unit according to the second embodiment.

【図８】本発明の第３の実施例のブロック図である。FIG. 8 is a block diagram of a third embodiment of the present invention.

【図９】従来の一例を示すブロック図である。FIG. 9 is a block diagram showing an example of the related art.

【図１０】従来例におけるプロセッサによる障害処理の
フローチャートである。FIG. 10 is a flowchart of a failure process by a processor in a conventional example.

[Explanation of symbols]

１，８０１，９０１診断処理装置２プロセッサ制御インタフェース３０〜３（ｎ−１），３ｍ，５０〜５（２ｎ−１），５
ｍ，８０２０〜８０２（ｎ−１）プロセッサ制御手
段４０〜４（２ｎ−１），４ｍ，４（２ｍ−２），４（２
ｍ−１），９０３，９０４プロセッサ５，９０５メモリ制御装置６，９０７主記憶装置７，９０６入出力処理装置８，９０２診断インタフェース９，８０８状態管理手段２０５，６００障害処理手段２０６メモリ６０１プロセッサ引継ぎ手段８０３０〜８０３（ｎ−１）障害処理結果保持手段８０４０〜８０４（ｎ−１）シフトレジスタ８０５０〜８０５（ｎ−１）データレジスタ８０６０〜８０６（ｎ−１）ＡＮＤゲート８０７１〜８０７（ｎ−１）切り換え回路８０９０〜８０９（ｎ−１）構成情報制御手段８１０データ転送制御手段８１１直列インタフェース1,801,901 Diagnosis processing device 2 Processor control interface 30-3 (n-1), 3m, 50-5 (2n-1), 5
m, 8020-802 (n-1) Processor control means 40-4 (2n-1), 4m, 4 (2m-2), 4 (2
m-1) , 903,904 processor 5,905 memory control device 6,907 main storage device 7,906 input / output processing device 8,902 diagnostic interface 9,808 state management means 205,600 failure processing means 206 memory 601 processor Takeover means 8030 to 803 (n-1) Failure processing result holding means 8040 to 804 (n-1) Shift register 8050 to 805 (n-1) Data register 8060 to 806 (n-1) AND gate 8071 to 807 (n -1) Switching circuits 8090 to 809 (n-1) Configuration information control means 810 Data transfer control means 811 Serial interface

Claims

(57) [Claims]

1. A computer having a plurality of processors.
The fault handling method used in the diagnostic processing device
And activated by a failure notification from the processor,
Collection of internal information of the processor, recovery of the processor
Old processing and communication of restoration processing results to the diagnostic processing unit
Knowledge is controlled by firmware or software
Processor control means for performing collective processing and processor control
Receiving the notification from the means and storing it in the processor control means.
Fetching the stored internal information of the processor,
The program from the computer based on the result of the restoration process
Before disconnecting the processor and performing built-in configuration change control
A state management unit of the diagnosis processing device, wherein each of the processor control units is provided for two of the processors, and when a processor recovery process fails when a processor failure occurs, the other normal processor By instructing the processor to stop the process being executed, extracting information necessary for processing continuation from the internal information collected from the failed processor and setting the information in the normal processor, the execution by the failed processor is executed. The normal processing is transferred to the normal processor, and the processor 2 is controlled by the processor control means.
Base of the diagnosis processor takeover not through the control of the processing unit is disabled you wherein the possible <br/> harm processing method with respect to the processor.

2. A failure processing method in a diagnostic processing device for performing failure processing of a computer having a plurality of processors, the diagnostic processing apparatus being activated by a failure notification from the processor,
A processor control unit that collectively collects internal information of the processor, performs a restoration process of the processor, and notifies a result of the restoration process to the diagnostic processing device under the control of firmware or software; and The diagnostic processing apparatus that receives the notification of the above, extracts the internal information of the processor stored in the processor control unit, disconnects the processor from the computer based on the result of the recovery processing, and controls the configuration change of the built-in processor State management means, the processor control means, the processor failure recovery processing result,
Holds various information such as a processor takeover processing result, and holds a failure processing result composed of serially connected flip-flops that sequentially shifts stored data by one bit at a time according to a step instruction from the diagnostic processing device. Means, the plurality of failure processing result holding means are connected in series,
A serial interface unit for sequentially shifting the contents of all the fault processing result holding means by one bit and transferring the serial data to the diagnostic processing device in series while the step instruction from the diagnostic processing device is valid; The state management means holds the configuration information of the processor, and the diagnostic processing device sequentially receives the failure processing result information transferred by the serial interface means one bit at a time while the step instruction is valid. Te, the data receiving means of the diagnostic processing device capturing the contents of the fault processing result holding means corresponding to said processor configured in a computer system based on the processor configuration information managed by said state managing means, the total While the contents of the failure processing result holding means are serially transferred one bit at a time, the stepping instruction is valid, and the stepping instruction is output periodically. To include a data transfer control unit by the series interface means, the total processor control unit of the fault processing result failure processing method, wherein the periodic recognizable to.