JPH0498329A

JPH0498329A - Fault recovery system

Info

Publication number: JPH0498329A
Application number: JP2212394A
Authority: JP
Inventors: Shinichi Nagoya; 名児耶　真一
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1990-08-10
Filing date: 1990-08-10
Publication date: 1992-03-31
Anticipated expiration: 2013-04-22
Also published as: JP2743562B2

Abstract

PURPOSE:To eliminate the destruction of fault information and an adverse influence on the other units by providing a field storing information obtained by means of encoding a suspected unit for a part of the field of a microinstruction. CONSTITUTION:The suspected units 1-2 to 1-4 can be specified to a certain degree in correspondence with the content of a microprogram in the suspected field 2-5 of the microinstruction. Thus, data is transferred between the diagnosed units 1-2 to 1-4. In the case of the microprogram in the diagnosed unit 1-2, the codes for indicating the diagnosed unit 1-3 as the first suspected unit and the diagnosed unit 1-2 as the second suspected unit can be described at the time of design in the suspected unit code field 2-5 of respective steps in a processing routine executing the processing based on data received from the diagnosed unit 1-3. Thus, the destruction of information to be extracted and the adverse influence on the other units which normally operate can be prevented.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は情報処理装置等に適用される障害処理方式に関
する。DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention relates to a fault handling method applied to information processing devices and the like.

[Conventional technology]

従来、この種の障害処理方式には、被診断ユニットで障
害検出時に、障害が発、生したことを示すフリップフロ
ップをセットするとともに、マイクロプログラムの実行
を停止し、診断制御ユニットによりフリップフロップの
値とマイクロプログラムの停止アドレスを採取し解析す
ることにより、障害の起因となる被疑ユニットを指摘す
る方式がある。Conventionally, in this type of fault handling method, when a fault is detected in the unit to be diagnosed, a flip-flop is set to indicate that a fault has occurred, the execution of the microprogram is stopped, and the diagnostic control unit sets the flip-flop to indicate that a fault has occurred. There is a method of pointing out the suspected unit causing the failure by collecting and analyzing the value and the stop address of the microprogram.

また、障害発生時に、特定のマイクロプログラムにより
、被疑ユニットをフード化し、特定のしジスタにセット
し、診断制御ユニットによりフリップフロップの値と特
定のレジスタの値を採取し解析する方式も知られている
。Another known method is to use a specific microprogram to hood the suspect unit and set it in a specific register when a failure occurs, and then use the diagnostic control unit to collect and analyze the flip-flop values and specific register values. There is.

[Problem to be solved by the invention]

上述した従来の障害処理方式のうち、障害の発生したフ
リップフロップの値とマイクロプログラムの停止アドレ
スにより解析する方式は、マイクロプログラムに変更が
生じた場合に、一般にマイクロプログラムのアドレスが
変更されてしまう為に、診断制御ユニットの障害解析プ
ログラムもそれに応じて変更しなければならないという
欠点がある。Among the conventional fault handling methods mentioned above, the method that analyzes the value of the faulty flip-flop and the stop address of the microprogram generally results in the address of the microprogram being changed when the microprogram is changed. Therefore, there is a drawback that the fault analysis program of the diagnostic control unit must be changed accordingly.

また、障害の発生したフリップフロップの値と特定のレ
ジスタの値により解析する方式は、障害を検出したユニ
ットのマイクロプログラムが被疑ユニットをコード化し
、この情報を特定のレジスタにセットする為に数ステッ
プ動作する為、特に障害を検出したユニット自身が障害
の起因となっている場合には、正常動作が保証されない
為、特定のレジスタにセットされた内容の信頼度が低く
なるのみならず、他の採取したい応報も破壊されかねな
いという欠点がある。さらに最悪の場合には、他の正常
に動作しているユニットにも悪影響を及ぼす可能性があ
るという欠点がある。In addition, in the method of analysis based on the value of the faulty flip-flop and the value of a specific register, the microprogram of the unit that detected the fault encodes the suspect unit and takes several steps to set this information in a specific register. Because normal operation is not guaranteed, especially if the unit that detected the fault is the cause of the fault, not only will the reliability of the contents set in a specific register become low, but also the The drawback is that the retribution you want to collect may also be destroyed. Furthermore, in the worst case, there is a possibility that other normally operating units may also be adversely affected.

[Means to solve the problem]

本発明の方式は、水平型マイクロプログラムの制御によ
り動作する複数の被診断ユニットと、前記被診断ユニッ
トで障害検出時に障害情報を採取し解析することにより
、障害の起因となる被疑ユニットの指摘を行う診断制御
ユニットより構成される情報処理システムにおける障害
処理方式において、前記被診断ユニットに障害検出時に実行中のマイクロプ
ログラムのステップを保持する手段を設け、水平型マイクロ命令のフィールドの一部に、前記被疑ユ
ニ７）をコード化した情報を格納するフィールドを有し
、また前記診断制御ユニットに、被診断ユニット内の前記
障害検出時のマイクロプログラムのステップの前記フィ
ールドを採取する手段を設け、障害解析時に、前記フィ
ールドの内容を解析することにより障害の起因となる被
疑ユニットを指摘することを特徴とする。The system of the present invention uses a plurality of units to be diagnosed that operate under the control of a horizontal microprogram, and collects and analyzes failure information when a failure is detected by the units to be diagnosed, thereby identifying a suspect unit that is the cause of the failure. In a fault handling method for an information processing system comprising a diagnostic control unit, the unit to be diagnosed is provided with means for retaining steps of a microprogram being executed at the time of detection of a fault, and a part of a field of a horizontal microinstruction includes: The diagnostic control unit has a field for storing coded information of the suspected unit 7), and the diagnostic control unit is provided with means for collecting the field of the microprogram step at the time of detecting the fault in the diagnosed unit, At the time of analysis, the suspect unit causing the failure is pointed out by analyzing the contents of the field.

〔Example〕

次に、図面を参照しながら本発明の一実施例について説
明する。Next, an embodiment of the present invention will be described with reference to the drawings.

第２図は本発明が適用されるシステム構成図である。FIG. 2 is a system configuration diagram to which the present invention is applied.

第２図において、１−１は診断制御ユニット、１−２〜
４は水平型マイクロプログラムの制御により動作する被
診断ユニット、１−５は主記憶ユニットであり、被診断
ユニット１−２〜１−４及ヒ主記憶ユニット１−５はシ
ステムバス１−６により互いに接続されて、被診断ユニ
ット１−２〜１−４と主記憶ユニット１−５問および被
診断ユニット１−２〜１−４と相互間でデータの授受を
行う。また、１−７は診断バスであり診断制御ユニット
１−１．被診断ユニット１−２〜１−４および主記憶ユ
ニット１−５が接続され障害発生時の診断制御ユニｙ）
１　１による障害情報の採取に使用される。In FIG. 2, 1-1 is a diagnostic control unit, 1-2 to
4 is a unit to be diagnosed that operates under the control of a horizontal microprogram, 1-5 is a main memory unit, and the units to be diagnosed 1-2 to 1-4 and the main memory unit 1-5 are connected to each other by a system bus 1-6. They are connected to each other and exchange data between the units to be diagnosed 1-2 to 1-4, the main memory unit 1-5, and the units to be diagnosed 1-2 to 1-4. Further, 1-7 is a diagnostic bus, and the diagnostic control unit 1-1. Diagnosis control unit when a failure occurs when the units to be diagnosed 1-2 to 1-4 and the main memory unit 1-5 are connected
1 Used to collect failure information by 1.

第１図は任意の被診断ユニット１−２〜１−４内の水平
型マイクロ命令の形式の一例である。１ステツプのマイ
クロ命令は３６ビノトで構成されており、２−１〜３は
被診断ユニ７）１　２〜１−４内の複数のサブユニット
を個別に制御する制御フィールドであり、２−４は次の
マイクロ命令のアドレスを示す次マイクロ命令アドレス
フィールドである。また、２−５は本発明の特徴となる
ところの被疑ユニットフードを示す被疑ユニットコード
フィールドである。FIG. 1 is an example of the format of horizontal microinstructions in any of the units to be diagnosed 1-2 to 1-4. One step of microinstruction consists of 36 bits, 2-1 to 3 are control fields that individually control multiple subunits in the unit to be diagnosed 7) 12 to 1-4, and 2-4 is the next microinstruction address field indicating the address of the next microinstruction. Further, 2-5 is a suspected unit code field indicating the suspected unit hood, which is a feature of the present invention.

第３図は、被診断ユニット１−２〜１−４のうち、障害
検出時のマイクロプログラムのステップをホールドする
回路例を示し、３−１はマイクロ命令レジスタ、３−２
〜３−４はエラー検出信号、３−５〜３−７はエラーレ
ジスタ、３−８はＮＯＲゲート、３−９はクロック入力
信号、３−１０はＮＡＮＤゲート、３−１１はクロック
信号である。マイクロ命令レジスタ３−１およびエラ−
レジスタ３−５〜３−７は各クロック信号３−１１によ
り制御されている。FIG. 3 shows an example of a circuit for holding microprogram steps when a fault is detected among the units to be diagnosed 1-2 to 1-4, where 3-1 is a microinstruction register;
~3-4 is an error detection signal, 3-5 to 3-7 are error registers, 3-8 is a NOR gate, 3-9 is a clock input signal, 3-10 is a NAND gate, and 3-11 is a clock signal. . Microinstruction register 3-1 and error
Registers 3-5 to 3-7 are controlled by each clock signal 3-11.

被診断ユニット１−２〜１−４で障害が検出されると、
エラー検出信号３−２〜３−４のいずれかが論理“１”
となり、エラーレジスタ３−５〜３−７のいずれかが論
理“１”がセットされる。When a fault is detected in the units to be diagnosed 1-2 to 1-4,
Any of error detection signals 3-2 to 3-4 is logic “1”
Therefore, logic "1" is set in any of the error registers 3-5 to 3-7.

エラーレジスタ３−５〜３−７の出力はＮＯＲゲート３
−８に印加される為、ＮＯＲゲート３−８の出力は論理
“Ｏｔ＋となることによりＮＡＮＤゲート３−１０の出
力、即ちクロック信号３−１１はクロック入力信号３−
９の値によらずに論理“１”となり、マイクロ命令レジ
スタ３−１、エラーレジスタ３−５〜３−７の動作は停
止し、内部の値はホールドされる。The outputs of error registers 3-5 to 3-7 are output to NOR gate 3.
-8, the output of the NOR gate 3-8 becomes logic "Ot+", and the output of the NAND gate 3-10, that is, the clock signal 3-11, becomes the clock input signal 3-8.
The logic becomes "1" regardless of the value of 9, the operations of the microinstruction register 3-1 and error registers 3-5 to 3-7 are stopped, and the internal values are held.

次に、第１図の被疑ユニットコードフィールド２−５に
被疑ユニットコードの記述が可能であることの根拠をそ
の記述例について説明する。Next, the reason why a suspected unit code can be written in the suspected unit code field 2-5 in FIG. 1 will be explained with reference to a description example.

一般に、被診断ユニット１−２〜１−４で障害が検出さ
れた場合に、障害を検出したことを示すエラーレジスタ
３−５〜３−７の値が残されていれば、ある程度、被疑
ユニット１−２〜１−４の指摘は可能となるが、例えば
、被診断ユニット１−２〜１−４間、または被診断ユニ
ット１−２〜１−４と主記憶ユニット１−５間でデータ
の授受に於けるタイムアウトエラーを検出した場合は、
どのユニットとのデータの授受に於いて検出されたかを
エラーレジスタ３−５〜３−７の値たけでは特定化でき
す、被疑ユニット１−２〜１−４の指摘が困難である。In general, when a fault is detected in the units to be diagnosed 1-2 to 1-4, if the values in the error registers 3-5 to 3-7 indicating that the fault has been detected remain, the suspected unit will be detected to some extent. It is possible to point out points 1-2 to 1-4, but for example, if the data is If a timeout error is detected during sending and receiving,
It is not possible to specify with which unit data was exchanged with which the error was detected based on the values in the error registers 3-5 to 3-7, and it is difficult to identify the suspect units 1-2 to 1-4.

また、マイクロプログラムにより論理的矛盾を検出する
様な場合も多数有りこの場合にエラーレジスタ３−５〜
３−７の値だけで被疑ユニット１−２〜１−４の指摘を
行う為にはエラーレジスタ３−５〜３−７のビット数が
膨大となる為に現実性は無い。従ってこの様な場合は、
障害が検出されたときにマイクロプログラムで行ってい
た処理の内容により被疑ユニット１−２〜１−４を容易
に指摘できることが多い。In addition, there are many cases where logical contradictions are detected by the microprogram, and in this case, error registers 3-5 to
It is not practical to identify the suspect units 1-2 to 1-4 using only the value of 3-7 because the number of bits in the error registers 3-5 to 3-7 would be enormous. Therefore, in such a case,
In many cases, the suspect units 1-2 to 1-4 can be easily identified based on the details of the processing being performed by the microprogram when the failure was detected.

そこで、第１図に於いて、マイクロ命令の被疑ユニット
コードフィールド２−５にはマイクロプログラムの処理
の内容に応じて、ある程度被疑ユニット１−２〜１−４
の特定化が可能である。例えば、第２図の被診断ユニッ
ト１−２〜１−４間でデータの授受を行っており、かつ
被診断ユニ、ノド１−３より受信したデータに基づく処
理を行うマイクロプログラムの処理ルーチンの各ステッ
プの被疑ユニットコードフィールド２−５には、被診断
ユニット１−２内のマイクロプログラムであれば、第１
被疑ユニツトとして被診断ユニット１−３、第２被疑ユ
ニツトとして被診断ユニット１−２と指摘する為のコー
ドを設計時に記述することが可能である。Therefore, in FIG. 1, the suspected unit code field 2-5 of the microinstruction contains the suspect units 1-2 to 1-1 to some extent depending on the content of the microprogram processing.
It is possible to specify For example, the processing routine of a microprogram that exchanges data between the units to be diagnosed 1-2 to 1-4 in FIG. In the suspected unit code field 2-5 of each step, if it is a microprogram in the diagnosed unit 1-2, the first
It is possible to write a code at the time of design to point out the diagnosed unit 1-3 as the suspected unit and the diagnosed unit 1-2 as the second suspected unit.

尚、被疑ユニットコードフィールドのコード化は、任意
の適当な方式を採れば良いが、本実施例では第１図に示
す対応表に従ってコード化されるものとする。Although any suitable method may be used to encode the suspect unit code field, in this embodiment, the encoding is performed according to the correspondence table shown in FIG.

次に、第１図〜第３図を参照しながら、被診断ユニット
１−３で障害を検出した場合の診断制御ユニット１−１
の障害処理方式について説明する。Next, referring to FIGS. 1 to 3, the diagnostic control unit 1-1 when a fault is detected in the diagnosed unit 1-3.
The failure handling method will be explained below.

診断制御ユニット１−１は被診断ユニット１−３で障害
を検出したことを認識すると診断バス１−７を経由して
被診断ユニット１−３の障害情報を採取する。この障害
情報には、第３図のエラーレジスタ３−５〜３−７及び
マイクロ命令レジスタ３−１の値も含むものとする。When the diagnostic control unit 1-1 recognizes that a fault has been detected in the unit to be diagnosed 1-3, it collects fault information for the unit to be diagnosed 1-3 via the diagnostic bus 1-7. It is assumed that this fault information also includes the values of error registers 3-5 to 3-7 and microinstruction register 3-1 shown in FIG.

障害情報を採取した診断制御ユニット１−１はエラーレ
ジスタ３−５〜３−７の値を参照し、その値たけで被疑
ユニットを指摘できる場合は指摘を行う。またエラーレ
ジスタ３−５〜３−７の値だけでは被疑ユニｙ）を指摘
できない場合はさらにマイクロ命令レジスタ３−１の被
疑ユニットコードフィールド２−５を参照し例えばその
値が“ＯＩＨ”であれば第１表に従って被疑ユニットと
して被診断ユニ７）１　２．１　３の順に指摘する。The diagnostic control unit 1-1 that has collected the fault information refers to the values in the error registers 3-5 to 3-7, and if the suspect unit can be pointed out based on the values, it points out the suspect unit. If the suspected unit (y) cannot be identified using only the values of error registers 3-5 to 3-7, refer to the suspect unit code field 2-5 of the microinstruction register 3-1. For example, if the value is "OIH", In accordance with Table 1, identify the units to be diagnosed as suspect units in the order of 7) 1 2. 1 3.

〔Effect of the invention〕

以上説明したように本発明の障害処理方式では、マイク
ロ命令のフィールドの一部に被疑ユニットをコード化し
た情報を格納するフィールドを設けることにより、マイ
クロプログラムに変更が生じたとしても、それに応じて
診断制御ユニット内の障害処理プログラムを変更する必
要は無く、また、障害検出時に障害の起因となるユニッ
トが動作することによる障害情報の破壊や他のユニット
に対して悪影響を及ぼすことが無いという効果がある。As explained above, in the fault handling method of the present invention, even if a change occurs in the microprogram, by providing a field for storing information encoding the suspect unit as part of the field of the microinstruction, There is no need to change the fault handling program in the diagnostic control unit, and there is no need to destroy fault information or adversely affect other units due to the operation of the fault-causing unit when a fault is detected. There is.

夕、３−２〜３−４・・・エラー検出信号、３−５〜３
−７・・・エラーレジスタ、３−８・・・ＮＯＲゲート
、３−９・・・クロック入力信号、３−１０・・・ＮＡ
ＮＤゲート、３−１１・・・クロック信号。Evening, 3-2 to 3-4...Error detection signal, 3-5 to 3
-7...Error register, 3-8...NOR gate, 3-9...Clock input signal, 3-10...NA
ND gate, 3-11...clock signal.

Claims

[Scope of Claims] A plurality of units to be diagnosed operate under the control of a horizontal microprogram, and by collecting and analyzing failure information when a failure is detected in the units to be diagnosed, a suspected unit causing the failure can be pointed out. In a fault handling method for an information processing system comprising a diagnostic control unit, the unit to be diagnosed is provided with means for retaining steps of a microprogram being executed at the time of detection of a fault, and a part of a field of a horizontal microinstruction includes: The diagnostic control unit has a field for storing encoded information about the suspected unit, and the diagnostic control unit is provided with means for collecting the field of the microprogram step at the time of detecting the fault in the diagnosed unit, A failure handling method characterized in that a suspected unit causing a failure is pointed out by analyzing the contents of the field.