JP2001043201A

JP2001043201A - Multiprocessor fault detector

Info

Publication number: JP2001043201A
Application number: JP11219659A
Authority: JP
Inventors: Takayuki Ito; 孝之伊藤
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 1999-08-03
Filing date: 1999-08-03
Publication date: 2001-02-16

Abstract

PROBLEM TO BE SOLVED: To easily specify a CPU, in which a fault occurs, inside a multiprocessor by dumping a memory while simplifying a device configuration and reducing the device cost without providing the counters of hardware as many as CPU. SOLUTION: Updated count values 7a-7c of counters and last count values 8a-8c are comparatively evaluated by a CPU 1 for examination and on the basis of the comparatively evaluated result of the CPU 1 for examination, an abnormality detecting signal 13 of a CPU is outputted by a WDT 5. Thus, since the abnormality detecting signal of the processor can be outputted on the basis of the evaluated result between the updated count values and last count values of counters in memories, the processor, in which a fault occurs, inside the multiprocessor can be easily specified while simplifying the device configuration and reducing the device cost without preparing the counters of hardware as many as processors.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、マルチプロセッサ
障害検出装置に係り、詳しくは、複数のプロセッサから
なる計算機において、ウオッチドッグタイマ（Ｗａｔｈ
ｄｏｇＴｉｍｅｒ；ＷＤＴ）を用いてプロセッサの障
害を検出するマルチプロセッサ障害検出装置に関するも
のである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a multiprocessor fault detecting device, and more particularly, to a watchdog timer (Wath) in a computer including a plurality of processors.
The present invention relates to a multiprocessor fault detection device that detects a fault in a processor by using a Dog Timer (WDT).

【０００２】[0002]

【従来の技術】従来、計算機のシステムハング検出方法
には、ＷＤＴというＨ／Ｗのタイマーを利用するのもの
が知られている。このハング検出方法では、ＷＤＴとい
うＨ／Ｗのタイマーを利用して、０秒から定期的にこれ
をリセットし、システムがハングしたときに、このタイ
マが設定された時間内でリセットされなくなることか
ら、ハングを検出することができる。2. Description of the Related Art Heretofore, as a method of detecting a system hang of a computer, a method using an H / W timer called WDT is known. In this hang detection method, an H / W timer called WDT is used to periodically reset the timer from 0 seconds, and when the system hangs, this timer is not reset within the set time. , Hangs can be detected.

【０００３】しかしながら、マルチプロセッサの計算機
では、複数のプロセッサのうち一つのプロセッサがハン
グしても、他のプロセッサが動作しているため、動作可
能なプロセッサにＷＤＴをリセットするプログラムの実
行が割り付けられていると、ＷＤＴのリセットが行われ
てほしくないのにも拘わらず行われてしまい、ハングし
たプロセッサによるシステム異常を検出できなくってし
まうことがあった。However, in a multiprocessor computer, even if one of a plurality of processors hangs, the other processor is operating. Therefore, the execution of the program for resetting the WDT is assigned to the operable processor. In such a case, the WDT may be reset although it is not desired that the reset is performed, and a system error due to the hung processor may not be detected.

【０００４】このようなマルチプロセッサシステムにお
けるシステム異常の検出の不具合を解消する従来技術と
しては、例えば特開昭５７−５５４６１号公報で報告さ
れたマルチプロセッサ障害検出方式が挙げられる。この
従来のマルチプロセッサ障害検出方式においては、各プ
ロセッサに対応するＨ／Ｗによる監視カウンタを備え、
各プロセッサが監視カウンタをそれぞれ定期的にリセッ
トすることにより、各プロセッサの異常を検出してい
る。As a conventional technique for solving such a problem of detecting a system abnormality in a multiprocessor system, there is a multiprocessor fault detection system reported in Japanese Patent Application Laid-Open No. 57-55461, for example. In this conventional multiprocessor failure detection system, a monitoring counter by H / W corresponding to each processor is provided,
Each processor periodically resets the monitoring counter, thereby detecting an abnormality of each processor.

【０００５】監視カウンタの内容は、カウントアップ回
路により常時カウントアップされ、各プロセッサは、カ
ウンタオーバーフロー時間より十分短い周期で対応する
監視カウンタの内容を定期的にクリアする。仮に、ある
プロセッサに異常があると、監視カウンタの対応する内
容がクリアされないので、オーバーフローを生じて、そ
の異常状態を表示部に表示する。The contents of the monitoring counter are constantly counted up by a count-up circuit, and each processor periodically clears the contents of the corresponding monitoring counter at a period sufficiently shorter than the counter overflow time. If there is an abnormality in a certain processor, the corresponding content of the monitoring counter is not cleared, so that an overflow occurs and the abnormal state is displayed on the display unit.

【０００６】また、従来、マルチマイクロプロセッサシ
ステムの故障検知装置については、例えば特開平２−５
３１６９号公報で報告されたものが挙げられる。この従
来の故障検知装置では、共有メモリに各プロセッサ対応
のＷＤＴフラグを備え、これを定期的に各プロセッサが
フラグセットし、全てのフラグセットでＷＤＴをリセッ
トしている。A conventional multi-microprocessor system failure detecting device is disclosed in, for example,
No. 3169 is reported. In this conventional failure detection device, a WDT flag corresponding to each processor is provided in the shared memory, and each processor periodically sets the WDT flag, and resets the WDT in all flag sets.

【０００７】共有メモリ上のＷＤＴフラグは、各プロセ
ッサが所定の演算時間内に正常な演算処理をした場合に
のみセットする。複数のプロセッサのうち、代表のプロ
セッサは、ＷＤＴフラグセットの有無で他のプロセッサ
動作の正常／異常をチェックすることができ、マルチプ
ロセッサの故障検知を行うことができる。[0007] The WDT flag on the shared memory is set only when each processor performs a normal operation within a predetermined operation time. Among the plurality of processors, a representative processor can check the normal / abnormal operation of other processors based on the presence / absence of the WDT flag set, and can detect a failure of the multiprocessor.

【０００８】[0008]

【発明が解決しようとする課題】上記した特開昭５７−
５５４６１号公報で報告された従来のマルチプロセッサ
障害検出方式では、ハードウェアによるカウンタをプロ
セッサの数だけ用意しなければならず、装置構成が複雑
になるうえ、装置コストが増加するという問題があっ
た。SUMMARY OF THE INVENTION The above-mentioned Japanese Patent Application Laid-Open No.
In the conventional multiprocessor failure detection method reported in Japanese Patent No. 55461, the number of hardware counters must be prepared for the number of processors, which causes a problem that the device configuration becomes complicated and the device cost increases. .

【０００９】また、上記した特開平２−５３１６９号公
報で報告された従来のマルチマイクロプロセッサシステ
ムの故障検知装置では、前述したように、マルチプロセ
ッサの中のあるプロセッサに障害が発生した時、ＷＤＴ
フラグセットの有無により、マルチプロセッサ自身に障
害が発生していることを知ることができる。In the conventional multi-microprocessor system fault detecting device reported in the above-mentioned Japanese Patent Application Laid-Open No. 2-53169, as described above, when a fault occurs in one of the multi-processors, the WDT
The presence or absence of the flag set indicates that a failure has occurred in the multiprocessor itself.

【００１０】しかしながら、マルチプロセッサの中のあ
るプロセッサに障害が発生した時、一般にメモリダンプ
を行い解析を行なうが、この従来の故障検知装置では、
フラグを用いているため、メモリダンプからマルチプロ
セッサの中のどのプロセッサに障害が発生したのか知る
ことができず、障害が発生したプロセッサの特定を行う
ことが困難であった。[0010] However, when a failure occurs in a certain processor in the multiprocessor, a memory dump is generally performed and analysis is performed.
Since the flag is used, it is difficult to know which processor in the multiprocessor has failed from the memory dump, and it is difficult to specify the failed processor.

【００１１】そこで、本発明は、上記のような課題を解
消するためになされたもので、ハードウェアによるカウ
ンタをプロセッサの数だけ用意しないで済ませることが
でき、装置構成の簡略化及び装置コストの低減を実現す
ることができるほか、マルチプロセッサの中で障害が発
生したプロセッサを容易に特定することができるマルチ
プロセッサ障害検出装置を提供することを目的とする。SUMMARY OF THE INVENTION The present invention has been made to solve the above-described problems, and can eliminate the need to prepare hardware counters by the number of processors, thereby simplifying the apparatus configuration and reducing the apparatus cost. It is another object of the present invention to provide a multiprocessor failure detection device that can reduce the number of failed processors and can easily identify a failed processor among multiprocessors.

【００１２】[0012]

【課題を解決するための手段】本発明によるマルチプロ
セッサ障害検出装置は、複数の被検査対象のプロセッサ
と、被検査対象の各プロセッサに対応する複数のカウン
タが設けられたメモリと、更新したカウンタのカウント
値と前回のカウンタのカウント値を比較評価する検査用
のプロセッサと、検査用のプロセッサの比較評価結果に
基づいてプロセッサの異常検出信号を出力する異常検出
信号出力手段とを有することを特徴とするものである。According to the present invention, there is provided a multiprocessor fault detecting apparatus comprising: a plurality of processors to be inspected; a memory provided with a plurality of counters corresponding to each of the processors to be inspected; And a failure detection signal output means for outputting a failure detection signal of the processor based on a comparison evaluation result of the inspection processor. It is assumed that.

【００１３】また、マルチプロセッサ障害検出装置にお
いて、検査用のプロセッサによりカウント値の更新値と
前回値のそれぞれが全て違った値であると検出した場
合、リセット信号を出力するリセット信号出力手段を有
し、異常検出信号出力手段は、検査用のプロセッサによ
りカウント値の更新値と前回値のそれぞれが同じ値であ
ると検出した場合、プロセッサの異常検出信号を出力す
ることを特徴とするものである。Further, the multiprocessor failure detection device has reset signal output means for outputting a reset signal when the test processor detects that the updated value of the count value and the previous value are all different values. The abnormality detection signal output means outputs an abnormality detection signal of the processor when the inspection processor detects that the updated value of the count value and the previous value are the same, respectively. .

【００１４】また、マルチプロセッサ障害検出装置にお
いて、異常検出信号出力手段は、検査用のプロセッサに
障害が発生してリセット信号がリセット信号出力手段に
より出力されない場合、プロセッサの異常検出信号を出
力することを特徴とするものである。In the multiprocessor failure detection device, the failure detection signal output means may output a failure detection signal of the processor when a failure occurs in the test processor and the reset signal is not output by the reset signal output means. It is characterized by the following.

【００１５】また、マルチプロセッサ障害検出装置にお
いて、異常検出信号出力手段によりプロセッサの異常検
出信号が出力された場合、カウンタのカウント値の更新
値と前回値をメモリからダンプするメモリダンプ手段を
有することを特徴とするものである。Further, the multiprocessor failure detection device has a memory dump means for dumping the updated value of the count value of the counter and the previous value from the memory when the abnormality detection signal of the processor is output from the abnormality detection signal output means. It is characterized by the following.

【００１６】[0016]

【発明の実施の形態】以下に、本発明の実施の形態を図
面を参照して説明する。実施の形態１．図１は本発明に係る実施の形態１のマル
チプロセッサ障害検出装置の構成を示すブロック図であ
る。図示例のマルチプロセッサ障害検出装置は、サーバ
などの計算機システムに適用することができる。図１に
おいて、１〜４はＮ個のＣＰＵのマルチプロセッサから
構成されるシステムのそれぞれのＣＰＵで、ＣＰＵ１は
検査用のＣＰＵであり、ＣＰＵ２〜４は被検査対象のＣ
ＰＵである。Embodiments of the present invention will be described below with reference to the drawings. Embodiment 1 FIG. FIG. 1 is a block diagram showing the configuration of the multiprocessor fault detection device according to the first embodiment of the present invention. The illustrated multiprocessor fault detection device can be applied to a computer system such as a server. In FIG. 1, reference numerals 1 to 4 denote CPUs of a system composed of a multiprocessor of N CPUs, a CPU 1 is a CPU for inspection, and CPUs 2 to 4 are Cs to be inspected.
PU.

【００１７】５はＷＤＴで、このＷＤＴ５は、検査用の
ＣＰＵ１からのＷＤＴリセット信号１２によりリセット
されるとともに、検査用のＣＰＵ１からＷＤＴリセット
信号１２が所定時間出力されない場合、ＣＰＵの障害検
出信号１３を出力する。６は被検査対象の各ＣＰＵ２〜
４にそれぞれ対応するカウンタが複数設けられたメモリ
である。Reference numeral 5 denotes a WDT. The WDT 5 is reset by a WDT reset signal 12 from the inspection CPU 1, and when the WDT reset signal 12 is not output from the inspection CPU 1 for a predetermined time, a CPU failure detection signal 13 Is output. 6 is each CPU 2 to be inspected
4 is a memory provided with a plurality of counters respectively corresponding to 4.

【００１８】７ａ〜７ｃは被検査対象のＣＰＵ２〜４に
よりメモリ６上のカウンタがそれぞれカウントアップさ
れて更新されるカウント値（更新値）であり、８ａ〜８
ｃは検査用のＣＰＵ１が最後にカウンタのカウント値７
ａ〜７ｃをそれぞれ読み出した時に、それをセーブした
カウント値（前回値）である。Reference numerals 7a to 7c denote count values (update values) that are updated by the counters in the memory 6 being counted up by the CPUs 2 to 4 to be inspected, respectively.
c indicates that the inspection CPU 1 has finally counted 7
This is the count value (previous value) at which each of a to 7c was read when it was read.

【００１９】９ａ〜９ｃは被検査対象の各ＣＰＵ２〜４
からメモリ６上の各カウンタのカウント値７ａ〜７ｃを
それぞれ書き込むことを示している。ここでのカウント
値７ａ〜７ｃは、更新したカウント値の更新値である。
例えば、カウント値７ａの書込み９ａは、被検査対象の
ＣＰＵ２から、このＣＰＵ２に対応するメモリ６上のカ
ウンタのカウント値７ａを書き込むことを示している。Reference numerals 9a to 9c denote CPUs 2 to 4 to be inspected.
To write the count values 7a to 7c of the respective counters on the memory 6 respectively. Here, the count values 7a to 7c are updated values of the updated count value.
For example, the writing 9a of the count value 7a indicates that the count value 7a of the counter on the memory 6 corresponding to the CPU 2 is written from the CPU 2 to be inspected.

【００２０】１０ａ〜１０ｃは検査用のＣＰＵ１からメ
モリ６上の各カウンタのカウント値７ａ〜７ｃをそれぞ
れ読み出すことを示している。ここでのカウント値７ａ
〜７ｃは、更新したカウント値の更新値である。例え
ば、カウント値７ａの読み出し１０ａは、検査用のＣＰ
Ｕ１から、このＣＰＵ１に対応するメモリ６上のカウン
タのカウント値７ａを読み出すことを示している。Reference numerals 10a to 10c indicate that the count values 7a to 7c of the respective counters on the memory 6 are read from the inspection CPU 1 respectively. Here the count value 7a
7c are updated values of the updated count value. For example, the reading 10a of the count value 7a is performed by the inspection CP.
This indicates that the count value 7a of the counter on the memory 6 corresponding to the CPU 1 is read from U1.

【００２１】１１ａ〜１１ｃは検査用のＣＰＵ１からメ
モリ６上の各カウンタのカウント値８ａ〜８ｃをそれぞ
れ書き込んだり、読み出したりすることを示している。
ここでのカウント値８ａ〜８ｃは、更新値のカウント値
７ａ〜７ｃに対して前回にカウントした前回値である。
例えば、前回のカウント値７ａの読み出しと書き込み１
１ａは、検査用のＣＰＵ１から、このＣＰＵ１に対応す
るメモリ６上のカウンタのカウント値８ａを書き込んだ
り、読み出したりすることを示している。Numerals 11a to 11c indicate that the count values 8a to 8c of the respective counters on the memory 6 are written and read from the inspection CPU 1 respectively.
Here, the count values 8a to 8c are the previous values counted last time with respect to the updated count values 7a to 7c.
For example, reading and writing 1 of the previous count value 7a
1a indicates that the inspection CPU 1 writes or reads the count value 8a of the counter on the memory 6 corresponding to the CPU 1.

【００２２】以下に、図１に示すマルチプロセッサ障害
検出装置の動作を説明する。被検査対象の各ＣＰＵ２〜
４は、各々一定時間毎に、メモリ６上に設けられた各カ
ウンタのカウント値７ａ〜７ｃを１ずつ増やす。例え
ば、被検査対象のＣＰＵ２は、メモリ６上の対応するカ
ウンタのカウント値７ａを一定時間毎に１ずつ増やし、
被検査対象のＣＰＵ３は、メモリ６上の対応するカウン
タのカウント値７ｂを一定時間毎に１ずつ増やす。The operation of the multiprocessor fault detecting device shown in FIG. 1 will be described below. Each CPU 2 to be inspected
Numeral 4 increments the count values 7a to 7c of the counters provided on the memory 6 by one at regular intervals. For example, the CPU 2 to be inspected increases the count value 7a of the corresponding counter on the memory 6 by one at regular intervals,
The CPU 3 to be inspected increases the count value 7b of the corresponding counter on the memory 6 by one at regular intervals.

【００２３】検査用のＣＰＵ１は、メモリ６上のカウン
タの更新値のカウント値７ａ〜７ｃと前回値のカウント
値８ａ〜８ｃを比較チェックし、カウント値の更新値と
前回値が異なっていると判断すると、異なっていた更新
値のカウント値７ａ〜７ｃを前回値のカウント値８ａ〜
８ｃにコピーする。例えば、更新値のカウント値７ａ
は、前回値のカウント値８ａにコピーされる。The inspection CPU 1 compares and checks the count values 7a to 7c of the updated values of the counters in the memory 6 with the count values 8a to 8c of the previous values, and determines that the updated value of the count value is different from the previous value. When it is determined, the count values 7a to 7c of the different update values are changed to the count values 8a to 8
8c. For example, the update value count value 7a
Is copied to the previous count value 8a.

【００２４】検査用のＣＰＵ１は、メモリ６上のカウン
タの更新値のカウント値７ａ〜７ｃと前回値のカウント
値８ａ〜８ｃを比較チェックし、更新値と前回値の対応
するカウント値各々が全て違った値であると判断する
と、ＷＤＴリセット信号１２をＷＤＴ５へ出力する。Ｗ
ＤＴ５は、このＷＤＴリセット信号１２によりリセット
される。The inspection CPU 1 compares and checks the count values 7a to 7c of the updated values of the counters in the memory 6 with the count values 8a to 8c of the previous values. If it is determined that the value is different, a WDT reset signal 12 is output to WDT5. W
DT5 is reset by the WDT reset signal 12.

【００２５】検査用のＣＰＵ１は、メモリ６上のカウン
タの更新値のカウント値７ａ〜７ｃと前回値のカウント
値８ａ〜８ｃを比較チェックし、更新値と前回値の対応
するカウント値の中で、どれか1つでも対応する更新値
と前回値のカウント値がそれぞれ同じ値であると判断す
ると、ＷＤＴリセット信号１２をＷＤＴ５へ出力しな
い。The inspection CPU 1 compares and checks the count values 7a to 7c of the updated values of the counters in the memory 6 with the count values 8a to 8c of the previous values. If any one of the updated values and the previous count value is determined to be the same, the WDT reset signal 12 is not output to the WDT 5.

【００２６】披検査対象のＣＰＵ７ａ〜７ｃは、障害が
発生すると、メモリ６の対応するカウンタの更新値が更
新されないため、その更新値が前回値と同じになる。こ
のような場合、検査用のＣＰＵ１は、ＷＤＴリセット信
号１２をＷＤＴ５へ出力しないようにしている。When a fault occurs in the CPUs 7a to 7c to be inspected, the updated value of the corresponding counter in the memory 6 is not updated, so that the updated value becomes the same as the previous value. In such a case, the inspection CPU 1 does not output the WDT reset signal 12 to the WDT 5.

【００２７】ＷＤＴ５は、検査用のＣＰＵ１からＷＤＴ
リセット信号１２が出力されないまま、予め設定されて
いた時間経過すると、ＣＰＵの異常検出信号１３を出力
する。そして、システムは、このＣＰＵの異常検出信号
１３により、図示しない表示部などにＣＰＵに障害が発
生した旨を表示させて、システム管理者にＣＰＵの異常
を報知する。The WDT 5 is transmitted from the inspection CPU 1 to the WDT 5.
When a preset time elapses without outputting the reset signal 12, an abnormality detection signal 13 of the CPU is output. Then, the system uses the abnormality detection signal 13 of the CPU to display on a display unit (not shown) that a failure has occurred in the CPU, and notifies the system administrator of the abnormality of the CPU.

【００２８】検査用のＣＰＵ１に障害が発生した場合
は、検査用のＣＰＵ１からＷＤＴリセット信号１２がＷ
ＤＴ５へ出力されない。このような場合、ＷＤＴ５は、
予め設定された時間経過しても、検査用のＣＰＵ１から
ＷＤＴリセット信号１２を取得しないため、ＣＰＵの異
常検出信号１３を出力する。以下同様に、システムは、
このＣＰＵの異常検出信号１３により、図示しない表示
部などにＣＰＵに障害が発生した旨を表示させて、シス
テム管理者にＣＰＵの異常を報知する。When a failure occurs in the CPU 1 for inspection, the WDT reset signal 12 from the CPU 1 for inspection
Not output to DT5. In such a case, WDT5
Since the WDT reset signal 12 is not acquired from the inspection CPU 1 even after the preset time has elapsed, the CPU abnormality detection signal 13 is output. Similarly, the system:
In response to the CPU abnormality detection signal 13, the fact that a failure has occurred in the CPU is displayed on a display unit (not shown) or the like to notify the system administrator of the CPU abnormality.

【００２９】ＣＰＵの異常は、表示部に表示させるだけ
でなく、ランプ表示でシステム管理者に報知するように
構成してもよいし、音声、ブザー音などでシステム管理
者に報知するように構成してもよい。また、ＣＰＵの異
常は、システム自身でシステム管理者へ報知するだけで
なく、遠方の監視端末へ通信で報知するように構成して
もよい。In addition to displaying the abnormality of the CPU on the display unit, the system administrator may be notified by lamp display, or may be notified by sound, buzzer sound, or the like to the system administrator. May be. Further, the system may be configured so that the system abnormality is notified not only to the system administrator itself but also to a remote monitoring terminal by communication.

【００３０】ここで、ＷＤＴ５がＣＰＵの異常検出信号
１３を出力したとき、実際にどのＣＰＵ１〜４が故障し
ているかを特定する方法を説明する。ＷＤＴ５によりＣ
ＰＵの異常検出信号１３が出力された時は、メモリ６の
内容を障害解析するために、メモリ６内のカウンタのカ
ウント値の更新値と前回値をメモリ６からダンプする。Here, a method for specifying which of the CPUs 1 to 4 has actually failed when the WDT 5 outputs the CPU abnormality detection signal 13 will be described. C by WDT5
When the PU abnormality detection signal 13 is output, the updated value of the count value of the counter in the memory 6 and the previous value are dumped from the memory 6 in order to analyze the contents of the memory 6 for a failure.

【００３１】仮に、メモリ６内のカウンタの更新値のカ
ウント値７ａ〜７ｃが前回のカウント値８ａ〜８ｂより
も大きな値になっているときは、披検査用のＣＰＵ２〜
４には障害が発生していないので、検査用のＣＰＵ１に
障害が発生していることが判る。If the count values 7a to 7c of the updated values of the counters in the memory 6 are larger than the previous count values 8a to 8b, the CPUs 2 to 2 for the inspection are checked.
Since no failure has occurred in No. 4, it can be seen that a failure has occurred in the CPU 1 for inspection.

【００３２】また、仮に、メモリ６内のカウンタの更新
値のカウント値７ａ〜７ｃが前回のカウント値８ａ〜８
ｂと同じになっているものがあるときは、その同じ値に
なっている更新値と前回値に対応する検査対象のＣＰＵ
２〜４に障害が発生していることが判る。このようにし
て、障害が発生したＣＰＵ１〜４をメモリ６をダンプす
ることにより特定できる。It is assumed that the count values 7a to 7c of the updated values of the counters in the memory 6 correspond to the previous count values 8a to 8c.
b, if there is an updated value having the same value and the CPU to be inspected corresponding to the previous value
It can be seen that two to four have failed. In this way, the failed CPUs 1-4 can be identified by dumping the memory 6.

【００３３】このように、本実施の形態では、検査用の
ＣＰＵ１により、更新したカウンタのカウント値７ａ〜
７ｃと前回のカウンタのカウント値８ａ〜８ｃを比較評
価し、ＷＤＴ５により、検査用のＣＰＵ１の比較評価結
果に基づいてＣＰＵの異常検出信号１３を出力するよう
に構成したため、メモリ６のカウンタのカウント値の更
新値と前回値の評価結果を基にＣＰＵの異常検出信号１
３を出力することができる。このため、ハードウェアに
よるカウンタをＣＰＵの数だけ用意することなく、装置
構成の簡略化及び装置コストの低減を実現しつつ、マル
チプロセッサの中で障害が発生したＣＰＵを、メモリ６
をダンプすることで容易に特定することができる。As described above, in this embodiment, the count values 7a to 7a of the updated counter are
7c is compared with the previous count value 8a to 8c of the counter, and the WDT 5 is used to output the CPU abnormality detection signal 13 based on the comparison evaluation result of the CPU 1 for inspection. CPU abnormality detection signal 1 based on the updated value of the value and the evaluation result of the previous value
3 can be output. For this reason, it is possible to simplify the device configuration and reduce the device cost without preparing hardware counters by the number of CPUs, and replace the failed CPU in the multiprocessor with the memory 6.
Can be easily specified by dumping.

【００３４】また、本実施の形態は、検査用のＣＰＵ１
によりメモリ６のカウント値の更新値と前回値のそれぞ
れが全て違った値であると検出した場合、ＷＤＴリセッ
ト信号１２をＷＤＴ５へ出力し、検査用のＣＰＵ１によ
りメモリ６のカウント値の更新値と前回値のそれぞれが
同じ値であると検出した場合、ＷＤＴ５によりＣＰＵの
異常検出信号１３を出力するように構成したため、ＷＤ
Ｔリセット信号１２の出力によりＣＰＵに障害が発生し
ていないことが分かり、ＣＰＵの異常検出信号１３の出
力によりＣＰＵに障害が発生していることをシステム管
理者に報知することができる。しかも、メモリ内の内容
をダンプすることにより、同じ値の更新値と前回値のカ
ウント値のカウンタに対応する検査対象のＣＰＵに障害
が発生していることを特定することができる。In this embodiment, the CPU 1 for inspection is used.
When the CPU 1 detects that the updated value of the count value of the memory 6 and the previous value are all different values, it outputs a WDT reset signal 12 to the WDT 5, and the CPU 1 for inspection checks the updated value of the count value of the memory 6 with the updated value. Since the WDT5 outputs the CPU abnormality detection signal 13 when it is detected that each of the previous values is the same value, WD
The output of the T reset signal 12 indicates that a failure has not occurred in the CPU, and the output of the abnormality detection signal 13 of the CPU can notify the system administrator that a failure has occurred in the CPU. Moreover, by dumping the contents in the memory, it is possible to specify that a failure has occurred in the CPU to be inspected corresponding to the counter of the updated value of the same value and the count value of the previous value.

【００３５】また、本実施の形態では、検査用のＣＰＵ
１に障害が発生してＷＤＴリセット信号１２が検査用の
ＣＰＵ１から出力されない場合、ＷＤＴ５によりＣＰＵ
の異常検出信号１３を出力するように構成したため、メ
モリ６内容をダンプすると、カウント値の更新値と前回
値のそれぞれが全て違った値になっていることから、検
査用のＣＰＵ１に障害が発生していることを特定するこ
とができる。In this embodiment, the inspection CPU is used.
If the WDT reset signal 12 is not output from the inspection CPU 1 due to the failure of the
When the contents of the memory 6 are dumped, a failure occurs in the inspection CPU 1 because the updated value of the count value and the previous value are all different values. Can be identified.

【００３６】また、本実施の形態では、ＷＤＴ５により
ＣＰＵの異常検出信号１３が出力された場合、メモリ６
のカウンタのカウント値の更新値と前回値をメモリ６か
らダンプするように構成したため、マルチプロセッサの
中のどのＣＰＵに障害が発生しているかを特定すること
ができる。メモリ６のカウンタの対応するカウント値の
更新値と前回値に同じ値のものがあれば、その同じ値の
更新値と前回値のカウント値のカウンタに対応する検査
対象のＣＰＵ２〜４に障害が発生していることを特定す
ることができる。また、メモリ６のカウント値の更新値
と前回値のそれぞれが全て違った値であれば、検査用の
プロセッサ１に障害が発生していることを特定すること
ができる。Further, in this embodiment, when the WDT 5 outputs the CPU abnormality detection signal 13, the memory 6
Since the updated value of the count value of the counter and the previous value are dumped from the memory 6, it is possible to specify which CPU in the multiprocessor has a fault. If the updated value of the corresponding count value of the counter of the memory 6 and the previous value have the same value, a failure occurs in the CPUs 2 to 4 corresponding to the updated value of the same value and the counter of the previous count value. It can be specified that it has occurred. If the updated value of the count value in the memory 6 and the previous value are all different values, it can be specified that a failure has occurred in the test processor 1.

【００３７】[0037]

【発明の効果】本発明によれば、複数の被検査対象のプ
ロセッサと、被検査対象の各プロセッサに対応する複数
のカウンタが設けられたメモリと、更新したカウンタの
カウント値と前回のカウンタのカウント値を比較評価す
る検査用のプロセッサと、検査用のプロセッサの比較評
価結果に基づいてプロセッサの異常検出信号を出力する
異常検出信号出力手段とを有し、検査用のプロセッサに
より、更新したカウンタのカウント値と前回のカウンタ
のカウント値を比較評価し、異常検出信号出力手段によ
り、検査用のプロセッサの比較評価結果に基づいてプロ
セッサの異常検出信号を出力するように構成することに
より、メモリのカウンタのカウント値の更新値と前回値
の評価結果を基にプロセッサの異常検出信号を出力する
ことができるので、ハードウェアによるカウンタをプロ
セッサの数だけ用意することなく、装置構成の簡略化及
び装置コストの低減を実現しつつ、マルチプロセッサの
中で障害が発生したプロセッサを容易に特定することが
できる。According to the present invention, a plurality of processors to be inspected, a memory provided with a plurality of counters corresponding to each processor to be inspected, the updated counter value and the previous counter An inspection processor for comparing and evaluating the count value, and an abnormality detection signal output unit for outputting an abnormality detection signal of the processor based on a comparison and evaluation result of the inspection processor, wherein the counter updated by the inspection processor By comparing and evaluating the count value of the counter with the count value of the previous counter, and outputting the abnormality detection signal of the processor by the abnormality detection signal output means based on the comparison and evaluation result of the processor for inspection, Since the processor error detection signal can be output based on the updated count value of the counter and the evaluation result of the previous value, Without preparing counter hardware the number of processors, while realizing a reduction in simplified and the apparatus cost of the device configuration, a processor failure has occurred in the multi-processor can be easily specified.

【００３８】また、検査用のプロセッサによりカウント
値の更新値と前回値のそれぞれが全て違った値であると
検出した場合、リセット信号出力手段によりリセット信
号を出力し、検査用のプロセッサによりカウント値の更
新値と前回値のそれぞれが同じ値であると検出した場
合、異常検出信号出力手段によりプロセッサの異常検出
信号を出力するように構成することにより、リセット信
号の出力によりプロセッサに障害が発生していないこと
が分かり、プロセッサの異常検出信号の出力によりプロ
セッサに障害が発生していることをシステム管理者に報
知することができる。しかも、メモリ内の内容をダンプ
することにより、同じ値の更新値と前回値のカウント値
のカウンタに対応する検査対象のプロセッサに障害が発
生していることを特定することができる。When the test processor detects that the updated value of the count value and the previous value are all different values, a reset signal is output by the reset signal output means, and the count value is output by the test processor. When it is detected that the updated value and the previous value are the same, the abnormality detection signal output means outputs the abnormality detection signal of the processor. Thus, it is possible to notify the system administrator that a failure has occurred in the processor by outputting the abnormality detection signal of the processor. Moreover, by dumping the contents in the memory, it is possible to specify that a failure has occurred in the processor to be inspected corresponding to the counter of the updated value of the same value and the count value of the previous value.

【００３９】また、検査用のプロセッサに障害が発生し
てリセット信号がリセット信号出力手段により出力され
ない場合、異常検出信号出力手段によりプロセッサの異
常検出信号を出力するように構成することにより、メモ
リ内容をダンプすると、カウント値の更新値と前回値の
それぞれが全て違った値であるので、検査用のプロセッ
サに障害が発生していることを特定することができる。Further, when a failure occurs in the test processor and the reset signal is not output by the reset signal output means, the abnormality detection signal output means outputs an abnormality detection signal of the processor, so that the content of the memory can be improved. Is dumped, the updated value of the count value and the previous value are all different values, so that it is possible to specify that a failure has occurred in the test processor.

【００４０】また、異常検出信号出力手段によりプロセ
ッサの異常検出信号が出力された場合、メモリダンプ手
段により、カウンタのカウント値の更新値と前回値をメ
モリからダンプするように構成することにより、マルチ
プロセッサの中のどのプロセッサに障害が発生している
かを特定することができる。カウンタの対応するカウン
ト値の更新値と前回値に同じ値のものがあれば、その同
じ値の更新値と前回値のカウント値のカウンタに対応す
る検査対象のＣＰＵに障害が発生していることを特定す
ることができる。また、カウント値の更新値と前回値の
それぞれが全て違った値であれば、検査用のプロセッサ
に障害が発生していることを特定することができる。When the abnormality detection signal output means outputs a processor abnormality detection signal, the memory dump means dumps the updated value of the count value of the counter and the previous value from the memory. It is possible to specify which of the processors has a fault. If the updated value of the corresponding count value of the counter and the previous value have the same value, a failure has occurred in the CPU to be inspected corresponding to the counter of the updated value of the same value and the count value of the previous value. Can be specified. If the updated value of the count value and the previous value are all different values, it can be specified that a failure has occurred in the test processor.

[Brief description of the drawings]

【図１】図１は本発明に係る実施の形態１のマルチプ
ロセッサ障害検出装置の構成を示すブロック図である。FIG. 1 is a block diagram showing a configuration of a multiprocessor fault detection device according to a first embodiment of the present invention.

[Explanation of symbols]

１検査用のＣＰＵ、２〜４被検査対象のＣＰＵ、５
ＷＤＴ、６メモリ、７ａ〜７ｃカウント値（更新
値）、８ａ〜８ｃカウント値（前回値）、９ａ〜９ｃ
カウント値の書き込み、１０ａ〜１０ｃカウント値
の読み出し、１１ａ〜１１ｃ前回のカウント値の読み
出しと書き込み、１２ＷＤＴリセット信号、１３Ｃ
ＰＵの障害検出信号。1 CPU for inspection, 2 to 4 CPUs to be inspected, 5
WDT, 6 memories, 7a-7c count value (update value), 8a-8c count value (previous value), 9a-9c
Write count value, 10a-10c Read count value, 11a-11c Read and write previous count value, 12 WDT reset signal, 13 C
PU failure detection signal.

Claims

[Claims]

1. A processor provided with a plurality of processors to be inspected, a memory provided with a plurality of counters corresponding to each processor to be inspected, and a count value of an updated counter and a count value of a previous counter are compared and evaluated. A multiprocessor failure detection device, comprising: a processor for inspection; and an abnormality detection signal output unit that outputs an abnormality detection signal of the processor based on a comparison evaluation result of the processor for inspection.

2. The multiprocessor fault detection device according to claim 1, wherein a reset signal is output when the test processor detects that the updated value of the count value and the previous value are all different values. A reset signal output unit configured to output an abnormality detection signal of the processor when the inspection processor detects that the updated value of the count value and the previous value are the same value. A multiprocessor failure detection device characterized by the following.

3. The multiprocessor failure detection device according to claim 1, wherein the failure detection signal output means outputs the error signal to the processor when a failure occurs in the test processor and the reset signal is not output by the reset signal output means. A multiprocessor failure detection device that outputs an abnormality detection signal.

4. The multiprocessor failure detection device according to claim 1, wherein when the abnormality detection signal output means outputs an abnormality detection signal of the processor, the updated value of the count value of the counter and the previous value are stored in the memory. A multiprocessor fault detection device comprising a memory dump unit for dumping.