JPH04275663A

JPH04275663A - System for deciding fault processor

Info

Publication number: JPH04275663A
Application number: JP3697091A
Authority: JP
Inventors: Kaoru Yamamoto; 薫山本; Tadashi Yamashita; 忠志山下
Original assignee: NEC Corp; NEC Communication Systems Ltd
Current assignee: NEC Corp; NEC Communication Systems Ltd
Priority date: 1991-03-04
Filing date: 1991-03-04
Publication date: 1992-10-01

Abstract

PURPOSE:To improve the accuracy for deciding the fault processor by accumulating the number of information as an abnormal processor for each processor and deciding it as the fault one when the number of accumulated information exceeds a threshold value. CONSTITUTION:Processors 21 to 2n detect the abnormality of the opposite processor in communication and inform it to a master processor 30 each other. This abnormality information is accumulated in an accumulation processing part 61 of the master processor, and stores it in information number storage memories 71 to 7n for each processor. An information frequency check processing part 62 checks the information number storage memories 71 to 7n for each abnormality information, and decides the processor as a fault one when the accumulation number exceeds the prescribed threshold value.

Description

[Detailed description of the invention]

【０００１】0001

【産業上の利用分野】本発明は複数のプロセッサをバス
を介し接続してプロセッサ相互間に通信を行うマルチプ
ロセッサシステムに関し、特に障害プロセッサの判定方
式に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a multiprocessor system in which a plurality of processors are connected via a bus and communicate with each other, and more particularly to a method for determining a faulty processor.

【０００２】0002

【従来の技術】従来、この種の障害プロセッサの判定方
式では、プロセッサ相互間通信中相手プロセッサの異常
を検出し、障害の判定機能をもつマスタプロセッサに相
手プロセッサの異常を密告すると、マスタプロセッサは
密告された、相手プロセッサを障害と判定していた。2. Description of the Related Art Conventionally, in this type of faulty processor determination method, an abnormality in the other processor is detected during communication between the processors, and when a master processor with a fault determination function is informed of the other processor's abnormality, the master processor It was reported that the other party's processor had been determined to be at fault.

【０００３】次に従来の例について図面を参照して説明
する。図２は従来の例を示すブロック図であり、ｎ個の
プロセッサ２１〜２ｎとマスタプロセッサ３０とをバス
１０を介し接続して、プロセッサ１０〜２ｎおよびマス
タプロセッサ３０の相互間で通信を行ないながら所定の
処理を進めるマルチプロセッサシステムを示している。マスタプロセッサ３０は、中央処理装置６０と、バス１
０上のデータを取り込んで中央処理装置６０に伝える受
信部４０と、中央処理装置６０からデータをバス１０上
に送信する送信部５０とを備えて構成されている。中央
処理装置６０は、密告されたプロセッサを障害と判定す
る障害プロセッサ判定処理部６１を有する。Next, a conventional example will be explained with reference to the drawings. FIG. 2 is a block diagram showing a conventional example, in which n processors 21 to 2n and a master processor 30 are connected via a bus 10, and the processors 10 to 2n and the master processor 30 communicate with each other. A multiprocessor system is shown that carries out predetermined processing. The master processor 30 has a central processing unit 60 and a bus 1
The receiving unit 40 receives data on the bus 10 and transmits the data to the central processing unit 60, and the transmitting unit 50 transmits data from the central processing unit 60 onto the bus 10. The central processing unit 60 has a faulty processor determination processing unit 61 that determines the notified processor to be faulty.

【０００４】ここで例えばプロセッサ２１とプロセッサ
２２がバス１０を介し通信中に、プロセッサ２２がプロ
セッサ２１の異常を検出すると、プロセッサ２２はバス
１０を介しマスタプロセッサ３０にプロセッサ２１の異
常を密告する。マスタプロセッサ３０は受信部４０でプ
ロセッサ２２からの密告を受け取り、中央処理装置６０
へプロセッサ２１の異常を通知する。異常通知を受けた
中央処理装置６０の障害プロセッサ判定処理部６１はプ
ロセッサ２１を障害と判定する。For example, if processor 22 detects an abnormality in processor 21 while processor 21 and processor 22 are communicating via bus 10, processor 22 notifies master processor 30 of the abnormality in processor 21 via bus 10. The master processor 30 receives the secret information from the processor 22 through the receiving unit 40 and sends it to the central processing unit 60.
An abnormality in the processor 21 is notified to the processor 21. Upon receiving the abnormality notification, the faulty processor determination processing unit 61 of the central processing unit 60 determines that the processor 21 is faulty.

【０００５】[0005]

【発明が解決しようとする課題】上述した従来の障害プ
ロセッサの判定方式では、障害の判定機能をもつマスタ
プロセッサは密告されたプロセッサを直ちに障害と判定
しているので、もし相手プロセッサの異常を密告したプ
ロセッサ自身が異常だった場合の密告も、障害機能を持
つマスタプロセッサは実際には正常であるプロセッサを
障害と判定してしまう欠点がある。[Problems to be Solved by the Invention] In the above-mentioned conventional method for determining faulty processors, the master processor with the fault determination function immediately determines the notified processor as faulty. Even if a master processor with a faulty function is notified when the processor itself is abnormal, it has the disadvantage that the master processor with the faulty function will judge a processor that is actually normal to be faulty.

【０００６】本発明の目的は、プロセッサの障害密告回
数をプロセッサ個々に累計し、所定のしきい値を越えた
プロセッサを障害と判定する障害プロセッサの判定方式
を提供することにある。SUMMARY OF THE INVENTION An object of the present invention is to provide a method for determining a faulty processor, which accumulates the number of fault notifications for each processor and determines a processor whose fault exceeds a predetermined threshold to be faulty.

【０００７】[0007]

【課題を解決するための手段】本発明の障害プロセッサ
の判定方式は、複数のプロセッサをバスを介し接続して
プロセッサ相互間通信を行っているマルチプロセッサシ
ステムで、プロセッサ相互間通信中相手プロセッサの異
常を検出し、検出した相手プロセッサの異常を障害の判
定機能をもつマスタプロセッサに密告し、当該プロセッ
サの障害を判定する障害プロセッサの判定方式において
、前記マスタプロセッサは、密告された回数を前記複数
のプロセッサ個々に累計する手段と、この累計された密
告回数が所定のしきい値を越えたとき当該プロセッサを
障害と判定する手段を有する。[Means for Solving the Problems] A method for determining a faulty processor according to the present invention is a multiprocessor system in which a plurality of processors are connected via a bus and communicate with each other, and a faulty processor is detected during communication between the processors. In a failure processor determination method in which an abnormality is detected, the detected abnormality in a partner processor is notified to a master processor having a failure determination function, and a failure of the processor is determined, the master processor calculates the number of times the notification has been made to the plurality of times. and means for determining the processor as having a failure when the cumulative number of notifications exceeds a predetermined threshold.

【０００８】[0008]

【実施例】次に、本発明について図面を参照して説明す
る。図１は本発明の一実施例を示すブロック図である。図１には、ｎ個のプロセッサ２１〜２ｎとマスタプロセ
ッサ３０とをバス１０を介し接続して、プロセッサ２１
〜２ｎおよびマスタプロセッサ３０の相互間で通信を行
ないながら所定の処理を進めるマルチプロセッサシステ
ムが示されている。マスタプロセッサ３０は、中央処理
装置６０と、バス１０上のデータを取り込んで中央処理
装置６０に伝える受信部４０と、中央処理装置６０から
データをバス１０上に送信する送信部５０と、中央処理
装置６０に接続された記憶装置７０とを備えて構成して
いる。中央処理装置６０は、密告された回数を累計する
累計処理部６１と、密告回数の頻度をチェックする密告
頻度チェック処理部６２を有し、記憶装置７０はプロセ
ッサ２１〜２ｎに対応する密告回数蓄積メモリ７１〜７
ｎを有する。DESCRIPTION OF THE PREFERRED EMBODIMENTS Next, the present invention will be explained with reference to the drawings. FIG. 1 is a block diagram showing one embodiment of the present invention. In FIG. 1, n processors 21 to 2n and a master processor 30 are connected via a bus 10.
2n and a master processor 30, a multiprocessor system is shown in which predetermined processing is carried out while communicating with each other. The master processor 30 includes a central processing unit 60, a receiving unit 40 that takes in data on the bus 10 and transmits it to the central processing unit 60, a transmitting unit 50 that transmits data from the central processing unit 60 onto the bus 10, and a central processing The storage device 70 is connected to the device 60. The central processing unit 60 has a cumulative processing unit 61 that totals the number of times that a person has been notified, and a person who has been informed of a number of times. Memory 71-7
It has n.

【０００９】次に障害プロセッサの判定動作について説
明する。ここではプロセッサ２１が異常プロセッサ、プ
ロセッサ２２〜２ｎが正常プロセッサとし、プロセッサ
数ｎを９コ、障害判定用の所定のしきい値を５と仮定す
る。プロセッサ２１とプロセッサ２２が、バス１０を介
し通信中にプロセッサ２２がプロセッサ２１の異常を検
出すると、プロセッサ２２はバス１０を介しマスタプロ
セッサ３０にプロセッサ２１の異常を所定の方法で密告
する。また、プロセッサ２１は、誤ってプロセッサ２２
を異常と検出しバス１０を介しマスタプロセッサ３０に
密告する。Next, the determination operation of a faulty processor will be explained. Here, it is assumed that the processor 21 is an abnormal processor, the processors 22 to 2n are normal processors, the number n of processors is nine, and the predetermined threshold for failure determination is five. When the processor 22 detects an abnormality in the processor 21 while the processors 21 and 22 are communicating via the bus 10, the processor 22 notifies the master processor 30 of the abnormality in the processor 21 via the bus 10 in a predetermined manner. In addition, the processor 21 mistakenly
is detected as an abnormality and notified to the master processor 30 via the bus 10.

【００１０】マスタプロセッサ３０は受信部４０で、プ
ロセッサ２１および２２からの密告を受け取り中央処理
装置６０へ通知する。中央処理装置６０の累計処理部６
１は、記憶装置７０の対応するプロセッサの密告回数蓄
積メモリ７１および７２に１加算を行なう。次に密告頻
度チェック処理部６２は、記憶装置７０の対応するプロ
セッサの密告回数蓄積メモリ７１および７２の密告累計
回数が、しきい値５を越えたかを調べる。現在は両プロ
セッサ２１および２２の密告回数蓄積メモリ７１および
７２の累計回数は１回なので、マスタプロセッサ３０は
障害と判定しない。こうして例えばプロセッサ２１とプ
ロセッサ２３〜２９との間で通信し、それぞれ密告した
とすると、プロセッサ２１の密告回数７１は９回、プロ
セッサ２３〜２９の密告回数蓄積メモリ７３〜７９の累
計回路はそれぞれ１回となる。[0010] The master processor 30 receives the secret information from the processors 21 and 22 through the receiving section 40 and notifies the central processing unit 60 of the information. Cumulative processing unit 6 of central processing unit 60
1 adds 1 to the informing number storage memories 71 and 72 of the corresponding processor in the storage device 70. Next, the informing frequency check processing unit 62 checks whether the cumulative number of informings in the informing frequency storage memories 71 and 72 of the corresponding processor of the storage device 70 exceeds the threshold value 5. Currently, the cumulative number of times in the informing number storage memories 71 and 72 of both processors 21 and 22 is 1, so the master processor 30 does not determine that there is a failure. In this way, for example, if the processor 21 and the processors 23 to 29 communicate and notify each other, the number of notifications 71 of the processor 21 is 9, and the cumulative circuit of the number of notification storage memories 73 to 79 of the processors 23 to 29 is 1, respectively. It will be times.

【００１１】ここで中央処理装置６０は密告頻度チェッ
ク処理６２で、密告回数蓄積メモリ７１〜７９が、所定
のしきい値の５を越えたものを障害と判定するので、プ
ロセッサ２１を障害と判定し、それ以外のプロセッサ２
２〜２９を正常と判定する。[0011] Here, in the notification frequency check process 62, the central processing unit 60 determines as a failure when the notification frequency storage memories 71 to 79 exceed a predetermined threshold value of 5, and therefore determines that the processor 21 is a failure. and other processor 2
2 to 29 are determined to be normal.

【００１２】0012

【発明の効果】以上説明したように本発明は、障害の判
定機能をもつマスタプロセッサに異常プロセッサと密告
された回数をプロセッサ個々に累計し、累計した密告回
数が所定のしきい値を越えたとき当該プロセッサを障害
と判定することにより、ただ１回の誤った障害密告によ
り、正常プロセッサを障害と判定することを防止できる
効果がある。[Effects of the Invention] As explained above, the present invention accumulates the number of times each processor is notified of an abnormal processor by a master processor having a fault determination function, and when the cumulative number of notifications exceeds a predetermined threshold. By determining that the processor in question is at fault, it is possible to prevent a normal processor from being determined to be at fault due to just one erroneous fault report.

[Brief explanation of the drawing]

【図１】本発明の一実施例を示すブロック図である。FIG. 1 is a block diagram showing one embodiment of the present invention.

【図２】障害プロセッサの判定方式の従来例を示すブロ
ック図である。FIG. 2 is a block diagram showing a conventional example of a faulty processor determination method.

[Explanation of symbols]

１０　　　　バス２１〜２ｎ　　　　プロセッサ３０　　　　マスタプロセッサ４０　　　　受信部５０　　　　送信部６０　　　　中央処理装置６１　　　　累計処理部６２　　　　密告頻度チェック処理部７０　　　　記憶装置 10 Bus 21~2n Processor 30 Master processor 40 Receiving section 50 Transmission section 60 Central processing unit 61 Cumulative processing section 62. Snitch frequency check processing unit 70 Storage device

Claims

[Claims]

Claim 1: A multiprocessor system in which a plurality of processors are connected via a bus to perform communication between the processors, in which an abnormality in a partner processor is detected during communication between the processors, and the detected abnormality in the partner processor is detected as a fault. In the failure processor determination method, the master processor has a means for accumulating the number of times of notification for each of the plurality of processors; 1. A method for determining a faulty processor, comprising means for determining the processor to be faulty when the number of notifications exceeds a predetermined threshold.