JPH02279040A - Fault detection system for multi-processor system - Google Patents

Fault detection system for multi-processor system

Info

Publication number
JPH02279040A
JPH02279040A JP1098957A JP9895789A JPH02279040A JP H02279040 A JPH02279040 A JP H02279040A JP 1098957 A JP1098957 A JP 1098957A JP 9895789 A JP9895789 A JP 9895789A JP H02279040 A JPH02279040 A JP H02279040A
Authority
JP
Japan
Prior art keywords
processor
operation monitoring
monitoring signal
signal
section
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP1098957A
Other languages
Japanese (ja)
Other versions
JP2917291B2 (en
Inventor
Kazuo Nishidai
西大 和男
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NEC Corp
Original Assignee
NEC Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NEC Corp filed Critical NEC Corp
Priority to JP1098957A priority Critical patent/JP2917291B2/en
Publication of JPH02279040A publication Critical patent/JPH02279040A/en
Application granted granted Critical
Publication of JP2917291B2 publication Critical patent/JP2917291B2/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Landscapes

  • Multi Processors (AREA)
  • Small-Scale Networks (AREA)

Abstract

PURPOSE:To detect the fault of a system without the provision of a master processor by applying processor fault detection notice when it is detected a reception section is in the nonreception state of an operation monitoring signal even after the prescribed time elapse and inputting a new operation monitoring signal to a transmission section. CONSTITUTION:When a fault takes place in a processor 2, a processor 3 cannot receive an operation monitoring signal S1 from the processor 2. A monitoring section 7 of the processor 3 resets a timer 8 at the transmission (point A) of the preceding operation monitoring signal S1 and counts the reception time based on a clock signal (c) from the timer 8, the elapse of the next signal reception timing time T111 is recognized and the occurrence of the fault in the processor 2 is discriminated. The monitoring section 7 based on the discrimination informs the fault detection of the processor 2 to reset the timer 8 to send the operation monitoring signals S1 to the processor 4. Thus, the fact of the occurrence of the fault is discriminated by individual processors 1-4.

Description

【発明の詳細な説明】 〔産業上の利用分野〕 この発明はバスに結合された複数のプロセッサの個々の
障害発生を自動的に検出するマルチプロセッサシステム
の障害検出方式に関する。
DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention relates to a fault detection method for a multiprocessor system that automatically detects the occurrence of a fault in each of a plurality of processors coupled to a bus.

(従来の技術〕 従来のマルチプロセッサシステムの障害検出方式は、シ
ステム内に各プロセッサの動作状態を管理するマスクプ
ロセッサを持ち、このマスクプロセッサから他の総ての
プロセッサに対して順次動作監視信号を送信して応答信
号の返信を監視し、所定時間内に応答信号を受信できな
かった場合にそのプロセッサを障害と判断することでシ
ステム内の全プロセッサの障害発生を検出する方式とな
っていた。
(Prior Art) A conventional failure detection method for a multiprocessor system has a mask processor in the system that manages the operating state of each processor, and this mask processor sequentially sends operation monitoring signals to all other processors. The system detects the occurrence of a failure in all processors in the system by transmitting a response signal, monitoring the response, and determining that the processor is at fault if the response signal is not received within a predetermined time.

この技術を第3図に基づいて具体的に説明する。This technique will be specifically explained based on FIG.

マルチプロセッサシステムは、1個のマスクプロセッサ
31と3個のプロセッサ32〜34をリング型のバス3
5で接続した構成となっている。マスクプロセッサ31
は他のプロ1セツサ32〜34の動作状態を管理してい
る。マスクプロセッサ31は、プロセッサ32に対して
動作監視信号Sllを送信し、所定時間内にその応答信
号S+Zが受信されるか否かを監視する。そしてその結
果をプロセッサ32の動作状態として管理する。引き続
きマスクプロセッサ31がプロセッサ33.プロセッサ
34に対して順次同様の手順を繰り返すことで、各プロ
セッサ32〜34の障害検出を行う。
The multiprocessor system connects one mask processor 31 and three processors 32 to 34 to a ring bus 3.
It has a configuration in which 5 are connected. Mask processor 31
manages the operating status of the other processors 32-34. The mask processor 31 transmits an operation monitoring signal Sll to the processor 32, and monitors whether the response signal S+Z is received within a predetermined time. The results are then managed as the operating state of the processor 32. Subsequently, the mask processor 31 is replaced by the processor 33 . By sequentially repeating the same procedure for the processors 34, failures in each of the processors 32 to 34 are detected.

〔発明が解決しようとする課題〕[Problem to be solved by the invention]

前述した従来のマルチプロセッサシステムの障害検出方
式にあっては、システム内にマスクプロセッサという特
別なプロセッサ31を設け、このマスクプロセッサ31
から他の総てのプロセッサ32〜34に対して動作監視
信号S11を送信し、その応答信号Sl□を監視するこ
とでプロセッサの障害を検出するものであったため、以
下の欠点がある。
In the conventional multiprocessor system failure detection method described above, a special processor 31 called a mask processor is provided in the system, and this mask processor 31
Since the system detects a failure in a processor by transmitting an operation monitoring signal S11 from one processor to all other processors 32 to 34 and monitoring the response signal Sl□, there are the following drawbacks.

(イ)マスクプロセッサ31のみで障害監視を行うこと
から、被監視プロセッサ32〜34の配設数が多くなれ
ばなる程、マスタプロセッサ31の負荷が増大する。
(a) Since only the mask processor 31 performs fault monitoring, the load on the master processor 31 increases as the number of monitored processors 32 to 34 increases.

(ロ)マスクプロセッサ31自身に障害が発生した場合
には、システム全体の障害検出機能が停止してしまう。
(b) If a failure occurs in the mask processor 31 itself, the failure detection function of the entire system will stop.

この発明の目的は、前記従来の課題を解決するために、
複数のプロセッサで障害検出動作を行うことができ、経
済的かつ信頼性の高いマルチプロセッサシステムの障害
検出方式を提供することにある。
The purpose of this invention is to solve the above-mentioned conventional problems.
It is an object of the present invention to provide an economical and highly reliable fault detection method for a multiprocessor system that can perform fault detection operations using a plurality of processors.

(課題を解決するための手段) この発明は、バスを介して結合している複数のプロセッ
サに動作監視信号を順回させることによってプロセッサ
の障害を検出するマルチプロセッサシステムの障害検出
方式であって、 各プロセッサは、 手前のプロセッサからの動作監視信号を受信する受信部
と、 次のプロセッサへ動作監視信号を送信する送信部と、 受信部の受信状態を監視し、受信部の動作監視信号受信
時にその動作監視信号を送信部に入力し、又は受信部が
所定時間経過後も動作監視信号の非受信状態にあるとネ
★知した時点でプロセッサ障害検出通知を行うと共に新
たな動作監視信号を送信部に人力する監視部とを備えて
いることを特徴とする。
(Means for Solving the Problem) The present invention is a fault detection method for a multiprocessor system that detects a fault in a processor by sequentially passing an operation monitoring signal to a plurality of processors connected via a bus. , each processor has a receiving section that receives the operation monitoring signal from the previous processor, a transmitting section that sends the operation monitoring signal to the next processor, and a receiving section that monitors the receiving state of the receiving section and receives the operation monitoring signal of the receiving section. When the operation monitoring signal is input to the transmitting section, or when the receiving section notices that the operation monitoring signal is not being received even after a predetermined period of time has elapsed, a processor fault detection notification is sent and a new operation monitoring signal is sent. It is characterized in that the transmitting section includes a monitoring section that is manually operated.

〔実施例〕〔Example〕

この発明の実施例について図面を参照して説明する。 Embodiments of the invention will be described with reference to the drawings.

第1図はこの発明の一実施例に係るマルチプロセッサシ
ステムの障害検出方式を示すブロック図である。
FIG. 1 is a block diagram showing a fault detection method for a multiprocessor system according to an embodiment of the present invention.

このマルチプロセッサシステムの障害検出方式は、プロ
セッサ1〜4をバス5によってリング状に接続し、動作
監視信号S、をプロセッサ1〜4に順回させることによ
ってプロセッサの障害を検出する方式である。
The failure detection method of this multiprocessor system is such that processors 1 to 4 are connected in a ring through a bus 5, and an operation monitoring signal S is sent to the processors 1 to 4 in order to detect a failure in the processors.

各プロセッサl (〜4)は、受信部6と、監視部7と
、タイマ8と、送信部9とを備えている。
Each processor l (~4) includes a receiving section 6, a monitoring section 7, a timer 8, and a transmitting section 9.

受信部6は、隣接のプロセッサからの動作監視信号S、
を受信して監視部7へ送出するためのものである。
The receiving unit 6 receives an operation monitoring signal S from an adjacent processor,
This is for receiving and sending it to the monitoring section 7.

監視部7は、受信部6からの動作監視信号S1とタイマ
8からのクロック信号Cを監視することにより、隣接の
プロセッサの障害の有無を検出するものである。
The monitoring unit 7 monitors the operation monitoring signal S1 from the receiving unit 6 and the clock signal C from the timer 8 to detect whether there is a failure in an adjacent processor.

以下、監視部7の機能を具体的に述べる。監視部7は、
受信部6から受信した動作監視信号SIを送信部9に送
出すると共にタイマをリセットし、タイマ8からのクロ
ック信号Cを監視する。そしてこのクロック信号Cに基
づいて動作監視信号S1を送信部9に送出した後、受信
部6から人力する迄の時間tを測定する。時間tが予め
設定された次信号受信タイミング時間T、に略−敗して
いる場合には、監視部7はプロセッサ4が正常に動作し
ていると判断し、タイマ8をリセットすると共に送信部
9に動作監視信号S、を送出する。動作監視信号S1の
送信部9への送出タイミングは、動作監視信号S、の受
信からT2時間後の送信タイミングアウト時を契機とし
て行われる。T2時間の測定は、監視部7が動作監視信
号S、の受信時にタイマ8をリセットし、タイマ8から
入力されるクロック信号Cを測定することにより行われ
る。一方、監視部7が次信号受信タイミング時間T、を
経過しても動作監視信号S1を受信しない場合には、プ
ロセッサ4に障害が発生したものと判断し、その旨の通
知を図示しない監視制御装置等に送る。監視部7は、こ
の通知と並行してタイマ8をリセットすると共に送信部
9に新たな動作監視信号S1を送出する機能を有する。
The functions of the monitoring section 7 will be specifically described below. The monitoring unit 7 is
The operation monitoring signal SI received from the receiving section 6 is sent to the transmitting section 9, the timer is reset, and the clock signal C from the timer 8 is monitored. After sending the operation monitoring signal S1 to the transmitting section 9 based on this clock signal C, the time t until it is manually transmitted from the receiving section 6 is measured. If the time t is approximately equal to the preset next signal reception timing T, the monitoring unit 7 determines that the processor 4 is operating normally, resets the timer 8, and resets the transmitting unit. The operation monitoring signal S is sent to the terminal 9. The timing at which the operation monitoring signal S1 is sent to the transmitter 9 is set at the time when the transmission timing is out, which is T2 hours after the reception of the operation monitoring signal S. The measurement of the T2 time is performed by resetting the timer 8 when the monitoring unit 7 receives the operation monitoring signal S, and measuring the clock signal C input from the timer 8. On the other hand, if the monitoring unit 7 does not receive the operation monitoring signal S1 even after the next signal reception timing time T has elapsed, it determines that a failure has occurred in the processor 4, and sends a notification to that effect via a monitoring control (not shown). Send to equipment, etc. The monitoring section 7 has a function of resetting the timer 8 and sending out a new operation monitoring signal S1 to the transmitting section 9 in parallel with this notification.

尚、次信号受信タイミング時間T1は、少なくとも動作
監視信号S、がプロセッサ1から送出され、プロセッサ
2,3.4を順回しプロセッサ1に戻る迄の時間以上に
設定されている。
The next signal reception timing time T1 is set to be at least longer than the time from when the operation monitoring signal S is sent from the processor 1 to when it passes through the processors 2, 3, and 4 and returns to the processor 1.

次に、この実施例の障害検出動作について第1図と第2
図に基づいて説明する。
Next, the fault detection operation of this embodiment will be explained in Figs. 1 and 2.
This will be explained based on the diagram.

第2図はこの実施例のマルチプロセッサシステムの障害
検出方式が示す障害検出動作のシーケンス図である。
FIG. 2 is a sequence diagram of the failure detection operation shown by the failure detection method of the multiprocessor system of this embodiment.

プロセッサ1が、タイマ8をリセットして送信部9から
動作監視信号S1をプロセッサ2に送信する。
The processor 1 resets the timer 8 and transmits the operation monitoring signal S1 from the transmitter 9 to the processor 2.

プロセッサ2〜4は、第2図に示すように、手前のプロ
セッサからの動作監視信号S、を受信し、T2時間の信
号送信タイミングアウト時に動作監視信号S1を次のプ
ロセッサに送出していく。
As shown in FIG. 2, the processors 2 to 4 receive the operation monitoring signal S from the previous processor, and send the operation monitoring signal S1 to the next processor when the signal transmission timing is out at time T2.

プロセッサ1〜4のいずれもが正常に動作している場合
には、動作監視信号S、はプロセッサ1〜4を順回し、
再びプロセッサ1に戻ってくる。
When all of the processors 1 to 4 are operating normally, the operation monitoring signal S passes through the processors 1 to 4 in order;
It returns to processor 1 again.

この動作監視信号S1はプロセッサ1の受信部6に受信
される。この受信部6からの動作監視信号S、を入力し
た監視部7は、タイマ8からのクロック信号Cに基づき
受信時の時間が次信号受信タイミング時間T1に略一致
していることを認識する。
This operation monitoring signal S1 is received by the receiving section 6 of the processor 1. The monitoring section 7, which receives the operation monitoring signal S from the receiving section 6, recognizes based on the clock signal C from the timer 8 that the time of reception substantially coincides with the next signal reception timing time T1.

これにより、監視部7は、プロセッサ4が正常動作して
いると判断する。動作監視信号Slの受信と並行して監
視部7は、タイマ8をリセットし、タイマ8からのクロ
ック信号Cを測定して信号送信タイミングアウト時間T
2に到ったと判断したときに動作監視信号Slを送信部
9に送る。送信部9は、この動作監視信号S1を次のプ
ロセッサ2に送信する(第2図)。
Thereby, the monitoring unit 7 determines that the processor 4 is operating normally. In parallel with receiving the operation monitoring signal Sl, the monitoring unit 7 resets the timer 8, measures the clock signal C from the timer 8, and determines the signal transmission timing out time T.
2, an operation monitoring signal Sl is sent to the transmitter 9. The transmitter 9 transmits this operation monitoring signal S1 to the next processor 2 (FIG. 2).

このとき、プロセッサ2に障害が生じた場合には、プロ
セッサ2は、プロセッサ1からの動作監視信号S、を受
信できず、プロセッサ3にその動作監視信号S1をプロ
セッサ3に送信することができない(第2図)。
At this time, if a failure occurs in the processor 2, the processor 2 cannot receive the operation monitoring signal S from the processor 1, and cannot send the operation monitoring signal S1 to the processor 3 ( Figure 2).

従ってプロセッサ3は、プロセッサ2からの動作監視信
号Slを受信することができない。このプロセッサ3の
監視部7は、前回の動作監視信号S、送信時(第2図A
点)にタイマ8をリセットし、タイマ8からのクロック
信号Cに基づいて受信時間を測定しているため、次信号
受信タイミング時間T、の経過を認識し、プロセッサ2
に障害が生じていると判断する。この判断に基づいて、
監視部7はプロセッサ2の障害検出通知を行うと共に、
自プロセッサ2以外の次信号受信タイミングアウトの発
生を防止するため、タイマ8をリセットし送信部9を介
して動作監視信号S、をプロセッサ4に送信する(第2
図)。
Therefore, the processor 3 cannot receive the operation monitoring signal Sl from the processor 2. The monitoring unit 7 of this processor 3 monitors the previous operation monitoring signal S at the time of transmission (FIG. 2A).
Since the timer 8 is reset at point 8) and the reception time is measured based on the clock signal C from the timer 8, the passage of the next signal reception timing T is recognized and the processor 2
It is determined that a problem has occurred. Based on this judgment,
The monitoring unit 7 notifies the processor 2 of failure detection, and
In order to prevent timing out of reception of the next signal other than the own processor 2, the timer 8 is reset and the operation monitoring signal S is transmitted to the processor 4 via the transmitter 9 (second
figure).

このようにして、個々のブロモ・ンサ1〜4によって障
害発生の事実を判断することができる。また、障害発生
によって障害が生じたプロセッサ2以外のプロセッサの
動作を停止させる必要もない。
In this way, the fact that a failure has occurred can be determined by each of the Bromo Sensors 1 to 4. Further, there is no need to stop the operation of processors other than the failed processor 2 due to the occurrence of a failure.

〔発明の効果〕〔Effect of the invention〕

この発明のマルチプロセッサシステムの障害検出方式は
以上説明したように構成されているため、以下の効果が
ある。
Since the fault detection method for the multiprocessor system of the present invention is configured as described above, it has the following effects.

(イ)システム内にマスクプロセッサという特別なプロ
セッサを設置することなく、システムの障害検出動作を
実現することができ、この結果、より経済的なシステム
の構築が可能となる。
(a) It is possible to realize a fault detection operation in the system without installing a special processor called a mask processor in the system, and as a result, it is possible to construct a more economical system.

(ロ)システム内のどのプロセッサが障害になってもシ
ステム内の障害検出動作が停、止しない。この結果、シ
ステムの信頼性の向上を図ることができる。
(b) No matter which processor in the system fails, the failure detection operation within the system does not stop or stop. As a result, it is possible to improve the reliability of the system.

【図面の簡単な説明】[Brief explanation of drawings]

第1図はこの発明の一実施例に係るマルチプロセッサシ
ステムの障害検出方式を示すブロック図、第2図は第1
図のマルチプロセッサシステムの障害検出方式が行う障
害検出動作のシーケンス図、第3図は従来のマルチプロ
セッサシステムの障害検出方式を示すブロック図である
。 1〜4・・・プロセッサ 5・・・バス 6・・・受信部 7・・・監視部 8・・・タイマ 9・・・送信部
FIG. 1 is a block diagram showing a fault detection method for a multiprocessor system according to an embodiment of the present invention, and FIG.
FIG. 3 is a sequence diagram of the failure detection operation performed by the failure detection method of the multiprocessor system shown in the figure. FIG. 3 is a block diagram showing the conventional failure detection method of the multiprocessor system. 1 to 4... Processor 5... Bus 6... Receiving section 7... Monitoring section 8... Timer 9... Transmitting section

Claims (1)

【特許請求の範囲】[Claims] (1)バスを介して結合している複数のプロセッサに動
作監視信号を順回させることによってプロセッサの障害
を検出するマルチプロセッサシステムの障害検出方式で
あって、 各プロセッサは、 手前のプロセッサからの動作監視信号を受信する受信部
と、 次のプロセッサへ動作監視信号を送信する送信部と、 受信部の受信状態を監視し、受信部の動作監視信号受信
時にその動作監視信号を送信部に入力し、又は受信部が
所定時間経過後も動作監視信号の非受信状態にあると検
知した時点でプロセッサ障害検出通知を行うと共に新た
な動作監視信号を送信部に入力する監視部とを備えてい
ることを特徴とするマルチプロセッサシステムの障害検
出方式。
(1) A failure detection method for a multiprocessor system that detects a processor failure by sequentially passing an operation monitoring signal to multiple processors connected via a bus, in which each processor receives a signal from the previous processor. A receiving section that receives the operation monitoring signal; a transmitting section that sends the operation monitoring signal to the next processor; and a transmitting section that monitors the receiving state of the receiving section and inputs the operation monitoring signal to the transmitting section when the receiving section receives the operation monitoring signal. or a monitoring unit that notifies the processor failure detection and inputs a new operation monitoring signal to the transmitting unit when the receiving unit detects that the operation monitoring signal is not being received even after a predetermined period of time has elapsed. A failure detection method for a multiprocessor system characterized by the following.
JP1098957A 1989-04-20 1989-04-20 Fault detection method for multiprocessor systems Expired - Fee Related JP2917291B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP1098957A JP2917291B2 (en) 1989-04-20 1989-04-20 Fault detection method for multiprocessor systems

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP1098957A JP2917291B2 (en) 1989-04-20 1989-04-20 Fault detection method for multiprocessor systems

Publications (2)

Publication Number Publication Date
JPH02279040A true JPH02279040A (en) 1990-11-15
JP2917291B2 JP2917291B2 (en) 1999-07-12

Family

ID=14233566

Family Applications (1)

Application Number Title Priority Date Filing Date
JP1098957A Expired - Fee Related JP2917291B2 (en) 1989-04-20 1989-04-20 Fault detection method for multiprocessor systems

Country Status (1)

Country Link
JP (1) JP2917291B2 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000060828A1 (en) * 1999-03-31 2000-10-12 Fujitsu Limited Data communication processing device and method and recording medium storing data communication processing program
JP2009087149A (en) * 2007-10-01 2009-04-23 Nec Corp Electronic device, data processor, and bus control method

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS5424508A (en) * 1977-07-26 1979-02-23 Fujitsu Ltd Filure detection system for loop delivery

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS5424508A (en) * 1977-07-26 1979-02-23 Fujitsu Ltd Filure detection system for loop delivery

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000060828A1 (en) * 1999-03-31 2000-10-12 Fujitsu Limited Data communication processing device and method and recording medium storing data communication processing program
JP2009087149A (en) * 2007-10-01 2009-04-23 Nec Corp Electronic device, data processor, and bus control method

Also Published As

Publication number Publication date
JP2917291B2 (en) 1999-07-12

Similar Documents

Publication Publication Date Title
US5712856A (en) Method and apparatus for testing links between network switches
JPH02279040A (en) Fault detection system for multi-processor system
JPH06175944A (en) Network monitoring method
JPH01217666A (en) Fault detecting system for multiprocessor system
JPH08307438A (en) Token ring type transmission system
JPH0348997A (en) Monitoring system
JP2675645B2 (en) System failure monitoring device
JPH02112398A (en) Preferential transmitting system in cyclic digital transmission
JPH02216577A (en) Fault detecting system in multi-processor system
JPH04179687A (en) Remote control device for elevator
JPS634366A (en) Mutual monitor system for multicomputer
JPH0716190B2 (en) Communication error monitoring device for communication system
JPH0735470Y2 (en) Loop type data transmission device
JP2000013469A (en) Device and method for recovering communication equipment
JPH0798667A (en) Remote monitor system
JPH0213151A (en) Local area network system
JPH10207745A (en) Method for confirming inter-processor existence
JPS59127160A (en) Fault detecting system
JPS62175836A (en) Health check system in data processing system
JPH03255745A (en) Loop-shaped signal communication system
JPH01251839A (en) Host monitor device
JPS59158144A (en) Data transmission system
JPH04167142A (en) Fault detection system for information processor
JPH01297796A (en) Alarm informing device
JPH0241058A (en) Diagnostic device for data transmission system

Legal Events

Date Code Title Description
LAPS Cancellation because of no payment of annual fees