JPH02277336A - Faulty processor detection system - Google Patents
Faulty processor detection systemInfo
- Publication number
- JPH02277336A JPH02277336A JP1099501A JP9950189A JPH02277336A JP H02277336 A JPH02277336 A JP H02277336A JP 1099501 A JP1099501 A JP 1099501A JP 9950189 A JP9950189 A JP 9950189A JP H02277336 A JPH02277336 A JP H02277336A
- Authority
- JP
- Japan
- Prior art keywords
- processor
- data
- transmission
- destination
- faulty
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims description 8
- 230000005540 biological transmission Effects 0.000 claims abstract description 33
- 238000012544 monitoring process Methods 0.000 claims abstract 2
- 238000000034 method Methods 0.000 claims description 4
- 238000010586 diagram Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000000725 suspension Substances 0.000 description 1
Landscapes
- Monitoring And Testing Of Exchanges (AREA)
- Small-Scale Networks (AREA)
Abstract
Description
【発明の詳細な説明】
〔産業上の利用分野〕
本発明は、複数のプロセッサをバスを介し接続して構成
されるマルチプロセッサシステムのプロセッサ相互間で
用いられる障害プロセッサ検出方式に関する。DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention relates to a faulty processor detection method used between processors in a multiprocessor system configured by connecting a plurality of processors via a bus.
従来、複数のプロセッサ相互間通信における障害プロセ
ッサ検出方式では、データを送信する送信元プロセッサ
に相手先各プロセッサ対応の送信待ちデータを登録させ
る送信待ちキュー及び送信待ちデータの長さを計数する
送信待ちキュー長カウンタを設け、送信データを受信す
べき送信先プロセッサが送信データを受取ってくれなか
ったときに、送信データを送信待ちキューに登録し送信
待ちキュー長カウンタを一歩歩進させ、周期的に送信待
ちキューの長さを監視して、所定のしきい値、例えば、
20と設定された場合はこの値を越えている送信先プロ
セッサを障害とみなすキュー長チェック処理によってプ
ロセッサの障害を検出している。Conventionally, in a faulty processor detection method in communication between multiple processors, there is a transmission waiting queue in which the sending processor that sends data registers the waiting data for each destination processor, and a sending waiting queue in which the length of the waiting data is counted. A queue length counter is provided, and when the destination processor that should receive the transmitted data does not receive the transmitted data, the transmitted data is registered in the transmission waiting queue, the transmission waiting queue length counter is incremented one step, and the processing is performed periodically. Monitor the send queue length and set a predetermined threshold, e.g.
If the value is set to 20, a processor failure is detected through queue length check processing that treats a destination processor exceeding this value as a failure.
上述したように、従来の障害プロセッサ検出方式では、
送信待ちキューの長さが所定のしきい値を越えた送信先
のプロセッサを直ちに障害とみなしてしまう。従って、
送信先プロセッサが、他のプロセッサからのデータを受
信しているような単なる輻湊状態のため送信されてきた
データの受信動作ができなくて送信元プロセッサの送信
待ちキューにたまっている状態のときでも、このような
送信先プロセッサは障害とみなされてしまうという欠点
がある。As mentioned above, conventional faulty processor detection methods
A destination processor whose transmission queue length exceeds a predetermined threshold is immediately regarded as a failure. Therefore,
Even when the destination processor is simply in a congestion state, such as receiving data from another processor, and is unable to receive the transmitted data and is stuck in the transmission queue of the source processor. , such a destination processor is considered to be a failure.
本発明の目的は、送信先プロセッサが単なる輻湊状態の
ときには障害とみなされないようにした障害プロセッサ
検出方式を提供することにある。SUMMARY OF THE INVENTION An object of the present invention is to provide a faulty processor detection method that prevents a destination processor from being considered faulty when it is simply in a congested state.
本発明の障害プロセッサ検出方式は、複数のプロセッサ
をバスを介し接続して構成されるマルチプロセッサシス
テムにおけるデータを送信する送信元プロセッサに、送
信相手先各プロセッサごとに送信待ち□データを登録さ
せる記憶手段及び送信待ちデータの長さを計数する第1
の計数手段を設け、送信データを受信すべき送信先プロ
セッサが送信データを受取ってくれなかったときに、送
信データを前記記憶手段に登録し、前記第1の計数手段
を歩進させ、周期的に送信待ちデータの長さを監視し、
所定の第1のしきい値を越えているプロセッサを障害と
みなすキュー長チェック処理により障害プロセッサを検
出する障害プロセッサ検出方式において、前記送信元プ
ロセッサに送信相手先各プロセッサごとに休止中状態を
計数する第2の計数手段を設け、送信先プロセッサが送
信データを受取ってくれなかったときに、前記送信先プ
ロセッサを休止中状態とみなしデータの送信を見合わせ
ると共に前記第2の計数手段、を歩進させ、一定周期で
前記休止中状態を監視し、所定の第2のしきい値より小
さければ前記第2の計数手段の値を0に戻し、大きけれ
ば障害と・みなす休止中チェック処理と前記キュー長チ
ェッ・り処理とを併用して障害プロセッサを検出するよ
う構成されている。The faulty processor detection method of the present invention is a storage system in which a sending processor that sends data in a multiprocessor system configured by connecting a plurality of processors via a bus registers data waiting to be sent for each destination processor. The first step is to count the length of the data waiting to be sent.
counting means is provided, and when the destination processor that should receive the transmitted data does not receive the transmitted data, the transmitted data is registered in the storage means, the first counting means is incremented, and the first counting means is periodically counted. monitor the length of data waiting to be sent,
In a faulty processor detection method in which a faulty processor is detected by a queue length check process in which a processor that exceeds a predetermined first threshold is considered to be faulty, the number of idle states of the transmission destination processor is counted for each destination processor. and when the destination processor does not receive the transmitted data, the destination processor is deemed to be in a dormant state, and the data transmission is suspended and the second counting means is incremented. the inactive state is monitored at regular intervals, and if the value is smaller than a predetermined second threshold value, the value of the second counting means is returned to 0; if the value is larger than the inactive state, the inactive state is regarded as a failure; The system is configured to detect a faulty processor using long check processing in combination.
次に、本発明の実施例について図面を参照して説明する
。Next, embodiments of the present invention will be described with reference to the drawings.
第1図は本発明の一実施例を示すブロック図である。第
1図には、n個のプロセッサ21〜2nをバス10を介
し接続して、プロセッサ21〜2nの間で通信を行いな
がら処理を進めるマルチプロセッサシステムが示されて
いる。プロセッサ21は、中央処理装置50と、バス1
0上のデータを取込んで中央処理装置50に伝える受信
部30と、中央処理装置50からのデータをバス10上
に送出する送信部40と、中央処理装置50に接続され
た記憶装置60とを備えて構成されており、他のプロセ
ッサ22〜2nも同様に構成されている。中央処理袋N
50は、キュー長チェック処理部51と休止中チェック
処理部52と送信制御処理部53とを有し、記憶装置6
0は、自プロセッサ以外の送信相手先プロセッサに対応
する送信待ちキュー72〜7nと送信待ちキュー長カウ
ンタ82〜8nと休止中カウンタ92〜9nとを有して
いる。FIG. 1 is a block diagram showing one embodiment of the present invention. FIG. 1 shows a multiprocessor system in which n processors 21 to 2n are connected via a bus 10 and processing is performed while communicating among the processors 21 to 2n. The processor 21 has a central processing unit 50 and a bus 1.
a receiving section 30 that takes in data on 0 and transmits it to the central processing unit 50, a transmitting section 40 that sends data from the central processing unit 50 onto the bus 10, and a storage device 60 connected to the central processing unit 50. The other processors 22 to 2n are similarly configured. Central processing bag N
50 includes a queue length check processing section 51, an inactive check processing section 52, and a transmission control processing section 53, and a storage device 6.
0 has transmission waiting queues 72 to 7n, transmission waiting queue length counters 82 to 8n, and idle counters 92 to 9n, which correspond to transmission destination processors other than the own processor.
以下にプロセッサ21がデータを送信する送信元プロセ
ッサ、プロセッサ22がデータを受信する送信先プロセ
ッサであるものとして本実施例の動作を説明する。The operation of this embodiment will be described below assuming that the processor 21 is a source processor that transmits data, and the processor 22 is a destination processor that receives data.
プロセッサ21が、プロセッサ22に対してデータを送
信したが何らかの要因によりプロセッサ22が送信デー
タを受信できなかった場合、プロセッサ21の送信制御
処理部53は、プロセッサ22に対応する送信待ちキュ
ー72に送信データを登録し、送信待ちキュー長カウン
タ82及び休止中カウンタ92をそれぞれ一歩歩進させ
、このプロセッサ22を休止中状態とみなし一定時間デ
ータの送信を中止し所定の時間後に送信処理を再開する
。If the processor 21 transmits data to the processor 22 but the processor 22 is unable to receive the transmitted data for some reason, the transmission control processing unit 53 of the processor 21 transmits the data to the transmission waiting queue 72 corresponding to the processor 22. The data is registered, the transmission waiting queue length counter 82 and the inactive counter 92 are each incremented by one step, the processor 22 is considered to be in an inactive state, data transmission is stopped for a certain period of time, and the transmission process is resumed after a predetermined period of time.
他方、キュー長チェック処理部51は、一定時間ごとに
各プロセッサに対応する送信待ちキュー長カウンタ82
〜8nの値と所定のしきい値とを比較して、しきい値を
越えているものはないかを検索する。このしきい値は、
送信先プロセッサの輻輳状態を加味しているため、例え
ば、従来2゜と設定していたものを200のように従来
の値より大きめに設定されている。もし、プロセッサ2
2に対応する送信待ちキュー長カウンタ72かそのしき
い値を越えていたならはプロセッサ22に障害が発生し
ているものと判断する。On the other hand, the queue length check processing unit 51 checks the transmission waiting queue length counter 82 corresponding to each processor at regular intervals.
The value of ~8n is compared with a predetermined threshold value to search for any value exceeding the threshold value. This threshold is
Since the congestion state of the destination processor is taken into consideration, for example, the conventional value of 2 degrees is now set to 200, which is larger than the conventional value. If processor 2
If the transmission queue length counter 72 corresponding to 2 exceeds its threshold value, it is determined that a failure has occurred in the processor 22.
休止中チェック処理部52は、一定周期て各プロセッサ
対応の休止中カウンタ92〜9nの値と所定のしきい値
とを比較して、しきい値を越えているものはないかを検
索する。例えば、50m5の間に2以内ならば通常の輻
輳状態の範囲内とみて正常動作しているものと判断して
これに該当する休止中カウンタの値を0に戻し、もし、
プロセッサ22に対応する休止中カウンタ92の値が3
であるならばプロセッサ22に障害が発生しているもの
と判断する。The inactivity check processing unit 52 compares the values of the inactivity counters 92 to 9n corresponding to each processor with a predetermined threshold value at regular intervals, and searches for any that exceeds the threshold value. For example, if it is within 2 within 50m5, it is considered to be within the range of normal congestion and is considered to be operating normally, and the value of the corresponding idle counter is reset to 0.
The value of the inactive counter 92 corresponding to the processor 22 is 3.
If so, it is determined that a failure has occurred in the processor 22.
以上説明したように、本発明は、送信元プロセッサに従
来から設けられている送信先プロセッサ対応の送信待ち
キュー長カウンタに加え、新たに休止中カウンタを設け
、送信先プロセッサがデータを受信してくれなかった場
合を休止中状態として計数させると共にデータの送信を
止め、定時間内に休止中状態の回数が所定の回数を越え
たらこの送信先プロセッサを障害とみなす休止中チェッ
ク処理と従来の送信待ちキュー長カウンタが所定のしき
い値を越えたら送信先プロセッサを障害とみなすキュー
長チェック処理とを併用することにより、送信先プロセ
ッサが単なる輻輳状態で受信処理ができないときに誤っ
て送信先プロセッサを障害と判断させる可能性を少なく
する効果を有する。As explained above, the present invention provides a new idle counter in addition to the transmission waiting queue length counter corresponding to the destination processor, which is conventionally provided in the source processor, so that the destination processor receives data. If the destination processor is not in the dormant state, it is counted as a dormant state and data transmission is stopped, and if the number of times the destination processor is in the dormant state exceeds a predetermined number within a fixed period of time, the destination processor is considered to be at fault. By using a queue length check process that treats the destination processor as a failure when the waiting queue length counter exceeds a predetermined threshold, it is possible to prevent the destination processor from erroneously failing when the destination processor is simply congested and unable to perform reception processing. This has the effect of reducing the possibility of being judged as a disability.
第1図は本発明の一実施例を示すブロック図である。
10・・・・・・バス、21〜2n・・・・・・プロセ
ッサ、30・・・・・・受信部、40・・・・・・送信
部、50・・・・・・中央処理装置、51・・・・・・
キュー長チェック処理部、52・・・・・・休止中チェ
ック処理部、53・・・・・・送信制御処理部、60・
・・・・・記憶装置、72〜7n・・・・・送信待ちキ
ュー、82〜8n・・・・・・送信待ちキュー長カウン
タ、92〜9n・・・・・・休止中カウンタ。FIG. 1 is a block diagram showing one embodiment of the present invention. 10... bus, 21-2n... processor, 30... receiving section, 40... transmitting section, 50... central processing unit , 51...
Queue length check processing unit, 52...Suspension check processing unit, 53...Transmission control processing unit, 60.
...Storage device, 72-7n...Transmission waiting queue, 82-8n...Sending waiting queue length counter, 92-9n...Sleeping counter.
Claims (1)
チプロセッサシステムにおけるデータを送信する送信元
プロセッサに、送信相手先各プロセッサごとに送信待ち
データを登録させる記憶手段及び送信待ちデータの長さ
を計数する第1の計数手段を設け、送信データを受信す
べき送信先プロセッサが送信データを受取ってくれなか
ったときに、送信データを前記記憶手段に登録し、前記
第1の計数手段を歩進させ、周期的に送信待ちデータの
長さを監視し、所定の第1のしきい値を越えているプロ
セッサを障害とみなすキュー長チェック処理により障害
プロセッサを検出する障害プロセッサ検出方式において
、前記送信元プロセッサに送信相手先各プロセッサごと
に休止中状態を計数する第2の計数手段を設け、送信先
プロセッサが送信データを受取ってくれなかったときに
、前記送信先プロセッサを休止中状態とみなしデータの
送信を見合わせると共に前記第2の計数手段を歩進させ
、一定周期で前記休止中状態を監視し、所定の第2のし
きい値より小さければ前記第2の計数手段の値を0に戻
し、大きければ障害とみなす休止中チェック処理と前記
キュー長チェック処理とを併用して障害プロセッサを検
出することを特徴とする障害プロセッサ検出方式。In a multiprocessor system configured by connecting multiple processors via a bus, a storage means for registering data waiting to be sent for each destination processor in a sending processor that sends data, and counting the length of the data waiting to be sent. and registering the transmitted data in the storage means and incrementing the first counting means when the destination processor that should receive the transmitted data does not receive the transmitted data. , a faulty processor detection method detects a faulty processor through a queue length check process that periodically monitors the length of data waiting to be sent and considers a processor exceeding a predetermined first threshold as a fault; The processor is provided with a second counting means for counting the inactive state for each destination processor, and when the destination processor does not receive the transmitted data, the destination processor is deemed to be in the inactive state and the data is counted. suspending transmission and incrementing the second counting means, monitoring the inactive state at regular intervals, and returning the value of the second counting means to 0 if it is smaller than a predetermined second threshold; A method for detecting a faulty processor, characterized in that a faulty processor is detected by using a combination of the queue length check processing and an inactive check processing in which the queue length is considered to be faulty if the size is large.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP1099501A JPH02277336A (en) | 1989-04-18 | 1989-04-18 | Faulty processor detection system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP1099501A JPH02277336A (en) | 1989-04-18 | 1989-04-18 | Faulty processor detection system |
Publications (1)
Publication Number | Publication Date |
---|---|
JPH02277336A true JPH02277336A (en) | 1990-11-13 |
Family
ID=14249024
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP1099501A Pending JPH02277336A (en) | 1989-04-18 | 1989-04-18 | Faulty processor detection system |
Country Status (1)
Country | Link |
---|---|
JP (1) | JPH02277336A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0575610A (en) * | 1991-09-17 | 1993-03-26 | Fujitsu Ltd | Method and device for monitoring system audit |
-
1989
- 1989-04-18 JP JP1099501A patent/JPH02277336A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0575610A (en) * | 1991-09-17 | 1993-03-26 | Fujitsu Ltd | Method and device for monitoring system audit |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20080002735A1 (en) | Device network | |
US20130242756A1 (en) | Packet relay device, packet relay system, and fault detection method | |
JPH02277336A (en) | Faulty processor detection system | |
JPH06175944A (en) | Network monitoring method | |
JPS6072351A (en) | Method for supervising operating condition of packet communication system | |
CN100367262C (en) | Method and device for testing a monitoring function of a bus system and a corresponding bus system | |
JP2004086520A (en) | Monitoring control device and its method | |
KR100250888B1 (en) | Network detection device of distributed control system | |
JPH11177550A (en) | Monitor system for network | |
JPH04367061A (en) | Faulty processor detection system | |
CN110752939A (en) | Service process fault processing method, notification method and device | |
JP2633351B2 (en) | Control device failure detection mechanism | |
JPS6129966A (en) | Monitoring method in exchange of message between computers | |
JP3111543B2 (en) | MAC bridge buffer release method | |
JPH03276942A (en) | Repeater | |
JPH03237555A (en) | Distributed processing system | |
JPH0898278A (en) | Digital control system | |
JPS62175836A (en) | Health check system in data processing system | |
JPH041831A (en) | Monitor system for program runaway | |
JPH0213151A (en) | Local area network system | |
JPS62162155A (en) | Information processing system | |
JPH01269152A (en) | Processor trouble detecting system in distributed processing system | |
JPH10207745A (en) | Method for confirming inter-processor existence | |
CN110297732A (en) | A kind of detection method and device of FPGA state | |
JPH02310755A (en) | Health check system |