JPH02277336A - Faulty processor detection system - Google Patents

Faulty processor detection system

Info

Publication number
JPH02277336A
JPH02277336A JP1099501A JP9950189A JPH02277336A JP H02277336 A JPH02277336 A JP H02277336A JP 1099501 A JP1099501 A JP 1099501A JP 9950189 A JP9950189 A JP 9950189A JP H02277336 A JPH02277336 A JP H02277336A
Authority
JP
Japan
Prior art keywords
processor
data
transmission
destination
faulty
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP1099501A
Other languages
Japanese (ja)
Inventor
Masato Konuki
小貫 理人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NEC Corp
Original Assignee
NEC Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NEC Corp filed Critical NEC Corp
Priority to JP1099501A priority Critical patent/JPH02277336A/en
Publication of JPH02277336A publication Critical patent/JPH02277336A/en
Pending legal-status Critical Current

Links

Landscapes

  • Monitoring And Testing Of Exchanges (AREA)
  • Small-Scale Networks (AREA)

Abstract

PURPOSE:To avoid a transmission destination processor from being regarded as a fault when the processor is in a single congestion state by providing a 2nd count means counting the pause state to each processor being the transmission destination to a sender processor, monitoring the pause state for a prescribed period and regarding it as a fault when the level is larger than a prescribed 2nd threshold level. CONSTITUTION:When a processor 22 cannot receive a transmission data due to any cause although a processor 1 sends a data to the processor 22, a transmission control processing section 53 of the processor 21 registers a transmission data to a transmission queue 72 corresponding to the processor 22, regards the processor 22 to be in the pause state to stop the transmission of a data for a prescribed time and restarts the transmission processing after a prescribed time. A pause state check processing regarding it to be a fault when number of times of the pause state exceeds a prescribed number of times within a prescribed time, and a processing regarding it to be a fault when a transmission queue length counter exceeds a prescribed value in a conventional system are used in common to reduce the possibility of misjudging the transmission destination processor congestion state is precluded.

Description

【発明の詳細な説明】 〔産業上の利用分野〕 本発明は、複数のプロセッサをバスを介し接続して構成
されるマルチプロセッサシステムのプロセッサ相互間で
用いられる障害プロセッサ検出方式に関する。
DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention relates to a faulty processor detection method used between processors in a multiprocessor system configured by connecting a plurality of processors via a bus.

〔従来の技術〕[Conventional technology]

従来、複数のプロセッサ相互間通信における障害プロセ
ッサ検出方式では、データを送信する送信元プロセッサ
に相手先各プロセッサ対応の送信待ちデータを登録させ
る送信待ちキュー及び送信待ちデータの長さを計数する
送信待ちキュー長カウンタを設け、送信データを受信す
べき送信先プロセッサが送信データを受取ってくれなか
ったときに、送信データを送信待ちキューに登録し送信
待ちキュー長カウンタを一歩歩進させ、周期的に送信待
ちキューの長さを監視して、所定のしきい値、例えば、
20と設定された場合はこの値を越えている送信先プロ
セッサを障害とみなすキュー長チェック処理によってプ
ロセッサの障害を検出している。
Conventionally, in a faulty processor detection method in communication between multiple processors, there is a transmission waiting queue in which the sending processor that sends data registers the waiting data for each destination processor, and a sending waiting queue in which the length of the waiting data is counted. A queue length counter is provided, and when the destination processor that should receive the transmitted data does not receive the transmitted data, the transmitted data is registered in the transmission waiting queue, the transmission waiting queue length counter is incremented one step, and the processing is performed periodically. Monitor the send queue length and set a predetermined threshold, e.g.
If the value is set to 20, a processor failure is detected through queue length check processing that treats a destination processor exceeding this value as a failure.

〔発明が解決しようとする課題〕[Problem to be solved by the invention]

上述したように、従来の障害プロセッサ検出方式では、
送信待ちキューの長さが所定のしきい値を越えた送信先
のプロセッサを直ちに障害とみなしてしまう。従って、
送信先プロセッサが、他のプロセッサからのデータを受
信しているような単なる輻湊状態のため送信されてきた
データの受信動作ができなくて送信元プロセッサの送信
待ちキューにたまっている状態のときでも、このような
送信先プロセッサは障害とみなされてしまうという欠点
がある。
As mentioned above, conventional faulty processor detection methods
A destination processor whose transmission queue length exceeds a predetermined threshold is immediately regarded as a failure. Therefore,
Even when the destination processor is simply in a congestion state, such as receiving data from another processor, and is unable to receive the transmitted data and is stuck in the transmission queue of the source processor. , such a destination processor is considered to be a failure.

本発明の目的は、送信先プロセッサが単なる輻湊状態の
ときには障害とみなされないようにした障害プロセッサ
検出方式を提供することにある。
SUMMARY OF THE INVENTION An object of the present invention is to provide a faulty processor detection method that prevents a destination processor from being considered faulty when it is simply in a congested state.

〔課題を解決するための手段〕[Means to solve the problem]

本発明の障害プロセッサ検出方式は、複数のプロセッサ
をバスを介し接続して構成されるマルチプロセッサシス
テムにおけるデータを送信する送信元プロセッサに、送
信相手先各プロセッサごとに送信待ち□データを登録さ
せる記憶手段及び送信待ちデータの長さを計数する第1
の計数手段を設け、送信データを受信すべき送信先プロ
セッサが送信データを受取ってくれなかったときに、送
信データを前記記憶手段に登録し、前記第1の計数手段
を歩進させ、周期的に送信待ちデータの長さを監視し、
所定の第1のしきい値を越えているプロセッサを障害と
みなすキュー長チェック処理により障害プロセッサを検
出する障害プロセッサ検出方式において、前記送信元プ
ロセッサに送信相手先各プロセッサごとに休止中状態を
計数する第2の計数手段を設け、送信先プロセッサが送
信データを受取ってくれなかったときに、前記送信先プ
ロセッサを休止中状態とみなしデータの送信を見合わせ
ると共に前記第2の計数手段、を歩進させ、一定周期で
前記休止中状態を監視し、所定の第2のしきい値より小
さければ前記第2の計数手段の値を0に戻し、大きけれ
ば障害と・みなす休止中チェック処理と前記キュー長チ
ェッ・り処理とを併用して障害プロセッサを検出するよ
う構成されている。
The faulty processor detection method of the present invention is a storage system in which a sending processor that sends data in a multiprocessor system configured by connecting a plurality of processors via a bus registers data waiting to be sent for each destination processor. The first step is to count the length of the data waiting to be sent.
counting means is provided, and when the destination processor that should receive the transmitted data does not receive the transmitted data, the transmitted data is registered in the storage means, the first counting means is incremented, and the first counting means is periodically counted. monitor the length of data waiting to be sent,
In a faulty processor detection method in which a faulty processor is detected by a queue length check process in which a processor that exceeds a predetermined first threshold is considered to be faulty, the number of idle states of the transmission destination processor is counted for each destination processor. and when the destination processor does not receive the transmitted data, the destination processor is deemed to be in a dormant state, and the data transmission is suspended and the second counting means is incremented. the inactive state is monitored at regular intervals, and if the value is smaller than a predetermined second threshold value, the value of the second counting means is returned to 0; if the value is larger than the inactive state, the inactive state is regarded as a failure; The system is configured to detect a faulty processor using long check processing in combination.

〔実施例〕〔Example〕

次に、本発明の実施例について図面を参照して説明する
Next, embodiments of the present invention will be described with reference to the drawings.

第1図は本発明の一実施例を示すブロック図である。第
1図には、n個のプロセッサ21〜2nをバス10を介
し接続して、プロセッサ21〜2nの間で通信を行いな
がら処理を進めるマルチプロセッサシステムが示されて
いる。プロセッサ21は、中央処理装置50と、バス1
0上のデータを取込んで中央処理装置50に伝える受信
部30と、中央処理装置50からのデータをバス10上
に送出する送信部40と、中央処理装置50に接続され
た記憶装置60とを備えて構成されており、他のプロセ
ッサ22〜2nも同様に構成されている。中央処理袋N
50は、キュー長チェック処理部51と休止中チェック
処理部52と送信制御処理部53とを有し、記憶装置6
0は、自プロセッサ以外の送信相手先プロセッサに対応
する送信待ちキュー72〜7nと送信待ちキュー長カウ
ンタ82〜8nと休止中カウンタ92〜9nとを有して
いる。
FIG. 1 is a block diagram showing one embodiment of the present invention. FIG. 1 shows a multiprocessor system in which n processors 21 to 2n are connected via a bus 10 and processing is performed while communicating among the processors 21 to 2n. The processor 21 has a central processing unit 50 and a bus 1.
a receiving section 30 that takes in data on 0 and transmits it to the central processing unit 50, a transmitting section 40 that sends data from the central processing unit 50 onto the bus 10, and a storage device 60 connected to the central processing unit 50. The other processors 22 to 2n are similarly configured. Central processing bag N
50 includes a queue length check processing section 51, an inactive check processing section 52, and a transmission control processing section 53, and a storage device 6.
0 has transmission waiting queues 72 to 7n, transmission waiting queue length counters 82 to 8n, and idle counters 92 to 9n, which correspond to transmission destination processors other than the own processor.

以下にプロセッサ21がデータを送信する送信元プロセ
ッサ、プロセッサ22がデータを受信する送信先プロセ
ッサであるものとして本実施例の動作を説明する。
The operation of this embodiment will be described below assuming that the processor 21 is a source processor that transmits data, and the processor 22 is a destination processor that receives data.

プロセッサ21が、プロセッサ22に対してデータを送
信したが何らかの要因によりプロセッサ22が送信デー
タを受信できなかった場合、プロセッサ21の送信制御
処理部53は、プロセッサ22に対応する送信待ちキュ
ー72に送信データを登録し、送信待ちキュー長カウン
タ82及び休止中カウンタ92をそれぞれ一歩歩進させ
、このプロセッサ22を休止中状態とみなし一定時間デ
ータの送信を中止し所定の時間後に送信処理を再開する
If the processor 21 transmits data to the processor 22 but the processor 22 is unable to receive the transmitted data for some reason, the transmission control processing unit 53 of the processor 21 transmits the data to the transmission waiting queue 72 corresponding to the processor 22. The data is registered, the transmission waiting queue length counter 82 and the inactive counter 92 are each incremented by one step, the processor 22 is considered to be in an inactive state, data transmission is stopped for a certain period of time, and the transmission process is resumed after a predetermined period of time.

他方、キュー長チェック処理部51は、一定時間ごとに
各プロセッサに対応する送信待ちキュー長カウンタ82
〜8nの値と所定のしきい値とを比較して、しきい値を
越えているものはないかを検索する。このしきい値は、
送信先プロセッサの輻輳状態を加味しているため、例え
ば、従来2゜と設定していたものを200のように従来
の値より大きめに設定されている。もし、プロセッサ2
2に対応する送信待ちキュー長カウンタ72かそのしき
い値を越えていたならはプロセッサ22に障害が発生し
ているものと判断する。
On the other hand, the queue length check processing unit 51 checks the transmission waiting queue length counter 82 corresponding to each processor at regular intervals.
The value of ~8n is compared with a predetermined threshold value to search for any value exceeding the threshold value. This threshold is
Since the congestion state of the destination processor is taken into consideration, for example, the conventional value of 2 degrees is now set to 200, which is larger than the conventional value. If processor 2
If the transmission queue length counter 72 corresponding to 2 exceeds its threshold value, it is determined that a failure has occurred in the processor 22.

休止中チェック処理部52は、一定周期て各プロセッサ
対応の休止中カウンタ92〜9nの値と所定のしきい値
とを比較して、しきい値を越えているものはないかを検
索する。例えば、50m5の間に2以内ならば通常の輻
輳状態の範囲内とみて正常動作しているものと判断して
これに該当する休止中カウンタの値を0に戻し、もし、
プロセッサ22に対応する休止中カウンタ92の値が3
であるならばプロセッサ22に障害が発生しているもの
と判断する。
The inactivity check processing unit 52 compares the values of the inactivity counters 92 to 9n corresponding to each processor with a predetermined threshold value at regular intervals, and searches for any that exceeds the threshold value. For example, if it is within 2 within 50m5, it is considered to be within the range of normal congestion and is considered to be operating normally, and the value of the corresponding idle counter is reset to 0.
The value of the inactive counter 92 corresponding to the processor 22 is 3.
If so, it is determined that a failure has occurred in the processor 22.

〔発明の効果〕〔Effect of the invention〕

以上説明したように、本発明は、送信元プロセッサに従
来から設けられている送信先プロセッサ対応の送信待ち
キュー長カウンタに加え、新たに休止中カウンタを設け
、送信先プロセッサがデータを受信してくれなかった場
合を休止中状態として計数させると共にデータの送信を
止め、定時間内に休止中状態の回数が所定の回数を越え
たらこの送信先プロセッサを障害とみなす休止中チェッ
ク処理と従来の送信待ちキュー長カウンタが所定のしき
い値を越えたら送信先プロセッサを障害とみなすキュー
長チェック処理とを併用することにより、送信先プロセ
ッサが単なる輻輳状態で受信処理ができないときに誤っ
て送信先プロセッサを障害と判断させる可能性を少なく
する効果を有する。
As explained above, the present invention provides a new idle counter in addition to the transmission waiting queue length counter corresponding to the destination processor, which is conventionally provided in the source processor, so that the destination processor receives data. If the destination processor is not in the dormant state, it is counted as a dormant state and data transmission is stopped, and if the number of times the destination processor is in the dormant state exceeds a predetermined number within a fixed period of time, the destination processor is considered to be at fault. By using a queue length check process that treats the destination processor as a failure when the waiting queue length counter exceeds a predetermined threshold, it is possible to prevent the destination processor from erroneously failing when the destination processor is simply congested and unable to perform reception processing. This has the effect of reducing the possibility of being judged as a disability.

【図面の簡単な説明】[Brief explanation of drawings]

第1図は本発明の一実施例を示すブロック図である。 10・・・・・・バス、21〜2n・・・・・・プロセ
ッサ、30・・・・・・受信部、40・・・・・・送信
部、50・・・・・・中央処理装置、51・・・・・・
キュー長チェック処理部、52・・・・・・休止中チェ
ック処理部、53・・・・・・送信制御処理部、60・
・・・・・記憶装置、72〜7n・・・・・送信待ちキ
ュー、82〜8n・・・・・・送信待ちキュー長カウン
タ、92〜9n・・・・・・休止中カウンタ。
FIG. 1 is a block diagram showing one embodiment of the present invention. 10... bus, 21-2n... processor, 30... receiving section, 40... transmitting section, 50... central processing unit , 51...
Queue length check processing unit, 52...Suspension check processing unit, 53...Transmission control processing unit, 60.
...Storage device, 72-7n...Transmission waiting queue, 82-8n...Sending waiting queue length counter, 92-9n...Sleeping counter.

Claims (1)

【特許請求の範囲】[Claims] 複数のプロセッサをバスを介し接続して構成されるマル
チプロセッサシステムにおけるデータを送信する送信元
プロセッサに、送信相手先各プロセッサごとに送信待ち
データを登録させる記憶手段及び送信待ちデータの長さ
を計数する第1の計数手段を設け、送信データを受信す
べき送信先プロセッサが送信データを受取ってくれなか
ったときに、送信データを前記記憶手段に登録し、前記
第1の計数手段を歩進させ、周期的に送信待ちデータの
長さを監視し、所定の第1のしきい値を越えているプロ
セッサを障害とみなすキュー長チェック処理により障害
プロセッサを検出する障害プロセッサ検出方式において
、前記送信元プロセッサに送信相手先各プロセッサごと
に休止中状態を計数する第2の計数手段を設け、送信先
プロセッサが送信データを受取ってくれなかったときに
、前記送信先プロセッサを休止中状態とみなしデータの
送信を見合わせると共に前記第2の計数手段を歩進させ
、一定周期で前記休止中状態を監視し、所定の第2のし
きい値より小さければ前記第2の計数手段の値を0に戻
し、大きければ障害とみなす休止中チェック処理と前記
キュー長チェック処理とを併用して障害プロセッサを検
出することを特徴とする障害プロセッサ検出方式。
In a multiprocessor system configured by connecting multiple processors via a bus, a storage means for registering data waiting to be sent for each destination processor in a sending processor that sends data, and counting the length of the data waiting to be sent. and registering the transmitted data in the storage means and incrementing the first counting means when the destination processor that should receive the transmitted data does not receive the transmitted data. , a faulty processor detection method detects a faulty processor through a queue length check process that periodically monitors the length of data waiting to be sent and considers a processor exceeding a predetermined first threshold as a fault; The processor is provided with a second counting means for counting the inactive state for each destination processor, and when the destination processor does not receive the transmitted data, the destination processor is deemed to be in the inactive state and the data is counted. suspending transmission and incrementing the second counting means, monitoring the inactive state at regular intervals, and returning the value of the second counting means to 0 if it is smaller than a predetermined second threshold; A method for detecting a faulty processor, characterized in that a faulty processor is detected by using a combination of the queue length check processing and an inactive check processing in which the queue length is considered to be faulty if the size is large.
JP1099501A 1989-04-18 1989-04-18 Faulty processor detection system Pending JPH02277336A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP1099501A JPH02277336A (en) 1989-04-18 1989-04-18 Faulty processor detection system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP1099501A JPH02277336A (en) 1989-04-18 1989-04-18 Faulty processor detection system

Publications (1)

Publication Number Publication Date
JPH02277336A true JPH02277336A (en) 1990-11-13

Family

ID=14249024

Family Applications (1)

Application Number Title Priority Date Filing Date
JP1099501A Pending JPH02277336A (en) 1989-04-18 1989-04-18 Faulty processor detection system

Country Status (1)

Country Link
JP (1) JPH02277336A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0575610A (en) * 1991-09-17 1993-03-26 Fujitsu Ltd Method and device for monitoring system audit

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0575610A (en) * 1991-09-17 1993-03-26 Fujitsu Ltd Method and device for monitoring system audit

Similar Documents

Publication Publication Date Title
US20080002735A1 (en) Device network
US20130242756A1 (en) Packet relay device, packet relay system, and fault detection method
JPH02277336A (en) Faulty processor detection system
JPH06175944A (en) Network monitoring method
JPS6072351A (en) Method for supervising operating condition of packet communication system
CN100367262C (en) Method and device for testing a monitoring function of a bus system and a corresponding bus system
JP2004086520A (en) Monitoring control device and its method
KR100250888B1 (en) Network detection device of distributed control system
JPH11177550A (en) Monitor system for network
JPH04367061A (en) Faulty processor detection system
CN110752939A (en) Service process fault processing method, notification method and device
JP2633351B2 (en) Control device failure detection mechanism
JPS6129966A (en) Monitoring method in exchange of message between computers
JP3111543B2 (en) MAC bridge buffer release method
JPH03276942A (en) Repeater
JPH03237555A (en) Distributed processing system
JPH0898278A (en) Digital control system
JPS62175836A (en) Health check system in data processing system
JPH041831A (en) Monitor system for program runaway
JPH0213151A (en) Local area network system
JPS62162155A (en) Information processing system
JPH01269152A (en) Processor trouble detecting system in distributed processing system
JPH10207745A (en) Method for confirming inter-processor existence
CN110297732A (en) A kind of detection method and device of FPGA state
JPH02310755A (en) Health check system