JP2630100B2

JP2630100B2 - Fault handling method for interprocessor communication bus

Info

Publication number: JP2630100B2
Application number: JP3078110A
Authority: JP
Inventors: 隆文今; 昌浩向野; 淳一山下; 久美水上
Original assignee: Nippon Electric Co Ltd
Current assignee: NEC Corp
Priority date: 1991-04-11
Filing date: 1991-04-11
Publication date: 1997-07-16
Anticipated expiration: 2012-07-16
Also published as: JPH04312043A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明はバス方式で接続された複
数のプロセッサを有するマルチプロセッサ方式の交換機
におけるプロセッサ間通信用バスの障害処理方式に関す
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a fault handling system for an interprocessor communication bus in a multiprocessor type switch having a plurality of processors connected by a bus system.

【０００２】[0002]

【従来の技術】従来、この種のプロセッサ間通信用バス
の障害処理方式では、構成要素である複数のプロセッサ
又はバスケーブル及びプロセッサ間通信バスを制御する
バス制御装置のすべてに対し診断を実施しなければ、障
害装置を特定できない。2. Description of the Related Art Conventionally, in this type of fault handling system for an inter-processor communication bus, diagnosis is performed for all of a plurality of processors or bus cables which are constituent elements and a bus control device for controlling the inter-processor communication bus. Otherwise, the faulty device cannot be identified.

【０００３】しかし、診断実行中は二重化されたプロセ
ッサが一重化となる為、すべてのプロセッサの診断を実
施すると診断処理時間が長くかかりシステムの安全率が
低下することから、たとえバス式に接続されたプロセッ
サが原因であっても、障害発生したプロセッサ又は、バ
ス自体を切り離す方式が一般的になっている。However, since the duplicated processors are unified during the execution of the diagnosis, if the diagnosis of all the processors is executed, the diagnosis processing time is long and the safety factor of the system is reduced. Even if the cause is a failed processor, it is common to disconnect the failed processor or the bus itself.

【０００４】[0004]

【発明が解決しようとする課題】上述したように従来の
プロセッサ間通信用バスの障害処理方式では、構成要素
であるバス式に接続されたプロセッサ又はバスケーブル
及びバス制御装置のどれが障害で有っても、障害を検出
した装置を切り離している為、原因を特定する為には保
守員の介在を必要とする欠点が有る。As described above, in the conventional fault handling method for the inter-processor communication bus, any of the constituent processors or bus cables connected to the bus system and the bus controller has a fault. Even so, there is a drawback that since the device that has detected the failure is disconnected, maintenance personnel are required to identify the cause.

【０００５】又、プロセッサの障害であっても、バス制
御装置で障害を検出すれば、通信バスを切り離すという
欠点が有る。[0005] In addition, even if a failure occurs in the processor, if the failure is detected by the bus control device, the communication bus is disconnected.

【０００６】[0006]

【課題を解決するための手段】本発明のプロセッサ間通
信用バスの障害処理方式は、バス方式で接続された複数
のプロセッサを有するマルチプロセッサシステムにおい
て、プロセッサ間を接続している通信バスで障害が発生
したとき、この障害発生時に情報を転送していたプロセ
ッサ相互及び前記バス方式の両端に接続されているプロ
セッサを診断対象プロセッサとして設定し、各プロセッ
サの自動診断を順次行い原因を探索し、前記診断対象プ
ロセッサの中で障害と判定されたプロセッサだけを自動
でシステムの系構成から切り離す構成である。According to the present invention, there is provided a multiprocessor system having a plurality of processors connected by a bus system. Occurs, the processors that have transferred information at the time of occurrence of this failure and the processors connected to both ends of the bus system are set as the processors to be diagnosed, and automatic diagnosis of each processor is sequentially performed to search for the cause, In this configuration, only the processor determined as a failure among the processors to be diagnosed is automatically separated from the system configuration of the system.

【０００７】[0007]

【実施例】次に本発明について図面を参照して説明す
る。DESCRIPTION OF THE PREFERRED EMBODIMENTS Next, the present invention will be described with reference to the drawings.

【０００８】図１は本発明の一実施例の障害処理におけ
るプログラムの流れを示す図であり、図２は本発明の一
実施例を説明するための交換システムの構成例を示す図
である。図２において、二重化された各プロセッサ１〜
８（ここでは、プロセッサ８台のマルチシステムとす
る）は、二重化された通信バス９，１０によりバス式に
接続されバス制御装置１１，１２により制御されてい
る。以下に例として、プロセッサ３がプロセッサ５に通
信を実施する場合について説明する。FIG. 1 is a diagram showing the flow of a program in the failure processing according to one embodiment of the present invention, and FIG. 2 is a diagram showing an example of the configuration of an exchange system for explaining one embodiment of the present invention. In FIG. 2, each of the duplicated processors 1 to 1
8 (here, a multi-system with eight processors) is connected in a bus manner by redundant communication buses 9 and 10 and is controlled by bus controllers 11 and 12. Hereinafter, a case where the processor 3 performs communication with the processor 5 will be described as an example.

【０００９】プロセッサ３は、通信バス９に対してプロ
セッサ間通信要求を行う。バス制御装置１１，１２は、
プロセッサ３が通信要求していることを知り、他のプロ
セッサに対して待ち合わせを指示した後、プロセッサ３
に転送指示する。プロセッサ３は通信バス９に相手プロ
セッサ５のアドレス信号を設定して転送を実施する。相
手プロセッサ５は、転送情報を受け取り正常終了を送信
元プロセッサ３に報告する。送信元プロセッサ３は、報
告を受け取ったことにより通信処理を終了する。バス制
御装置１１，１２は、プロセッサ３の転送終了を待って
他のプロセッサに待ち合わせ解除を指示する。The processor 3 issues an inter-processor communication request to the communication bus 9. The bus control devices 11 and 12
After knowing that the processor 3 is making a communication request and instructing another processor to wait, the processor 3
To the transfer. The processor 3 sets the address signal of the partner processor 5 on the communication bus 9 and executes the transfer. The partner processor 5 receives the transfer information and reports normal termination to the transmission source processor 3. The transmission source processor 3 ends the communication processing upon receiving the report. The bus controllers 11 and 12 instruct the other processors to cancel the wait after the transfer of the processor 3 is completed.

【００１０】図３はプロセッサ３及びプロセッサ５を図
１より抜き出し詳細を示した図である。図２で説明した
各プロセッサ及びバス制御装置は、図３のように０系及
び１系の二重化構成により制御されている。又、各プロ
セッサは、送信及び受信回路によりバスの通信制御をし
ている。以下に図１〜図６を使用して障害処理について
説明するが、この時、バス制御装置及びプロセッサすべ
てで０系が使用されていたと仮定する。従って、１系側
はすべて二重化構成のスタンバイである。FIG. 3 is a diagram showing details of the processor 3 and the processor 5 extracted from FIG. Each processor and the bus control device described in FIG. 2 are controlled by a redundant configuration of the 0 system and the 1 system as shown in FIG. Further, each processor controls communication of the bus by a transmission and reception circuit. The failure processing will be described below with reference to FIGS. 1 to 6. At this time, it is assumed that the system 0 is used in all the bus control devices and processors. Therefore, all of the first systems are standbys in a duplex configuration.

【００１１】図２で説明した処理中に相手側のプロセッ
サ５の送信及び受信回路５ｆが原因により送信側のプロ
セッサ３で通信障害が検出されると、障害処理プログラ
ムが起動され、図４で示すように送信側のプロセッサ３
の０系をシステムから切り離し、１系にて通信処理を再
開する（ステップ１；Ｓ１）。この場合、相手側のプロ
セッサ５が障害原因を有するものと仮定すると、再度障
害が発生するが今度は、送信側のプロセッサ３は、１系
運転の一重化の為、障害処理プログラムは図５で示すよ
うに通信バス９を切り離した後、通信バス１０を使用す
る。これによりプロセッサ５は送信及び受信回路５ｅを
使用するため正常運転出来るようになる。この場合、真
の原因であるプロセッサ５の０系は障害検出出来ない。When a communication failure is detected in the processor 3 on the transmission side due to the transmission and reception circuit 5f of the processor 5 on the other side during the processing described in FIG. 2, a failure processing program is started and shown in FIG. So the sending processor 3
The system 0 is disconnected from the system, and the communication processing is resumed in the system 1 (step 1; S1). In this case, assuming that the partner processor 5 has the cause of the failure, the failure occurs again. However, this time, the processor 3 on the transmission side performs the single-system operation, and the failure processing program is as shown in FIG. After disconnecting the communication bus 9 as shown, the communication bus 10 is used. This allows the processor 5 to operate normally because of the use of the transmission and reception circuit 5e. In this case, the 0 system of the processor 5, which is the true cause, cannot detect the failure.

【００１２】しかし、本発明の処理では、障害処理プロ
グラムにより図４のように切り離された後、障害探索プ
ログラムを起動して図１に示すように各プロセッサ毎に
通信情報セーブエリア及び障害情報セーブエリア内の情
報を探索して、障害時に情報転送を実施していたプロセ
ッサすべてを診断実施対象プロセッサ情報エリアに設定
する（Ｓ２）。又、通信バスケーブル等の障害も考慮し
て、バス式に接続された最初と最後のプロセッサも同時
に診断対象プロセッサエリアに設定する。However, in the processing of the present invention, after being separated as shown in FIG. 4 by the failure processing program, a failure search program is started and, as shown in FIG. The information in the area is searched, and all the processors that have performed the information transfer at the time of the failure are set in the processor information area to be diagnosed (S2). In addition, the first and last processors connected in a bus manner are simultaneously set in the processor area to be diagnosed in consideration of a trouble such as a communication bus cable.

【００１３】その後、図５にように通信バス９を切り離
し、バス制御装置１１，１２の自動診断を通信バス９を
使用して実施する（Ｓ３）。この結果正常であれば、診
断対象プロセッサエリアに設定されたプロセッサすべて
に対して障害発生前に使用していた系へ自動診断プログ
ラムにより診断を実施する（Ｓ４）。この例の場合、通
信していたプロセッサ３，５及び通信バスの最初と最後
のプロセッサであるプロセッサ１，８が診断対象とな
り、順次診断を実施する。Thereafter, the communication bus 9 is disconnected as shown in FIG. 5, and the bus controllers 11 and 12 are automatically diagnosed using the communication bus 9 (S3). If the result is normal, the diagnosis is performed by the automatic diagnosis program for all the processors set in the diagnosis target processor area for the system used before the occurrence of the failure (S4). In the case of this example, the communicating processors 3 and 5 and the first and last processors 1 and 8 of the communication bus are to be diagnosed, and the diagnosis is sequentially performed.

【００１４】自動診断プログラムの結果、プロセッタ５
の０系バス試験で診断エラーとなる為、プロセッサ５の
０系を切り離した後、それ以外の装置を二重化運転に復
帰させる（Ｓ５）。そして、図６のように真の原因であ
るプロセッサ５の０系をシステムから自動で切り離す。As a result of the automatic diagnosis program, the processor 5
Since a diagnosis error occurs in the 0-system bus test, the 0-system of the processor 5 is disconnected, and then the other devices are returned to the duplex operation (S5). Then, as shown in FIG. 6, the system 0 of the processor 5, which is the true cause, is automatically separated from the system.

【００１５】このように、診断対象装置を絞って実施す
るので、プロセッサが多いシステムほど診断時間を短縮
できる効果がある。As described above, since the diagnosis is performed by narrowing down the devices to be diagnosed, there is an effect that the diagnosis time can be shortened in a system having more processors.

【００１６】[0016]

【発明の効果】以上説明したように本発明は、対象プロ
セッサを絞ることによる診断時間の短縮と、特定プロセ
ッサに起因するプロセッサ間通信用バスの障害の場合に
は、通信バスを切り離さないで自動的に特定プロセッサ
の障害系のみを隔離してシステムを再開させることによ
り、保守員を介在せずに障害原因を探索し、システムの
安定性と保守性を向上させることができるという効果が
ある。As described above, according to the present invention, the diagnosis time can be reduced by narrowing down the target processor, and in the case of a failure in the inter-processor communication bus caused by a specific processor, the communication bus is automatically disconnected without disconnecting the communication bus. By isolating only the faulty system of the specific processor and restarting the system, it is possible to search for the cause of the fault without the intervention of maintenance personnel, thereby improving the stability and maintainability of the system.

[Brief description of the drawings]

【図１】本発明の一実施例の障害処理におけるプログラ
ムの流れを示す図である。FIG. 1 is a diagram showing a flow of a program in failure processing according to an embodiment of the present invention.

【図２】本発明の一実施例を説明するための交換システ
ム構成例を示す図である。FIG. 2 is a diagram showing a configuration example of an exchange system for explaining an embodiment of the present invention.

【図３】図２のシステム構成例の詳細を示す図である。FIG. 3 is a diagram illustrating details of an example of a system configuration in FIG. 2;

【図４】本発明の一実施例の障害処理におけるシステム
系構成図である。FIG. 4 is a configuration diagram of a system in failure processing according to an embodiment of the present invention.

【図５】本発明の一実施例の障害処理におけるシステム
系構成図である。FIG. 5 is a system configuration diagram in a failure process according to an embodiment of the present invention.

【図６】本発明の一実施例の障害処理におけるシステム
系構成図である。FIG. 6 is a system configuration diagram in a failure process according to an embodiment of the present invention.

[Explanation of symbols]

１〜８プロセッサ９，１０通信バス１１，１２バス制御装置 1-8 processor 9,10 communication bus 11,12 bus controller

───────────────────────────────────────────────────── フロントページの続き (72)発明者水上久美東京都港区芝五丁目７番１号日本電気株式会社内 (56)参考文献特開昭59−219052（ＪＰ，Ａ) 特開昭61−125245（ＪＰ，Ａ) 特開昭64−24650（ＪＰ，Ａ) 特開昭63−227234（ＪＰ，Ａ) 特開平２−288527（ＪＰ，Ａ) 特開昭59−72255（ＪＰ，Ａ) 特開昭60−112155（ＪＰ，Ａ) 特開平２−306362（ＪＰ，Ａ) 特開昭64−88677（ＪＰ，Ａ) 実開昭58−111552（ＪＰ，Ｕ) ────────────────────────────────────────────────── ─── Continuing on the front page (72) Kumi Mizukami, Inventor 5-7-1 Shiba, Minato-ku, Tokyo Inside NEC Corporation (56) References JP-A-59-219052 (JP, A) JP-A Sho JP-A-64-24650 (JP, A) JP-A-63-227234 (JP, A) JP-A-2-288527 (JP, A) JP-A-59-72255 (JP, A) A) JP-A-60-112155 (JP, A) JP-A-2-306362 (JP, A) JP-A 64-88677 (JP, A)

Claims

(57) [Claims]

1. In a multiprocessor system having a plurality of processors connected by a bus method, when a failure occurs in a communication bus connecting the processors, when a failure occurs, the processors transferring information at the time of the failure and the processors that have transferred information. Processors connected to both ends of the bus system are set as processors to be diagnosed, and automatic diagnosis of each processor is sequentially performed to search for a cause. A fault handling method for an inter-processor communication bus, characterized in that it is separated from the system configuration.