JPH09311841A

JPH09311841A - Multiprocessor system

Info

Publication number: JPH09311841A
Application number: JP8129667A
Authority: JP
Inventors: Masako Takagi; 政子高木
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1996-05-24
Filing date: 1996-05-24
Publication date: 1997-12-02

Abstract

PROBLEM TO BE SOLVED: To prevent the discontinuation of an entire multiprocessor system by cutting the electrical connection between a faulty local bus connected to a faulty slave processor and a system bus if one of plural slave processors in a faulty state. SOLUTION: The fault information is stored in a reset circuit 40-1 where a faulty slave processor 20-1 is connected. Thus, the local buses 70-1 and 80-1 connected to the circuit 40-1 and the slave processors 20-1 and 20-2 connected to the buses 70-1 and 80-1 can be electrically disconnected from the system buses 50 and 60. As a result, the faulty processor 20-1 is separated from a multiprocessor system and the entire discontinuation of this system can be prevented. When the fault of the processor 20-1 is recovered, the fault information on the circuit 40-1 is erased. Then the processor 20-1 is built into the system again.

Description

Detailed Description of the Invention

【発明の属する技術分野】本発明はマルチプロセッサシ
ステムに関し、特にスレーブプロセッサが故障した場合
にシステム全体の停止を防止するマルチプロセッサシス
テムに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a multiprocessor system, and more particularly to a multiprocessor system which prevents the entire system from being stopped when a slave processor fails.

【０００１】[0001]

【従来の技術】従来この種のマルチプロセッサでは、シ
ステム全体の停止を防ぐため、スレーブプロセッサが故
障した場合には、この故障したスレーブプロセッサに対
してのみマスタプロセッサからのアクセスを抑止してシ
ステムの再立ち上げを行っていた。2. Description of the Related Art Conventionally, in a multiprocessor of this type, in order to prevent the entire system from being stopped, when a slave processor fails, access from the master processor is suppressed only to the failed slave processor. It was restarting.

【０００２】たとえば、特開平３−２６９７５９号公報
には、マスタプロセッサが立ち上げ診断時に故障のあっ
たスレーブプロセッサに対してエラー詳細通知またはエ
ラー詳細クリアの通知を要求し、これらのうちのいずれ
かの通知がない場合には、再立ち上げ時にマスタプロセ
ッサからこの故障したスレーブプロセッサに対するアク
セスを抑止させる構成が記載されている。For example, in Japanese Patent Laid-Open No. 3-269759, a master processor requests an error detail notification or an error detail clear notification to a slave processor having a failure at start-up diagnosis. If there is no notification of the above, there is described a configuration for inhibiting access from the master processor to this failed slave processor at the time of restarting.

【０００３】[0003]

【発明が解決しようとする課題】上述の従来のマルチプ
ロセッサシステムでは、故障したスレーブプロセッサの
接続されているバス上に、故障の要因が残っている場合
があり、そのような場合にはシステム全体が停止してし
まうという問題がある。例えば、正常動作中のスレーブ
プロセッサがバスを介してメモリとデータ転送を行って
いるときに、故障したスレーブプロセッサはバスに不正
なデータを送出してしまうという問題がある。この問題
は、故障したスレーブプロセッサが、他のスレーブプロ
セッサによるバス使用権の掌握を正しく認識できない場
合があるために発生する。In the above-described conventional multiprocessor system, there may be a cause of failure on the bus to which the failed slave processor is connected. In such a case, the entire system is destroyed. There is a problem that will stop. For example, when a normally operating slave processor is transferring data to and from the memory via the bus, there is a problem that the failed slave processor sends out incorrect data to the bus. This problem occurs because a failed slave processor may not properly recognize the control of the bus usage right by another slave processor.

【０００４】また、マスタプロセッサは、スレーブプロ
セッサからの通知がないことから、このスレーブプロセ
ッサを故障状態にあるものとして認識する。しかし、こ
の場合、バス自身の故障によっても同様の認識をしてし
まうため、スレーブプロセッサが正常でも故障状態と認
識してしまうという問題がある。Further, the master processor recognizes this slave processor as being in a failure state because there is no notification from the slave processor. However, in this case, even if the bus itself fails, the same recognition is performed, so that there is a problem in that even if the slave processor is normal, it is recognized as a failure state.

【０００５】本発明の目的は上述の欠点を除去し、シス
テム全体の停止を防止できるマルチプロセッサシステム
を提供することにある。An object of the present invention is to eliminate the above-mentioned drawbacks and to provide a multiprocessor system capable of preventing the entire system from being stopped.

【０００６】[0006]

【課題を解決するための手段】上記課題を解決するため
に本発明のマルチプロセッサシステムは、マスタプロセ
ッサと、このマスタプロセッサが接続されるシステムバ
スと、複数のスレーブプロセッサと、それぞれ前記複数
のスレーブプロセッサの少なくとも１つが接続されると
ともにそれぞれ前記システムバスと接続される複数のロ
ーカルバスと、前記システムバスと前記複数のローカル
バスとの間の電気的接続を行うとともに、前記複数のロ
ーカルバスに接続された前記複数のスレーブプロセッサ
の中に障害状態にあるものが少なくとも１つ含まれる場
合には当該障害スレーブプロセッサが接続された当該障
害ローカルバスと前記システムバスとの間の電気的接続
を切断する接続手段とを含む。To solve the above problems, a multiprocessor system according to the present invention comprises a master processor, a system bus to which the master processor is connected, a plurality of slave processors, and the plurality of slaves. A plurality of local buses to which at least one of the processors is connected and each of which is connected to the system bus, and an electrical connection between the system bus and the plurality of local buses are connected and connected to the plurality of local buses. If at least one of the plurality of slave processors that has been failed is in a failure state, the electrical connection between the failure local bus to which the failure slave processor is connected and the system bus is cut off. And connecting means.

【０００７】また、本発明の他のマルチプロセッサシス
テムは、前記接続手段は、前記障害スレーブプロセッサ
から送出される障害情報を格納する障害情報格納手段を
含み、前記障害情報格納手段に前記障害情報が格納され
ている間だけ、前記障害スレーブプロセッサが接続され
た前記障害ローカルバスと前記システムバスとを電気的
に切断することを特徴とする。Further, in another multiprocessor system of the present invention, the connection means includes failure information storage means for storing failure information sent from the failed slave processor, and the failure information is stored in the failure information storage means. It is characterized in that the faulty local bus to which the faulty slave processor is connected and the system bus are electrically disconnected only while the data is stored.

【０００８】また、本発明の他のマルチプロセッサシス
テムは、システムバスと、このシステムバスと電気的に
接続された複数のローカルバスと、この複数のローカル
バスの１つに接続されたマスタプロセッサと、前記複数
のローカルバスに接続された複数のスレーブプロセッサ
と、前記複数のローカルバスのうち、障害状態にある障
害スレーブプロセッサを接続している障害ローカルバス
を前記システムバスから電気的に切断する切断手段と、
この切断手段により前記マスタプロセッサが前記システ
ムバスから電気的に切断される場合には前記複数のスレ
ーブプロセッサのうちの正常なものから新たにマスタプ
ロセッサを選定する選定手段とを含む。Another multiprocessor system of the present invention includes a system bus, a plurality of local buses electrically connected to the system bus, and a master processor connected to one of the plurality of local buses. Disconnecting electrically disconnecting a faulty local bus connecting a plurality of slave processors connected to the plurality of local buses and a faulty slave processor in the fault state of the plurality of local buses from the system bus Means and
When the master processor is electrically disconnected from the system bus by the disconnecting means, a selecting means for newly selecting a master processor from normal ones of the plurality of slave processors is included.

【０００９】また、本発明の他のマルチプロセッサシス
テムは、前記障害スレーブプロセッサが接続されている
前記障害ローカルバスを求めるローカルバス取得手段を
さらに含み、前記選定手段は、前記ローカルバス取得手
段により求められた前記障害スレーブプロセッサが接続
されている前記障害ローカルバスにマスタプロセッサが
接続されているか否かを判別する判別手段を含み、この
判別手段の判別結果に基づいて前記マスタプロセッサの
選定を行うことを特徴とする。Further, another multiprocessor system of the present invention further includes local bus acquisition means for obtaining the failed local bus to which the failed slave processor is connected, and the selection means is obtained by the local bus acquisition means. A faulty local bus to which the faulty slave processor is connected. A discriminating unit that discriminates whether or not a master processor is connected to the faulty local bus is included. Is characterized by.

【００１０】また、本発明の他のマルチプロセッサシス
テムは、前記複数のスレーブプロセッサの各々は、自プ
ロセッサの障害を検出するとともに障害が検出された場
合に障害情報を送出する障害処理手段を含み、前記切断
手段は、前記複数のローカルバス毎に設けられるととも
に前記障害情報を受信した場合に接続されている全ての
プロセッサを電気的に切断することを特徴とする。According to another multiprocessor system of the present invention, each of the plurality of slave processors includes a failure processing means for detecting a failure of its own processor and transmitting failure information when the failure is detected. The disconnecting means is provided for each of the plurality of local buses, and electrically disconnects all the processors connected when the failure information is received.

【００１１】また、本発明の他のマルチプロセッサシス
テムは、前記複数のスレーブプロセッサは識別子を有
し、前記選定手段は、前記識別子に基づいて前記複数の
スレーブプロセッサのうちから新たなマスタプロセッサ
を選定することを特徴とする。In another multiprocessor system of the present invention, the plurality of slave processors have an identifier, and the selecting means selects a new master processor from the plurality of slave processors based on the identifier. It is characterized by doing.

【００１２】[0012]

【発明の実施の形態】次に、本発明について図面を参照
して詳細に説明する。DESCRIPTION OF THE PREFERRED EMBODIMENTS Next, the present invention will be described in detail with reference to the drawings.

【００１３】図１を参照すると、本発明の第一の実施例
であるマルチプロセッサシステムは、マスタプロセッサ
１０と、スレーブプロセッサ２０−１、・・・および２
０−４と、リセット回路４０−１および４０−２と、主
記憶装置３０とから構成される。マスタプロセッサ１０
および主記憶装置３０は、システムバス５０および６０
に接続されている。Referring to FIG. 1, a multiprocessor system according to a first embodiment of the present invention includes a master processor 10, slave processors 20-1, ...
0-4, reset circuits 40-1 and 40-2, and a main storage device 30. Master processor 10
The main storage device 30 is connected to the system buses 50 and 60.
It is connected to the.

【００１４】スレーブプロセッサ２０−１および２０−
２はローカルバス７０−１および８０−１に接続され、
ローカルバス７０−１および８０−１はリセット回路４
０−１を介してシステムバス５０および６０に接続され
ている。Slave processors 20-1 and 20-
2 is connected to the local buses 70-1 and 80-1,
The local buses 70-1 and 80-1 are reset circuits 4
It is connected to the system buses 50 and 60 via 0-1.

【００１５】さらに、スレーブプロセッサ２０−３およ
び２０−４はローカルバス７０−２および８０−２に接
続され、ローカルバス７０−２および８０−２はリセッ
ト回路４０−２を介してシステムバス５０および６０に
接続されている。Further, the slave processors 20-3 and 20-4 are connected to the local buses 70-2 and 80-2, and the local buses 70-2 and 80-2 are connected to the system bus 50 and the system bus 50 via the reset circuit 40-2. It is connected to 60.

【００１６】バス５０はマスタプロセッサ１０、スレー
ブプロセッサ２０−１、・・・、２０−４および主記憶
装置に接続されこれらの間でのデータ伝送路に用いられ
る。The bus 50 is connected to the master processor 10, the slave processors 20-1, ..., 20-4 and the main memory device and is used as a data transmission path between them.

【００１７】リセット回路４０−１は、それに接続され
ているスレーブプロセッサ２０−１および２０−２に故
障が発生した場合に故障情報が格納される故障情報保持
回路４０１−１を含む。この障害情報はスレーブプロセ
ッサ２０−１および２０−２からマスタプロセッサ１０
に送付された後、マスタプロセッサ１０からリセット回
路４０−１に送出される。障害情報には、障害の種別や
故障が検出された箇所等の情報が含まれる。障害情報が
格納されると、リセット回路４０−１は、それに接続さ
れているスレーブプロセッサ２０−１および２０−２を
システムバス５０から切断するためにスレーブプロセッ
サ２０−１および２０−２にリセット信号を送出する。The reset circuit 40-1 includes a failure information holding circuit 401-1 which stores failure information when a failure occurs in the slave processors 20-1 and 20-2 connected thereto. This fault information is transmitted from the slave processors 20-1 and 20-2 to the master processor 10
Is sent to the reset circuit 40-1 from the master processor 10. The failure information includes information such as the type of failure and the location where the failure is detected. When the failure information is stored, the reset circuit 40-1 sends a reset signal to the slave processors 20-1 and 20-2 to disconnect the slave processors 20-1 and 20-2 connected thereto from the system bus 50. Is sent.

【００１８】リセット回路４０−２は、それに接続され
ているスレーブプロセッサ２０−３および２０−４に故
障が発生した場合に障害情報が格納される故障情報保持
回路４０１−２を含む。障害情報はスレーブプロセッサ
２０−３および２０−４からマスタプロセッサ１０に送
付された後、マスタプロセッサ１０からリセット回路４
０−２に送出される。障害情報には、障害の種別やどの
箇所を診断した際に検出した故障であるのか等の情報が
含まれる。障害情報が格納されると、リセット回路４０
−２は、それに接続されているスレーブプロセッサ２０
−３および２０−４をシステムバス５０から切断するた
めにスレーブプロセッサ２０−３および２０−４にリセ
ット信号を送出する。The reset circuit 40-2 includes a failure information holding circuit 401-2 which stores failure information when a failure occurs in the slave processors 20-3 and 20-4 connected thereto. The fault information is sent from the slave processors 20-3 and 20-4 to the master processor 10, and then the master processor 10 resets the reset circuit 4.
0-2. The failure information includes information such as the type of failure and which part is the failure detected when diagnosing. When the failure information is stored, the reset circuit 40
-2 is the slave processor 20 connected to it
-3 and 20-4 to send a reset signal to the slave processors 20-3 and 20-4 to disconnect from the system bus 50.

【００１９】スレーブプロセッサ２０−１、・・・、２
０−４は、それぞれ診断回路２０１−１、・・・、およ
び２０１−４を有する。Slave processors 20-1, ..., 2
0-4 has diagnostic circuits 201-1, ..., And 201-4, respectively.

【００２０】リセット回路４０−１および４０−２は、
通常動作時には、システムバス５０とローカルバス７０
−１および７０−２との接続回路として動作する。The reset circuits 40-1 and 40-2 are
During normal operation, system bus 50 and local bus 70
-1 and 70-2 operate as a connection circuit.

【００２１】システムバス６０は、リセット回路４０−
１および４０−２から主記憶装置５へのリセット信号の
送出に用いられる伝送路である。The system bus 60 includes a reset circuit 40-
1 and 40-2 is a transmission line used for sending a reset signal from the main storage device 5.

【００２２】ローカルバス７０−１は、これに接続され
たスレーブプロセッサ２０−１および２０−２のいずれ
か一方と、マスタプロセッサ１０および主記憶装置３０
のいずれか一方との間のデータ送信に用いられる。ま
た、ローカルバス７０−１はスレーブプロセッサ２０−
１と２０−２との間のデータ送信にも用いられる。ロー
カルバス７０−２は、これに接続されたスレーブプロセ
ッサ２０−３および２０−４のいずれか一方と、マスタ
プロセッサ１０および主記憶装置３０のいずれか一方と
の間のデータ送信に用いられる。また、ローカルバス７
０−２はスレーブプロセッサ２０−３と２０−４との間
のデータ送信にも用いられる。The local bus 70-1 has one of the slave processors 20-1 and 20-2 connected thereto, the master processor 10 and the main memory 30.
It is used for data transmission to or from either of the above. The local bus 70-1 is connected to the slave processor 20-
It is also used for data transmission between 1 and 20-2. The local bus 70-2 is used for data transmission between one of the slave processors 20-3 and 20-4 connected thereto and one of the master processor 10 and the main storage device 30. In addition, local bus 7
0-2 is also used for data transmission between the slave processors 20-3 and 20-4.

【００２３】ローカルバス８０−１は、リセット回路４
０−１からスレーブプロセッサ２０−１および２０−２
に対して送出されるリセット信号を伝送する。ローカル
バス８０−２は、リセット回路４０−２からスレーブプ
ロセッサ２０−３および２０−４に対して送出されるリ
セット信号を伝送する。The local bus 80-1 is connected to the reset circuit 4
0-1 to slave processors 20-1 and 20-2
To transmit a reset signal sent to. The local bus 80-2 transmits a reset signal sent from the reset circuit 40-2 to the slave processors 20-3 and 20-4.

【００２４】次に本実施例の動作について説明する。Next, the operation of this embodiment will be described.

【００２５】図１および図２を参照すると、システムの
立ち上げ時に、マスタプロセッサ１０はスレーブプロセ
ッサ２０−１、・・・、および２０−４に立ち上げ診断
の実行を指示する（図２のステップ２１）。この指示を
受け取ったスレーブプロセッサ２０−１、・・・および
２０−４のそれぞれは、以下に示す立ち上げ時の処理を
実行するが、ここではスレーブプロセッサ２０−１のみ
について説明し、他のスレーブプロセッサについては説
明を省略する。Referring to FIGS. 1 and 2, at the time of system startup, the master processor 10 instructs the slave processors 20-1, ..., And 20-4 to execute startup diagnostics (steps in FIG. 2). 21). Each of the slave processors 20-1, ..., And 20-4, which has received this instruction, executes the startup process described below. However, only the slave processor 20-1 will be described here and other slaves will be described. The description of the processor is omitted.

【００２６】図１および図３を参照すると、スレーブプ
ロセッサ２０−１は、マスタプロセッサ１０から立ち上
げ診断の実行指示を受け取ると、立ち上げ診断を実行す
る（図３のステップ３１）。この立ち上げ診断は、スレ
ーブプロセッサ２０−１の診断回路２０１−１により行
われる。この診断回路２０１−１による立ち上げ診断実
行中に故障が検出されたか否かが判定される（図３のス
テップ３２）。故障が検出された場合には、マスタプロ
セッサ１０に対して故障情報を通知する（図３のステッ
プ３３）。故障情報の通知には、プロセッサ間通信が用
いられる。ステップ３２において、故障が検出されない
ときには、マスタプロセッサ１０に対し正常である旨の
通知を送出する（図３のステップ３７）。正常である旨
の通知を送出後、立ち上げ時の処理を終了し通常の処理
に移行する。Referring to FIGS. 1 and 3, when the slave processor 20-1 receives the start-up diagnosis execution instruction from the master processor 10, the slave processor 20-1 executes the start-up diagnosis (step 31 in FIG. 3). This startup diagnosis is performed by the diagnosis circuit 201-1 of the slave processor 20-1. It is determined whether or not a failure is detected during the start-up diagnosis by the diagnosis circuit 201-1 (step 32 in FIG. 3). When a failure is detected, the failure information is notified to the master processor 10 (step 33 in FIG. 3). Communication between processors is used for notification of failure information. If no failure is detected in step 32, a notification indicating normal is sent to the master processor 10 (step 37 in FIG. 3). After sending the notification that the status is normal, the processing at the start-up is ended and the processing shifts to the normal processing.

【００２７】図１および図２を参照すると、マスタプロ
セッサ１０は、ステップ２２において、スレーブプロセ
ッサ２０−１からの通知があったか否かを判定する。通
知があった場合には、この通知が通知元のスレーブプロ
セッサにおいて故障が検出された旨を示しているか否か
を判定する（図２のステップ２３）。この判定は、通知
元のスレーブプロセッサからの通知に故障情報が含まれ
ているか否かを判定することにより行う。故障情報が含
まれる場合には、この故障情報の内容に基づき本実施例
ではスレーブプロセッサ２０−１に対して故障のクリア
を指示する（図２のステップ２４）。この後、マスタプ
ロセッサ１０は、スレーブプロセッサ２０−１からのク
リア終了通知を所定の期間だけ待ち続ける待ち状態に入
る。この待ち状態中においては、所定の期間内であるか
否かを判定し（図２のステップ２５）、所定期間内であ
る場合にはスレーブプロセッサ２０−１からのクリア終
了通知があるか否かを判定する（図２のステップ２
６）。Referring to FIGS. 1 and 2, in step 22, the master processor 10 determines whether or not there is a notification from the slave processor 20-1. When there is a notification, it is determined whether or not this notification indicates that a failure has been detected in the slave processor of the notification source (step 23 in FIG. 2). This determination is performed by determining whether or not the notification from the notification source slave processor includes failure information. If the failure information is included, the slave processor 20-1 is instructed to clear the failure in this embodiment based on the content of the failure information (step 24 in FIG. 2). After that, the master processor 10 enters a waiting state in which it waits for the clear end notification from the slave processor 20-1 for a predetermined period. In this waiting state, it is determined whether or not it is within a predetermined period (step 25 in FIG. 2), and if it is within the predetermined period, whether or not there is a clear end notification from the slave processor 20-1. Is determined (step 2 in FIG. 2)
6).

【００２８】図１および図３を参照すると、スレーブプ
ロセッサ２０−１は、ステップ３４において、マスタプ
ロセッサ１０からのクリア指示があるか否かを判定す
る。クリア指示があった場合には、故障要因のクリアを
実行する（図３のステップ３５）。故障要因がクリアで
きた場合には、マスタプロセッサ１０に対しクリア終了
を通知する（図３のステップ３６）。この後、立ち上げ
時の処理を終了し、通常動作に移行する。Referring to FIGS. 1 and 3, the slave processor 20-1 determines in step 34 whether or not there is a clear instruction from the master processor 10. If there is a clear instruction, the cause of failure is cleared (step 35 in FIG. 3). When the failure factor can be cleared, the master processor 10 is notified of the completion of clearing (step 36 in FIG. 3). After that, the process at the time of start-up is ended, and the normal operation is started.

【００２９】図１および図２を参照すると、ステップ２
６において、マスタプロセッサ１０は、スレーブプロセ
ッサ２０−１からクリア終了通知を受け取った場合に
は、故障のあったスレーブプロセッサ２０−１を縮退
し、再立ち上げ動作を実行する（図２のステップ２
７）。これに対し、ステップ２５において、所定時間内
にスレーブプロセッサ２０−１からクリア終了通知を受
け取っていない場合には、故障情報をリセット回路４０
−１に送出して立ち上げ時の処理を終了する（図２のス
テップ２８）。Referring to FIGS. 1 and 2, step 2
6, when the master processor 10 receives the clear end notification from the slave processor 20-1, the master processor 10 degenerates the faulty slave processor 20-1 and executes the restart operation (step 2 in FIG. 2).
7). On the other hand, in step 25, when the clear end notification is not received from the slave processor 20-1 within the predetermined time, the failure information is reset circuit 40.
-1 is sent to terminate the startup process (step 28 in FIG. 2).

【００３０】マスタプロセッサ１０は、ステップ２２に
おいて、スレーブプロセッサ２０−１からの通知が無い
場合には、ステップ２８の動作を行う。When there is no notification from the slave processor 20-1 in step 22, the master processor 10 performs the operation of step 28.

【００３１】図１を参照すると、故障情報は、故障が検
出されたスレーブプロセッサと接続されているリセット
回路に対して送出される。リセット回路４０−１は、故
障情報を格納し、接続されている全てのスレーブプロセ
ッサをリセットする。リセット回路４０−１は、故障情
報が格納されている間は、接続されている全てのスレー
ブプロセッサを常にリセット状態に維持する。Referring to FIG. 1, the failure information is sent to the reset circuit connected to the slave processor in which the failure is detected. The reset circuit 40-1 stores failure information and resets all connected slave processors. The reset circuit 40-1 always maintains all connected slave processors in the reset state while the failure information is stored.

【００３２】再立ち上げ動作実行時において、電源が投
入されると、リセット回路４０−１には障害情報が保持
され続けているため、このリセット回路４０−１にロー
カルバス７０−１および８０−１を介して接続された全
てのスレーブプロセッサ２０−１および２０−２にリセ
ット信号が送出され続ける。これにより、スレーブプロ
セッサ２０−１および２０−２はマルチプロセッサシス
テムから切り離された状態のままに保たれる。When power is turned on during execution of the restart operation, the reset circuit 40-1 continues to hold fault information. Therefore, the reset circuit 40-1 has local buses 70-1 and 80-. The reset signal continues to be sent to all slave processors 20-1 and 20-2 connected via 1. This keeps slave processors 20-1 and 20-2 separate from the multiprocessor system.

【００３３】スレーブプロセッサ２０−１および２０−
２が切り離されているため、マスタプロセッサ１０と、
スレーブプロセッサ２０−３および２０−４とによりマ
ルチプロセッサシステムが構成される。Slave processors 20-1 and 20-
Since 2 is separated, the master processor 10 and
The slave processors 20-3 and 20-4 form a multiprocessor system.

【００３４】スレーブプロセッサ２０−１の故障が回復
したときには、リセット回路４０−１の障害情報を消去
することにより、リセット回路４０−１にローカルバス
７０−１および８０−１を介して接続された全てのスレ
ーブプロセッサ２０−１および２０−２を再びシステム
に組み込む。When the failure of the slave processor 20-1 is recovered, the failure information of the reset circuit 40-1 is erased to connect to the reset circuit 40-1 via the local buses 70-1 and 80-1. Reinstall all slave processors 20-1 and 20-2 into the system.

【００３５】このように、本実施例では、故障状態にあ
るスレーブプロセッサ２０−１が接続されているリセッ
ト回路４０−１に障害情報を格納し、これにより、リセ
ット回路４０−１に接続されたローカルバス７０−１お
よび８０−１と、このローカルバス７０−１および８０
−１に接続された全てのスレーブプロセッサ２０−１お
よび２０−２をシステムバス５０および６０から電気的
に切断させることができる。したがって、故障したスレ
ーブプロセッサをマルチプロセッサシステムから切り離
し、システム全体の停止を防止することができる。さら
に、故障したスレーブプロセッサによりローカルバス７
０−１上に送出された可能性のある異常データのマルチ
プロセッサシステム全体への波及を防ぐことができる。
さらに、バスが故障している場合に、マルチプロセッサ
システムへの影響を排除できる。As described above, in this embodiment, the failure information is stored in the reset circuit 40-1 to which the slave processor 20-1 in the fault state is connected, and thereby the failure information is connected to the reset circuit 40-1. Local buses 70-1 and 80-1, and the local buses 70-1 and 80
All slave processors 20-1 and 20-2 connected to -1 can be electrically disconnected from the system buses 50 and 60. Therefore, it is possible to disconnect the failed slave processor from the multiprocessor system and prevent the entire system from being stopped. Furthermore, the local bus 7 is
It is possible to prevent the abnormal data that may have been transmitted on 0-1 from spreading to the entire multiprocessor system.
In addition, the impact on the multiprocessor system can be eliminated if the bus fails.

【００３６】次に本発明のマルチプロセッサシステムの
第二の実施例について図面を参照して詳細に説明する。
この第二の実施例は、マスタプロセッサがローカルバス
に接続されている点が第１の実施例とは異なる。その他
の構成は、第一の実施例のものと同様である。Next, a second embodiment of the multiprocessor system of the present invention will be described in detail with reference to the drawings.
The second embodiment differs from the first embodiment in that the master processor is connected to the local bus. Other configurations are similar to those of the first embodiment.

【００３７】本発明の第二の実施例を示す図４を参照す
ると、マスタプロセッサ１０は、ローカルバス７０−２
および８０−２に接続されている。Referring to FIG. 4, which illustrates a second embodiment of the present invention, the master processor 10 includes a local bus 70-2.
And 80-2.

【００３８】本実施例のマルチプロセッサシステムの動
作は、マスタプロセッサ１０の指示により故障したスレ
ーブプロセッサの故障要因のクリアを行わせるところま
では第一の実施例と同様である。このクリア動作によ
り、障害スレーブプロセッサの故障要因をクリアできな
い場合において、故障したスレーブプロセッサがマスタ
プロセッサ１０が接続されているローカルバス７０−２
および８０−２に接続されているか、または、故障した
スレーブプロセッサがマスタプロセッサが接続されてい
るローカルバス７０−２および８０−２とは別のローカ
ルバスに接続されているかによりこれ以降の動作が異な
る。すなわち、故障したスレーブプロセッサとマスタプ
ロセッサ１０とが同一のローカルバスに接続されている
か、または、異なるローカルバスに接続されているかに
よりクリア処理後の動作が異なる。The operation of the multiprocessor system of this embodiment is the same as that of the first embodiment up to the point where the cause of failure of the slave processor which has failed is cleared by the instruction of the master processor 10. By this clearing operation, when the failure factor of the failed slave processor cannot be cleared, the failed slave processor is connected to the local bus 70-2 to which the master processor 10 is connected.
And 80-2, or depending on whether the failed slave processor is connected to a local bus different from the local buses 70-2 and 80-2 to which the master processor is connected. different. That is, the operation after the clear processing differs depending on whether the failed slave processor and the master processor 10 are connected to the same local bus or different local buses.

【００３９】故障したスレーブプロセッサとマスタプロ
セッサ１０とが同一のローカルバスに接続されている場
合には、以下の処理が行われる。マスタプロセッサ１０
は、リセット回路４０−２に障害情報を送出する。これ
により、リセット回路４０−２にローカルバス７０−２
および８０−２を介して接続された故障したスレーブプ
ロセッサ２０−３およびマスタプロセッサ１０がマルチ
プロセッサシステムから電気的に切断される。この処理
により、マスタプロセッサ１０がマルチプロセッサシス
テムから切り離されるため、正常動作中のスレーブプロ
セッサから新たなマスタプロセッサが選定される。この
選定は、図示しないサービスプロセッサにより行われ
る。新たに選定されたマスタプロセッサのもとで、マル
チプロセッサシステムが動作する。When the failed slave processor and the master processor 10 are connected to the same local bus, the following processing is performed. Master processor 10
Sends fault information to the reset circuit 40-2. As a result, the reset circuit 40-2 is connected to the local bus 70-2.
And the failed slave processor 20-3 and master processor 10 connected via 80-2 are electrically disconnected from the multiprocessor system. By this processing, the master processor 10 is separated from the multiprocessor system, so that a new master processor is selected from the slave processors that are operating normally. This selection is performed by a service processor (not shown). The multiprocessor system operates under the newly selected master processor.

【００４０】故障したスレーブプロセッサとマスタプロ
セッサ１０とが異なるローカルバスに接続されている場
合、すなわち、スレーブプロセッサ２０−１が故障した
場合には、以下の処理が行われる。マスタプロセッサ１
０は、スレーブプロセッサ２０−１が接続されているリ
セット回路４０−１に障害情報を送出する。これによ
り、リセット回路４０−１にローカルバス７０−１およ
び８０−１を介して接続されている故障したスレーブプ
ロセッサ２０−１およびスレーブプロセッサ２０−２
が、マルチプロセッサシステムから電気的に切断され
る。When the failed slave processor and the master processor 10 are connected to different local buses, that is, when the slave processor 20-1 fails, the following processing is performed. Master processor 1
0 sends fault information to the reset circuit 40-1 to which the slave processor 20-1 is connected. As a result, the failed slave processor 20-1 and slave processor 20-2 connected to the reset circuit 40-1 via the local buses 70-1 and 80-1.
Is electrically disconnected from the multiprocessor system.

【００４１】このように、本実施例によれば、マスタプ
ロセッサ１０がローカルバスに接続されているマルチプ
ロセッサシステムにおいて、故障したスレーブプロセッ
サとマスタプロセッサ１０とが同一のローカルバスに接
続されている場合には、このローカルバスをシステムか
ら切り離すように構成される。また、故障したスレーブ
プロセッサとマスタプロセッサ１０とが異なるローカル
バスに接続されている場合には、故障したスレーブプロ
セッサが接続されているローカルバスをシステムから切
り離すように構成されている。このため、マスタプロセ
ッサが固定されていないマルチプロセッサシステムにお
けるスレーブプロセッサの故障によるシステム全体の停
止を抑止することができる。As described above, according to the present embodiment, in the multiprocessor system in which the master processor 10 is connected to the local bus, the faulty slave processor and the master processor 10 are connected to the same local bus. Is configured to disconnect this local bus from the system. When the failed slave processor and the master processor 10 are connected to different local buses, the local bus to which the failed slave processor is connected is disconnected from the system. Therefore, it is possible to prevent the entire system from being stopped due to the failure of the slave processor in the multiprocessor system in which the master processor is not fixed.

【００４２】なお、上述の第二の実施例においては、新
たなマスタプロセッサ選定手段としてサービスプロセッ
サを用いたが、この選定手段としてはこれに限定されず
種々の手段が適用できる。例えば、特願平８−０２６６
７９号に記載された手法が適用できる。これは、あるス
レーブプロセッサにおいて実行される所定のアルゴリズ
ムにより、正常なスレーブプロセッサの中から１つのス
レーブプロセッサを選定する手法である。所定のアルゴ
リズムとしては、複数のスレーブプロセッサの各々に番
号を付与しておき、正常なスレーブプロセッサのうち最
小の番号を有するものを選定するものなどがある。In the second embodiment described above, the service processor is used as the new master processor selecting means, but the selecting means is not limited to this and various means can be applied. For example, Japanese Patent Application No. 8-0266
The method described in No. 79 can be applied. This is a method of selecting one slave processor from normal slave processors by a predetermined algorithm executed in a certain slave processor. As the predetermined algorithm, there is a method of assigning a number to each of a plurality of slave processors and selecting a normal slave processor having the smallest number.

【００４３】[0043]

【発明の効果】以上の説明で明らかなように、本発明で
は、故障したスレーブプロセッサの接続されたローカル
バスをマルチプロセッサから切り離すように構成してあ
るため、故障したスレーブプロセッサによりローカルバ
スに送出される可能性のある誤ったデータの影響を排除
できるため、システム全体の停止を防止することができ
る。As is apparent from the above description, according to the present invention, the local bus connected to the failed slave processor is disconnected from the multiprocessor, so that the failed slave processor sends data to the local bus. Since the influence of erroneous data that may be generated can be eliminated, it is possible to prevent the entire system from being stopped.

[Brief description of drawings]

【図１】本発明のマルチプロセッサシステムの第一実施
例を示すブロック図である。FIG. 1 is a block diagram showing a first embodiment of a multiprocessor system of the present invention.

【図２】本発明のマスタプロセッサ１０の動作を示す流
れ図である。FIG. 2 is a flowchart showing the operation of the master processor 10 of the present invention.

【図３】本発明のスレーブプロセッサ２０−１、・・・
および２０−４の動作を示す流れ図である。FIG. 3 shows a slave processor 20-1, of the present invention.
And 20-4 is a flowchart showing the operation of FIG.

【図４】本発明のマルチプロセッサシステムの第二の実
施例を示すブロック図である。FIG. 4 is a block diagram showing a second embodiment of the multiprocessor system of the present invention.

[Explanation of symbols]

１０マスタプロセッサ２０−１、・・・および２０−４スレーブプロセッサ３０主記憶装置４０−１、４０−２リセット回路５０システムバス６０システムバス７０−１、７０−２ローカルバス８０−１、８０−２ローカルバス４０１−１、４０１−２故障情報保持回路 10 Master processor 20-1, ... And 20-4 Slave processor 30 Main memory 40-1, 40-2 Reset circuit 50 System bus 60 System bus 70-1, 70-2 Local bus 80-1, 80- 2 Local buses 401-1 and 401-2 Failure information holding circuit

Claims

[Claims]

1. A master processor, a system bus to which the master processor is connected, a plurality of slave processors, and at least one of the plurality of slave processors, respectively.
A plurality of local buses each connected to the system bus, and a plurality of local buses electrically connected between the system bus and the plurality of local buses, and connected to the plurality of local buses. If at least one of the slave processors in the fault state is included in the fault state, a connection means for disconnecting an electrical connection between the fault local bus to which the fault slave processor is connected and the system bus is provided. A multiprocessor system including :.

2. The connection means includes failure information storage means for storing failure information sent from the failed slave processor, and the failure slave is stored only while the failure information storage means stores the failure information. 2. The multiprocessor system according to claim 1, wherein the faulty local bus to which a processor is connected is electrically disconnected from the system bus.

3. A system bus, a plurality of local buses electrically connected to the system bus, a master processor connected to one of the plurality of local buses, and a local bus connected to the plurality of local buses. A plurality of slave processors, a disconnecting unit electrically disconnecting the faulty local bus connecting the faulty slave processors in the faulty state from the system bus among the plurality of local buses, and the master processor by the disconnecting unit. Is electrically disconnected from the system bus, a selecting means for newly selecting a master processor from the normal ones of the plurality of slave processors is included.

4. The local bus acquisition unit for obtaining the failed local bus to which the failed slave processor is connected, further comprising: the selecting unit connected to the failed slave processor obtained by the local bus acquisition unit. A multiprocessor system comprising: a discriminating unit for discriminating whether or not a master processor is connected to the faulty local bus, and selecting the master processor based on a discriminating result of the discriminating unit.

5. Each of the plurality of slave processors includes failure processing means for detecting a failure of its own processor and sending failure information when a failure is detected, and the disconnecting means includes the plurality of local busses. 4. The multiprocessor system according to claim 3, wherein all the processors provided for each of them are electrically disconnected when the failure information is received.

6. The plurality of slave processors have an identifier, and the selecting unit selects a new master processor from the plurality of slave processors based on the identifier. Multiprocessor system.