JPS6112580B2

JPS6112580B2 -

Info

Publication number: JPS6112580B2
Application number: JP52137901A
Authority: JP
Inventors: Koichiro Yamaguchi; Kazuo Nishimura
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1977-11-18
Filing date: 1977-11-18
Publication date: 1986-04-09
Also published as: JPS5471537A

Description

【発明の詳細な説明】本発明は、情報処理装置において、負荷分散あ
るいは機能分散をはかるマルチプロセツサの障害
処理方式に関するものである。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a multiprocessor failure handling method for distributing loads or distributing functions in an information processing apparatus.

本発明に最も近いと考えられる従来技術の一例
を第１図により説明する。第１図はｎ台のプロセ
ツサ（PU１〜PUn）１_１〜１ｎを共通バス３に
より有機的に結合し、この共通バス３には、さら
に共通メモリ（CM）５、データチヤネル
（DCH）６、専用入出力制御部（Ｐ１０Ｃ）７を
結合する代表的なマルチプロセツサを表わす。各
プロセツサ１１〜１ｎはこれに対応する個別メモ
リ（LM）２_１〜２ｎに接続され、LM２_１〜２
ｎ上に格納された個別プログラムに従がつて動作
する。したがつて、負荷分散形のマルチプロセツ
サにおいては複数のプロセツサが同種の処理を分
担実施することになり、これに対応するLM２_１
〜２ｎ上には同種のプログラムを格納する。 An example of the prior art that is considered to be closest to the present invention will be explained with reference to FIG. In FIG. 1, n processors (PU1 to PUn) ₁₁ to 1n are organically coupled by a common bus 3, and this common bus 3 further includes a common memory (CM) 5, a data channel (DCH) 6, It represents a typical multiprocessor coupled with a dedicated input/output control unit (P10C) 7. Each processor 11 to 1n is connected to a corresponding individual memory (LM) 2 ₁ to 2n, and LM2 ₁ to 2
It operates according to an individual program stored on the computer. Therefore, in a load-balanced multiprocessor, multiple processors share the same type of processing, and the corresponding LM2 ₁
~2n stores the same kind of programs.

いつぽう、機能分散のマルチプロセツサにおい
ては、それぞれの処理機能に対応する異種のプロ
グラムを各LM２_１〜２ｎ上に格納することにな
る。したがつて、負荷分散形、機能分散形のいづ
れについても、第１図に示す基本構成をとるのが
一般的である。 In a functionally distributed multiprocessor, different types of programs corresponding to respective processing functions are stored on each LM2 ₁ to 2n. Therefore, the basic configuration shown in FIG. 1 is generally adopted for both the load distribution type and the function distribution type.

このマルチプロセツサにおける構成のポイント
は、情報の処理要求源、例えば、DCH６ある
いはPIOC７からの処理要求をどのプロセツサで
分担するか、プロセツサが障害に陥つた場合に
どのようにこれを検出し、正常なプロセツサに処
理を引継ぐかにあるといえる。 The key points in the configuration of this multiprocessor are which processors share the processing requests from the information processing request source, for example, DCH6 or PIOC7, how to detect a failure in a processor, and how to restore it to normal status. It can be said that the processing can be taken over to a suitable processor.

第１のポイントに関しては、従来方式において
も、例えば優先処理または巡回処理により各プロ
セツサに処理を割り当てる方法が採られ、金物的
には、共通バス制御部（CBC）４がこの機能を
分担していた。 Regarding the first point, even in the conventional system, a method is adopted in which processing is assigned to each processor by, for example, priority processing or cyclic processing, and in terms of hardware, this function is shared by the common bus control unit (CBC) 4. Ta.

第２のポイントに関しては、従来方式において
は、各プロセツサに障害検出、停止機能を持たせ
るという考え方が一般的であつた。 Regarding the second point, in the conventional system, the general idea was to provide each processor with a fault detection and shutdown function.

ところがこの方式は、障害となつたプロセツサ
に対し、自己制御機能を持たせるためには、個々
プロセツサの金物増を引きおこす。また障害に陥
つたプロセツサが起こす妨害、例えば、共通メモ
リアクセス要求信号が要求状態にスタツクされる
ことも考えられ、従つてこの場合には他のプロセ
ツサの共通バス３の使用を妨げ、ひいてはマルチ
プロセツサ全体のシステムダウンを招くという危
険性があつた。 However, this method requires additional hardware for each processor in order to provide a self-control function to the processor that is causing the problem. It is also possible that interference caused by the failed processor, for example, the common memory access request signal may be stacked in the requested state, thus preventing other processors from using the common bus 3, and thus preventing the multiprocessor from using the common bus 3. There was a risk that the entire SETUSA system would go down.

本発明の目的は、上記した従来技術の欠点をな
くし、経済的、かつ信頼性の高いマルチプロセツ
サを提供するにある。 SUMMARY OF THE INVENTION An object of the present invention is to eliminate the above-mentioned drawbacks of the prior art and provide an economical and highly reliable multiprocessor.

本発明は、各プロセツサを共通バスにて結合す
るマルチプロセツサにおいて、個々のプロセツサ
との間に設けられた集中監視制御バスを通して各
プロセツサの障害を早期に検出し、かつ障害に陥
つたプロセツサを共通バスより電気的に切離し、
また正常なプロセツサに負荷を移し換える機能を
備えた集中監視制御部を装備することにより、経
済的で、かつ信頼性の高いマルチプロセツサを構
成することを特徴とするものである。 In a multiprocessor in which processors are connected via a common bus, the present invention detects a failure in each processor at an early stage through a centralized monitoring control bus provided between the individual processors, and removes the failed processor. Electrically separated from the common bus,
Furthermore, by being equipped with a centralized monitoring control section having a function of transferring the load to a normal processor, an economical and highly reliable multiprocessor is constructed.

本発明の一実施例の全構成を第２図の機能ブロ
ツク図により説明する。第２図は、従来方式のマ
ルチプロセツサに対し、各々のプロセツサの障害
を早期に検出し、障害処理を実施する集中監視制
御部（CSC）８を装備した本発明のマルチプロ
セツサを示すものである。CSC８は各プロセツ
サ１_１〜１ｎと、集中監視制御バス９および共通
バス３により接続される。或るプロセツサ１_１〜
１ｎまたは、これに接続される個別メモリ２_１〜
２ｎが障害に陥つた場合に、障害の発生を、集中
監視制御バス９を経由してCSC８が検出し障害
の程度に応じてCSC８が障害に陥つたプロセツ
サに対して障害処理指令を集中監視制御バス９を
経由して発する。また、共通バス３を通して他の
正常なプロセツサに対して割込要求を出し、障害
プロセツサの負荷を移し換える。 The entire configuration of one embodiment of the present invention will be explained with reference to the functional block diagram of FIG. FIG. 2 shows the multiprocessor of the present invention, which is equipped with a centralized supervisory control unit (CSC) 8 that detects failures in each processor early and performs failure handling, in contrast to conventional multiprocessors. It is. The CSC 8 is connected to each of the processors 1 ₁ to 1n by a centralized monitoring control bus 9 and a common bus 3. A certain processor 1 ₁ ~
1n or individual memories 2 ₁ to 1n connected to this
2n encounters a fault, the CSC 8 detects the occurrence of the fault via the centralized monitoring control bus 9, and centrally monitors and controls the CSC 8 to issue a fault handling command to the faulty processor depending on the degree of the fault. Departs via bus 9. It also issues an interrupt request to other normal processors through the common bus 3 to transfer the load on the faulty processor.

つぎに、CSC８の具体的構成および動作を第
３図および第４図を用いて説明する。 Next, the specific configuration and operation of the CSC 8 will be explained using FIGS. 3 and 4.

第３図において、ｎ台のプロセツサ１_１〜１ｎ
のうち、第ｉ番目のプロセツサを代表例として左
側に示し、右側にCSCを示す。プロセツサ１_１
〜１ｎは、一般には内蔵プログラム制御方式で構
成され、全体の構成は省略するが、本発明に特に
関係するCSC８とのインタフエースをもつ部分
のみを図中に示す。 In FIG. 3, n processors 1 ₁ to 1n
Among them, the i-th processor is shown as a representative example on the left, and the CSC is shown on the right. Processor 1 ₁
-1n are generally configured using a built-in program control system, and although the entire configuration is omitted, only the portion having an interface with the CSC 8, which is particularly relevant to the present invention, is shown in the figure.

障害検出回路（ED）１０はプロセツサ自身が
異常状態を検出する機能を行なう部分であり、 (イ) プロセツサのクロツク断検出 (ロ) 障害検出タイマによるプログラム暴走の検出 (ハ) 個別メモリのパリテイエラー検出等の障害検出を行ない、これらの障害を検出した
旨を障害報告信号線１１を介して、CSC８内の
制御回路（CONT）２１に通報する。制御回路２
１は、プロセツサの障害原因を分析し、(イ)、(ロ)の
ような重症障害の場合は、障害プロセツサ１ｉの
障害が共通バス３を介して他のプロセツサに影影
を及ぼすのを防止するべく、処理部停止信号線１
５を経由して、障害に陥つたプロセツサを動作停
止させるほか、処理部切離信号線１６を経由し
て、共通バス３と直接接続され電気的にプロセツ
サ１ｉに対して信号を入出力制御しているところ
のバストランスミツタ１９およびバスレシーバ２
０に対し、共通バス３との接続を電気的に切離す
べく指示する。 The fault detection circuit (ED) 10 is a part in which the processor itself performs the function of detecting an abnormal state. (a) Detection of processor clock interruption (b) Detection of program runaway by fault detection timer (c) Parity detection of individual memory Fault detection such as error detection is performed, and the fact that these faults have been detected is reported to the control circuit (CONT) 21 in the CSC 8 via the fault report signal line 11. Control circuit 2
1 analyzes the cause of the processor failure, and in the case of severe failures such as (a) and (b), prevents the failure of the failed processor 1i from affecting other processors via the common bus 3. In order to do so, the processing unit stop signal line 1
In addition to stopping the operation of the faulty processor via line 5, it is directly connected to common bus 3 via processing section disconnection signal line 16, and electrically controls the input and output of signals to processor 1i. Bus transmitter 19 and bus receiver 2
0 to electrically disconnect from the common bus 3.

つぎに制御回路２１は、障害に陥つたプロセツ
サの障害発生時点での状態を示す種々のレジス
タ、例えば、内蔵プログラムの実行番地を示すプ
ログラムカウンタ（PC）１１、プロセツサに対
する外部からの割込みの有無を示す割込レジスタ
（ISF）１２、障害原因を示す状態レジスタ
（STR）１３等の内容を集中監視制御バス９の中
のデータ線１７を経由して読取り、これらのレジ
スタの内容を報告キユーレジスタ２２の内部に順
次畜積する。この際にレジスタ類の選択にはレジ
スタ選択線１８を用いて通知する。 Next, the control circuit 21 checks various registers that indicate the state of the faulty processor at the time of the fault, such as a program counter (PC) 11 that shows the execution address of the built-in program, and the presence or absence of an external interrupt to the processor. The contents of the interrupt register (ISF) 12 indicating the cause of the failure, the status register (STR) 13 indicating the cause of the failure, etc. are read via the data line 17 in the central monitoring control bus 9, and the contents of these registers are reported to the queue register 22. Accumulate sequentially inside. At this time, the register selection line 18 is used to notify the selection of registers.

以上の障害情報の読取りが完了すると制御回路
２１は、障害処理部表示レジスタ２３に、該当す
る番号の表示ビツトを立て、該当するプロセツサ
が障害に陥つた旨を表示しておく。 When the reading of the above fault information is completed, the control circuit 21 sets a display bit of the corresponding number in the fault processing unit display register 23 to indicate that the corresponding processor has fallen into a fault.

次にCSC８は、共通バス３を介して、正常な
プロセツサに対して、第ｉ番目のプロセツサ１ｉ
に障害が発生した旨を報告すべく割込要求を出
す。これに応じて、CSC８に蓄積されている障
害処理部表示レジスタ２３、報告キユーレジスタ
２２の内容が、共通バス制御線２６を通して、制
御回路２１により制御され、バストランスミツタ
２４，２５から共通バスに送出され、正常なプロ
セツサのうち、異常処理解析プログラムを起動し
うるプロセツサがこれを読取り、異常処理解析を
実行する。 Next, the CSC 8 sends the i-th processor 1i to the normal processor via the common bus 3.
An interrupt request is issued to report that a failure has occurred. In response to this, the contents of the failure processing unit display register 23 and report queue register 22 stored in the CSC 8 are controlled by the control circuit 21 through the common bus control line 26 and sent from the bus transmitters 24 and 25 to the common bus. Among the normal processors, a processor capable of starting an abnormality processing analysis program reads this and executes an abnormality processing analysis.

以上の一連の動作を第４図の動作図に示す。 The above series of operations is shown in the operation diagram of FIG.

以上説明したように集中監視制御部CSCをマ
ルチプロセツサに導入することにより、従来方式
の負つていた１台のプロセツサの障害がマルチプ
ロセツサ全体のシステムダウンをひき起こす可能
性をなくすことが可能となり、マルチプロセツサ
の稼動率を飛躍的に高めることが可能となつた。 As explained above, by introducing the centralized monitoring and control unit CSC into a multiprocessor, it is possible to eliminate the possibility that a failure in one processor will cause the entire multiprocessor system to go down, which was the problem with the conventional system. This has made it possible to dramatically increase the operating rate of multiprocessors.

さらに集中監視制御部は内蔵プログラム方式の
プロセツサに比して、わずかな金物量により実現
できるため、同種の機能をプロセツサを用いて実
現する場合に比して、飛躍的な経済化を達成する
ことができる。 Furthermore, since the centralized monitoring and control unit can be realized with a small amount of hardware compared to a built-in program type processor, it is possible to achieve dramatic economical savings compared to the case where similar functions are realized using a processor. I can do it.

[Brief explanation of the drawing]

第１図は従来方式によるマルチプロセツサの代
表的構成例を示す機能ブロツク図、第２図は本発
明の一実施例を示すマルチプロセツサの機能ブロ
ツク図、第３図は本発明のポイントである集中監
視制御部の機能の一構成例を示す機能図、第４図
は障害検出および障害処理の一部を示す動作図で
ある。１_１〜１ｎ：プロセツサ、２_１〜２ｎ：個別メ
モリ、３：共通バス、９：集中監視制御バス、１
０：障害検出回路、１１：プログラムカウンタ、
１２：割込レジスタ、１３：状態レジスタ、１
９，２４，２５：バストランスミツタ、２０：バ
スレシーバ、８：集中監視制御部、２１：制御回
路、２２：報告キユーレジスタ、２３：障害処理
部表示レジスタ、２６：共通バス制御線。 Fig. 1 is a functional block diagram showing a typical configuration example of a conventional multiprocessor, Fig. 2 is a functional block diagram of a multiprocessor showing an embodiment of the present invention, and Fig. 3 is a functional block diagram showing the main points of the present invention. FIG. 4 is a functional diagram showing an example of the configuration of the functions of a certain centralized monitoring control section, and FIG. 4 is an operation diagram showing a part of fault detection and fault processing. 1 ₁ to 1n: Processor, 2 ₁ to 2n: Individual memory, 3: Common bus, 9: Centralized monitoring control bus, 1
0: Failure detection circuit, 11: Program counter,
12: Interrupt register, 13: Status register, 1
9, 24, 25: bus transmitter, 20: bus receiver, 8: central monitoring control section, 21: control circuit, 22: report queue register, 23: failure processing section display register, 26: common bus control line.

Claims

[Claims]

1. In a multiprocessor that organically connects multiple processors through a common bus, and connects input/output control units, common memory, etc. to this common bus, and exhibits processing capabilities exceeding those of the individual processors, the above-mentioned In addition to the common bus, a centralized monitoring control bus is provided, through which faults in individual processors can be detected at an early stage, and in the case of a major fault, the operation of the faulty processor will be stopped, and at the same time the faulty processor will be stopped. Electrically separated from the common bus,
It also reads the contents of the register that indicates the state of the faulty processor at the time of the fault occurrence, stores the faulty processor number, notifies other normal processors of the stored fault information, and analyzes the abnormality processing. A failure handling method for a multiprocessor characterized by increasing system reliability by being equipped with a centralized monitoring and control unit that has a function to