JPH02188863A

JPH02188863A - Multiprocessor system

Info

Publication number: JPH02188863A
Application number: JP835389A
Authority: JP
Inventors: Shigeru Adachi; 茂足立; Masanobu Nakajima; 中島　正信
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 1989-01-17
Filing date: 1989-01-17
Publication date: 1990-07-24

Abstract

PURPOSE:To quickly recover the fault of a CPU by executing a process request given to the faulty CPU after transferring this request to a nondefective CPU at occurrence of the CPU fault. CONSTITUTION:A mailbox control table 26, a 1st queue pointer 24a, and a 2nd queue pointer 24b are produced in a shared memory 6 by the function of a system controller SCM 100. When the SCM 100 detects the fault of a 2nd CPU 102, the SCM 100 checks the contents of a 1st queue in terms of the CPU 102 and then checks the presence or absence of another CPU that performs the same process as long as a 1st communication buffer 22a connected to the pointer 24a contains a message. If the presence of the CPU is confirmed, the process request of the faulty CPU 102 is connected to the queue of a normal CPU, e.g., a 3rd CPU 103. As a result, the process request of the faulty CPU is succeeded by a normal CPU and executed. Thus the CPU fault can be quickly recovered.

Description

[Detailed description of the invention] [Industrial application field]

この発明は、複数のプロセッサ間で通信を行って所定の
処理を実行するマルチプロセッサシステムに関するもの
である。The present invention relates to a multiprocessor system in which a plurality of processors communicate with each other to execute predetermined processing.

[Conventional technology]

マルチプロセッサシステムの目的は、システムの機能を
分割し、それぞれ専用のプロセッサに負わせることによ
って高速処理を行わせ、かつ、同様機能を有するプロセ
ッサを複数配置することによって、一部プロセッサの障
害時のバックアンプを行うことにより、システムの高信
頼化を実現することにある。マルチプロセッサシステムにおいては、プロセッサ間の
通信は必須であるが、従来、この種のものとしては、次
のようなものがあった。第５図は、例えば特公昭６２−３９７８９号公報に示さ
れた従来のマルチプロセッサシステムを示すプロック図
、第６図はそのプロセッサ間通信の概要を示す説明図で
ある。図において、１は高速バス、２〜５は各々が専用
の機能を備えて高速バスｌに接続されたプロセッサで、
２はジョブプロセッサ（以下ＪＯＢＰという）、３はフ
ァイルコントロールプロセッサ（以下ＦＣＰという）、
４は入出力装置コントロールプロセッサ（以下１０Ｐと
いう）、５はコミュニケーションコントロールプロセッ
サ（以下ＣＣＰという）である。また、７は前記ＦＣＰ
３に接続され、それによって制御されるファイルであり
、８はｌ０Ｐ４に接続されたリングバス、９はこのリン
グバス８に複数個接続され、前記ｌ０Ｐ４によって制御
される入出力装置（以下Ｉ１０という）である。さらに
、１０は前記ＣＣＰ　５によって制御される通信回線で
あり、１１はこの通信回線１０を介して接続される他シ
ステムである。また、２２は各プロセッサからの処理要求を格納する通
信バッファ、２３は前記■１０９の使用状況等を記録す
るデバイス制御テーブルであり、２４はｌ０ＰＪ用のプ
ロセッサ間通信のための待行列ポインタ、２５はＪＯＢ
Ｐ　Ｚ用のプロセッサ間通信のための待行列ポインタで
ある。これら通信バッファ２２、デバイス制御テーブル
２３、および各待行列ポインタ２４．２５は全て前記共
有メモＩ７６内に設定される。次に動作について説明する。ここでは、−例として、リ
ングバス８を介して各ｌ１０９を駆動及び割り込み処理
などを実行するｌ０Ｐ４と、ＪＯＢＰ２との間の入出力
実行制御におけるプロセッサ間の通信について説明する
。ここで、第６図中、実線の矢印はデータの流れを、破
線の矢印は制御信号の流れを示している。ユーザタスクの処理を行うＪＯＢＰ２では、ユーザタス
クから発行された入出力要求マクロ命令によって入出力
実行に必要な制御情報を通信バッファ２２に作成後、ｌ
０Ｐ４用のプロセッサ間通信のための待行列ポインタ２
４に接続し■、ｌ０Ｐ４に対してプロセッサ間連絡側り
込みをかける■。この後、ユーザタスクはプログラム処理と入出力実行処
理との同期を取るために人出力実行終了待となり、他の
タスクに制御が移る。１０Ｐ４においては、ＪＯＢＰ２からのプロセッサ間連
絡側り込み■を受けて、自プロセッサへの待行列ポイン
タ２４に接続されている通信バッファ２２から要求され
た１１０デバイス番号を取り出し、デバイス制御テーブ
ル（ＤＶＣＢ）　２３によってｌ１０９が使用中か否か
を調べる。ｌ１０９が空いていれば、通信バッファ２２
の制御情報からＴ１０９に渡す入出力動作指令及びデー
タを作成し■、Ｔ１０９に対して起動指令を送出する■
。ｌ１０９では入出力動作完了後１０Ｐ４に対して入出力
終了報告を返す■。１０Ｐ４ではｌ１０９からの入出力終了報告■を受ける
と、入出力結果を通信バッファ２２へ格納して■、その
通信バッファ２２をＪＯＢＰ　Ｚ用のプロセッサ間通信
のための待行列ポインタ２５に接続し■、ＪＯＢＰ　２
に対して、プロセッサ間通信連絡側り込みをかける■。ｌ０Ｐ４からのプロセッサ間連絡側り込み■を受けたＪ
ＯＢＰ　２では、自プロセッサへの待行列ポインタ２５
に接続されている通信バッファ２２を待行列から削除し
■、入出力処理結果を入出力要求マクロ命令発行タスク
にリターンコードとして返し、タスクの入出力実行終了
待状態を解除する。これによって一連のＩ１０入出力制御が完結する。このように、ＪＯＢＰ　２と各専用プロセッサとの間の
プロセッサ間通信は、第６図に示すような手順によって
すべて実行できる。ここで、このようなマルチプロセッサシステムでは、処
理能力及び信頼性の向上を目的として、複数のＪＯＢＰ
　２を使用してシステムを構成する場合がある。この時
には、−台のＪＯＢＰの障害によってシステム全体が動
作不能になることを防ぐため、障害ＪＯＢＰをシステム
から切り離し、残ったＪＯＢＰによってシステムを再構
成し、処理を継続するプロセッサ構成制御が必須となる
。プロセッサの障害は一般には処理要求に対して、一定
時間以上経過しても処理の完了報告がなされないといっ
たタイムアウト処理などにより検出される。一＝６障害のＪＯＢＰの切り離し処理は以下のごとくである。（１）障害ＪＯＢＰにて実行中のタスクの処理打ち切り
。（２）　　他のプロセッサから障害ＪＯＢＰに対して発
行されている処理要求を格納した通信バッファの削除。（３）　　障害ＪＯＢＰから他のプロセッサに対して発
行されている処理要求を格納した通信バッファの削除。（４）障害ＪＯＢＰの停止。上記（２）項および（３）項の処理は、前述した通信バ
ッファ２２の削除方法によって行うことができる。すなわち、（２）項の処理は障害ＪＯＢＰ用のプロセッ
サ間通信のための待行列ポインタ２５に接続されている
通信バンファ２２を削除することによって、（３）項の
処理は共有メモリ６上に設定された通信バッファ２２を
、これまた共有メモリ６上に設定された待行列ポインタ
２５から検索し、そこに示されるプロセッサ番号が、障
害ＪＯＢＰのプロセッサ番号と一致するか否かをチエツ
クすることによって、個々の通信要求が障害ＪＯＢＰの
ものか否かを検出し、障害ＪＯＢＰより発行された通信
要求を削除することによって実現される。The purpose of a multiprocessor system is to divide the system functions and assign them to dedicated processors for high-speed processing, and by arranging multiple processors with similar functions, the system can be used in the event of a failure of some processors. The goal is to achieve high reliability of the system by performing a back-amplification. In a multiprocessor system, communication between processors is essential, and conventionally there have been the following types of communication. FIG. 5 is a block diagram showing a conventional multiprocessor system disclosed in, for example, Japanese Patent Publication No. 62-39789, and FIG. 6 is an explanatory diagram showing an outline of communication between the processors. In the figure, 1 is a high-speed bus, 2 to 5 are processors each equipped with a dedicated function and connected to the high-speed bus l,
2 is a job processor (hereinafter referred to as JOBP), 3 is a file control processor (hereinafter referred to as FCP),
4 is an input/output device control processor (hereinafter referred to as 10P), and 5 is a communication control processor (hereinafter referred to as CCP). In addition, 7 is the FCP
3 is a file connected to and controlled by the l0P4, 8 is a ring bus connected to the l0P4, and 9 is a plurality of input/output devices connected to the ring bus 8 and controlled by the l0P4 (hereinafter referred to as I10). It is. Furthermore, 10 is a communication line controlled by the CCP 5, and 11 is another system connected via this communication line 10. Further, 22 is a communication buffer that stores processing requests from each processor, 23 is a device control table that records the usage status of the above-mentioned 109, 24 is a queue pointer for inter-processor communication for l0PJ, 25 is JOB
Queue pointer for interprocessor communication for PZ. These communication buffer 22, device control table 23, and each queue pointer 24, 25 are all set in the shared memory I76. Next, the operation will be explained. Here, as an example, communication between processors in input/output execution control between 10P4, which drives each 1109 and executes interrupt processing, etc. via ring bus 8, and JOBP2 will be described. Here, in FIG. 6, solid line arrows indicate the flow of data, and broken line arrows indicate the flow of control signals. In JOBP2, which processes a user task, control information necessary for input/output execution is created in the communication buffer 22 by an input/output request macro instruction issued by the user task, and then l
Queue pointer 2 for interprocessor communication for 0P4
4 ■, and performs inter-processor communication side input to l0P4 ■. Thereafter, the user task waits for the human output execution to be completed in order to synchronize the program processing and the input/output execution processing, and control is transferred to another task. In 10P4, in response to the inter-processor communication input from JOBP2, the requested 110 device number is retrieved from the communication buffer 22 connected to the queue pointer 24 to the own processor, and the device number is stored in the device control table (DVCB). 23 to check whether l109 is in use. If l109 is free, communication buffer 22
■ Create input/output operation commands and data to be passed to T109 from the control information of ■, and send a start command to T109■
. In l109, after the input/output operation is completed, an input/output completion report is returned to 10P4 (■). 10P4 receives the input/output completion report ■ from l109, stores the input/output result in the communication buffer 22, and connects the communication buffer 22 to the queue pointer 25 for inter-processor communication for JOBP Z.■ , JOBP 2
For this purpose, the inter-processor communication side is involved.■. J received inter-processor contact from l0P4.
In OBP 2, the queue pointer 25 to the own processor
The communication buffer 22 connected to is deleted from the queue (1), the input/output processing result is returned as a return code to the input/output request macro instruction issuing task, and the task is released from the waiting state for input/output execution to be completed. This completes a series of I10 input/output controls. In this way, all inter-processor communication between JOBP 2 and each dedicated processor can be performed by the procedure shown in FIG. Here, in such a multiprocessor system, multiple JOBPs are processed for the purpose of improving processing capacity and reliability.
2 may be used to configure the system. At this time, in order to prevent the entire system from becoming inoperable due to a failure in one JOBP, processor configuration control is required to disconnect the failed JOBP from the system, reconfigure the system with the remaining JOBPs, and continue processing. . Processor failures are generally detected by timeout processing, in which a processing request is not reported as completed even after a certain period of time has elapsed. 1=6 The process of disconnecting the faulty JOBP is as follows. (1) Processing of the task being executed in the faulty JOBP is aborted. (2) Deletion of the communication buffer that stores the processing requests issued by other processors for the failed JOBP. (3) Delete the communication buffer that stores processing requests issued from the faulty JOBP to other processors. (4) Stopping the failed JOBP. The processes in items (2) and (3) above can be performed by the communication buffer 22 deletion method described above. That is, the process in item (2) is performed by deleting the communication bumper 22 connected to the queue pointer 25 for inter-processor communication for the faulty JOBP, and the process in item (3) is performed by deleting the communication bumper 22 connected to the queue pointer 25 for inter-processor communication for the failed JOBP. By searching the communication buffer 22 that has been sent from the queue pointer 25, which is also set on the shared memory 6, and checking whether the processor number indicated there matches the processor number of the failed JOBP, This is achieved by detecting whether each communication request is for a faulty JOBP and deleting the communication request issued by the faulty JOBP.

[Problem to be solved by the invention]

従来のマルチプロセッサシステムは以上のように構成さ
れているので、プロセッサに障害が発生した場合は、そ
の障害プロセッサに関する処理要求を格納している通信
バッファ２２を削除しなければならず、障害からの回復
処理に時間がかかるという問題点があった。この発明は上記のような問題点を解消するためになされ
たもので、プロセッサの障害発生時に、その障害からの
回復処理が迅速に行えるマルチプこの発明に係るマルチ
プロセッサシステムは、各プロセッサの正常性の判定を
行い、障害プロセッサを検出すると、その障害プロセッ
サに関する処理要求を、同種の処理を行う他の正常なプ
ロセッサの処理要求の待行列に接続するシステムコント
ローラを備えたものである。Conventional multiprocessor systems are configured as described above, so when a processor failure occurs, the communication buffer 22 that stores processing requests related to the failed processor must be deleted, and recovery from the failure occurs. There was a problem in that the recovery process took a long time. This invention was made in order to solve the above-mentioned problems.The multiprocessor system according to the present invention provides a multiprocessor system that can quickly recover from the failure when a processor failure occurs. The system controller includes a system controller that, upon detecting a faulty processor, connects a processing request related to the faulty processor to a queue of processing requests of other normal processors that perform the same type of processing.

[For use]

この発明におけるマルチプロセッサシステムは、障害プ
ロセッサを検出したシステムコントローラによって、当
該障害プロセッサに関する処理要求を他の正常な同種の
処理を行うプロセッサの処理要求の待行列に接続し、障
害を起こしたプロセッサの処理要求の実行を他の正常な
プロセッサに引き継がせる。In the multiprocessor system according to the present invention, a system controller that detects a faulty processor connects a processing request related to the faulty processor to a processing request queue of other normal processors that perform the same type of processing. Allow another normal processor to take over execution of the processing request.

【Example】

以下、この発明の一実施例を図について説明する。第１
図において、１は高速バス、６は共有メモリであり、第
５図に同一符号を付した従来のそれらと同一、あるいは
相当部分であるため詳細な説明は省略する。また、１０
１〜１０４は前記高速バス１に接続された複数のプロセ
ッサ（以下ＣＰＵという）であり、この実施例では説明
を簡略にするため、第１のＣＰＵＩ　Ｏ１〜第４のｃｐ
ｕ１０４の４台としている。１００は前記高速バス１に
接続されて各ＣＰＵＩ　０１〜１０４の正常性の判定を
行い、障害が発生したＣＰＵを検出すると、その障害Ｃ
ＰＵに関する処理要求を他の正常なＣＰＵの処理要求の
待行列に接続するシステムコントローラ（以下ＳＣＭと
いう）であり、１１０は高速バス１に接続されたＩ１０
コントローラ、１１１はこのＩ１０コントローラ１１０
によって制御されるＩｌｏである。また、第２図は前記共有メモリ６内における通信バッフ
ァの待行列への接続を示す説明図である。図において、２２ａは第１の通信バッファ、２２ｂは第
２の通信バッファであり、２４ａは第１の通信バッファ
２２ａが接続される第１の待行列ポインタ、２４ｂは第
２の通信バッファ２２ｂが接続される第２の待行列ポイ
ンタであり、２６はメールボックスを管理するメールボ
ックス管理テーブルである。次に動作について説明する。まず、ＳＣＭｌｏｏの機能
について説明する。ＳＣＭｌｏｏは各ＣＰＵｌ０Ｉ〜１０４への仕事の分担
を行う。ＳＣＭＩ　ＯＯはｌ１０１１１からプログラム
を読み、それをＣＰＵｌ０Ｉ〜１０４ヘロードすること
により各ＣＰＵＩ　０１〜１０４への仕事の分担を行う
機能をもつ。いま、第１のＣＰＵＩ　Ｏ１へは仕事Ａを
、第２のＣＰＵＩ　０２および第３のＣＰＵＩ　０３へ
は仕事Ｂを与え、また、第４０ＣＰＵへはなんの仕事も
与えていないものとする。ＳＣＭｌｏｏは各ＣＰＵｌ０Ｉ〜１０４からの要求によ
り、共有メモリ６内に処理要求の待行列を生成する。処
理要求元のＣＰＵは、メールボックスの生成を、処理要
求先のＣＰＵは、メールボックスに関する待行列の生成
を、各々ＳＣＭ１００に依頼する。今、第１のＣＰＵＩ
　０１よりメールボックス生成の依頼が行われ、また、
第２のＣＰＵ１０２、及び第３のＣＰＵ１０３より前記
メールボックスに関する待行列の生成の依頼が行われた
とすると、ＳＣＭｌｏｏの働きにより、共有メモリ６内
にメールボックス管理テーブル２６、第１の待行列ポイ
ンタ２４ａ、及び第２の待行列ポインタ２４ｂが生成さ
れる。ＳＣＭｌｏｏはまた、処理要求（メツセージ）の送信及
び受信を行わせる働きを持つ。すなわち、ＳＣＭｌｏｏ
は各ＣＰＵＩ　０１〜１０４からのメソセージ送信要求
を受けると、そのメソセージを共有メモリ６内の通信バ
ッファ２２ａ、２２ｂへ格納し、ＣＰＵＩ　０１〜１０
４からの受信要求を受けると、通信バッファ２２ａ、２
２ｂの内容を取り出し、１０１〜１０４の該当するＣＰ
Ｕに与える。次に、このようなマルチプロセッサシステムにおける実
際のプロセッサ間通信につき説明する。第１のＣＰＵｌ０ＩがＳＣＭｌｏｏに対しメソセージを
送信要求すると、ＳＣＭｌｏ、０はそのメツセージを第
１の待行列ポインタ２４ａに接続されている第１の通信
バッファ２２ａ、または、第２の待行列ポインタ２４ｂ
に接続されている第２の通信バッファ２２ｂに格納する
。次に、第２のＣＰＵ１０２がメツセージ受信要求をＳＣ
Ｍｌｏｏに対して行うと、ＳＣＭｌｏｏは第１の待行列
ポインタ２４ａに接続された第１の通信バッファ２２ａ
からメツセージを取り出して第２のＣＰＵ１０２へ渡す
。また、第３のＣＰＵ１０３がメツセージ受信要求をＳ
ＣＭｌｏｏに対して行うと、ＳＣＭｌｏｏは第２の待行
列ポインタ２４ｂに接続された第２の通信バッファ２２
ｂからメソセージを取り出してＣＰＵ１０３へ渡す。このようにして、第１のＣＰＵＩ　Ｏ１から第２のＣＰ
Ｕ１０２または第３のＣＰＵ１０３への通信が行われる
。次に、障害時の処理について説明する。第３図にその処
理手順を示すフローチャートである。例えば、第２のＣＰＵ１０２に障害が発生したとする。ＳＣＭｌｏｏは各ＣＰＵＩ　０１〜１０４の正常性を順
次測定しくステップ５Ｔ１）、第２のＣＰＵＩ　Ｏ２の
障害を検知すると、この第２のＣＰＵ１０２に関する第
１の待行列の内容をチエツクする。その結果、第１の待
行列ポインタ２４ａに接続された第１の通信バッファ２
２ａにメツセージがあれば、他に同処理を行うＣＰＵが
有るか調べ、もし有れば障害を起した第２０ＣＰＵ１０
２の処理要求（メツセージ）を正常なＣＰＵ、例えば第
３のＣＰＵＩ　０３の待行列に接続する（ステップ５Ｔ
３）。即ち、第１の通信バッファ２２ａのメツセージを
正常な第３のＣＰＵ１０３に関する第２の待行列ポイン
タ２４ｂに接続される第２の通信バッファ２２ｂにコピ
ーする。その後、障害を起した第２のＣＰＵ１０２の待
行列を削除しくステップ５Ｔ４）　、各ＣＰＵＩ　Ｏ１
〜１’０４（７）全てについてチエツクが終了したこと
を検出すると処理を終わる（ステップ５Ｔ５）。以上により障害が発生した第２のＣＰＵＩ　Ｏ２の処理
要求は、正常な第３のＣＰＵ１０３に引き継がれて実行
されることになる。また、上記実施例では、障害ＣＰＵの処理要求を他の正
常に処理を実行しているＣＰＵへ引き継ぐものを示した
が、これを、待機中のｃＰＵへ引き継ぐようにしてもよ
い。第４図はその処理手順を示すフローチャートである
。ここで、前述の如く第４のＣＰＵ１０４は待機ＣＰＵで
あり、通常はなんの処理も行っていない。ＳＣＭｌｏｏはステップＳＴＩにて第２のＣＰＵ１０２
に障害が発生したことを検出すると、待機中のＣＰＵの
有無を検出する（ステップ５Ｔ６）。その結果、第４のＣＰＵ１０４が待機中であるので、そ
れに障害を起した第２のＣＰＵ１０２と同一の仕事を与
える（ステップ５Ｔ７）。ついで、待機中の第４のＣＰ
ＵＩ　０４が待行列生成要求を行った後（ステップ５Ｔ
８）、障害を起した第２のＣＰＵＩ　Ｏ２の処理要求を
それに代わる第４のＣＰＵ１０４の待行列へ接続する（
ステップ５Ｔ９）。その後、障害を起した第２の障害Ｃ
ＰＵ１０２の待行列を削除しくステップ５Ｔ４）　、全
てのＣＰＵのチエツク終了を確認して処理を終る（ステ
ップ５Ｔ５）。以上により障害を起した第２のＣＰＵ１０２の処理要求
は、待機中の正常な第４のＣＰＵ１０４に引き継がれて
実行されることになる。An embodiment of the present invention will be described below with reference to the drawings. 1st
In the figure, 1 is a high-speed bus, and 6 is a shared memory, which are the same or equivalent parts to those in the conventional system denoted by the same reference numerals in FIG. 5, and detailed explanations thereof will be omitted. Also, 10
1 to 104 are a plurality of processors (hereinafter referred to as CPUs) connected to the high-speed bus 1, and in this embodiment, to simplify the explanation, first CPUI O1 to fourth cp
There are 4 u104 units. 100 is connected to the high-speed bus 1 and determines the normality of each CPUI 01 to 104, and when a faulty CPU is detected, the faulty CPU
A system controller (hereinafter referred to as SCM) connects processing requests related to the PU to a queue of processing requests of other normal CPUs, and 110 is an I10 connected to the high-speed bus 1.
The controller 111 is this I10 controller 110
Ilo is controlled by Ilo. FIG. 2 is an explanatory diagram showing the connection of communication buffers to queues in the shared memory 6. In the figure, 22a is a first communication buffer, 22b is a second communication buffer, 24a is a first queue pointer to which the first communication buffer 22a is connected, and 24b is a first queue pointer to which the second communication buffer 22b is connected. 26 is a mailbox management table for managing mailboxes. Next, the operation will be explained. First, the functions of SCMloo will be explained. SCMloo allocates work to each CPU10I to 104. SCMI OO has a function of reading a program from 110111 and loading it to CPU 10I-104, thereby allocating work to each CPUI 01-104. Now, it is assumed that work A is given to the first CPUI O1, work B is given to the second CPUI 02 and third CPUI 03, and no work is given to the 40th CPU. SCMloo generates a processing request queue in the shared memory 6 in response to requests from each of the CPUs I0I to 104. The processing request source CPU requests the SCM 100 to generate a mailbox, and the processing request destination CPU requests the SCM 100 to generate a queue related to the mailbox. Now, the first CPUI
A request to create a mailbox was made from 01, and
Assuming that the second CPU 102 and the third CPU 103 request generation of a queue for the mailbox, the mailbox management table 26 and the first queue pointer 24a are stored in the shared memory 6 by the action of SCMloo. , and a second queue pointer 24b are generated. SCMloo is also responsible for sending and receiving processing requests (messages). That is, SCMloo
When receiving a message transmission request from each CPUI 01 to 104, it stores the message in the communication buffers 22a and 22b in the shared memory 6, and sends the message to each CPUI 01 to 104.
4, the communication buffers 22a and 2
Extract the contents of 2b and select the corresponding CP from 101 to 104.
Give to U. Next, actual communication between processors in such a multiprocessor system will be explained. When the first CPUl0I requests SCMloo to send a message, SCMlo,0 sends the message to the first communication buffer 22a connected to the first queue pointer 24a or to the second queue pointer 24b.
The data is stored in the second communication buffer 22b connected to the second communication buffer 22b. Next, the second CPU 102 sends the message reception request to the SC
When done for Mloo, SCMloo is the first communication buffer 22a connected to the first queue pointer 24a.
The message is retrieved from the CPU 102 and passed to the second CPU 102. Further, the third CPU 103 sends a message reception request to
When done for CMloo, SCMloo is the second communication buffer 22 connected to the second queue pointer 24b.
The message is taken out from b and passed to the CPU 103. In this way, from the first CPUI O1 to the second CPU
Communication to U102 or third CPU 103 is performed. Next, processing at the time of failure will be explained. FIG. 3 is a flowchart showing the processing procedure. For example, assume that a failure occurs in the second CPU 102. SCMloo sequentially measures the health of each CPUI 01-104 (step 5T1), and when detecting a failure of the second CPUI O2, checks the contents of the first queue regarding this second CPU 102. As a result, the first communication buffer 2 connected to the first queue pointer 24a
If there is a message on 2a, check whether there are other CPUs that perform the same processing, and if there are, the 20th CPU 10 that caused the failure
2 processing request (message) is connected to the queue of a normal CPU, for example, the third CPUI 03 (step 5T).
3). That is, the message in the first communication buffer 22a is copied to the second communication buffer 22b connected to the second queue pointer 24b related to the normal third CPU 103. After that, the queue of the second CPU 102 that has caused the failure is deleted (step 5T4), and each CPUI O1
-1'04(7) When it is detected that all checks have been completed, the process ends (step 5T5). As described above, the processing request of the second CPUI O2 in which the failure has occurred is taken over and executed by the normal third CPU 103. Further, in the above embodiment, the processing request of the faulty CPU is handed over to another normally executing CPU, but this may be handed over to a standby cPU. FIG. 4 is a flowchart showing the processing procedure. Here, as described above, the fourth CPU 104 is a standby CPU, and normally does not perform any processing. SCMloo is the second CPU 102 in step STI.
When it is detected that a failure has occurred in the CPU, the presence or absence of a CPU on standby is detected (step 5T6). As a result, since the fourth CPU 104 is on standby, it is given the same work as the failed second CPU 102 (step 5T7). Next, the fourth CP waiting
After UI 04 makes a queue creation request (step 5T
8), connect the processing request of the failed second CPUI O2 to the queue of the fourth CPU 104 that replaces it (
Step 5T9). After that, the second failure C that caused the failure
The queue of the PU 102 is deleted (step 5T4), and the process is completed after checking that all CPUs have been checked (step 5T5). As a result of the above, the processing request of the second CPU 102 that has caused the failure is taken over and executed by the normal fourth CPU 104 that is on standby.

【Effect of the invention】

以上のように、この発明によればＣＰＵに異常が発生し
た時に異常なＣＰＵ処理要求を他の正常なＣＰＵへ移し
て実行を続けるため、障害からの回復が迅速に行なえる
マルチプロセッサシステムが得られる効果がある。As described above, according to the present invention, when an abnormality occurs in a CPU, the abnormal CPU processing request is transferred to another normal CPU to continue execution, thereby providing a multiprocessor system that can quickly recover from the failure. It has the effect of

[Brief explanation of the drawing]

第１図はこの発明の一実施例によるマルチプロセッサシ
ステムを示すブロック図、第２図はその通信バッファの
待行列への接続を示す説明図、第３図はその障害時の処
理手順を示すフローチャート、第４図はこの発明の他の
実施例による障害時の処理手順を示すフローチャート、
第５図は従来のマルチプロセッサシステムを示すブロッ
ク図、第６図はそのプログラム間通信の概要を示す説明
図である。１は高速バス、６は共有メモリ、１００はＳＣＭ、１０
１〜１０４はＣＰＵ、２２ａ、２２ｂは通信バッファ、
２４ａ、２４ｂは待行列ポインタ。なお、図中、同一符号は同一、又は相当部分を示す。FIG. 1 is a block diagram showing a multiprocessor system according to an embodiment of the present invention, FIG. 2 is an explanatory diagram showing the connection of the communication buffer to the queue, and FIG. 3 is a flowchart showing the processing procedure in the event of a failure. , FIG. 4 is a flowchart showing the processing procedure at the time of failure according to another embodiment of the present invention,
FIG. 5 is a block diagram showing a conventional multiprocessor system, and FIG. 6 is an explanatory diagram showing an outline of communication between programs. 1 is high-speed bus, 6 is shared memory, 100 is SCM, 10
1 to 104 are CPUs, 22a and 22b are communication buffers,
24a and 24b are queue pointers. In addition, in the figures, the same reference numerals indicate the same or equivalent parts.

Claims

[Claims]

A process comprising a plurality of processors and a shared memory whose contents can be commonly referenced and updated by each of the processors, and transmitted from the requesting processor to the requesting processor on the shared memory. A plurality of communication buffers that store request contents and a queue pointer that sequentially connects the plurality of communication buffers to form a queue structure are set, and the processing request is sent to the request source via the communication buffer. In a multiprocessor system in which communication is transmitted from the processor to the request destination processor, when the normality of each of the processors is sequentially determined and a faulty processor is detected, the communication buffer connected to the queue pointer regarding the faulty processor is 2. A multiprocessor system, comprising: a system controller for transferring said processing request stored in said processing request to said communication buffer connected to said queue pointer for other normal processors that perform the same type of processing.