JPH1185713A

JPH1185713A - Multi-computer system

Info

Publication number: JPH1185713A
Application number: JP9244143A
Authority: JP
Inventors: Yasuhiro Shimomura; 泰宏下村
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1997-09-09
Filing date: 1997-09-09
Publication date: 1999-03-30

Abstract

PROBLEM TO BE SOLVED: To effectively use the resources of processor elements or the like, to realize a degradation operation and to improve redundancy without increasing the number of the processor elements. SOLUTION: Application programs divided into plural tasks are stored in a storage medium 1 and the tasks are overlapped on the plural processor elements in a CPU 2 and are executed. The processing result of the task is- transmitted/received between the processor elements through an interprocessor element interface 3 and it is decided by majority decision. The task giving the processing result different from the result of majority decision is stopped and the task similar to the task is executed on the other processor element as an alternate task. Thus, the task is set to be the unit of redundancy management.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明はマルチコンピュータ
システムに関し、特にフォールトトレランス性を有する
マルチコンピュータシステムに関する。The present invention relates to a multi-computer system, and more particularly to a multi-computer system having fault tolerance.

【０００２】[0002]

【従来の技術】従来、この種のマルチコンピュータシス
テムは、停止をしたり誤った処理結果を出力したりする
とこのシステムの利用者が多大な被害をこうむると予想
されるシステムに対し、冗長的な管理を行うことにより
停止をしたり誤った処理結果を出力しないようにするた
めに用いられている。2. Description of the Related Art Conventionally, a multi-computer system of this type is redundant with respect to a system which is expected to cause a great deal of damage to a user of the system if the system is stopped or an erroneous processing result is output. It is used to prevent a stop or an incorrect processing result by performing management.

【０００３】一般的に、複数のプロセッサエレメントを
有するマルチコンピュータシステムにおいてフォールト
トレラントを実現するには、故障検出、故障分離、シス
テムの再構成といった手順が必要である。故障検出の方
法としては、三重のハードウェアを使用して結果を多数
決により決める方法や、特開平０５−０９４４２８号公
報のように予め定めた時間内にジョブが完了しなかった
ことを示すタイムアウトを検出する方法などがある。故
障分離の方法としては、切替スイッチを用いて故障した
プロセッサエレメントを予備の正常なプロセッサエレメ
ントに置き換える方法や、特開平０６−２５０９９２号
公報のように故障したプロセッサエレメントをプロセッ
サエレメント結合網から論理的に切り離す方法などがあ
る。また、システムの再構成には、各プロセッサエレメ
ントの負荷を考慮せずに行う方法や、特開平０３−２１
９３６０号公報のように各プロセッサエレメントの負荷
がなるべく均等になるように行う方法などがある。そし
て、従来のマルチコンピュータシステムは、これらの方
法を組み合わせてプロセッサエレメントを冗長管理の単
位としてフォールトトレランス性を有するようにし、処
理結果のトランジェントエラー（一時的エラー）が発生
したときや、アプリケーションプログラムを複数のアプ
リケーションタスクに分割して処理するときにこの複数
のアプリケーショタスクのうちの一部のアプリケーショ
ンタスクにのみ異常が発生したときでも、エラーや異常
が発生したプロセッサエレメントを故障プロセッサエレ
メントとして使用しないようにしていた。Generally, in order to realize fault tolerance in a multi-computer system having a plurality of processor elements, procedures such as fault detection, fault isolation, and system reconfiguration are required. As a method of failure detection, a method of deciding a result by majority decision using triple hardware or a timeout indicating that a job is not completed within a predetermined time as disclosed in Japanese Patent Laid-Open No. 05-094428 is used. There is a method to detect. As a method of fault isolation, a method in which a failed processor element is replaced with a spare normal processor element by using a changeover switch, or a failed processor element is logically converted from a processor element connection network as disclosed in Japanese Patent Application Laid-Open No. 06-250992. There is a method of separating. Also, the system reconfiguration is performed without considering the load of each processor element,
For example, there is a method of making the load of each processor element as equal as possible as disclosed in Japanese Patent No. 9360. The conventional multi-computer system combines these methods so that the processor element has fault tolerance as a unit of redundancy management, and when a transient error (temporary error) as a processing result occurs or when an application program is executed. When processing is performed while being divided into a plurality of application tasks, even if an error occurs only in some of the application tasks, the processor element in which the error or the error has occurred is not used as a failed processor element. I was

【０００４】[0004]

【発明が解決しようとする課題】上述した従来のマルチ
コンピュータシステムは、プロセッサエレメントを冗長
管理の単位とし、マルチコンピュータシステムの処理結
果にトランジェントエラー（一時的エラー）が発生した
ときや、一部のアプリケーションタスクにのみ異常が発
生したときでも、エラーや異常が発生したプロセッサエ
レメントを故障プロセッサエレメントとして使用しない
ようにしていたため、機能を低下させながらもシステム
動作に不可欠な処理を継続して行う縮退動作と呼ばれる
動作を実現することが難しく、また、プロセッサエレメ
ント等の資源が有効に活用されないという問題がある。
そして、冗長管理の単位がプロセッサエレメントである
ので、冗長度を高めてシステムの信頼性をあげるとき
に、プロセッサエレメントの数を増やす必要があるた
め、コストが掛かったりシステムが肥大するという問題
がある。In the above-mentioned conventional multicomputer system, a processor element is used as a unit of redundancy management, and when a transient error (temporary error) occurs in the processing result of the multicomputer system, or when a partial error occurs. Even if an error occurs only in the application task, the processor element in which the error or error occurred is not used as a faulty processor element, so degraded operation that continues processing essential to system operation while degrading functions. However, there is a problem that it is difficult to realize an operation referred to as “processor operation”, and resources such as a processor element are not effectively used.
Since the unit of the redundancy management is a processor element, the number of processor elements needs to be increased when increasing the redundancy to increase the reliability of the system, so that there is a problem that the cost is increased and the system is enlarged. .

【０００５】本発明の目的はこのような従来の欠点を除
去するため、縮退動作を実現することが難しくなく、プ
ロセッサエレメント等の資源を有効に活用でき、さら
に、冗長度を高めてシステムの信頼性をあげるときに、
プロセッサエレメントの数を増やす必要がないマルチコ
ンピュータシステムを提供することにある。[0005] An object of the present invention is to eliminate such conventional disadvantages, so that it is not difficult to realize a degenerate operation, resources such as processor elements can be effectively used, and furthermore, redundancy is increased to improve system reliability. When you improve your sex,
An object of the present invention is to provide a multi-computer system that does not require increasing the number of processor elements.

【０００６】[0006]

【課題を解決するための手段】本発明のマルチコンピュ
ータシステムは、複数のプロセッサエレメントを有する
マルチコンピュータシステムにおいて、アプリケーショ
ンプログラムを複数のタスクに分けこれらの複数のタス
クを前記複数のプロセッサエレメントで処理するとき
に、前記タスクを冗長管理の単位とすることによりフォ
ールトトレランス性を有するようにしている。According to a multi-computer system of the present invention, in a multi-computer system having a plurality of processor elements, an application program is divided into a plurality of tasks, and the plurality of tasks are processed by the plurality of processor elements. At times, the task is made a unit of redundancy management so as to have fault tolerance.

【０００７】また、本発明のマルチコンピュータシステ
ムは、前記複数のタスクのうちの一つのタスクを複数の
プロセッサエレメントで重複してそれぞれ処理するよう
にしている。Further, in the multi-computer system of the present invention, one of the plurality of tasks is processed by a plurality of processor elements in an overlapping manner.

【０００８】また、本発明のマルチコンピュータシステ
ムは、一つの前記プロセッサエレメントで複数の前記タ
スクを処理するようにしている。Further, in the multi-computer system according to the present invention, a plurality of the tasks are processed by one processor element.

【０００９】さらに、本発明のマルチコンピュータシス
テムは、前記複数のプロセッサエレメントをプロセッサ
エレメント間インタフェースによりそれぞれ接続し、こ
のプロセッサエレメント間インタフェースを介して前記
複数のプロセッサエレメント間で情報を送受信するよう
にしている。Further, in the multi-computer system according to the present invention, the plurality of processor elements are connected by an interface between the processor elements, and information is transmitted and received between the plurality of processor elements via the interface between the processor elements. I have.

【００１０】また、本発明のマルチコンピュータシステ
ムは、前記複数のタスクのうちの一つのタスクを複数の
プロセッサエレメントで重複してそれぞれ処理し、これ
ら複数の処理結果をそれぞれのプロセッサエレメントで
それぞれ受け前記複数の処理結果が一致していないとき
に多数決により前記複数のタスクのうちの前記一つのタ
スクの処理結果を決めるようにしている。Further, in the multi-computer system of the present invention, one of the plurality of tasks is processed in a redundant manner by a plurality of processor elements, and the plurality of processing results are received by the respective processor elements. When the plurality of processing results do not match, the processing result of the one task of the plurality of tasks is determined by majority vote.

【００１１】さらに、本発明のマルチコンピュータシス
テムは、前記多数決により決まった前記処理結果と一致
していない処理結果を出した前記複数のタスクのうちの
前記一つのタスクを処理した前記プロセッサエレメント
に対しこのタスクを停止させ、このタスクを処理してい
ないプロセッサエレメントに対してこのタスクを処理さ
せるようにしている。Further, the multi-computer system according to the present invention may be arranged such that the processor element that processes the one of the plurality of tasks that has produced a processing result that does not match the processing result determined by the majority decision is provided. This task is stopped, and the task is processed by a processor element that has not processed the task.

【００１２】また、本発明のマルチコンピュータシステ
ムは、前記多数決により決まった前記処理結果と一致し
ていない処理結果を出した前記複数のタスクのうちの前
記一つのタスクを処理した前記プロセッサエレメントに
対し、このタスクの処理の次の周期の処理をさせ、この
処理結果をこのタスクの処理の次の周期の処理を実行し
た複数のプロセッサエレメントの処理結果と比較し、処
理結果が異なるときに、前記多数決により決まった前記
処理結果と一致していない処理結果を出した前記プロセ
ッサエレメントに対し、前記一つのタスクを停止するよ
うにしている。Further, the multi-computer system according to the present invention may be arranged such that the processor element that processes the one task of the plurality of tasks that has produced a processing result that does not match the processing result determined by the majority decision is provided. The processing of the next cycle of the processing of this task is performed, and the processing result is compared with the processing results of a plurality of processor elements that have performed the processing of the next cycle of the processing of this task. The one task is stopped for the processor element that has issued a processing result that does not match the processing result determined by majority vote.

【００１３】さらに、本発明のマルチコンピュータシス
テムは、前記アプリケーションプログラムを格納する記
憶媒体と、前記アプリケーションプログラムを示す前記
複数のタスクを処理する前記複数のプロセッサエレメン
トと、前記複数のプロセッサエレメントをそれぞれ接続
する前記プロセッサエレメント間インタフェースと、を
備えて構成されている。Further, the multi-computer system of the present invention connects a storage medium for storing the application program, the plurality of processor elements for processing the plurality of tasks representing the application program, and the plurality of processor elements. And an interface between the processor elements.

【００１４】[0014]

【発明の実施の形態】次に、本発明の実施の形態につい
て図面を参照して説明する。Next, embodiments of the present invention will be described with reference to the drawings.

【００１５】図１は、本発明のマルチコンピュータシス
テムの一つの実施の形態を示すブロック図である。FIG. 1 is a block diagram showing one embodiment of a multi-computer system according to the present invention.

【００１６】図１に示す本実施の形態は、アプリケーシ
ョンプログラムを格納する例えば磁気ディスク，半導体
メモリ等の記憶媒体１と、アプリケーションプログラム
を示す複数のタスクを処理する複数のプロセッサエレメ
ントを有するＣＰＵ２と、複数（例えば６つ）のプロセ
ッサエレメントをそれぞれ接続するプロセッサエレメン
ト間インタフェース３とにより構成されている。In the embodiment shown in FIG. 1, a storage medium 1 for storing an application program, such as a magnetic disk or a semiconductor memory, a CPU 2 having a plurality of processor elements for processing a plurality of tasks representing the application program, It is constituted by an inter-processor element interface 3 for connecting a plurality of (for example, six) processor elements.

【００１７】次に、本実施の形態のマルチコンピュータ
システムの動作を図２から図５を参照して詳細に説明す
る。Next, the operation of the multi-computer system according to the present embodiment will be described in detail with reference to FIGS.

【００１８】図２は、プロセッサエレメントへのタスク
の配置の一例を示す図であり、アプリケーションプログ
ラムを複数のタスク（例えば、タスク＃１からタスク＃
５）に分けたときこの分けられた５つのタスクと、複数
のプロセッサエレメント（例えば、プロセッサエレメン
ト＃１からプロセッサエレメント＃６）にそれぞれ配置
され各プロセッサエレメントをそれぞれ制御する制御タ
スクとが示されている。そして、各プロセッサエレメン
トには複数のタスクが配置され、タスク＃１からタスク
＃３はプロセッサエレメント＃１からプロセッサエレメ
ント＃３に重複してそれぞれ配置され、また、タスク＃
４とタスク＃５とはプロセッサエレメント＃４からプロ
セッサエレメント＃６に重複してそれぞれ配置されてい
る。ここで、各タスクは例えば周期的に動作する周期タ
スクであるとし、周期毎に処理結果を自タスクが配置さ
れているプロセッサエレメントの制御タスクに通知し、
制御タスクはこの通知された処理結果をプロセッサエレ
メント間インタフェース３を介して周期的に他のプロセ
ッサエレメント上の制御タスクに通知している。すなわ
ち、各制御タスク間でこれらの処理結果を送受信してい
る。FIG. 2 is a diagram showing an example of the assignment of tasks to processor elements. In FIG.
When divided into 5), the five divided tasks and a control task arranged in each of a plurality of processor elements (for example, processor element # 1 to processor element # 6) and controlling each processor element are shown. I have. A plurality of tasks are arranged in each processor element. Tasks # 1 to # 3 are arranged so as to overlap processor elements # 1 to # 3, respectively.
4 and task # 5 are arranged so as to overlap processor elements # 4 to # 6, respectively. Here, each task is, for example, a periodic task that operates periodically, and notifies the processing result to the control task of the processor element in which the own task is arranged in each cycle,
The control task periodically notifies the control task on another processor element of the notified processing result via the inter-processor element interface 3. That is, these processing results are transmitted and received between the control tasks.

【００１９】図３は、タスクの動作の概要を示す流れ図
であり、例えばタスクを周期的に動作する周期タスクで
あるとし、予め定められた処理を行い（Ｓ３１）、この
処理結果をこのタスクが配置されているプロセッサエレ
メントの制御タスクに通知する（Ｓ３２）ようにし、制
御タスクの指示によりこの処理を繰り返すようにしてい
ることを示している。FIG. 3 is a flowchart showing an outline of the operation of the task. For example, it is assumed that the task is a periodic task that operates periodically, and a predetermined process is performed (S31). The control task of the arranged processor element is notified (S32), and this processing is repeated according to the instruction of the control task.

【００２０】図４は、制御タスクの動作の概要を示す流
れ図である。FIG. 4 is a flowchart showing an outline of the operation of the control task.

【００２１】図５は、制御タスクによる故障分離とシス
テムの再構成との動作を示す流れ図である。FIG. 5 is a flowchart showing the operation of fault isolation and system reconfiguration by the control task.

【００２２】図１において、最初に、実行すべきアプリ
ケーションプログラムは、記憶媒体１から読み出されて
予め定められた５つのタスクに分けられ、プロセッサエ
レメント＃１からプロセッサエレメント＃６に例えば図
２に示すように配置され、タスクと制御タスクが動作す
る。各プロセッサエレメント上のタスク＃１からタスク
＃５は、図３に示す動作をシステム動作中、継続して行
う。制御タスクは、図４に示すように、各タスクから処
理結果が通知されると、同一タスクの処理結果の多数決
を行う（Ｓ４１）。各タスクの処理結果がすべて一致し
ているか否かを調査し（Ｓ４２）、一致しているときに
は、それぞれのタスクには異常が発生していないと判断
しステップＳ４４へと制御を続行する。ステップＳ４２
で調査した結果、一致しないタスクが存在するときに
は、ステップＳ４１の多数決の結果と異なる結果を出し
たタスクに異常が発生したと判断し、故障分離・システ
ムの再構成を図５に示すように行う（Ｓ４３）。In FIG. 1, first, an application program to be executed is read from the storage medium 1 and divided into five predetermined tasks. It is arranged as shown, and the task and control task operate. Tasks # 1 to # 5 on each processor element continuously perform the operation shown in FIG. 3 during the system operation. As shown in FIG. 4, when the processing result is notified from each task, the control task performs a majority decision on the processing result of the same task (S41). It is checked whether or not all the processing results of each task match (S42). If they match, it is determined that no abnormality has occurred in each task, and control is continued to step S44. Step S42
As a result of the investigation, when there is a task that does not match, it is determined that an abnormality has occurred in a task that has a result different from the result of the majority decision in step S41, and fault isolation and system reconfiguration are performed as shown in FIG. (S43).

【００２３】すなわち、各制御タスクは、異常と判断さ
れたタスクが自プロセッサエレメント上で動作していた
ものなのかを確認する（Ｓ５１）。異常と判断されたタ
スクが自プロセッサエレメント上にあった場合、その異
常が一時的なものかどうかを確認するために、この異常
と判断されたタスクに、図３に示すように次の周期の処
理をさせその処理結果をモニタする、すなわち、仮にプ
ロセッサエレメント＃１のタスク＃１に異常が発生した
場合には、プロセッサエレメント＃１の制御タスクは、
異常が検出されたタスク＃１に次の周期の処理を行わせ
その処理結果を、正常なタスク＃１を有する他のプロセ
ッサエレメント上のタスク＃１に次の周期の処理を行わ
せたときの処理結果の多数決の結果と比較する（Ｓ５
２）。モニタした処理結果が異常（多数決の結果と異な
る）か否かを調査し（Ｓ５３）、この処理結果が異常だ
った場合は、この異常と判断されたタスクを停止し、タ
スクを停止したことを示すタスク停止完をプロセッサエ
レメント間インタフェース３を介して他の制御タスクへ
送信し（Ｓ５４）、ステップＳ５９へと進む。モニタし
た処理結果が異常でなかった場合は、この異常と判断さ
れたタスクに替わって実行されるこのタスクと同一のタ
スクを示す代替タスクを停止することを要求する代替タ
スク停止要求をプロセッサエレメント間インタフェース
３を介して他の制御タスクへ送信し（Ｓ５５）、ステッ
プＳ５９へと進む。That is, each control task checks whether the task determined to be abnormal has been operating on its own processor element (S51). If the task determined to be abnormal is present on its own processor element, the task determined to be abnormal is added to the task determined to be abnormal in the next cycle as shown in FIG. Perform the processing and monitor the processing result. That is, if an abnormality occurs in the task # 1 of the processor element # 1, the control task of the processor element # 1
When the task # 1 in which the abnormality is detected is processed in the next cycle, and the processing result is compared with the task # 1 on another processor element having the normal task # 1 in the next cycle. The result of processing is compared with the result of majority decision (S5
2). It is checked whether or not the monitored processing result is abnormal (different from the result of the majority decision) (S53). If the processing result is abnormal, the task determined to be abnormal is stopped, and it is determined that the task has been stopped. The indicated task stop completion is transmitted to another control task via the inter-processor element interface 3 (S54), and the process proceeds to step S59. If the monitored processing result is not abnormal, an alternative task stop request for requesting to stop an alternative task indicating the same task as this task executed in place of the task determined to be abnormal is issued between the processor elements. The data is transmitted to another control task via the interface 3 (S55), and the process proceeds to step S59.

【００２４】一方、ステップＳ５１で異常と判断された
タスクが自プロセッサエレメント以外で動作していた場
合は、自プロセッサエレメント上で異常タスクと同一の
タスクが動作しているかどうかを調べる（Ｓ５６）。調
べた結果同一のタスクが自プロセッサエレメント上に存
在したときにはステップＳ５９へと続く。すなわち、仮
にプロセッサエレメント＃１のタスク＃１に異常が発生
した場合には、プロセッサエレメント＃２とプロセッサ
エレメント＃３とにタスク＃１があるので、このプロセ
ッサエレメント＃２とプロセッサエレメント＃３との制
御タスクの制御がステップＳ５９へ行き、このプロセッ
サエレメントが代替タスク起動候補から除外される。On the other hand, if the task determined to be abnormal in step S51 is operating on a processor other than its own processor element, it is checked whether the same task as the abnormal task is operating on its own processor element (S56). When the same task is found on the own processor element as a result of the check, the process proceeds to step S59. That is, if an abnormality occurs in the task # 1 of the processor element # 1, the task # 1 exists between the processor element # 2 and the processor element # 3. Control of the control task goes to step S59, and this processor element is excluded from the alternative task activation candidates.

【００２５】ステップＳ５６の調査で異常タスクと同一
のタスクが自プロセッサエレメント上に存在しなかった
場合には、例えば、タスクが各プロセッサエレメントに
配置されたときに予め得ていたプロセッサエレメントの
番号とこの番号のプロセッサエレメントに配置された例
えばタスクの個数とを示す情報を基にして、自プロセッ
サエレメントの負荷（例えば、配置されているタスクの
個数）が最小でかつプロセッサエレメント番号が最小か
どうかを、異常タスクと同一のタスクが配置されていな
い他のプロセッサエレメントと比べる（Ｓ５７）。負荷
が最小でかつプロセッサエレメント番号が最小であった
場合、代替タスクを自プロセッサエレメント上で起動し
（Ｓ５８）、ステップＳ５９へ進む。ステップＳ５７で
調べた結果、負荷が最小でなかったり負荷が最小であっ
ても負荷が最小のプロセッサエレメントのうちでプロセ
ッサエレメント番号が最小でなかったときには代替タス
ク起動候補から除外され、ステップＳ５９に進む。すな
わち、仮にプロセッサエレメント＃１のタスク＃１に異
常が発生した場合には、プロセッサエレメント＃４から
プロセッサエレメント＃６のうちのプロセッサエレメン
ト＃４上で代替タスクが起動する。If the same task as the abnormal task does not exist on the own processor element in the check in step S56, for example, the number of the processor element obtained in advance when the task is allocated to each processor element is Based on information indicating, for example, the number of tasks arranged in the processor element of this number, it is determined whether the load of the own processor element (for example, the number of arranged tasks) is the smallest and the processor element number is the smallest. Then, it is compared with another processor element in which the same task as the abnormal task is not arranged (S57). If the load is the smallest and the processor element number is the smallest, the alternative task is started on its own processor element (S58), and the process proceeds to step S59. As a result of the check in step S57, if the processor element number is not the smallest among the processor elements having the smallest load even if the load is not the smallest or the load is the smallest, the processor element is excluded from the alternative task activation candidates and the process proceeds to the step S59. . That is, if an abnormality occurs in the task # 1 of the processor element # 1, an alternative task is activated on the processor element # 4 of the processor elements # 4 to # 6.

【００２６】次に、ステップＳ５９では、仮にプロセッ
サエレメント＃１のタスク＃１に異常が発生した場合に
は、プロセッサエレメント＃２から＃６の制御タスク
は、プロセッサエレメント＃１からのタスク停止完また
は代替タスク停止要求を待ち、代替タスク停止要求とタ
スク停止完とのどちらがきたを調査し（Ｓ５９）、代替
タスク停止要求がきていれば、代替タスクを停止し削除
して（Ｓ６０）、故障分離・システムの再構成の処理を
終了する。ステップＳ５９で調査した結果、タスク停止
完がきていれば、各制御タスクは代替タスクを含めて自
プロセッサエレメントの負荷状況を確認し、例えばタス
クの個数の差が３以上あるような著しい不均衡があるか
否か調査し（Ｓ６１）、著しい不均衡があった場合は、
タスクの個数の差が３以上にならないようにタスク再配
置を実行する（Ｓ６２）。著しい不均衡がなかった場合
は、そのまま終了する。Next, in step S59, if an abnormality occurs in the task # 1 of the processor element # 1, the control tasks of the processor elements # 2 to # 6 complete the task stop from the processor element # 1 or Waiting for the alternative task stop request, investigating whether the alternative task stop request or the task stop completion has come (S59). If the alternative task stop request has been received, the alternative task is stopped and deleted (S60), and the fault isolation and The system reconfiguration processing ends. As a result of the investigation in step S59, if the task stop has been completed, each control task checks the load status of its own processor element including the alternative task, and for example, a remarkable imbalance such as a difference in the number of tasks of 3 or more occurs. Investigate if there is any (S61), and if there is a significant imbalance,
Task relocation is performed so that the difference in the number of tasks does not become 3 or more (S62). If there is no significant imbalance, the process is terminated.

【００２７】すなわち、以上のように代替タスクの起
動，停止，削除およびシステムの再構成を制御タスクが
自律的に制御するようにしている。That is, as described above, the control task autonomously controls the start, stop, and deletion of the alternative task and the reconfiguration of the system.

【００２８】そして、以上に説明したようにして故障分
離・システムの再構成を図５に示すように行ったのち
に、図４に示すように、ステップＳ４４ではタスクの処
理結果が一致していること又は多数決結果及び故障分離
・システムの再構成結果をプロセッサエレメント間イン
タフェース３を介して他のプロセッサエレメント上の制
御タスクへ通知する（Ｓ４４）。この動作をシステム動
作中、継続する。After the fault isolation and the system reconfiguration are performed as shown in FIG. 5 as described above, as shown in FIG. 4, the processing results of the tasks match in step S44 as shown in FIG. The control task on another processor element is notified of the fact or the majority decision result and the fault isolation / system reconfiguration result via the inter-processor element interface 3 (S44). This operation is continued during the operation of the system.

【００２９】[0029]

【発明の効果】以上説明したように、本発明のマルチコ
ンピュータシステムによれば、アプリケーションプログ
ラムを複数のタスクに分け、これらのタスクを複数のプ
ロセッサエレメント上で重複させて実行し、タスクの処
理結果を多数決によって決め、多数決の結果と異なる処
理結果を出したタスクを停止し、このタスクと同一のタ
スクを代替タスクとして他のプロセッサエレメント上で
実行させるようにすることにより、タスクを冗長管理の
単位として故障検出／分離／再構成の単位を小さくでき
るため、タスクに異常が発生してもこのタスクを実行し
たプロセッサエレメントを切り離さないですむので、プ
ロセッサエレメント等の資源を有効に活用でき、縮退動
作が実現できる。また、タスクを複数のプロセッサエレ
メント上で重複して実行させるときにこの重複度をあげ
ることにより冗長度を高めることができるので、冗長度
を高めるときにプロセッサエレメントの数を増やす必要
がないため、コストの低減、システムの小型化が図られ
る。さらに、異常検出を複数のプロセッサエレメント上
のソフトウェアによる多数決で行うため、従来のハード
ウェアによる多数決と比べて、一点故障によるシステム
ダウンの危険性を小さくすることができる。As described above, according to the multi-computer system of the present invention, an application program is divided into a plurality of tasks, and these tasks are executed on a plurality of processor elements in an overlapping manner. Is determined by majority rule, the task that has produced a processing result different from the result of majority rule is stopped, and the same task as this task is executed as an alternative task on another processor element, so that the task is a unit of redundancy management Since the unit of failure detection / isolation / reconfiguration can be reduced, even if an error occurs in a task, the processor element that executed this task does not need to be separated, so that resources such as the processor element can be used effectively and degeneration operation can be performed. Can be realized. In addition, when a task is executed in duplicate on a plurality of processor elements, the redundancy can be increased by increasing the degree of redundancy, so that it is not necessary to increase the number of processor elements when increasing the redundancy. The cost and the size of the system can be reduced. Further, since the abnormality detection is performed by a majority decision using software on a plurality of processor elements, the risk of a system down due to a single point failure can be reduced as compared with the conventional majority decision using hardware.

[Brief description of the drawings]

【図１】本発明のマルチコンピュータシステムの一つの
実施の形態を示すブロック図である。FIG. 1 is a block diagram showing one embodiment of a multi-computer system according to the present invention.

【図２】プロセッサエレメントへのタスクの配置の一例
を示す図である。FIG. 2 is a diagram illustrating an example of an arrangement of tasks in processor elements.

【図３】タスクの動作の概要を示す流れ図である。FIG. 3 is a flowchart showing an outline of a task operation;

【図４】制御タスクの動作の概要を示す流れ図である。FIG. 4 is a flowchart showing an outline of an operation of a control task.

【図５】制御タスクによる故障分離とシステムの再構成
との動作を示す流れ図である。FIG. 5 is a flowchart showing the operation of fault isolation and system reconfiguration by a control task.

[Explanation of symbols]

１記憶媒体２ＣＰＵ３プロセッサエレメント間インタフェース DESCRIPTION OF SYMBOLS 1 Storage medium 2 CPU 3 Interface between processor elements

Claims

[Claims]

1. In a multi-computer system having a plurality of processor elements, when an application program is divided into a plurality of tasks and these plurality of tasks are processed by the plurality of processor elements,
A multi-computer system characterized in that the task is a unit of redundancy management so as to have fault tolerance.

2. The multi-computer system according to claim 1, wherein one of the plurality of tasks is processed by a plurality of processor elements in an overlapping manner.

3. The multi-computer system according to claim 1, wherein a plurality of said tasks are processed by one processor element.

4. The system according to claim 1, wherein the plurality of processor elements are connected by an interface between the processor elements, and information is transmitted and received between the plurality of processor elements via the interface between the processor elements. 4. The multi-computer system according to 2, 3 or 4.

5. One of the plurality of tasks is processed in a redundant manner by a plurality of processor elements, and the plurality of processing results are received by the respective processor elements, and the plurality of processing results are identical. 5. The multi-computer system according to claim 1, wherein a processing result of said one task of said plurality of tasks is determined by a majority decision when there is no task.

6. The processor element that has processed the one task among the plurality of tasks that has issued a processing result that does not match the processing result determined by the majority decision, suspends this task, 6. The multi-computer system according to claim 5, wherein said task is processed by a processor element which has not processed the task.

7. The processor element, which has processed the one task among the plurality of tasks that have produced a processing result that does not match the processing result determined by the majority decision, is provided next to the processing of this task. Cycle processing, comparing this processing result with the processing results of a plurality of processor elements that have executed the processing of the next cycle of the processing of this task, and when the processing results are different, the processing result determined by the majority decision 7. The multi-computer system according to claim 5, wherein the one task is stopped for the processor element that has output a processing result that does not match.

8. A storage medium for storing the application program; a plurality of processor elements for processing the plurality of tasks representing the application program; an interface between the processor elements for connecting the plurality of processor elements; Claims 1, 2, 3, 4, 5,
8. The multi-computer system according to 6 or 7.