JP3879436B2

JP3879436B2 - Distributed processing system, distributed processing method, and distributed processing control program

Info

Publication number: JP3879436B2
Application number: JP2001145733A
Authority: JP
Inventors: 明浜谷
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2001-05-16
Filing date: 2001-05-16
Publication date: 2007-02-14
Anticipated expiration: 2021-05-16
Also published as: JP2002342300A

Description

【０００１】
【発明の属する技術分野】
本発明は分散処理システム及び分散処理方法並びに分散処理制御プログラムに関し、特にフォールトトレラント（ｆａｕｌｔ−ｔｏｌｅｒａｎｔ；故障許容）能力を有する分散処理システム及び分散処理方法並びに分散処理制御プログラムに関する。
【０００２】
【従来の技術】
図１４は従来の分散処理システムの一例の構成図である。図１４において、プロセッサエレメントＡ；４０１ａは、入出力部Ａ；４０２ａにより装置Ａ；４０３ａと接続されており、装置Ａ；４０３ａからの入力データＡ；４０５ａは、プロセッサエレメントＡ；４０１ａの中でタスクＡ；４０８ａによって、出力データＡ；４０４ａが算出されて装置Ａ；４０３ａに出力される。
【０００３】
又、プロセッサエレメントＢ；４０１ｂでの処理動作、プロセッサエレメントＣ；４０１ｃでの処理は、夫々同様にタスクＢ；４０８ｂ、タスクＣ；４０８ｃで実行され、装置Ｂ；４０３ｂ、装置Ｃ；４０３ｃに出力される。さらに、ネットワーク４０６を経由して、タスクＡ；４０８ａ、タスクＢ；４０８ｂ、タスクＣ；４０８ｃを実行する上で必要なデータ交換を実施する。
【０００４】
このようなシステムでは、もともとフォールトトレラント能力は全くないが、それぞれのプロセッサエレメントに自己故障診断部Ａ；４０９ａ〜Ｃ；４０９ｃを設けることで、故障したプロセッサエレメントエレメントをシステムから切り離すことができる。しかし、自己故障診断では全ての故障を１００％検出して故障分離をすることはできない。さらに、故障したプロセッサエレメントを故障分離すると、それに接続されている装置まで分離されていまうので故障発生前に有していた機能が損なわれてしまう。それを防ぐためには、待機冗長のために予備のプロセッサエレメントを設置することが必要となるが、その分ハードウェアが増加してしまう。
【０００５】
以上のように、従来の分散処理システムでは、故障発生に対するフォールトトレラント能力が十分でないという欠点があった。
【０００６】
又、上記欠点を解決するために、図１４におけるプロセッサエレメントＡ；４０１ａ、プロセッサエレメントＢ；４０１ｂ、プロセッサエレメントＣ；４０１ｃをそれぞれ３重冗長化し、多数決によって確実なフォールトトレラント能力を持たせることも、従来の技術の延長上で容易に考えられる。しかし、その場合は、ハードウェア量が、３重冗長化のために少なくとも３倍増加してしまうので、ハードウェア量が大幅に大きくなってしまうという欠点があった。
【０００７】
つまり、従来技術は、基本的に、分散処理技術、フォールトトレラント技術をそれぞれ独立した技術として扱っていたため、片方の技術の弱点をもう片方の技術を使って補おうとしても、効率の悪いものとなっていた。
【０００８】
なお、分散処理技術とフォールトトレラント技術（冗長処理技術）を効率良く組みあわせて、システム全体として処理性能向上と信頼性向上をバランス良く実現するための技術については、極めて希少ながら、既存の発明もあるので、その発明の実施例の一例を以下に示す。
【０００９】
上記の冗長資源を有効活用して処理能力向上と信頼性向上を実現するための発明として、特開平７−１１４５２０号公報（名称：冗長資源の管理方法及びそれを用いた分散型フォールトトレラントコンピュータシステム）（以下、先行技術文献１という）が開示されている。ここで、先行技術文献１記載の技術は、冗長系を構成するそれぞれのコンピュータモジュールにおいて、各タスクについてのフォールト発生情報を収集し、そのフォールト発生情報に基づいた評価関数を設定して各タスクの信頼度を推定し、自コンピュータモジュールでどのタスクの冗長構成に参加すべきかを決定し、参加すべきタスクの実行を行うものである。
【００１０】
ここでの説明としては、本発明との相違点を明確にするため、本発明の一実施例である図１の構成と同じ機能動作に、先行技術文献１記載のシステムを適用した場合の動作例を以下に示す。つまり、コンピュータモジュールが制御対象としてデータの入出力を行う装置が３個あり、その各装置への制御を行うタスクがそれぞれの装置に対応して個別にあり、さらに、それぞれが３重冗長となっていて３重多数決によるフォールトトレランスを実行する場合の実施例について説明する。
【００１１】
図１５は、先行技術文献１記載のシステムの構成図である。同図は、その構成要素であるコンピュータモジュール１；５０１〜９；５０９に故障が発生した場合の動作例を示したものである。ここでは、本発明の一実施例である図１の場合との比較を行うことを目的として、相違点が明確になるように簡略化している。
【００１２】
図１５（Ａ）において、コンピュータモジュール１；５０１〜９；５０９は、それぞれ本発明のプロセッサエレメントに相当するものであり、装置Ａ；５１０〜Ｃ；５１２の制御等を行うため、データ入出力を行う。コンピュータモジュール１〜９はデータバス５１３により相互にデータ転送ができるものとする。又、装置Ａを制御するためのタスクＡ；５１４は、コンピュータモジュール１；５０１〜３；５０３の３つのモジュールで３重冗長タスクとして動作し、３重多数決による故障検出／分離／再構成を実施し、その結果として装置Ａへの出力データＡ；５１７をデータバス５１３経由で転送する。同様に、装置Ｂを制御するためのタスクＢ；５１５は、コンピュータモジュール４；５０４〜６；５０６の３つのモジュールで３重冗長タスクとして動作し、３重多数決による故障検出／分離／再構成を実施し、その結果として装置Ｂへの出力データＢ；５１８を転送する。同様に、装置Ｃを制御するためのタスクＣ；５１６は、コンピュータモジュール７；５０７〜９；５０９の３つのモジュールで３重冗長タスクとして動作し、３重多数決による故障検出／分離／再構成を実施し、その結果として装置Ｃへの出力データ５１９を転送する。
【００１３】
ここで、タスクＡ；５１４の重要度＆信頼性要求が高く、常に３重冗長が必要であるが、タスクＢ；５１５の重要度＆信頼性要求がタスクＡ；５１４に比べて低いとした場合は、コンピュータモジュール２；５０２が故障すると、図１５（Ｂ）に示すような動作に変化する。つまり、コンピュータモジュール２；５０２が故障したという情報が評価関数に反映され、それによって、タスクＡの３重冗長を維持するために、コンピュータモジュール４；５０４はタスクＢ；５１５の実行を中止してタスクＡ；５１４の実行を始める。つまり、故障により使用可能なコンピュータモジュール資源が減少した場合に、タスクＡ；５１４等の重要度＆信頼性要求が高いタスクに関して信頼性が低下しないように、各コンピュータモジュールで常に最も実行するに相応しいタスクを選択して実行するという効果があるものである。
【００１４】
上記のように、先行技術文献１記載のシステムの場合は、目的のために最適なタスク実行を制御するための手段について記述しているが、故障検出／分離の手段については一般的な手段を用いることにしている。又、この種のシステムの他の例が特開平８−３２９０２５号公報（以下、先行技術文献２という）、特開平９−１６５３５号公報（以下、先行技術文献３という）、特開平８−２２１２８５号公報（以下、先行技術文献４という）及び特開平８−９５９３５号公報（以下、先行技術文献５という）に記載されている。
【００１５】
【発明が解決しようとする課題】
しかし、上記先行技術文献１記載のシステムでは、３重多数決を実施して１００％の故障検出／分離を行うためには、各タスクごとに３つのコンピュータモジュールが必要である。従って、制御対象となる装置が３種類ある場合は、図１５に示すように、コンピュータモジュールが９個必要となる。一方、この課題を解決する手段は上記先行技術文献２〜５にも記載されていない。
【００１６】
そこで本発明の目的は、コンピュータモジュール等のハ−ドウエア量を従来よりも低減することが可能な分散処理システム及び分散処理方法並びに分散処理制御プログラムを提供することにある。
【００１７】
【課題を解決するための手段】
前記課題を解決するために本発明による第１の発明は、複数のプロセッサと、各々の前記プロセッサに対応して設けられる複数の装置と、前記プロセッサ及び装置間のデータ入出力を制御する入出力手段とが含まれ、前記複数の装置からのデータ入力及び前記複数の装置へのデータ出力を前記複数のプロセッサが分散して行う分散処理システムであって、そのシステムは前記分散処理を目的とする前記複数のプロセッサ及び前記複数の入出力手段に故障許容処理を併せて行わせる故障許容処理手段を含み、前記故障許容処理手段は、前記プロセッサの各々に設けられ、前記全ての装置からのデータ入力及び前記全ての装置に対するデータ出力を行うデータ共有手段と、自プロセッサで算出した所定装置宛の出力データと他プロセッサで算出した前記所定装置宛の出力データとを比較し、その比較結果を他プロセッサに通知する冗長管理手段と、前記入出力手段に設けられ、他プロセッサから通知された比較結果に基づき自プロセッサが故障か否かを判断する故障判定手段とを含み、前記故障判定手段により自プロセッサが故障と判断された場合、自プロセッサを故障分離させる故障分離手段をさらに含み、前記故障分離手段により自プロセッサが故障分離された場合、自プロセッサに対応する装置に対するデータ出力を他プロセッサから出力させるデータ転送手段を含み、前記冗長管理手段は、前記両プロセッサで算出した出力データが不一致の場合にその出力データを算出した他プロセッサに対し故障通知を送出し、一致の場合に正常通知を送出することを特徴とすることを特徴とする。
【００１８】
又、本発明による第２の発明は、複数のプロセッサと、各々の前記プロセッサに対応して設けられる複数の装置と、前記プロセッサ及び装置間のデータ入出力を制御する入出力手段とが含まれ、前記複数の装置からのデータ入力及び前記複数の装置へのデータ出力を前記複数のプロセッサが分散して行う分散処理方法であって、その方法は前記分散処理を目的とする前記複数のプロセッサ及び前記複数の入出力手段に故障許容処理を併せて行わせる故障許容処理ステップを含み、前記故障許容処理ステップは、前記プロセッサの各々に設けられ、前記全ての装置からのデータ入力及び前記全ての装置に対するデータ出力を行うデータ共有ステップと、自プロセッサで算出した所定装置宛の出力データと他プロセッサで算出した前記所定装置宛の出力データとを比較し、その比較結果を他プロセッサに通知する冗長管理ステップと、前記入出力手段に設けられ、他プロセッサから通知された比較結果に基づき自プロセッサが故障か否かを判断する故障判定ステップとを含み、前記故障判定ステップにより自プロセッサが故障と判断された場合、自プロセッサを故障分離させる故障分離ステップをさらに含み、前記故障分離ステップにより自プロセッサが故障分離された場合、自プロセッサに対応する装置に対するデータ出力を他プロセッサから出力させるデータ転送ステップを含み、前記冗長管理ステップは、前記両プロセッサで算出した出力データが不一致の場合にその出力データを算出した他プロセッサに対し故障通知を送出し、一致の場合に正常通知を送出することを特徴とする分散処理方法ことを特徴とする。
【００１９】
又、本発明による第３の発明は、複数のプロセッサと、各々の前記プロセッサに対応して設けられる複数の装置と、前記プロセッサ及び装置間のデータ入出力を制御する入出力手段とが含まれ、前記複数の装置からのデータ入力及び前記複数の装置へのデータ出力を前記複数のプロセッサが分散して行う分散処理制御プログラムであって、そのプログラムは前記分散処理を目的とする前記複数のプロセッサ及び前記複数の入出力手段に故障許容処理を併せて行わせる故障許容処理ステップを含み、前記故障許容処理ステップは、前記プロセッサの各々に設けられ、前記全ての装置からのデータ入力及び前記全ての装置に対するデータ出力を行うデータ共有ステップと、自プロセッサで算出した所定装置宛の出力データと他プロセッサで算出した前記所定装置宛の出力データとを比較し、その比較結果を他プロセッサに通知する冗長管理ステップと、前記入出力手段に設けられ、他プロセッサから通知された比較結果に基づき自プロセッサが故障か否かを判断する故障判定ステップとを含み、前記故障判定ステップにより自プロセッサが故障と判断された場合、自プロセッサを故障分離させる故障分離ステップをさらに含み、前記故障分離ステップにより自プロセッサが故障分離された場合、自プロセッサに対応する装置に対するデータ出力を他プロセッサから出力させるデータ転送ステップを含み、前記冗長管理ステップは、前記両プロセッサで算出した出力データが不一致の場合にその出力データを算出した他プロセッサに対し故障通知を送出し、一致の場合に正常通知を送出することを特徴とする。
【００２０】
本発明による第１から第３の発明によれば、分散処理を行うマルチプロセッサシステムにおいて、分散処理を目的として複数存在するプロセッサ資源をフォールトトレラント処理のために有効活用して、故障が発生した時の故障検出及び分離を行う構成であるため、コンピュータモジュール等のハ−ドウエア量を従来よりも低減することが可能となる。
【００２１】
【発明の実施の形態】
まず、本発明の概要について説明する。図９は本発明の概要を示すシステム構成図である。同図（Ａ）において、制御対象となる装置が、装置Ａ；６０４〜Ｃ；６０６の３つあり、それらの制御タスクを実施するプロセッサエレメント（図１５のコンピュータモジュールに相当する）が、プロセッサエレメントＡ；６０１〜Ｃ；６０３の３つある。これらのプロセッサエレメントＡ；６０１〜Ｃ；６０３の夫々の中で、タスクＡ；６０８、タスクＢ；６０９、タスクＣ；６１０が実行され、全体で３重多数決による冗長処理を実施している。
【００２２】
ここで、プロセッサエレメントＢ６０２が故障した場合は、同図（Ｂ）に示すように、正常時にはプロセッサエレメントＢ６０２上のタスクＢ６０９から装置Ｂ６０５に転送していた出力データＢ６１２が、プロセッサエレメントＡ６０１上のタスクＢ６０９から転送されるようにすることで、故障検出／分離を行うと同時に、残された正常なプロセッサエレメントで正常処理を継続する。
【００２３】
つまり、同じ条件でプロセッサエレメント（図１５ではコンピュータモジュール）の故障に対して１００％の故障検出／分離を実施できるようにすると、先行技術文献１記載のシステム（図１５）の場合は９個のコンピュータモジュールが必要となるのに対して、本発明（図９）の場合は、プロセッサエレメントが３個あれば実現できる。
【００２４】
この際、本発明でも故障分離時に３重多数決を維持することも可能であり、その場合は予備のプロセッサエレメントを１個追加することが必要であるが、それでも先行技術文献１記載のシステムの場合と比べてプロセッサエレメント（冗長資源）の個数は半分以下（４／９）ですむ。
【００２５】
なお、先行技術文献１における図２０にコンピュータモジュールの数と実行可能なタスク数の関係を表した表が示されているが、その表においてコンピュータモジュールの数が３の時は、２重冗長のタスクが１個と冗長度の無いタスクが１個しか実行できないことが示されている。つまり、３個しか冗長資源が無いときは、本発明で実現可能な３種類の３重冗長タスクの実行が不可能であることが示されているという点で、本発明との相違点が明確である。
【００２６】
これは、本発明の場合、図１の実施の形態で示すように、図１における冗長管理部３ａ〜３ｃ、故障判定部１０ａ〜１０ｃ、ネットワーク制御部１２ａ〜１２ｃ、故障分離部９ａ〜９ｃ及びデータ転送制御部１１ａ〜１１ｃによって、分散処理目的に３つあるプロセッサエレメントのみを使用して３重多数決による冗長処理で故障検出／分離を可能としているためである。
【００２７】
上記のような、本発明の、先行技術文献１記載のシステムに対する相違点を整理すると、次のようになる。まず、先行技術文献１記載のシステムについて述べる。
【００２８】
（１）３個の制御対象がある場合に、３重多数決による冗長管理を実施する上で必要なプロセッサエレメントの個数は９個である。
【００２９】
（２）発明が解決しようとする課題（最も重大なもの）は、冗長化されたコンピュータモジュール（プロセッサエレメントに相当する）資源を利用して、分散処理と冗長処理を組み合わせて実施することで処理性能向上と信頼性向上とを同時に実現する場合に、コンピュータモジュールの故障発生によって、最も重要なタスクの信頼性が低下する可能性があったこと、つまり、故障発生時に、重要度の低いタスクから先に停止させて、重要度の高いタスクの信頼性を維持することが望まれるが、そのためには単純に冗長度を増すことしかできず、効率的な縮退の手段がなかったことである。
【００３０】
（３）発明の最大の効果は、故障発生時に、重要度の低いタスクから先に停止させて機能／性能を縮退させることで、重要なタスクに関する信頼性や機能／性能を維持可能とし、限られた資源を最も重要な目的に振り分けることができるということである。つまり、冗長資源を使用して、処理性能向上と信頼性向上とを同時に実現する場合で、かつタスクに重要度の高いものと低いものとがある場合に大きな効果を発揮する。ただし、その際、コンピュータモジュール等の冗長資源が非常にたくさん存在しないと効果がない。又、ここでの分散処理とは、処理性能向上のための分散処理に限られる。
【００３１】
次に、本発明について述べる。
【００３２】
（１）３個の制御対象がある場合に、３重多数決による冗長管理を実施する上で必要なプロセッサエレメントの個数は３個で足りる。
【００３３】
（２）発明が解決しようとする課題（最も重大なもの）は、分散処理を目的として存在するプロセッサエレメント資源について、それらの資源を冗長処理に利用して故障検出／分離することができなかったということである。そのため、冗長処理によって信頼性を高める場合は、単純に個々の分散された資源を個別に冗長化する必要があったために、信頼性向上のために増加するハ−ドウエアが膨大であった。
【００３４】
（３）発明の最大の効果は、分散処理を目的として存在するプロセッサエレメント資源を、冗長処理にも同時に使用して故障検出／分離可能な実現することにより、最小の冗長資源数で分離処理と信頼性向上の同時実現を可能とするということである。なお、処理性能向上のための分離処理に適用するよりも、例えば、ロボットアームにおける間接間の協調制御のように、ミッション目的から物理的に分散する必要がある分離処理システムに適用する方がより高い威力を発揮する。
【００３５】
即ち、冗長資源のハードウェア量が約１／３で実現できるという点で、本発明の方が先行技術文献１記載のシステムよりも優れている。
【００３６】
なお、本発明と先行技術文献１記載のシステムとは、分散処理とフォールトトレランス処理を効率的に実現するというような広いカテゴリでは類似しているが、上記のように、発明が解決しようとする課題及び発明の効果という点では大きく異なる発明である。先行技術文献１記載のシステムは、非常にたくさんの冗長資源を使用して処理性能向上のための分散処理と信頼性向上のための冗長処理を行う場合に、故障発生状況に応じて効率的な縮退（重要度の低いタスクを先に停止して重要度の高いタスクへの影響を小さくする）を行うための手段を提供するものである。即ち、積極的に、重要でない機能を削減するものであるので、機能削減を許容するシステムでないと適用できない。
【００３７】
これに対して本発明は、もともと分散処理を目的として存在している冗長資源をフォールトトレラントのために有効活用して、最小の冗長資源数で、分散処理と信頼性向上を同時に実現するための手段を提供するものであるため、発明の内容が大きく異なっており、先行技術文献１記載のシステムと重複する発明ではない。
【００３８】
上記のように、本発明は分散処理を行うマルチプロセッサシステムにおいて、分散処理を目的として複数存在するプロセッサ資源を、フォールトトレラント処理のために有効活用して、必要最小限の冗長資源（従来技術の約１／３）で故障が発生した時の故障検出／分離を行い、しかも、分散処理対象となる装置の動作を中断することなく、故障発生前と同じ機能を継続して実行することが可能である。
【００３９】
以下、本発明の実施の形態について添付図面を参照しながら説明する。本発明は、分散処理を行うマルチプロセッサシステムにおいて、分散処理対象となる各装置からの入力データを各プロセッサ間の共有情報として管理し、そのデータを使用して異なるプロセッサ間で多数決処理等の冗長管理を行うことで、システム全体としてフォールトトレラント能力を持たせ、故障発生時の信頼性を高めるとことを特徴としている。
【００４０】
図１は本発明に係る分散処理システムの最良の実施の形態の構成図である。同図を参照すると、分散処理システムはプロセッサエレメントＡ；１ａと、このプロセッサエレメントＡ；１ａとプロセッサエレメントインタフェ−ス７ａを介して接続される入出力部Ａ；８ａと、この入出力部Ａ；８ａと伝送路Ａを介して接続される装置Ａ；１５ａと、プロセッサエレメントＢ；１ｂと、このプロセッサエレメントＢ；１ｂとプロセッサエレメントインタフェ−ス７ｂを介して接続される入出力部Ｂ；８ｂと、この入出力部Ｂ；８ｂと伝送路Ｂを介して接続される装置Ｂ；１５ｂと、プロセッサエレメントＣ；１ｃと、このプロセッサエレメントＣ；１ｃとプロセッサエレメントインタフェ−ス７ｃを介して接続される入出力部Ｃ；８ｃと、この入出力部Ｃ；８ｃと伝送路Ｃを介して接続される装置Ｃ；１５ｃとを含んでおり、入出力部Ａ；８ａ，Ｂ；８ｂ及びＣ；８ｃは夫々ネットワーク１６に接続されている。
【００４１】
さらに、プロセッサエレメントＡ；１ａはデータ共有部２ａと、冗長管理部３ａと、タスクＡ；４ａ〜Ｃ；６ａとを含んでおり、同様にプロセッサエレメントＢ；１ｂはデータ共有部２ｂと、冗長管理部３ｂと、タスクＡ；４ｂ〜Ｃ；６ｂとを含んでおり、プロセッサエレメントＣ；１ｃはデータ共有部２ｃと、冗長管理部３ｃと、タスクＡ；４ｃ〜Ｃ；６ｃとを含んでいる。
【００４２】
又、入出力部Ａ；８ａは故障分離部９ａと、故障判定部１０ａと、データ転送制御部１１ａと、ネットワーク制御部１２ａとを含んでおり、同様に入出力部Ｂ；８ｂは故障分離部９ｂと、故障判定部１０ｂと、データ転送制御部１１ｂと、ネットワーク制御部１２ｂとを含んでおり、入出力部Ｃ；８ｃは故障分離部９ｃと、故障判定部１０ｃと、データ転送制御部１１ｃと、ネットワーク制御部１２ｃとを含んでいる。
【００４３】
同図において、分散処理対象となる装置Ａ；１５ａ〜Ｃ；１５ｃからの入力デ−タＡ；１４ａ〜Ｃ；１４ｃは、ネットワーク１６を使用してデータ転送され、プロセッサエレメントＡ；１ａ〜Ｃ；１ｃの各プロセッサで共有データとして管理される。その共有データを使用して、装置Ａへのデータ出力を行うタスクＡ；４ａ〜４ｃ、装置Ｂへのデータ出力を行うタスクＢ；５ａ〜５ｃ、装置Ｃへのデータ出力を行うタスクＣ；６ａ〜６ｃが各プロセッサエレメントで実行される。
【００４４】
以上のタスクについてプロセッサエレメントＡ；１ａ〜Ｃ；１ｃの間で、冗長管理部３ａ〜３ｃ及び故障判定部１０ａ〜１０ｃを用いて多数決原理に基づく故障判定を行い、故障があった場合は、故障分離部９ａ〜９ｃにより故障したプロセッサエレメントを分離する。その際、故障分離したプロセッサエレメントに接続されている装置は、故障分離部９ａ〜９ｃの結果に従ったデータ転送制御部１１ａ〜１１ｃにより、他の正常なプロセッサエレメントとデータ転送可能な状態となり、それにより装置としての処理を中断することなく故障が発生する前と同じ機能を継続して実行する。
【００４５】
このようにして、本発明では、分散処理を行うマルチプロセッサシステムにおいて、上記部位により、分散処理を目的として複数存在するプロセッサ資源を、フォールトトレラント処理のために有効活用して、故障が発生した時の故障検出／分離を行い、しかも、分散処理対象となる装置の動作を中断することなく、故障発生前と同じ機能を継続して実行することができる。
【００４６】
【実施例】
以下、本発明の実施例について説明する。まず、第１実施例について説明する。第１実施例の説明にも図１を用いる。図１は、３つの独立した装置Ａ〜Ｃを対象に分散処理を行うシステムの一実施例を示している。同図において、装置Ａ；１５ａにはプロセッサエレメントＡ；１ａが接続されており、基本的に、装置Ａ；１５ａからの入力データＡ；１４ａは、伝送線路Ａ及び入出力部Ａ；８ａを介してプロセッサエレメントＡ；１ａにおいて、タスクＡ；４ａで処理され、入出力部Ａ；８ａ及び伝送線路Ａを介して出力データＡ；１３ａとして装置Ａ；１５ａに出力される。装置Ａ；１５ａとプロセッサエレメントＡ；１ａ間に接続された入出力部Ａ；８ａは上記のデータ転送を行う。
【００４７】
ここで、プロセッサエレメントＡ；１ａ〜Ｃ；１ｃとは、一般的な計算機の機能を持つものであり、組み込み型の計算機のように、データ入出力機能、演算機能、データ記憶機能等の計算機として最も基本的な機能のみでもかまわない。
【００４８】
又、装置Ｂ；１５ｂとプロセッサエレメントＢ；１ｂと入出力部Ｂ；１５ｂ、並びに装置Ｃ；１５ｃとプロセッサエレメントＣ；１ｃと入出力部Ｃ；１５ｃの構成も、図１に示すように、上記の装置Ａ；１５ａの場合と同様である。
【００４９】
さらに、これらは、入出力部Ａ；８ａ、入出力部Ｂ；８ｂ、入出力部Ｃ；８ｃでネットワーク１６に接続されており、相互のデータ転送が可能である。このネットワーク１６によるデータ転送は、ネットワーク制御部１２ａ〜１２ｃによって行われるものであり、これは一般的なネットワーク機能を持つものでかまわない。
【００５０】
以上までに説明した基本的な分散処理システム構成においては、分散処理を目的として複数のプロセッサを使用しているが、本発明では複数あるプロセッサ資源を有効活用するために、分散処理と同時に３重多数決を基本とした冗長処理も行えるようにする。
【００５１】
図１において、分散処理対象となる装置Ａ；１５ａ〜Ｃ；１５ｃからの入力デ−タＡ；１４ａ〜Ｃ；１４ｃは、ネットワーク１６を使用してデータ転送され、プロセッサエレメントＡ；１ａ〜Ｃ；１ｃの各プロセッサで共有データとして管理される。その共有データを使用して、装置Ａへのデータ出力を行うタスクＡ；４ａ〜４ｃ、装置Ｂへのデータ出力を行うタスクＢ；５ａ〜５ｃ、装置Ｃへのデータ出力を行うタスクＣ；６ａ〜６ｃが各プロセッサで実行される。以上のデータは、データ共有部２ａ〜２ｃにより各プロセッサエレメント内で管理される。
【００５２】
図２に、各データ共有部２ａ〜２ｃがデータ管理に用いる共有データ管理テーブルの一実施例を示す。この共有データ管理テーブルは、例えば、各データ共有部２ａ〜２ｃ内の図示しない格納部に設けられる。同図を参照すると、共有データ管理テーブルには装置Ａからの入力データに対するプロセッサエレメントＡの処理結果であるタスクＡの出力値、プロセッサエレメントＢの処理結果であるタスクＡの出力値及びプロセッサエレメントＣの処理結果であるタスクＡの出力値と、装置Ｂからの入力データに対するプロセッサエレメントＡの処理結果であるタスクＢの出力値、プロセッサエレメントＢの処理結果であるタスクＢの出力値及びプロセッサエレメントＣの処理結果であるタスクＢの出力値と、装置Ｃからの入力データに対するプロセッサエレメントＡの処理結果であるタスクＣの出力値、プロセッサエレメントＢの処理結果であるタスクＣの出力値及びプロセッサエレメントＣの処理結果であるタスクＣの出力値とが格納されている。
【００５３】
冗長管理部３ａ〜３ｃは、図２の共有データ管理テーブルを使用して、一般的な多数決原理に基づく冗長管理を実施する。例えば、プロセッサエレメントＡ；１ａの場合は、タスクＡの出力値について多数決を行い、正常であれば多数決で正常と判断された値を出力データＡ；１３ａとして、装置Ａに出力する。
【００５４】
又、プロセッサエレメントＢ；１ｂの場合は、タスクＢの出力値について多数決を行い、正常であれば多数決で正常と判断された値を出力データＢ；１３ｂとして、装置Ｂに出力する。又、プロセッサエレメントＣ；１ｃの場合は、タスクＣの出力値について多数決を行い、正常であれば多数決で正常と判断された値を出力データＣ；１３ｃとして、装置Ｃに出力する。
【００５５】
上記の多数決結果に不一致があった場合、冗長管理部３ａ〜３ｃは、一般的多数決原理に基づいて、自らの結果と異なる結果を出したプロセッサエレメントに対し、ネットワーク１６を使用して故障を通知する。例えば、図１において、プロセッサエレメントＢ；１ｂが故障した場合は、それが実行するタスクＡ；４ｂ、タスクＢ；５ｂ、タスクＣ；６ｂが異常となるため、図３に示すような故障通知が実行される。
【００５６】
図３はプロセッサエレメントＢ；１ｂが故障した場合の故障通知の例を示す説明図である。同図を参照すると、プロセッサエレメントＢ；１ｂに接続されている故障判定部１０ｂに対しては、故障通知ＡＢ１７（プロセッサエレメントＡからプロセッサエレメントＢに対する故障通知）と故障通知ＣＢ２０（プロセッサエレメントＣからプロセッサエレメントＢに対する故障通知）の２つの故障通知が通知されるため、故障判定部１０ｂは多数決原理に基づいて自らが故障であると判定できる。これに対して、他のプロセッサエレメントＡ，Ｃに接続されている故障判定部１０ａ，１０ｃには１つ（プロセッサエレメントＢから）しか故障通知が通知されないため自らを正常であると判定できる。
【００５７】
図４は故障判定部１０ａ〜１０ｃにおける故障判定の論理（ロジック）を示す図である。なお、ここでは、自らのプロセッサエレメントの識別標識ＩＤをＮとし、図１においてＮの右隣のプロセッサエレメントのＩＤをＮ＋１、Ｎ＋１の右隣のプロセッサエレメントのＩＤをＮ＋２とした。ただし、プロセッサエレメントＣ；１ｃの右隣はプロセッサエレメントＡ；１ａとした。
【００５８】
図４を参照すると、プロセッサエレメントＮが少なくともプロセッサエレメントＮ＋１，Ｎ＋２のいずれかから正常と通知された場合、プロセッサエレメントＮは自プロセッサエレメントが正常と判断することを示している。換言すれば、プロセッサエレメントＮはプロセッサエレメントＮ＋１，Ｎ＋２の両者から故障と通知された場合のみ自プロセッサエレメントが故障と判断するのである。
【００５９】
故障判定部１０ａ〜１０ｃは図４に示すロジックに従って、自らの故障判定結果を、故障分離部９ａ〜９ｃ及びデータ転送制御部１１ａ〜１１ｃに通知する。故障判定部１０ａ〜１０ｃの判定結果と、それに従った故障分離部９ａ〜９ｃ及びデータ転送制御部１１ａ〜１１ｃの動作をまとめると次のようになる。
【００６０】
まず、故障判定部が自らを正常と判断した場合について説明する。
【００６１】
（１）故障分離部は何もしない。
【００６２】
（２ａ）データ転送制御部において、装置からの入力データのバッファ、装置への出力データのバッファ、故障判定部の故障通知データが入力されるポートは、ネットワーク上で全て独立したアドレスとして認識できるようにする。
【００６３】
（２ｂ）全てのプロセッサエレメントが、全ての装置からの入力データをネットワーク経由で読み取れるようにする。
【００６４】
（２ｃ）全てのプロセッサエレメントが、故障判定部の故障通知データが入力されるポートへ故障通知データを書き込めるようにする。
【００６５】
（２ｄ）装置への出力データのバッファに対するデータの転送は、自らのプロセッサエレメントからのみ可能とする。
【００６６】
次に、故障判定部が自らを故障と判断した場合について説明する。
【００６７】
（１）自らのプロセッサエレメントから送られてくるデータを遮断し、自らのプロセッサエレメントを入出力部から機能的に分離する。そして、自らのプロセッサエレメントにリセットをかける。
【００６８】
（２ｂ）〜（２ｃ）は上述した故障判定部が自らを正常と判断した場合と同様である。
【００６９】
（２ｄ）装置への出力データのバッファに対するデータの転送は、他のプロセッサエレメントからデータバスを経由して転送できるようにする。この時、自らのプロセッサエレメントから転送することはできない。
【００７０】
即ち、いずれかのプロセッサエレメントに故障があった場合は、故障分離部９ａ〜９ｃにより故障したプロセッサエレメントを分離する。その際、故障分離したプロセッサエレメントに接続されている装置は、故障分離部９ａ〜９ｃの結果に従ったデータ転送制御部１１ａ〜１１ｃにより、他の正常なプロセッサエレメントとデータ転送可能な状態となる。
【００７１】
上記のような状態の中で、図１における冗長管理部３ａ〜３ｃは、今まで述べた方法により正常と判断されたプロセッサエレメントから、故障分離されたプロセッサエレメントに接続されている装置に、その装置用の出力データを転送して、その装置の正常な動作継続を可能とする。
【００７２】
具体的な例を示すと、図３に示されるようにプロセッサエレメントＢ；１ｂが故障と判断された場合は、図１におけるプロセッサエレメントＢ；１ｂは入出力部Ｂ；８ｂから分離され、装置Ｂ；１５ｂに対する出力データＢ；１３ｂは、プロセッサエレメントＡ；１ａのタスクＢ；５ａの結果か、もしくは、プロセッサエレメントＣ；１ｃのタスクＢ；５ｃの結果がネットワーク１６を経由して転送される。
【００７３】
なお、ここで、プロセッサエレメントＡ；１ａのデータを使用するか、プロセッサエレメントＣ；１ｃのデータを使用するかは重要な問題ではなく、どちらでもかまわない。例えば、故障分離されたプロセッサエレメントの右隣のプロセッサエレメントのデータを使用するなどといったロジックを決めれば良いことであり、一般的に良く知られた方法で容易に実現できる。
【００７４】
以上までに示したように、プロセッサエレメントＡ；１ａ〜Ｃ；１ｃのいずれかに故障が発生しても、装置としての処理を中断することなく故障が発生する前と同じ機能を継続して実行することができる。
【００７５】
なお、入出力部Ａ；８ａ〜Ｃ；８ｃの中の故障判定部１０ａ〜１０ｃ、故障故障分離部９ａ〜９ｃ、データ転送制御部１１ａ〜１１ｃは、図４のロジック及び上述の故障判定部が自らを正常／故障と判断した場合の動作を満たすものであれば、ソフトウェアで実現してもハードウェアで実現してもかまわない。どちらも一般的に良く知られている手法で容易に実現可能である。
【００７６】
次に、フローチャートを用いて、図１に示す本実施例の動作について説明する。図１において、プロセッサエレメントＡ；１ａ〜Ｃ；１ｃが装置Ａ；１５ａ〜Ｃ；１５ｃから入力データＡ；１４ａ〜Ｃ；１４ｃを入力し、冗長管理部３ａ〜３ｃにより故障判定部１０ａ〜１０ｃへ故障通知を通知するところまでの動作のフローチャートを図５に示す。
【００７７】
図５においては、説明上の汎用性を持たせるために、Ｎという位置のプロセッサエレメントのソフトウェア動作として示した。ここで、Ｎ＋１はＮの右隣の位置を示し、Ｎ＋２はＮ＋１の右隣の位置を示すものとする。なお、図１においてプロセッサエレメントＣ；１ｃの右隣はプロセッサエレメントＡ；１ａとする。例えば、ＮをＡとした場合は、Ｎ＋１はＢ、Ｎ＋２はＣとなる。
【００７８】
図５において、Ｎの位置のプロセッサエレメントは、Ｎ、Ｎ＋１、Ｎ＋２の装置の入力データを入力し、それらのデータを使用してＮ、Ｎ＋１、Ｎ＋２の装置への出力データを算出する（ステップＳ１０１，Ｓ１０２，Ｓ１０３）。
【００７９】
なお、上記を図１の構成との対比を例を挙げて示すと、Ｎを図１におけるＡとした場合、これはプロセッサエレメントＡ；１ａのソフトウェア動作となり、Ｎの装置の入力データは入力データＡ；１４ａとなり、Ｎ＋１の装置の入力データは入力データＢ；１４ｂとなり、Ｎ＋２の装置の入力データは入力データＣ；１４ｃとなる。これらを使用して、タスクＡ；４ａで装置Ａ；１５ａへの出力データを算出し、タスクＢ；５ａで装置Ｂ；１５ｂへの出力データを算出し、タスクＣ；６ａで装置Ｃ；１５ｃへの出力データを算出することに対応する。
【００８０】
次に、図５において、Ｎの位置のプロセッサエレメントで算出した結果をＮ＋１とＮ＋２の位置のプロセッサエレメントに送ると同時に、Ｎ＋１とＮ＋２の位置のプロセッサエレメントから、算出結果を受け取る（ステップＳ１０４）。なお、これらの動作は、図１におけるデータ共有部２ａ〜２ｃによって実現するものである。
【００８１】
次に、図１における冗長管理部３ａ〜３ｃにより、多数決原理に基づく冗長管理を実施する。図５において、Ｎ、Ｎ＋１、Ｎ＋２の算出結果で多数決を実施し、各算出結果に対する故障の判定を実施する（ステップＳ１０５，Ｓ１０６、Ｓ１０９）。その結果で、Ｎの算出結果が正常と判断した場合は正常処理を継続する（ステップＳ１０７）。又、Ｎ＋１が故障と判断した場合はＮ＋１へ故障通知を通知し（ステップＳ１１０）、Ｎ＋２が故障と判断した場合はＮ＋２に故障通知を通知する（ステップＳ１０８）。又、Ｎが自らを故障と判断した場合は動作を停止する（ステップＳ１１１）。
【００８２】
なお、実際のシステムでは、以上の動作において、分散処理のためのデータ交換、同期処理等が必要となるが、本発明の範囲とは関係がないので、それらに関する説明は省略する。又、故障が複数発生した場合の対処は説明上複雑になるので、本実施例においては、説明を簡単にするため故障が１つ発生した場合とした。
【００８３】
次に、図１における入出力部Ａ；８ａ〜Ｃ；８ｃの動作について、フローチャートを用いて説明する。図６は、故障判定部１０ａ〜１０ｃが、上記に示した故障通知を受け取ってからの、故障分離部９ａ〜９ｃ及びデータ転送制御部１１ａ〜１１ｃの動作を示したフローチャートである。なお、ここでのＮ、Ｎ＋１、Ｎ＋２の定義は、図５において定義した内容と同じである。
【００８４】
図６において、Ｎは、Ｎ＋１とＮ＋２からの故障通知について、多数決原理に基づく故障判定を行う（ステップＳ２０１，Ｓ２０２）。これは、図１における故障判定部１０ａ〜１０ｃで実施するものである。即ち、Ｎ＋１から故障通知を受け取り（ステップＳ２０１でＹｅｓの場合）、Ｎ＋２からも故障通知を受け取った場合（ステップＳ２０２でＹｅｓの場合）、Ｎは故障と判断し（ステップＳ２０３）、Ｎのプロセッサエレメントを故障分離し（ステップＳ２０４）、Ｎ＋１、Ｎ＋２に対して故障分離結果を通知する（ステップＳ２０５）。さらに、Ｎのプロセッサエレメントが故障分離された状態で、Ｎに接続されている装置の動作を継続させるために、Ｎに接続されている装置に対する出力をＮ＋１又はＮ＋２から出力することをデータ転送上可能にする（ステップＳ２０６）。これらの動作は、図１における、故障分離部９ａ〜９ｃ及びデータ転送制御部１１ａ〜１１ｃで実施するものである。又、図６において、Ｎが正常と判断された場合（ステップＳ２０１及びＳ２０２のいずれかでＮｏの場合）は、そのままの状態で処理を継続する（ステップＳ２０７）。
【００８５】
次に、上記に示したように、図１における入出力部Ａ；１ａ〜Ｃ；１ｃにより故障分離等が実施された後の冗長管理部３ａ〜３ｃの動作について、同様に、図７のフローチャートを用いて説明する。
【００８６】
図７は、図６で示した動作結果を受けて、図１におけるプロセッサエレメントＡ；１ａ〜Ｃ；３ｃのソフトウェア動作として実現する冗長管理部３ａ〜３ｃの動作を示したものである。なお、ここでのＮ、Ｎ＋１、Ｎ＋２の定義も、図５において定義した内容と同じである。
【００８７】
図７において、Ｎ＋１から故障分離結果を受け取った場合（ステップＳ３０１にてＹｅｓの場合）は、Ｎ＋１はプロセッサエレメントが故障分離されていると認識し（ステップＳ３０２）、Ｎ＋１に接続されている装置に対して、Ｎ＋１に接続されている装置のための出力データを送信して（ステップＳ３０３）、その他の処理を継続する（ステップＳ３０４）。
【００８８】
又、Ｎ＋２から故障分離結果を受け取った場合（ステップＳ３０１にてＮｏ，ステップＳ３０５にてＹｅｓの場合）も、Ｎ＋２に対して同様な動作を行う（ステップＳ３０６，Ｓ３０７，Ｓ３０８）。
【００８９】
さらに、Ｎ＋１からもＮ＋２からも故障分離結果を受け取っていない場合（ステップＳ３０１にてＮｏ，ステップＳ３０５にてＮｏの場合）は、状態を何も変更せずに、そのまま処理を継続する（ステップＳ３０９）。
【００９０】
以上の動作に関して、その動作例として、プロセッサエレメントＢ；１ｂが故障分離された場合の、装置へのデータ出力動作の例を図８に示す。図８において、プロセッサエレメントＢ；１ｂは故障分離されて動作停止状態となっているが、その代わりに、プロセッサエレメントＡ；１ａ上で動作するタスクＢ；５ａの出力値から、装置Ｂ；１５ｂに出力データＢ；２２を転送する。この際、装置Ａ；１５ａに対しては、故障発生前の動作と同じで、プロセッサエレメントＡ；１ａ上で動作するタスクＡ；４ａの出力値から、出力データＡ；２１を転送する。又、装置Ｃ；１５ｃに対しても、故障発生前の動作と同じで、プロセッサエレメントＣ；１ｃ上で動作するタスクＣ；６ｃの出力値から、出力データＣ；２３を転送する。
【００９１】
以上のように、本実施例によれば、分散処理システム中の１つのプロセッサエレメントが故障しても、その故障を検出／分離し、なおかつ、プロセッサエレメントに接続されている装置としての処理を中断することなく、故障が発生する前と同じ機能を継続することができる。
【００９２】
なお、本実施例では、図１におけるプロセッサエレメントＡ；１ａ〜Ｃ；１ｃに故障が発生した場合についてフォールトトレラント能力を持つ場合の実施例であるが、入出力部Ａ；８ａ〜Ｃ；８ｃに故障が発生する場合に対する対処も別途必要となる場合もある。ただし、一般的にプロセッサエレメント等の基本的な計算機機能の部分とインタフェース部分とを比べると、プロセッサエレメントの方が回路規模や動作の複雑さが桁違いに大きいため、プロセッサエレメント部分の故障対策の方が重要であり、冗長化した場合の影響も桁違いに大きい。又、インタフェース部分の故障対策としては、冗長化、使用部品の高信頼化等の従来から良く用いられている手段で容易に実現できるため、本発明では、説明を簡単にするために、プロセッサエレメント部分に故障が発生する場合に条件を絞って実施例を説明した。
【００９３】
なお、本発明は上記の実施例に限定されるものではなく、例えば、図１の実施例において、自らのプロセッサエレメントの故障診断ができる手段をもっていれば、１つのプロセッサエレメント内でタスクを冗長に実行して多数決を実施する必要はない。その場合、１つのプロセッサエレメントに故障が発生した時に自己故障診断で故障したプロセッサエレメントと故障分離すれば、その後の動作を図１の実施例と同じ動作にすることで、装置Ａ；１５ａ〜Ｃ；１５ｃの動作を中断することなく、正常な処理を継続することができる。プロセッサエレメントに故障が発生してもシステム全体として機能を維持できるという効果は同じである。なお、これは第３実施例として後述する。
【００９４】
又、図１の実施例の場合は、いずれかのプロセッサエレメントに故障が１回発生する場合を想定したフォールトトレラント方法の例であるが、宇宙空間等の特殊な環境で使用した場合に、環境条件の特殊性から、一時的な（トランジェントな）故障が何度も発生して、立ち上げ直すとその都度正常に戻るような場合も多く見受けられる。そのような環境で使用するシステムに適用する場合は、図１の実施例で故障分離したプロセッサエレメントにリセットをかけて再立ち上げを実施し、イニシャルセルフチェック結果が正常であった場合はシステムに復帰させる方法を加えると有効である。その場合は、一時的な（トランジェントな）故障が何度も発生しても、ノンストップで正常処理継続が可能なフォールトトレラントシステムを実現できる。
【００９５】
なお、上記のような、自己故障診断のための方法や、一時的な（トランジェントな）故障に対応した再立ち上げしてシステムに復帰させる方法は、従来から良く知られている方法を適用することで容易に実現できるため、ここでは説明を省略する。
【００９６】
次に、第２実施例について説明する。図１０は第２実施例の構成を示す模式図、図１１はＣＰＵ（例えば、プロセッサエレメントＡ）と装置（例えば、装置Ａ）との関係を示す模式図である。第２実施例は第１実施例における分散処理システムをロボットアームに適用したものである。
【００９７】
図１０にはロボットアームの構成の一例が示されている。図１０を参照すると、ロボットアームは取手部３１と、アーム３２〜３５と、アクチュエ−タ４１〜４５とを含んでおり、アクチュエ−タ４１は取手部３１の動作を制御し、アクチュエ−タ４２〜４４はアーム３２〜３５の動作を制御する。又、アクチュエ−タ４５はアーム３２〜３５の原点となるアクチュエ−タである。即ち、これらのアクチュエ−タ４１〜４５はロボットアームの関節に相当するものである。
【００９８】
ロボットアームの関節（この段落では便宜上アクチュエーク４１〜４５をそれぞれ関節４１〜４５という）は、隣接する関節と強調動作を行う。例えば、取手部３１を所定位置まで移動させるためには、アーム３２だけでなく、アーム３３〜３５の各々も、動かさなくてはならない。そのためには、各関節４１〜４５は協調して動作する必要があるのである。従ってこの関節４１〜４５の制御に本発明に係る分散処理を適用することができる。
【００９９】
図１１を参照すると、各々のアクチュエ−タ４１〜４５はセンサ６１と、モータ６２と、センサ６１からの出力及び協調動作上必要な他のＣＰＵの処理結果（ネットワーク経由で受け取るもの）に基づきモータ６２を制御するＣＰＵ（ｃｅｎｔｒａｌｐｒｏｃｅｓｓｉｎｇｕｎｉｔ）５２と、ネットワークと接続されたインタフェ−ス５３とを含んでおり、このモータ６２の動きで取手部３１及びアーム３２〜３５の動作が制御される。又、センサ６１及びモ−タ６２は装置５１に含まれる。
【０１００】
そして、ＣＰＵ５２が図１のプロセッサエレメントＡ〜Ｃ及び入出力部Ａ〜Ｃに、インタフェ−ス５３が図１のネットワーク制御部１２ａ〜１２ｃに、装置５１が図１の装置Ａ〜Ｃに夫々相当する。
【０１０１】
いま、アクチュエ−タ４１〜４５のうちアクチュエ−タ４２〜４４に本発明に係る分散処理システムを適用するものとする。即ち、アクチュエ−タ４２が図１のプロセッサエレメントＡ，入出力部Ａ及び装置Ａで構成され、アクチュエ−タ４３が図１のプロセッサエレメントＢ，入出力部Ｂ及び装置Ｂで構成され、アクチュエ−タ４４が図１のプロセッサエレメントＣ，入出力部Ｃ及び装置Ｃで構成されるものとする。
【０１０２】
即ち、センサ６１からＣＰＵ５２（正確には図１の入出力部Ａ〜Ｃ）に入力される入力データ７１が図１の入力データＡ〜Ｃであり、ＣＰＵ５２（正確には図１の入出力部Ａ〜Ｃ）からモータ６２に出力される出力データ７２が図１の出力データＡ〜Ｃである。そして、ＣＰＵ５２は入力データ７１を演算し、演算結果である出力データ７２を出力する。
【０１０３】
従って、各アクチュエ−タ４２〜４４では分散処理とともに多数決による冗長処理が行われることになる。いま、アクチュエ−タ４３（図１のプロセッサエレメントＢ相当）が故障と判断されたとすると、アクチュエ−タ４３は故障分離され、アクチュエ−タ４３の装置５１Ｂのモ−タ６２に対する出力データ７２はアクチュエ−タ４１（図１のプロセッサエレメントＡ相当）又はアクチュエ−タ４３（図１のプロセッサエレメントＣ相当）のＣＰＵ５２から出力されることになる。
【０１０４】
次に、第３実施例について説明する。図１２は第３実施例の構成を示す模式図、図１３はプロセッサエレメント故障時のデータ転送制御を示す模式図である。第３実施例も第２実施例と同様に分散処理システムをロボットアームに適用した一例であるが、第２実施例と異なる点は各アクチュエ−タは冗長処理を行わない点である。
【０１０５】
即ち、アクチュエ−タ４２は自己の装置５１Ａからの入力データ７１のみを入力し、演算結果の出力データ７２を自己の装置５１Ａのみに出力する。同様に、アクチュエ−タ４３は自己の装置５１Ｂからの入力データ７１のみを入力し、演算結果の出力データ７２を自己の装置５１Ｂのみに出力し、アクチュエ−タ４４は自己の装置５１Ｃからの入力データ７１のみを入力し、演算結果の出力データ７２を自己の装置５１Ｃのみに出力する。
【０１０６】
さらに、各アクチュエ−タ４２〜４４は自アクチュエ−タが故障したことを他のアクチュエ−タからの故障通知から判断するのではなく、自アクチュエ−タ内に自アクチュエ−タが故障したことを検出する故障検出部を含んでいる。
【０１０７】
図１２を参照すると、アクチュエ−タ４１にはカメラ４６が接続されており、アクチュエ−タ４１はこのカメラ４６からの画像情報に基づき内部のモ−タ６２を制御するよう構成されている。
【０１０８】
一方、このアクチュエ−タ４１はアクチュエ−タ４２〜４４のいずれかが故障した場合、その代行処理を行う機能も備えている。例えば、アクチュエ−タ４３が故障した場合、そのアクチュエ−タ４３内のＣＰＵ５２は故障分離され、代わりにアクチュエ−タ４１内のＣＰＵ５２がアクチュエ−タ４３で行うべき処理を代行し、アクチュエ−タ４３のモータ６２に出力データを出力する。その際、アクチュエ−タ４１はネットワーク１６を介して装置５１Ｂから得た入力データ７１に基づき出力すべきデータを演算するものとする。なお、原点のアクチュエ−タ４５にこの代行処理を行わせてもよい。
【０１０９】
図１３はこの第３実施例の動作を示しており、第１及び第２実施例ではプロセッサエレメントＢが故障した場合、プロセッサエレメントＡから装置Ｂに出力データＢを出力していたが、第３実施例ではアクチュエ−タ４１（プロセッサエレメントＤ）から装置Ｂに出力データＢを出力する。
【０１１０】
図６のフローチャートで説明すると、プロセッサエレメントＢが故障と判断された以降の動作は、同図のステップＳ２０３〜Ｓ２０５と同様となる。
【０１１１】
【発明の効果】
本発明による第１の発明によれば、複数のプロセッサと、各々の前記プロセッサに対応して設けられる複数の装置と、前記プロセッサ及び装置間のデータ入出力を制御する入出力手段とが含まれ、前記複数の装置からのデータ入力及び前記複数の装置へのデータ出力を前記複数のプロセッサが分散して行う分散処理システムであって、そのシステムは前記分散処理を目的とする前記複数のプロセッサ及び前記複数の入出力手段に故障許容処理を併せて行わせる故障許容処理手段を含むため、コンピュータモジュール等のハ−ドウエア量の低減が可能となる。
【０１１２】
又、本発明による第２及び第３の発明も上記第１の発明と同様の効果を奏する。
【０１１３】
具体的に説明すると、本発明によれば、分散処理を行うマルチプロセッサシステムにおいて、分散処理を目的として複数存在するプロセッサ資源をフォールトトレラント処理のために有効活用して、故障が発生した時の故障検出／分離を行い、しかも、分散処理対象となる装置の動作を中断することなく、故障発生前と同じ機能を継続して実行することができる。
【０１１４】
これは、プロセッサ等の個々の構成要素にはフォールトトレラント能力が無いにもかかわらず、分散処理システムというシステム構成上の特徴を本来の目的と別次元の用途で使用する機能を加えることで、システム全体としてフォールトトレラント能力を持たせているものである。つまり、フォールトトレラントを目的としてハードウェアを冗長化するという手段を取らずにフォールトトレラント能力を実現しているため、従来のフォールトトレラントシステムが持っているハードウェア量の大幅な（通常は３重多数決を行うため、ハードウェア量が３倍以上となる）増加という弱点を解決している。
【０１１５】
なお、本発明を実際に実現する際には、本発明を実現するための手段がデメリットとならないように適用検討をすることも必要である。つまり、本発明を実現するためには、ソフトウェア冗長動作とプロセッサ間の共有データの交換を積極的に行うため、プロセッサの処理速度とネットワーク転送速度が速いことが望ましい。しかし、近年のコンピュータ関連技術の技術動向を見ると、プロセッサの処理速度とネットワークの転送速度は飛躍的に向上しているので、多くの場合、本発明を実現するための手段はデメリットにならないと考えられる。
【０１１６】
また、本発明は、システム構成上の理由で分散処理システムとなっているシステムへ適用する場合に最も大きな効果を発揮する。例えば、ロボットアームの関節の制御を独立したプロセッサで実施するような組み込み型の分散処理システムの場合は、もともと関節の数だけプロセッサがあるので、それが３つ以上あれば、本発明を適用して、インタフェース部分の僅かなハードウェアの追加で、高信頼システムが構築できる。特にこのような組み込みシステムの場合、ハードウェア量を小さく抑えることと高信頼性を確保することが同時に要求されるので、本発明の適用効果は極めて大きいと考えられる。
【図面の簡単な説明】
【図１】本発明に係る分散処理システムの最良の実施の形態の構成図である。
【図２】共有データ管理テーブルを示す図である。
【図３】プロセッサエレメントＢ；１ｂが故障した場合の故障通知の一例を示す図である。
【図４】故障判定部における故障判定のロジックを示す図である。
【図５】故障通知するまでの動作を示すフローチャートである。
【図６】故障通知を受け取ってからの動作を示すフローチャートである。
【図７】図６で示した動作結果を受けて実行される動作を示すフローチャートである。
【図８】プロセッサエレメントＢ；１ｂが故障分離された場合の、装置へのデータ出力の動作を示すフローチャートである。
【図９】本発明の概要を示すシステム構成図である。
【図１０】第２実施例の構成を示す模式図である。
【図１１】ＣＰＵと装置との関係を示す模式図である。
【図１２】第３実施例の構成を示す模式図である。
【図１３】プロセッサエレメント故障時のデータ転送制御を示す模式図である。
【図１４】従来の分散処理システムの一例の構成図である。
【図１５】先行技術文献１記載のシステムの構成図である。
【符号の説明】
１ａ〜１ｃプロセッサエレメント
２ａ〜２ｃデータ共有部
３ａ〜３ｃ冗長管理部
４ａ，５ａ，６ａタスク
７ａプロセッサエレメントインタフェ−ス
８ａ〜８ｃ入出力部
９ａ〜９ｃ故障分離部
１０ａ〜１０ｃ故障判定部
１１ａ〜１１ｃデータ転送制御部
１２ａ〜１２ｃネットワーク制御部
１５ａ〜１５ｃ装置
３１取手部
３２〜３５アーム
４１〜４５アクチュエ−タ
５１装置
５２ＣＰＵ
５３インタフェ−ス
６１センサ
６２モータ[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a distributed processing system, a distributed processing method, and a distributed processing control program, and more particularly to a distributed processing system, distributed processing method, and distributed processing control program having fault-tolerant (fault-tolerant) capability.
[0002]
[Prior art]
FIG. 14 is a configuration diagram of an example of a conventional distributed processing system. In FIG. 14, a processor element A; 401a is connected to an apparatus A; 403a by an input / output unit A; 402a, and input data A; 405a from the apparatus A; 403a is a task in the processor element A; 401a. Output data A; 404a is calculated by A; 408a and output to the device A; 403a.
[0003]
Further, the processing operation in the processor element B; 401b and the processing in the processor element C; 401c are similarly executed in the task B; 408b and the task C; 408c, and are output to the device B; 403b and the device C; 403c. The Furthermore, data exchange necessary for executing tasks A; 408a, task B; 408b, and task C; 408c is performed via the network 406.
[0004]
Such a system originally has no fault-tolerant capability, but by providing the self-fault diagnosis units A; 409a to C; 409c in each processor element, the failed processor element element can be separated from the system. However, self-diagnosis cannot detect all faults 100% and perform fault isolation. Furthermore, when a failed processor element is separated by failure, the devices connected to the failed processor element are also separated, so that the functions possessed before the failure occurs are impaired. In order to prevent this, it is necessary to install a spare processor element for standby redundancy, but the amount of hardware increases accordingly.
[0005]
As described above, the conventional distributed processing system has a drawback that the fault-tolerant capability against the occurrence of a failure is not sufficient.
[0006]
In order to solve the above-mentioned drawbacks, the processor element A; 401a, the processor element B; 401b, and the processor element C; 401c in FIG. It can be easily considered as an extension of the conventional technology. However, in that case, the amount of hardware is increased at least three times for triple redundancy, so that the amount of hardware is greatly increased.
[0007]
In other words, because the conventional technology basically handled the distributed processing technology and fault tolerant technology as independent technologies, even if the weakness of one technology was compensated for using the other technology, it would be inefficient. It was.
[0008]
It should be noted that the technology for achieving a good balance between improving the processing performance and reliability of the entire system by efficiently combining distributed processing technology and fault-tolerant technology (redundant processing technology) is extremely rare, but existing inventions are also rare. Therefore, an example of the embodiment of the invention is shown below.
[0009]
Japanese Patent Application Laid-Open No. 7-114520 (name: redundant resource management method and distributed fault tolerant computer system using the same) as an invention for effectively improving the processing capability and improving the reliability by utilizing the above redundant resources (Hereinafter referred to as Prior Art Document 1). Here, the technology described in Prior Art Document 1 collects fault occurrence information for each task in each computer module constituting the redundant system, sets an evaluation function based on the fault occurrence information, and sets each task's The reliability is estimated, the task redundant configuration is determined by the own computer module, and the task to be participated is executed.
[0010]
As an explanation here, in order to clarify the difference from the present invention, the operation when the system described in the prior art document 1 is applied to the same functional operation as the configuration of FIG. 1 which is an embodiment of the present invention. An example is shown below. In other words, there are three devices for the computer module to input / output data as control targets, and there are individual tasks for controlling each device corresponding to each device, and each has a triple redundancy. An embodiment in the case of executing fault tolerance by triple majority will be described.
[0011]
FIG. 15 is a configuration diagram of a system described in Prior Art Document 1. This figure shows an operation example when a failure occurs in the computer modules 1; 501 to 9; Here, for the purpose of making a comparison with the case of FIG. 1 which is an embodiment of the present invention, the difference is simplified so as to be clear.
[0012]
In FIG. 15A, computer modules 1; 501 to 9; 509 correspond to the processor elements of the present invention, respectively, and perform data input / output in order to control the devices A; 510 to C; Do. It is assumed that the computer modules 1 to 9 can transfer data to each other via the data bus 513. Task A; 514 for controlling the device A operates as a triple redundant task with three modules of the computer modules 1; 501 to 3; 503, and performs fault detection / separation / reconfiguration by triple majority. As a result, the output data A; 517 to the device A is transferred via the data bus 513. Similarly, the task B; 515 for controlling the device B operates as a triple redundant task with three modules of the computer modules 4; 504 to 6; 506, and performs fault detection / isolation / reconfiguration by triple majority. As a result, the output data B; 518 to the device B is transferred. Similarly, the task C; 516 for controlling the device C operates as a triple redundant task with three modules of the computer module 7; 507-9; 509, and performs fault detection / isolation / reconfiguration by triple majority. As a result, the output data 519 to the device C is transferred.
[0013]
Here, when the importance & reliability requirement of task A; 514 is high and triple redundancy is always required, but the importance & reliability requirement of task B; 515 is lower than that of task A; 514 When the computer module 2; 502 fails, the operation changes to that shown in FIG. That is, the information that the computer module 2; 502 has failed is reflected in the evaluation function, so that the computer module 4; 504 stops executing the task B; 515 in order to maintain the triple redundancy of the task A. Task A; 514 starts executing. In other words, when computer module resources that can be used are reduced due to a failure, task A; 514 is suitable for the most execution at each computer module so that the reliability of tasks with high importance and reliability requirements is not lowered. This has the effect of selecting and executing a task.
[0014]
As described above, in the case of the system described in the prior art document 1, the means for controlling the optimal task execution for the purpose is described, but the general means for the fault detection / isolation means is described. I will use it. Other examples of this type of system are disclosed in JP-A-8-329025 (hereinafter referred to as Prior Art Document 2), JP-A-9-16535 (hereinafter referred to as Prior Art Document 3), and JP-A-8-212285. Publication (hereinafter referred to as Prior Art Document 4) and JP-A-8-95935 (hereinafter referred to as Prior Art Document 5).
[0015]
[Problems to be solved by the invention]
However, in the system described in the above-mentioned prior art document 1, three computer modules are required for each task in order to perform triple majority and perform 100% failure detection / separation. Therefore, when there are three types of devices to be controlled, nine computer modules are required as shown in FIG. On the other hand, means for solving this problem is not described in the prior art documents 2 to 5.
[0016]
SUMMARY OF THE INVENTION An object of the present invention is to provide a distributed processing system, a distributed processing method, and a distributed processing control program capable of reducing the amount of hardware such as a computer module as compared with the prior art.
[0017]
[Means for Solving the Problems]
In order to solve the above problems, a first invention according to the present invention includes a plurality of processors, a plurality of devices provided corresponding to the processors, and an input / output for controlling data input / output between the processors and the devices. A distributed processing system in which the plurality of processors distribute the data input from the plurality of devices and the data output to the plurality of devices, the system being intended for the distributed processing Including fault tolerance processing means for causing the plurality of processors and the plurality of input / output means to perform fault tolerance processing together; The fault tolerance processing means is provided in each of the processors, and data sharing means for inputting data from all the devices and outputting data to all the devices, output data addressed to a predetermined device calculated by the own processor, Redundancy management means for comparing output data destined for the predetermined device calculated by the other processor and notifying the comparison result to the other processor, and provided in the input / output means, based on the comparison result notified from the other processor. Failure determination means for determining whether or not the processor is faulty, and when the failure determination means determines that the own processor is faulty, further includes fault separation means for fault-separating the own processor. When a processor is fault-isolated, the data output to the device corresponding to its own processor is output from another processor. Includes data transfer means, the redundancy management means, the output data calculated in both processor sends a failure notification to the other processor to calculate the output data in the case of disagreement, and sends a normal notification when a match It is characterized by that.
[0018]
A second invention according to the present invention includes a plurality of processors, a plurality of devices provided corresponding to the processors, and input / output means for controlling data input / output between the processors and the devices. A distributed processing method in which the plurality of processors perform data input from the plurality of devices and data output to the plurality of devices in a distributed manner, the method including the plurality of processors for the purpose of the distributed processing; A fault tolerance processing step for causing the plurality of input / output means to perform fault tolerance processing together; The fault tolerance processing step is provided in each of the processors, and a data sharing step for inputting data from all the devices and outputting data to all the devices, output data addressed to a predetermined device calculated by the own processor, A redundancy management step for comparing the output data addressed to the predetermined device calculated by the other processor and notifying the other processor of the comparison result, and provided in the input / output means and based on the comparison result notified from the other processor. A failure determination step for determining whether or not the processor is faulty. When the failure determination step determines that the own processor is faulty, the failure determination step further includes a fault isolation step for fault isolation of the own processor. If a processor is fault-isolated, data output to the device corresponding to the processor is A data transfer step for outputting the data from another processor, and when the output data calculated by the two processors does not match, the redundancy management step sends a failure notification to the other processor that calculated the output data. Distributed processing method characterized by sending a normal notification to a client It is characterized by that.
[0019]
A third invention according to the present invention includes a plurality of processors, a plurality of devices provided corresponding to the processors, and input / output means for controlling data input / output between the processors and the devices. A distributed processing control program in which the plurality of processors distribute and input data from the plurality of devices and data output to the plurality of devices, the program being the plurality of processors for the distributed processing And a fault tolerance processing step for causing the plurality of input / output means to perform fault tolerance processing together, The fault tolerance processing step is provided in each of the processors, and a data sharing step for inputting data from all the devices and outputting data to all the devices, output data addressed to a predetermined device calculated by the own processor, A redundancy management step for comparing the output data addressed to the predetermined device calculated by the other processor and notifying the other processor of the comparison result, and provided in the input / output means and based on the comparison result notified from the other processor. A failure determination step for determining whether or not the processor is faulty. When the failure determination step determines that the own processor is faulty, the failure determination step further includes a fault isolation step for fault isolation of the own processor. If a processor is fault-isolated, data output to the device corresponding to the processor is A data transfer step for outputting the data from another processor, and when the output data calculated by the two processors does not match, the redundancy management step sends a failure notification to the other processor that calculated the output data. Send a normal notification to It is characterized by that.
[0020]
According to the first to third aspects of the present invention, when a failure occurs in a multi-processor system that performs distributed processing, a plurality of processor resources are effectively used for fault-tolerant processing for the purpose of distributed processing. Therefore, it is possible to reduce the amount of hardware such as a computer module as compared with the prior art.
[0021]
DETAILED DESCRIPTION OF THE INVENTION
First, an outline of the present invention will be described. FIG. 9 is a system configuration diagram showing an outline of the present invention. In FIG. 5A, there are three devices to be controlled, that is, devices A; 604 to C; 606, and a processor element (corresponding to the computer module in FIG. 15) that performs these control tasks is a processor element. There are three: A; 601-C; In each of these processor elements A; 601 to C; 603, task A; 608, task B; 609, task C; 610 are executed, and the redundant processing by the triple majority is executed as a whole.
[0022]
If the processor element B 602 fails, as shown in FIG. 5B, the output data B 612 transferred from the task B 609 on the processor element B 602 to the device B 605 when the processor element B 602 is normal is the task on the processor element A 601. By performing the transfer from B609, failure detection / separation is performed, and at the same time, normal processing is continued with the remaining normal processor elements.
[0023]
In other words, if 100% failure detection / separation can be performed for a failure of the processor element (computer module in FIG. 15) under the same conditions, nine systems are possible in the case of the system (FIG. 15) described in the prior art document 1. Whereas a computer module is required, the present invention (FIG. 9) can be realized with three processor elements.
[0024]
At this time, even in the present invention, it is possible to maintain a triple majority at the time of fault isolation. In that case, it is necessary to add one spare processor element. The number of processor elements (redundant resources) is less than half (4/9).
[0025]
Note that FIG. 20 in Prior Art Document 1 shows a table showing the relationship between the number of computer modules and the number of executable tasks. In the table, when the number of computer modules is 3, double redundant It shows that only one task and one task without redundancy can be executed. In other words, when there are only three redundant resources, it is shown that the execution of the three types of triple redundant tasks that can be realized by the present invention is impossible, and the difference from the present invention is clear. It is.
[0026]
In the case of the present invention, as shown in the embodiment of FIG. 1, the redundancy management units 3a to 3c, the failure determination units 10a to 10c, the network control units 12a to 12c, the failure isolation units 9a to 9c in FIG. This is because the data transfer control units 11a to 11c enable failure detection / separation by redundant processing by triple majority using only three processor elements for distributed processing purposes.
[0027]
The above differences of the present invention from the system described in Prior Art Document 1 are summarized as follows. First, the system described in Prior Art Document 1 will be described.
[0028]
(1) When there are three control targets, the number of processor elements necessary for implementing redundancy management by triple majority is nine.
[0029]
(2) The problem to be solved by the invention (the most serious one) is handled by combining distributed processing and redundant processing by using redundant computer module (corresponding to processor elements) resources. When improving performance and improving reliability at the same time, the failure of a computer module could reduce the reliability of the most important task, that is, from the less important task when a failure occurs. It is desirable to stop the system first and maintain the reliability of the task with high importance. However, for that purpose, the redundancy can only be increased and there is no means for efficient degeneration.
[0030]
(3) The greatest effect of the invention is that in the event of a failure, it is possible to maintain the reliability and function / performance related to the important task by stopping the less important task first and reducing the function / performance. This means that the resources allocated can be allocated to the most important purpose. In other words, it is highly effective when redundant resources are used to improve processing performance and reliability at the same time, and when tasks have high and low importance. However, at that time, there is no effect unless there are a large number of redundant resources such as computer modules. The distributed processing here is limited to distributed processing for improving processing performance.
[0031]
Next, the present invention will be described.
[0032]
(1) When there are three control objects, three processor elements are sufficient to implement redundancy management by triple majority.
[0033]
(2) The problem (most important) to be solved by the invention is that processor element resources that exist for the purpose of distributed processing cannot be detected / isolated by using those resources for redundant processing. That's what it means. For this reason, when the reliability is increased by the redundancy processing, it is necessary to make individual distributed resources individually redundant, so that the amount of hardware that is increased for improving the reliability is enormous.
[0034]
(3) The greatest effect of the invention is that the processor element resources existing for the purpose of distributed processing can be simultaneously used for redundant processing so that failure detection / separation can be realized, and separation processing can be performed with a minimum number of redundant resources. This means that it is possible to achieve simultaneous improvements in reliability. It is better to apply to separation processing systems that need to be physically distributed for mission purposes, such as indirect cooperative control in robot arms, than to separation processing for improving processing performance. Demonstrate high power.
[0035]
That is, the present invention is superior to the system described in the prior art document 1 in that the hardware amount of redundant resources can be realized by about 1/3.
[0036]
Note that the present invention and the system described in Prior Art Document 1 are similar in a broad category of efficiently realizing distributed processing and fault tolerance processing. However, as described above, the invention is to be solved. The invention is greatly different in terms of the problem and the effect of the invention. The system described in the prior art document 1 is efficient depending on the failure occurrence state when performing distributed processing for improving processing performance and redundant processing for improving reliability by using a large number of redundant resources. The present invention provides a means for performing degeneration (stops a low-importance task first to reduce the influence on a high-importance task). In other words, since it is intended to actively reduce unimportant functions, it can only be applied to a system that allows function reduction.
[0037]
On the other hand, the present invention effectively utilizes redundant resources originally existing for the purpose of distributed processing for fault tolerance, and simultaneously achieves distributed processing and improved reliability with the minimum number of redundant resources. Since the present invention provides means, the contents of the invention are greatly different, and the invention does not overlap with the system described in Prior Art Document 1.
[0038]
As described above, according to the present invention, in a multiprocessor system that performs distributed processing, a plurality of processor resources existing for the purpose of distributed processing are effectively used for fault-tolerant processing, and the minimum necessary redundant resources (conventional technology) are used. Failure detection / separation when a failure occurs in about 1/3), and the same functions as before the failure can be continuously executed without interrupting the operation of the distributed processing target device It is.
[0039]
Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings. In a multiprocessor system that performs distributed processing, the present invention manages input data from each device to be distributed as shared information between the processors, and uses that data for redundancy such as majority processing between different processors. By managing it, the system as a whole has fault-tolerant capability and is characterized by increasing the reliability when a failure occurs.
[0040]
FIG. 1 is a block diagram of the best mode of a distributed processing system according to the present invention. Referring to the figure, the distributed processing system includes a processor element A; 1a, an input / output unit A; 8a connected to the processor element A; 1a via the processor element interface 7a, and the input / output unit A; An apparatus A; 15a connected to the transmission line A through the transmission line A; a processor element B; 1b; an input / output unit B; 8b connected to the processor element B; 1b through the processor element interface 7b; The device B; 15b connected to the input / output unit B; 8b via the transmission line B; 15b; the processor element C; 1c; and the processor element C; 1c connected to the processor element interface 7c. An input / output unit C; 8c, and an apparatus C; 15c connected to the input / output unit C; 8c via a transmission line C. Ri, output unit A; 8a, B; 8b and C; 8c are respectively connected to the network 16.
[0041]
Further, the processor element A; 1a includes a data sharing unit 2a, a redundancy management unit 3a, and tasks A; 4a to C; 6a. Similarly, the processor element B; 1b includes a data sharing unit 2b and redundancy management. The unit 3b includes tasks A; 4b to C; 6b, and the processor element C; 1c includes a data sharing unit 2c, a redundancy management unit 3c, and tasks A; 4c to C; 6c.
[0042]
The input / output unit A; 8a includes a fault isolation unit 9a, a fault determination unit 10a, a data transfer control unit 11a, and a network control unit 12a. Similarly, the input / output unit B; 8b is a fault isolation unit. 9b, a failure determination unit 10b, a data transfer control unit 11b, and a network control unit 12b. An input / output unit C; 8c is a failure isolation unit 9c, a failure determination unit 10c, and a data transfer control unit 11c. And a network control unit 12c.
[0043]
In the figure, input data A; 14a to C; 14c from devices A; 15a to C; 15c to be distributed are transferred using a network 16, and processor elements A; 1a to C; Managed as shared data by each processor 1c. Using the shared data, task A that outputs data to apparatus A; 4a to 4c; task B that outputs data to apparatus B; 5a to 5c; task C that outputs data to apparatus C; 6a ~ 6c are executed in each processor element.
[0044]
For the above tasks, failure determination based on the majority rule is performed between the processor elements A; 1a to C; 1c using the redundancy management units 3a to 3c and the failure determination units 10a to 10c. The processor elements that have failed are separated by the separation units 9a to 9c. At that time, the devices connected to the processor elements separated by failure are in a state where data can be transferred to other normal processor elements by the data transfer control units 11a to 11c according to the results of the failure separation units 9a to 9c. Thereby, the same function as before the occurrence of the failure is continuously executed without interrupting the processing as the apparatus.
[0045]
In this way, according to the present invention, when a failure occurs in a multiprocessor system that performs distributed processing, a plurality of processor resources for the purpose of distributed processing are effectively used for fault-tolerant processing. In addition, the same function as before the occurrence of the failure can be continuously executed without interrupting the operation of the device to be distributed.
[0046]
【Example】
Examples of the present invention will be described below. First, the first embodiment will be described. FIG. 1 is also used to explain the first embodiment. FIG. 1 shows an embodiment of a system that performs distributed processing on three independent devices A to C. In the figure, the processor element A; 1a is connected to the device A; 15a, and basically the input data A; 14a from the device A; 15a is transmitted via the transmission line A and the input / output unit A; 8a. In the processor element A; 1a, it is processed by the task A; 4a, and is output to the device A; 15a as the output data A; 13a via the input / output unit A; 8a and the transmission line A. The input / output unit A; 8a connected between the device A; 15a and the processor element A; 1a performs the above data transfer.
[0047]
Here, the processor elements A; 1a to C; 1c have functions of a general computer, and as a computer having a data input / output function, an arithmetic function, a data storage function, etc., like an embedded computer. Only the most basic functions can be used.
[0048]
The configuration of the apparatus B; 15b and the processor element B; 1b and the input / output unit B; 15b and the apparatus C; 15c, the processor element C; 1c and the input / output unit C; The same as the case of the apparatus A;
[0049]
Further, these are connected to the network 16 by an input / output unit A; 8a, an input / output unit B; 8b, and an input / output unit C; 8c, and mutual data transfer is possible. Data transfer by the network 16 is performed by the network control units 12a to 12c, and this may have a general network function.
[0050]
In the basic distributed processing system configuration described above, a plurality of processors are used for the purpose of distributed processing. In the present invention, in order to effectively use a plurality of processor resources, triple processing is performed simultaneously with distributed processing. Redundancy processing based on majority vote can be performed.
[0051]
In FIG. 1, devices A; 15a to C; input data A; 14a to C; 14c from a distributed processing target are transferred using a network 16, and processor elements A; 1a to C; Managed as shared data by each processor 1c. Using the shared data, task A that outputs data to apparatus A; 4a to 4c; task B that outputs data to apparatus B; 5a to 5c; task C that outputs data to apparatus C; 6a ~ 6c are executed by each processor. The above data is managed in each processor element by the data sharing units 2a to 2c.
[0052]
FIG. 2 shows an example of a shared data management table used by each data sharing unit 2a to 2c for data management. This shared data management table is provided in a storage unit (not shown) in each of the data sharing units 2a to 2c, for example. Referring to the figure, in the shared data management table, the output value of task A that is the processing result of processor element A for the input data from apparatus A, the output value of task A that is the processing result of processor element B, and processor element C Output value of task A, the output value of task B which is the processing result of processor element A for the input data from apparatus B, the output value of task B which is the processing result of processor element B, and processor element C Output value of task C, the output value of task C, which is the processing result of processor element B for the input data from apparatus C, the output value of task C, which is the processing result of processor element B, and processor element C The output value of task C, which is the result of the above processing, is stored.
[0053]
The redundancy managers 3a to 3c perform redundancy management based on a general majority rule using the shared data management table of FIG. For example, in the case of processor element A; 1a, a majority decision is made on the output value of task A, and if it is normal, the value determined to be normal by the majority decision is output to apparatus A as output data A; 13a.
[0054]
In the case of processor element B; 1b, a majority decision is made on the output value of task B, and if it is normal, the value determined to be normal by the majority decision is output to device B as output data B; 13b. In the case of the processor element C; 1c, a majority decision is made on the output value of the task C, and if it is normal, the value determined to be normal by the majority decision is output to the device C as output data C; 13c.
[0055]
If there is a discrepancy in the majority result, the redundancy managers 3a to 3c use the network 16 to notify the failure to the processor elements that have a different result from their own results based on the general majority principle. To do. For example, in FIG. 1, when the processor element B; 1b fails, the task A; 4b, task B; 5b, task C; 6b executed by the processor element B; Executed.
[0056]
FIG. 3 is an explanatory diagram showing an example of failure notification when the processor element B; 1b fails. Referring to the figure, for failure determination unit 10b connected to processor element B; 1b, failure notification AB17 (failure notification from processor element A to processor element B) and failure notification CB20 (processor element C to processor) The failure determination unit 10b can determine that it is a failure based on the majority rule. On the other hand, since only one failure notification (from the processor element B) is notified to the failure determination units 10a and 10c connected to the other processor elements A and C, it can be determined that the device is normal.
[0057]
FIG. 4 is a diagram illustrating a failure determination logic in the failure determination units 10a to 10c. Here, the identification mark ID of its own processor element is N, the ID of the processor element right next to N in FIG. 1 is N + 1, and the ID of the processor element right next to N + 1 is N + 2. However, the processor element A; 1a is the processor element C;
[0058]
Referring to FIG. 4, when the processor element N is notified that at least one of the processor elements N + 1 and N + 2 is normal, the processor element N indicates that its own processor element is determined to be normal. In other words, the processor element N determines that its own processor element is faulty only when both of the processor elements N + 1 and N + 2 are notified of the fault.
[0059]
The failure determination units 10a to 10c notify their failure determination results to the failure isolation units 9a to 9c and the data transfer control units 11a to 11c according to the logic shown in FIG. The determination results of the failure determination units 10a to 10c and the operations of the failure isolation units 9a to 9c and the data transfer control units 11a to 11c according to the determination results are summarized as follows.
[0060]
First, a case where the failure determination unit determines that it is normal will be described.
[0061]
(1) The fault isolation unit does nothing.
[0062]
(2a) In the data transfer control unit, the input data buffer from the device, the output data buffer to the device, and the port to which the failure notification data of the failure determination unit is input can all be recognized as independent addresses on the network. To.
[0063]
(2b) All processor elements can read input data from all devices via a network.
[0064]
(2c) All processor elements are allowed to write failure notification data to a port to which failure notification data of the failure determination unit is input.
[0065]
(2d) Data transfer to the output data buffer to the device is possible only from its own processor element.
[0066]
Next, a case where the failure determination unit determines that it is a failure will be described.
[0067]
(1) Blocks data sent from its own processor element and functionally separates its own processor element from the input / output unit. Then, it resets its own processor element.
[0068]
(2b) to (2c) are the same as when the above-described failure determination unit determines that it is normal.
[0069]
(2d) The transfer of data to the output data buffer to the apparatus is enabled to be transferred from another processor element via the data bus. At this time, it cannot be transferred from its own processor element.
[0070]
That is, when any processor element has a failure, the failed processor elements are separated by the failure separation units 9a to 9c. At that time, the devices connected to the processor elements separated by failure are in a state where data can be transferred with other normal processor elements by the data transfer control units 11a to 11c according to the results of the failure separation units 9a to 9c. .
[0071]
In the state as described above, the redundancy managers 3a to 3c in FIG. 1 change the processor elements that are determined to be normal by the method described so far to the devices connected to the processor elements that are fault-isolated. The output data for the device is transferred so that the normal operation of the device can be continued.
[0072]
Specifically, as shown in FIG. 3, when it is determined that the processor element B; 1b is faulty, the processor element B; 1b in FIG. 1 is separated from the input / output unit B; Output data B for 15b; 13b is the result of processor element A; task B of 1a; 5a, or the result of processor element C; task B of 1c;
[0073]
Here, whether to use the data of the processor element A; 1a or the data of the processor element C; 1c is not an important problem, and may be either. For example, logic such as using the data of the processor element on the right side of the processor element separated by failure may be determined, and can be easily realized by a generally well-known method.
[0074]
As described above, even if a failure occurs in any of the processor elements A; 1a to C; 1c, the same function as before the failure occurs is continuously executed without interrupting the processing as the device. can do.
[0075]
The failure determination units 10a to 10c, the failure / separation units 9a to 9c, and the data transfer control units 11a to 11c in the input / output units A; 8a to 8C; It can be realized by software or hardware as long as it satisfies the operation when it determines that it is normal / failure. Both can be easily realized by a generally well-known method.
[0076]
Next, the operation of the present embodiment shown in FIG. 1 will be described using a flowchart. In FIG. 1, the processor element A; 1a to C; 1c inputs the input data A; 14a to C; 14c from the devices A; 15a to C; 15c, and the failure management units 10a to 10c are input by the redundancy management units 3a to 3c. FIG. 5 shows a flowchart of the operation up to the notification of the failure notification.
[0077]
In FIG. 5, in order to provide versatility for explanation, it is shown as a software operation of the processor element at the position N. Here, N + 1 indicates the position on the right side of N, and N + 2 indicates the position on the right side of N + 1. In FIG. 1, it is assumed that the processor element C; For example, when N is A, N + 1 is B and N + 2 is C.
[0078]
In FIG. 5, the processor element at position N inputs the input data of the devices N, N + 1, N + 2, and uses these data to calculate the output data to the devices N, N + 1, N + 2 (step S101). , S102, S103).
[0079]
The above is shown by way of example in comparison with the configuration of FIG. 1. When N is A in FIG. 1, this is a software operation of the processor element A; 1a, and the input data of the device of N is input data. A; 14a, the input data of the N + 1 device is input data B; 14b, and the input data of the N + 2 device is input data C; 14c. Using these, the output data to device A; 15a is calculated by task A; 4a, the output data to device B; 15b is calculated by task B; 5a, and the device C; 15c is calculated by task C; 6a. Corresponds to the calculation of the output data.
[0080]
Next, in FIG. 5, the result calculated by the processor element at the position N is sent to the processor elements at the positions N + 1 and N + 2, and at the same time, the calculation result is received from the processor elements at the positions N + 1 and N + 2 (step S104). These operations are realized by the data sharing units 2a to 2c in FIG.
[0081]
Next, redundancy management based on the majority rule is performed by the redundancy management units 3a to 3c in FIG. In FIG. 5, a majority decision is performed based on the calculation results of N, N + 1, and N + 2, and a failure is determined for each calculation result (steps S105, S106, and S109). As a result, when it is determined that the calculation result of N is normal, normal processing is continued (step S107). If N + 1 is determined to be a failure, a failure notification is notified to N + 1 (step S110), and if N + 2 is determined to be a failure, a failure notification is notified to N + 2 (step S108). If N determines that it has failed, the operation is stopped (step S111).
[0082]
In an actual system, data exchange for distributed processing, synchronization processing, and the like are necessary in the above-described operation. However, since they are not related to the scope of the present invention, descriptions thereof are omitted. In addition, since the handling when a plurality of failures occur is complicated in explanation, in this embodiment, it is assumed that one failure has occurred in order to simplify the explanation.
[0083]
Next, the operation of the input / output units A; 8a to C; 8c in FIG. 1 will be described using a flowchart. FIG. 6 is a flowchart showing the operations of the fault isolation units 9a to 9c and the data transfer control units 11a to 11c after the fault determination units 10a to 10c receive the fault notification shown above. Note that the definitions of N, N + 1, and N + 2 here are the same as the contents defined in FIG.
[0084]
In FIG. 6, N performs failure determination based on the majority rule for failure notifications from N + 1 and N + 2 (steps S201 and S202). This is performed by the failure determination units 10a to 10c in FIG. That is, if a failure notification is received from N + 1 (Yes in step S201) and a failure notification is also received from N + 2 (Yes in step S202), N is determined to be a failure (step S203), and N processor elements Are separated (step S204), and N + 1 and N + 2 are notified of the result of failure separation (step S205). Further, in order to continue the operation of the device connected to N in a state where the N processor elements are fault-isolated, the output to the device connected to N is output from N + 1 or N + 2 in data transfer. Enable (step S206). These operations are performed by the fault isolation units 9a to 9c and the data transfer control units 11a to 11c in FIG. In FIG. 6, if N is determined to be normal (No in either step S201 or S202), the processing is continued as it is (step S207).
[0085]
Next, as described above, the operations of the redundancy management units 3a to 3c after the fault isolation and the like are performed by the input / output units A; 1a to C; 1c in FIG. Will be described.
[0086]
FIG. 7 shows the operation of the redundancy management units 3a to 3c realized as the software operation of the processor elements A; 1a to C; 3c in FIG. 1 in response to the operation result shown in FIG. The definitions of N, N + 1, and N + 2 here are the same as the contents defined in FIG.
[0087]
In FIG. 7, when the fault isolation result is received from N + 1 (Yes in step S301), N + 1 recognizes that the processor element is fault-isolated (step S302), and the device connected to N + 1 On the other hand, output data for the device connected to N + 1 is transmitted (step S303), and other processing is continued (step S304).
[0088]
Also, when a failure isolation result is received from N + 2 (No in step S301, Yes in step S305), the same operation is performed on N + 2 (steps S306, S307, and S308).
[0089]
Further, if no fault isolation result has been received from N + 1 or N + 2 (No in step S301, No in step S305), the process is continued without changing any state (step S309). ).
[0090]
As an example of the above operation, FIG. 8 shows an example of the data output operation to the apparatus when the processor element B; 1b is fault-isolated. In FIG. 8, the processor element B; 1 b is fault-isolated and is in an operation stop state. Instead, from the output value of the task B; 5 a operating on the processor element A; 1 a to the device B; 15 b Output data B; 22 is transferred. At this time, the output data A; 21 is transferred from the output value of the task A; 4a operating on the processor element A; 1a to the device A; Also for the device C; 15c, the output data C; 23 is transferred from the output value of the task C; 6c operating on the processor element C;
[0091]
As described above, according to this embodiment, even when one processor element in the distributed processing system fails, the failure is detected / isolated, and the processing as a device connected to the processor element is interrupted. Without failing, the same function as before the failure occurred can be continued.
[0092]
In the present embodiment, a fault tolerant capability is provided when a failure occurs in the processor elements A; 1a to C; 1c in FIG. 1, but the input / output units A; 8a to C; 8c In some cases, it is necessary to deal with a case where a failure occurs. However, in general, when comparing the basic computer function part such as the processor element and the interface part, the processor element is much larger in circuit scale and operation complexity. Is more important, and the impact of redundancy is significantly greater. Further, as a countermeasure against the failure of the interface portion, since it can be easily realized by means conventionally used such as redundancy and high reliability of used parts, in the present invention, in order to simplify the explanation, the processor element The embodiment has been described by narrowing down the conditions when a failure occurs in a part.
[0093]
The present invention is not limited to the above embodiment. For example, in the embodiment of FIG. 1, if there is a means for diagnosing the failure of its own processor element, tasks can be made redundant in one processor element. There is no need to run and implement a majority vote. In that case, when a failure occurs in one processor element and the failure is separated from the failed processor element by the self-fault diagnosis, the subsequent operation is the same as that of the embodiment of FIG. The normal processing can be continued without interrupting the operation of 15c. Even if a failure occurs in the processor element, the effect of maintaining the function of the entire system is the same. This will be described later as a third embodiment.
[0094]
1 is an example of a fault tolerant method assuming that a failure occurs once in any of the processor elements. However, when used in a special environment such as outer space, the environment Due to the peculiarities of the conditions, there are many cases where temporary (transient) failures occur many times and return to normal each time they are restarted. When applied to a system used in such an environment, the processor element separated in the embodiment shown in FIG. 1 is reset and restarted. If the initial self-check result is normal, the system is reset. It is effective to add a method of returning. In that case, it is possible to realize a fault-tolerant system that can continue normal processing non-stop even if temporary (transient) failures occur many times.
[0095]
For the self-failure diagnosis method described above and the method for restarting and returning to the system corresponding to a temporary (transient) failure, a well-known method is applied. Therefore, the description is omitted here.
[0096]
Next, a second embodiment will be described. FIG. 10 is a schematic diagram showing a configuration of the second embodiment, and FIG. 11 is a schematic diagram showing a relationship between a CPU (for example, processor element A) and a device (for example, device A). In the second embodiment, the distributed processing system in the first embodiment is applied to a robot arm.
[0097]
FIG. 10 shows an example of the configuration of the robot arm. Referring to FIG. 10, the robot arm includes a handle portion 31, arms 32 to 35, and actuators 41 to 45. The actuator 41 controls the operation of the handle portion 31, and the actuator 42. ˜44 controls the operation of the arms 32˜35. The actuator 45 is an actuator that is the origin of the arms 32 to 35. That is, these actuators 41 to 45 correspond to the joints of the robot arm.
[0098]
The joints of the robot arm (in this paragraph, the actuators 41 to 45 are referred to as joints 41 to 45 for convenience) perform an emphasis operation with the adjacent joints. For example, in order to move the handle portion 31 to a predetermined position, not only the arm 32 but also each of the arms 33 to 35 must be moved. For that purpose, each joint 41-45 needs to operate | move in cooperation. Therefore, the distributed processing according to the present invention can be applied to the control of the joints 41 to 45.
[0099]
Referring to FIG. 11, each of the actuators 41 to 45 is a motor based on the sensor 61, the motor 62, the output from the sensor 61, and the processing results of other CPUs required for cooperative operation (received via the network). A CPU (central processing unit) 52 that controls the CPU 62 and an interface 53 connected to the network are included. The movement of the motor 62 controls the operations of the handle 31 and the arms 32 to 35. The sensor 61 and the motor 62 are included in the device 51.
[0100]
The CPU 52 corresponds to the processor elements A to C and the input / output units A to C in FIG. 1, the interface 53 corresponds to the network control units 12a to 12c in FIG. 1, and the device 51 corresponds to the devices A to C in FIG. To do.
[0101]
Now, the distributed processing system according to the present invention is applied to the actuators 42 to 44 among the actuators 41 to 45. That is, the actuator 42 is constituted by the processor element A, the input / output unit A and the device A shown in FIG. 1, and the actuator 43 is constituted by the processor element B, the input / output unit B and the device B shown in FIG. 1 is configured by the processor element C, the input / output unit C, and the device C of FIG.
[0102]
That is, the input data 71 input from the sensor 61 to the CPU 52 (more precisely, the input / output units A to C in FIG. 1) is the input data A to C in FIG. Output data 72 output from A to C) to the motor 62 is the output data A to C in FIG. Then, the CPU 52 calculates the input data 71 and outputs the output data 72 that is the calculation result.
[0103]
Therefore, each of the actuators 42 to 44 performs redundant processing by majority decision as well as distributed processing. If it is determined that the actuator 43 (corresponding to processor element B in FIG. 1) is faulty, the actuator 43 is fault-isolated, and the output data 72 for the motor 62 of the device 51B of the actuator 43 is the actuator data. The data is output from the CPU 52 of the actuator 41 (equivalent to the processor element A in FIG. 1) or the actuator 43 (equivalent to the processor element C in FIG. 1).
[0104]
Next, a third embodiment will be described. FIG. 12 is a schematic diagram showing the configuration of the third embodiment, and FIG. 13 is a schematic diagram showing data transfer control when a processor element fails. Similarly to the second embodiment, the third embodiment is an example in which the distributed processing system is applied to the robot arm. However, the difference from the second embodiment is that each actuator does not perform redundant processing.
[0105]
That is, the actuator 42 inputs only the input data 71 from its own device 51A, and outputs the output data 72 of the calculation result only to its own device 51A. Similarly, the actuator 43 inputs only the input data 71 from its own device 51B, outputs the output data 72 of the operation result only to its own device 51B, and the actuator 44 inputs from its own device 51C. Only the data 71 is input, and the output data 72 of the calculation result is output only to its own device 51C.
[0106]
Further, each of the actuators 42 to 44 does not judge that the own actuator has failed from the failure notification from the other actuators, but indicates that the own actuator has failed in the own actuator. A failure detection unit for detection is included.
[0107]
Referring to FIG. 12, a camera 46 is connected to the actuator 41, and the actuator 41 is configured to control an internal motor 62 based on image information from the camera 46.
[0108]
On the other hand, the actuator 41 also has a function of performing a substitute process when any of the actuators 42 to 44 fails. For example, when the actuator 43 fails, the CPU 52 in the actuator 43 is isolated from the failure. Instead, the CPU 52 in the actuator 41 performs a process to be performed by the actuator 43, and the actuator 43. Output data is output to the motor 62. At this time, the actuator 41 calculates data to be output based on the input data 71 obtained from the device 51B via the network 16. The origin actuator 45 may perform this substitution process.
[0109]
FIG. 13 shows the operation of the third embodiment. In the first and second embodiments, when the processor element B fails, the output data B is output from the processor element A to the apparatus B. In the embodiment, output data B is output from the actuator 41 (processor element D) to the apparatus B.
[0110]
If it demonstrates with the flowchart of FIG. 6, the operation | movement after it is judged that the processor element B is a failure will become the same as that of step S203-S205 of the same figure.
[0111]
【The invention's effect】
According to a first aspect of the present invention, a plurality of processors, a plurality of devices provided corresponding to each of the processors, and input / output means for controlling data input / output between the processors and the devices are included. A distributed processing system in which the plurality of processors perform data input from the plurality of devices and data output to the plurality of devices in a distributed manner, the system including the plurality of processors for the distributed processing and Since the fault tolerance processing means for causing the plurality of input / output means to perform fault tolerance processing together is included, the amount of hardware such as a computer module can be reduced.
[0112]
The second and third inventions according to the present invention also have the same effects as the first invention.
[0113]
More specifically, according to the present invention, in a multiprocessor system that performs distributed processing, a failure when a failure occurs by effectively utilizing a plurality of processor resources for the purpose of distributed processing for the purpose of distributed processing. Detection / separation can be performed, and the same function as before the occurrence of the failure can be continuously executed without interrupting the operation of the device to be distributed.
[0114]
This is because the system configuration feature of the distributed processing system is added to the original purpose and another dimension, even though individual components such as processors do not have fault-tolerant capability. As a whole, it has fault-tolerant ability. In other words, the fault-tolerant capability is realized without taking the means of making the hardware redundant for the purpose of fault-tolerant, so the hardware amount of the conventional fault-tolerant system is large (usually triple majority) This solves the weak point of increasing the amount of hardware to more than three times.
[0115]
When actually realizing the present invention, it is also necessary to consider application so that the means for realizing the present invention do not become a disadvantage. In other words, in order to realize the present invention, it is desirable that the processor processing speed and the network transfer speed are high in order to actively perform software redundant operation and exchange of shared data between processors. However, looking at recent technological trends in computer-related technologies, the processor processing speed and network transfer speed have improved dramatically, so in many cases, the means for realizing the present invention will not be a disadvantage. Conceivable.
[0116]
Further, the present invention exhibits the greatest effect when applied to a system that is a distributed processing system for the reason of system configuration. For example, in the case of an embedded distributed processing system in which joint control of a robot arm is performed by an independent processor, there are originally as many processors as the number of joints, so if there are three or more, the present invention is applied. Thus, a highly reliable system can be constructed by adding a small amount of hardware in the interface part. In particular, in the case of such an embedded system, it is considered that the effect of applying the present invention is extremely large because it is required to simultaneously reduce the amount of hardware and ensure high reliability.
[Brief description of the drawings]
FIG. 1 is a configuration diagram of a best mode of a distributed processing system according to the present invention.
FIG. 2 is a diagram showing a shared data management table.
FIG. 3 is a diagram showing an example of a failure notification when a processor element B; 1b fails.
FIG. 4 is a diagram illustrating a failure determination logic in a failure determination unit.
FIG. 5 is a flowchart showing an operation until a failure is notified.
FIG. 6 is a flowchart showing an operation after receiving a failure notification.
7 is a flowchart showing an operation executed in response to the operation result shown in FIG.
FIG. 8 is a flowchart showing an operation of outputting data to the apparatus when the processor element B; 1b is fault-isolated.
FIG. 9 is a system configuration diagram showing an outline of the present invention.
FIG. 10 is a schematic diagram showing a configuration of a second embodiment.
FIG. 11 is a schematic diagram showing a relationship between a CPU and a device.
FIG. 12 is a schematic diagram showing a configuration of a third embodiment.
FIG. 13 is a schematic diagram showing data transfer control when a processor element fails.
FIG. 14 is a configuration diagram of an example of a conventional distributed processing system.
15 is a configuration diagram of a system described in Prior Art Document 1. FIG.
[Explanation of symbols]
1a to 1c processor element
2a-2c Data sharing part
3a-3c Redundancy management unit
4a, 5a, 6a tasks
7a Processor element interface
8a-8c Input / output section
9a-9c Fault isolation part
10a to 10c failure determination unit
11a to 11c Data transfer control unit
12a to 12c Network control unit
15a-15c device
31 Toride Department
32-35 arm
41-45 Actuator
51 devices
52 CPU
53 Interface
61 sensors
62 Motor

Claims

A plurality of processors, a plurality of devices provided corresponding to each of the processors, and input / output means for controlling data input / output between the processors and the devices, the data input from the plurality of devices and the A distributed processing system in which the plurality of processors distribute data output to a plurality of devices,
Including fault tolerance processing means for causing the plurality of processors for the distributed processing and the plurality of input / output means to perform fault tolerance processing together;
The fault tolerance processing means is provided in each of the processors, and data sharing means for inputting data from all the devices and outputting data to all the devices, output data addressed to a predetermined device calculated by the own processor, Redundancy management means for comparing output data destined for the predetermined device calculated by another processor and notifying the other processor of the comparison result;
A failure determination unit that is provided in the input / output unit and determines whether or not the own processor is faulty based on a comparison result notified from another processor;
If the failure determination means determines that the own processor is faulty, it further includes failure isolation means for isolating the own processor as faults,
A data transfer means for outputting a data output to a device corresponding to the own processor from another processor when the own processor is fault-isolated by the fault isolation means;
The redundancy management means sends out a failure notification to other processors that have calculated the output data when the output data calculated by the two processors does not match, and sends a normal notification when they match. Processing system.

Distributed processing system according to claim 1, characterized by using the distributed processing to the joint control of the robot arm.

The fault tolerance processing means is provided corresponding to each of the processors, and a fault determination means for uniquely determining whether or not the own processor is faulty; The distributed processing system according to claim 1, further comprising:

4. The distributed processing system according to claim 3, wherein the distributed processing is used for joint control of a robot arm.

In a distributed processing system including a plurality of processors, a plurality of devices provided corresponding to each of the processors, and input / output means for controlling data input / output between the processors and the devices, A distributed processing method in which the plurality of processors distribute data input and data output to the plurality of devices,
Including a fault tolerance processing step for causing the plurality of processors for the distributed processing and the plurality of input / output means to perform fault tolerance processing together;
The fault tolerance processing step is provided in each of the processors, and a data sharing step for inputting data from all the devices and outputting data to all the devices, output data addressed to a predetermined device calculated by the own processor, A redundancy management step of comparing output data addressed to the predetermined device calculated by another processor and notifying the other processor of the comparison result;
A failure determination step that is provided in the input / output means and determines whether or not the own processor is in failure based on a comparison result notified from another processor;
If the failure determination step determines that the own processor has failed, the failure determination step further includes a failure isolation step for isolating the own processor.
A data transfer step for causing a data output to a device corresponding to the own processor to be output from another processor when the own processor is fault-isolated by the fault isolation step;
The redundancy management step is characterized in that if the output data calculated by the two processors does not match, a failure notification is sent to the other processor that calculated the output data, and if it matches, a normality notification is sent. Processing method.

6. The distributed processing method according to claim 5, wherein the distributed processing is used for joint control of a robot arm.

The fault tolerance processing step is provided corresponding to each of the processors, and a fault determination step for uniquely determining whether or not the own processor is faulty;
The distributed processing method according to claim 5 , further comprising: a proxy processing step of performing a proxy processing of the processor determined to be the failure.

8. The distributed processing method according to claim 7, wherein the distributed processing is used for joint control of a robot arm.

In a distributed processing system including a plurality of processors, a plurality of devices provided corresponding to each of the processors, and input / output means for controlling data input / output between the processors and the devices, A distributed processing control program in which the plurality of processors distribute data input and data output to the plurality of devices,
Including a fault tolerance processing step for causing the plurality of processors for the distributed processing and the plurality of input / output means to perform fault tolerance processing together;
The fault tolerance processing step is provided in each of the processors, and a data sharing step for inputting data from all the devices and outputting data to all the devices, output data addressed to a predetermined device calculated by the own processor, A redundancy management step of comparing output data addressed to the predetermined device calculated by another processor and notifying the other processor of the comparison result;
A failure determination step that is provided in the input / output means and determines whether or not the own processor is in failure based on a comparison result notified from another processor;
If the failure determination step determines that the own processor has failed, the failure determination step further includes a failure isolation step for isolating the own processor.
A data transfer step for causing a data output to a device corresponding to the own processor to be output from another processor when the own processor is fault-isolated by the fault isolation step;
The redundancy management step is characterized in that if the output data calculated by the two processors does not match, a failure notification is sent to the other processor that calculated the output data, and if it matches, a normality notification is sent. Processing control program.

The distributed processing control program according to claim 9, wherein the distributed processing is used for joint control of a robot arm.

The fault tolerance processing step is provided corresponding to each of the processors, and a fault determination step for uniquely determining whether or not the own processor is faulty;
The distributed processing control program according to claim 9 , further comprising: a proxy processing step that performs a proxy processing of the processor determined to be the failure.

12. The distributed processing control program according to claim 11, wherein the distributed processing is used for joint control of a robot arm.